With Mizzen Education
By Sophia Wang and Sanjana Satagopan
Sophia Wang and Sanjana Satagopan’s build4good Experience
This summer, we interned at Mizzen Education to create an automated QA testing suite, an LLM evaluation and deployment pipeline, and a chatbot integrated into Mizzen’s admin and user portals.
About the internship
Context
Mizzen Education is a nonprofit that provides in- and out-of-school educators with academic resources including lessons across a range of subjects and student levels. Mizzen maintains a digital platform that allows educators to discover, organize, and collaborate with other educators they work with to manage their educational content. This platform supports a range of resources created in tandem with partner organizations such as NASA and Nickelodeon.
As Mizzen Education grew, a few limitations of the platform emerged. Educators often needed tailored lessons that aligned with their specific learning environments, available resources, and student needs, but manually customizing lessons for each context was extremely time consuming. In addition, educators would need to leave the platform to make these edits, making it more difficult to collaborate with other educators or reference more materials and lessons on the platform.
On the administrator side, site admin had to manually categorize each lesson and provide dozens of tags of keywords. With a platform containing hundreds of lessons, similar to editing each lesson, this step would be tedious and difficult to scale. At the same time, Mizzen’s platform lacked an automated testing framework, making it difficult for Mizzen engineers to consistently and scalably introduce new features that fixed these issues.
The Challenge
Our Solution
To address these challenges, we designed and built a large language model (LLM) evaluation and deployment pipeline, which would allow Mizzen engineers to create, train, and test their own custom LLMs. We used this interface to create Mizzy, a Mizzen-specific LLM agent capable of sporting both educators and administrators while only referencing Mizzen’s content.
Using Retool, we first developed the interface that allowed Mizzen engineers to create an LLM agent, prompt the agent with test prompts, review its responses, and provide structured ratings and feedback for improvement. This evaluation process ensured Mizzy could accurately answer questions, reference relevant lessons or content, and demonstrate a clear understanding of Mizzen’s curriculum and mission. The agent itself was built on AWS Bedrock, where we selected and configured a base model fitted with a knowledge base for it to reference materials from. The pipeline also tracked prompt histories, response qualities, and agent versions, allowing us to mark iterations of the agent as deployment ready in the pipeline. We designed this pipeline to be extensible, so engineers could use it to create, evaluate, and test other LLM agents for various other use cases.
Once the evaluation pipeline was complete, we integrated our agent Mizzy directly into Mizzen’s platform. On the educator-facing side, Mizzy functions as an in-platform chatbot that allows users to ask questions, request modules, or modify/find activities that match specific criteria. On the administrative side, Mizzy automatically generates tags and categories for activities when edited or created. Finally, to ensure safe rollout of these new features into the platform, we built an automated quality assurance testing suite using Playwright to run various integration and end-to-end tests on deployment of new code to the platform, also enabling future engineers to scale AI-driven functionality safely and efficiently.
Overall, we had an amazing experience working with Mizzen Education and build4good! We learned so much about software infrastructure, full stack development, agentic AI, and the software development lifecycle overall. Being able to take ownership of a critical feature in its entirety, spanning everything from ideation, to UI/UX design, to implementation, to deployment and testing, has also been incredibly fulfilling, and pushed us to think about systems holistically. The experience has also given us confidence in writing and deploying production-ready software for future projects.
On top of the technical growth, the mentorship and community aspects of the program have been incredibly impactful as well. Our Mizzen internship supervisors, Alex and Anton, were incredibly supportive and helped us grow as collaborators. The broader build4good program has also offered so many opportunities to connect with mentors across the public-interest technology space. Spending time with our build4good cohort in Washington DC for the build4good convening and hackathon in particular was a highlight, and we loved getting to know one another, exploring DC, and working on projects with motivated peers who shared an excitement for technology and social impact.