Over the past few years, our team at Education Analytics has been collaborating with partner education agencies to lay the groundwork for an open framework for analytics and data warehousing based on Ed-Fi data. By “open framework,” we mean a codebase that is free and open for education agencies to use, paired with a modular approach to configure, customize, and extend the base functionality of that code. In this blog, we want to describe what our efforts have produced so far, and share the brand identities we’ve developed for these efforts.
Enable Data Union (EDU) began as a collaborative effort between education agencies (meaning, districts and states) and Education Analytics. We have been working with education agencies across the country to solve a specific, but pretty prolific, problem: They have integrated their data into Ed-Fi and need a good solution to leverage that data for analytics, research, and data visualization.
The software that this collaborative effort produces will be a fully functional solution that allows education agencies to extract data out of Ed-Fi via the API, load it into a data warehouse, and then transform it into a dimensional data model in a database that is easier to work with for analytics—with room to build upon the common core of code to extend and customize. We plan to facilitate an engaged community of education agencies working with this software so they can share ideas, experiences, and code. We wanted the name and branding for this endeavor to express openness, transparency, and collaboration, while putting a clear focus on enabling the use of data in Ed-Fi for various purposes.
Stadium is the hosted version of this product that Education Analytics offers.
The goal of this product is to support education agencies that have enterprise needs beyond what they are interested in (or have the capacity for) managing in-house via Enable Data Union, by offering additional services, maintenance, hosting, development, and support. Stadium also serves to fund development of the Enable Data Union open code base and community. We wanted the Stadium name and branding to convey the structure, organization, and stability of the product.
We have also recently begun developing a new product, Podium, which will provide data visualization for the integrated data housed in our Ed-Fi data warehouse. Podium can help get education agencies up and running with starting to see how their integrated Ed-Fi data can be visualized and put into action.
The Data Engineering team at Education Analytics has spent the last several years working on innovations that allow us to produce actionable, research-based analytics at a lower cost for our partners. We have focused on re-using code and building expertise to solve common data, research, and analytics problems in the districts and states we work in. We have found that the key limiting factor in our efforts has been the lack of a data standard on which to build re-usable analytics code. To help overcome this, we have become actively involved in the Ed-Fi community to contribute towards standardizing the storage of student data in an interoperable framework.
What problems are we trying to solve?
In an Ed-Fi system, data are stored in an Operational Data Store (ODS). The word “operational” here indicates that the data themselves support the functional, or operational, needs of the agency that owns and uses the data. Education agencies might want to directly use their operational data in Ed-Fi to help answer questions like:
- Which students are in a teacher’s class right now?
- Can I see all of a student’s test scores and their grades from the first semester?
- How do I send data to my state for accountability purposes?
- How do I securely share data with a charter network that is under my district’s jurisdiction?
These are important operational questions that data in an ODS can answer—but those data are not necessarily well positioned to help answer analytics questions or research questions like:
- How have GPAs for socioeconomically disadvantaged students changed over the past 3 school years?
- What is the attendance rate in my district today?
- What percentage of students in a school are predicted to be proficient on state tests at the end of the upcoming school year?
- Are certain schools in my district showing faster recovery from COVID-related learning impacts than others?
For these kinds of questions, we need an analytics solution, not just an operational one. We are not the first to develop an analytics solution based on Ed-Fi, but with Enable Data Union and Stadium, we hope to solve some of the challenges we have observed in the existing analytics and data warehousing solutions. We want to help our partners avoid vendor lock-in by supporting the adoption of Ed-Fi and support them in building their data infrastructure off of an open data standard. If education agencies were to then go on to feed their data into proprietary products and data models for analytics, they would be back in the position of potentially being locked into a vendor.
We’ve seen that some proprietary products don’t have the customizability or extensibility needed to produce the wide variety of metrics that education agencies often care most about—things like:
- Rolling up attendance metrics by custom windows (like the previous week or previous month)
- Implementing locally defined on-track metrics with customized thresholds for differentiating various levels of “on-trackness”
- Mirroring how their state attributes students to schools or students to teachers so the district can compute their own metrics
Vendors may also make decisions about data behind the walls of proprietary code that make it difficult to inspect what is happening with data transformations. We know firsthand the value of analysts being able to understand how data in a data warehouse has been transformed and what business rules have been applied “behind the scenes” in order to do rigorous, valid, trustworthy analytics with data.
Finally, perhaps one of the simplest but largest challenges we have noticed is that some of these solutions are expensive to implement (and can be difficult to maintain over time), especially for smaller districts with smaller budgets—which are the very same districts that likely do not have the extensive in-house capacity needed to build these kinds of data systems. As a result, we consistently see more rural school districts or less well-resourced districts left behind when it comes to data systems.
We think there is an incredible opportunity to collaborate with education agencies to take on tricky analytics problems by working with a common framework—rather than each agency working on their own in their own systems. In the aggregate, that approach produces a lot of waste, with education agencies needing to re-invent solutions to very similar problems and losing the ability for knowledge transfer and shared learning.
We think the time for these solutions is right now
We think the time is right for this type of effort to succeed, in no small part because the technology is at a point to make this effort feasible. The modern data stack refers to a set of cloud-native tools that work well together and allow new design paradigms in data and analytics; we think that many of the innovations in the modern data stack can be leveraged in the public education sector. As Ed-Fi continues to expand, more vendors are integrating with Ed-Fi and more education agencies have data in Ed-Fi—and they are eager for low-cost analytics solutions that work but do not lock them into one specific proprietary product.
And as the technology has evolved to enable these efforts, so too has the appetite to build collaborative approaches to implementation. Agencies have witnessed the demonstrated success of collaboration among districts in the Ed-Fi community to initiate and scale data interoperability; in the same way, we have seen a growing interest in collaboration between districts and states around analytics. Education agencies doing analytics in their own custom-built data warehouses, or in closed proprietary systems, lose the opportunity to learn from other agencies solving similar problems, and prevent others from learning from them, in turn. We think the opportunity and appetite for collaboration is high.
About EDU: The product
The product we are building uses dbt (data build tool) as the centerpiece of how to organize data. Our goal with the product is flexibility to build upon a common core of code to meet the various needs of different education agencies. dbt and other elements of our technical strategy allow us to build cloud infrastructure that can run in multiple cloud environments in the future. We are starting by focusing on supporting AWS & Snowflake, given these are well-developed tools that we have experience with. The codebase for EDU will be open and available for any education agencies to use. All that is required to source the data is credentials to an Ed-Fi API.
When successfully deployed, the code creates a data pipeline and a data warehouse of Ed-Fi data that analysts, BI developers, and researchers can use. From the perspective of these users, the interface to EDU is a SQL database with a star schema dimensional model. Data are transformed from how they come out of the Ed-Fi API into something more useful to answer the kinds of analytics questions we shared above.
About EDU: The community
Enable Data Union began as a collaborative effort between EA and several education agencies that were trying to solve similar problems while using data in Ed-Fi for analytics. Our goal with the community is to draw upon the expertise and needs of the agencies we work with to contribute new solutions, build analytics applications, and support one another using common toolsets. We want the code to be a fully functional solution that districts could choose to implement themselves. We also want to plan for and support the configurations and customizations we know that districts and states will need to make a data warehousing and analytics solution really work for their needs. In the longer term, we hope this data model can be something that others can use to build tools upon, like advanced analytics and dashboarding tools. We also want agencies to be confident in adopting EDU as a secure and stable solution, without being locked into a proprietary product. We are also excited about the idea of fostering more collaboration among districts and states on metrics, research, and reporting using a common technological and philosophical framework.
About Stadium: The product
Code that is open for education agencies, as it will be with EDU, is a great start. But in our work with education agencies so far, we also know that there is a desire for an option that isn’t do-it-yourself, especially at first. Many agencies we work with do not have the in-house capacity to implement an open codebase, and are seeking more hands-on support to build and maintain this type of analytics database.
We think the offerings of the Stadium product will evolve over time, but we expect it to include:
- Setup and configuration of the EDU code
- Managing cloud hosting
- Security management and consulting
- System maintenance
- Development of code modules for a particular education agency’s implementation, such as:
- Integration of data sources that have analytic value but would not come from Ed-Fi (for example, integrating publicly available data about neighborhoods)
- Development of code to build out metrics inside a data warehouse (for example, reproducing a state’s business rules for calculating graduation rates)
We anticipate pricing a version of Stadium in a way that is low cost enough for small education agencies to be able to participate, and we aim to use revenue from this product to invest in the development of the EDU product and support the EDU community. In addition, we plan to make modules developed by a particular education agency available to other agencies, which will help expand the functionality of the EDU product over time.
What comes next
We are currently implementing our Ed-Fi data warehouse product with an initial set of districts, district collaboratives, and one state. One of our goals is to solicit feedback and conduct usability testing on the Enable Data Union code before a larger release. The intent of the code we are building for EDU is to identify points at which implementations of Ed-Fi can work differently and build code that is flexible enough to handle those cases—and that is impossible to do without working with many different implementations of Ed-Fi. We also value getting usability feedback from real users to inform design choices for the EDU and Stadium products. These collaborations have also allowed us to earn the Ed-Fi Alliance’s API Consumer Badge for the Stadium product.
We are in the process of finalizing a licensing model for the code we are releasing to ensure free and open use for education agencies. We are planning on releasing a version 1 of the EDU code in public by this fall, and we will be sharing more about these products as they continue to evolve and grow.