Data is oxygen for empirical research—without it, research ceases to exist. Education researchers want and need access to student-level and aggregated data to answer the questions that will drive meaningful and measurable outcomes for students and schools. A crucial roadblock most researchers face (sometimes even unknowingly) is working with educational data systems, since they are not set up to easily connect to one another. Every school district and state uses many different systems for storing and managing data, including student information systems, financial software, HR systems, assessment platforms, behavior management systems, curricular materials stored in learning object repositories, and more. This makes it extremely difficult and expensive for schools themselves—let alone researchers—to compile the data they need to answer the most pressing questions in education. And as a researcher increases their scope across multiple districts or states, this challenge gets exponentially more complex. Though data are abundant in the K-12 education ecosystem, far more data exist than are harnessed and analyzed. As a result, the field of education remains data rich but information poor.
Most educational researchers’ approach to accessing and analyzing data has been dictated by what data are readily available and accessible. Education Analytics—along with other organizations committed to interoperability, such the Ed-Fi Alliance—have been working to create data infrastructure that has the potential to completely transform the speed, efficiency, affordability, and accuracy of education research. Interoperability means that instead of relying only on data collected in manual and labor-intensive ways, researchers can tap into the power of real-time operational data—what we refer to as moving from datasets to data streams.
Interoperable data streams mean that education agencies (like state education agencies, or SEAs, and local education agencies, or LEAs) can securely and easily grant researchers access to real-time data streams that schools and districts are already generating, storing, and using. And this moment is the precipice of transformation: After a decade of work to develop and build out this technology, Ed-Fi is being implemented and scaled up in districts and states around the country. SEAs and LEAs located in South Carolina, Michigan, Colorado, Wisconsin, Texas, Arizona, Delaware, and Georgia (just to name a few) are well on their way to making interoperability a reality for their schools and districts.
So why is this moment important for researchers? What value and promise does it hold? And most importantly, “What is possible?”
How interoperability will transform education research
Interoperable data streams have almost unlimited potential to radically accelerate the pace, applicability, and scalability of research. There are three ways we expect interoperability to transform the way we do educational research:
- Interoperability makes it easy to bring together data from different source systems. Many of the questions researchers want to answer require data from different domains. Currently, a vast majority (let’s say, between 75-90%) of a researcher or analyst’s time is spent cleaning, conforming, and checking datasets. Interoperable data streams will allow education agencies to grant researchers access to those cross-domain data in ways that are much faster, easier, and less costly. Interoperability standards also will make scaling up to other contexts almost seamless. Instead of a manual, clunky, bespoke data query to extract data from a school district's unique constellation of systems, a district could run a standard, generic set of code for the broader interoperable ecosystem to instantly grant access to a researcher to conduct their approved research. And any district in the country could easily run that same code to grant that researcher access to the same kinds of data, formatted in the same way. This will mean exponentially less time gathering, combining, and understanding the nuance of local data and significantly more time doing the interesting analysis and interpretation that makes research meaningful.
- Interoperability allows the outputs of statistical research to live in real time within the data ecosystem. Currently, researchers write code to run statistical models upon a particular static dataset (perhaps a flat file, like a .csv, or a database using a SQL query). To turn the results from those analyses into something up-to-date, actionable, and useful within the education system is an extremely manual and unscalable process: Perhaps a report is written, or a custom dashboard is built to visualize the results of a particular set of models. Because the data systems are currently so custom, it never makes sense to build something more robust, because the cost would far outweigh the benefit. Rather than research being conducted with “dead” datasets that become irrelevant and out-of-date even before the research can be published, interoperability means that education agencies can install the artifacts of that research to "live" inside the data system itself. The implication? When the source data are updated in real time, the outputs of statistical research—like numeric results, data visualizations, or generated reports—are instantly updated, too. The education agency can choose to make the research live inside the system, which radically enhances its relevance and timeliness.
- Interoperability can transform rigorous research into actionable, operational systems for educators and administrators. When interoperable data systems are established, it means that data in one system can “talk to” data in another system instantly. This means that when data changes in, say, a student information system, it could automatically trigger a change in another system, like an IEP system; this kind of data flow allows the educators in the system to act quickly—without the massive amounts of of low-value, high-cost, manual, inefficient data transmission work that it would currently require. Interoperability is one of the ways we can automate what’s possible to reduce the manual burden on education professionals and create more time and resources for them to do the meaningful, impactful work that only humans can do. Researchers can work in conjunction with educators to develop these high-speed workflows to best leverage cutting edge best practice from research. Not only is this efficient for replacing manual workflows within school systems, but it’s also an opportunity for the outputs of evidence-based research to be immediately operationalized.
Next, we provide 10 examples of the kinds of research questions researchers could ask and answer with interoperable data streams, to illustrate each of the three transformations listed above.
10 questions we could better answer with interoperability
- What factors increase the likelihood of teacher retention? Using data from a district’s HR-owned systems—such as teacher salaries, teacher demographics, teacher evaluation results, and principal evaluation results—plus data from finance-owned systems—such as school funding levels and changes, professional development budget use, and per pupil spending—is enough to overwhelm any researcher, without even considering student-level data, such as student demographics, student achievement, and student daily attendance patterns. Imagine a world where any education agency could choose to instantly grant you access to these data in the same, standardized way by running the same simple, standardized code as any other agency in the country. Once you have those standardized data, which are already interoperable and therefore already connected to each other, you can identify which of the variables from those pre-connected data predict something like teacher retention year-over-year. A school or district could choose to instantiate that model into their systems to make their workflows not only more evidence-based and rigorous, but also faster, cheaper, easier, and better. They can now easily do things like monitor which schools are most at-risk of teacher turnover given the current set of data in these source systems. This could activate an automatic report emailed to the school leader that specifies the top 3 data points to reflect and act on for each given teacher who may be statistically at risk of turnover.
- How can real-time effects of interventions on short-term outcomes be used to adjust intervention dosage? Far too often, researchers are limited to spotty program implementation data and limited student outcomes, and outcome data over too long of a time span when evaluating what works in schools. Imagine that instead of end-of-year test scores, aggregate attendance rates, or end-of-course grades, an education agency could grant you instant, seamless access to data points like weekly formative assessments, day-by-day attendance, or assignment completion across a huge number of schools and districts using that same intervention. Educators could look at the relationship between inputs, like student-level dosage information or fidelity of implementation indicators, and these proximal student outcomes to monitor the effectiveness of an intervention (perhaps a 1:1 tutoring program, an AVID elective, or an element of a PBIS intervention) for their specific students. Perhaps a parent or family member could be texted whenever a student experiences an increase in one of those proximal outcomes, to help further motivate and incentivize participation in the program.
- Does students’ classroom behavior influence their daily school attendance? Data like daily disciplinary records, period-by-period attendance, student-teacher links, and student-school links are captured in a student information system (SIS), but most researchers can only dream of getting access to these kinds of fine-grained operational data—if they even know to ask for it. Currently, researchers rely on “dead” (i.e., out-of-date) data extracts pulled from the SIS at some slow cadence (annually or perhaps twice a year). An interoperable ecosystem means that for any education agency that grants you permission, you could (1) design and run these predictive models on that agency's data, (2) combine the outputs of those models from different agencies outside of the data system, and (3) insert those models back into each district's data ecosystem, where it lives in real time—without ever needing to combine or cross data from different education agencies together. This win-win situation benefits the district that gets to put that research to immediate use in their context and the researcher who gets to ask and answer their research question across a much larger dataset more quickly and cheaply. Districts might stand up a dashboard for their administrators to monitor how new disciplinary policies are having affecting attendance in their district. Or the system might push an automated message to a school counselor or attendance officer within a software platform they use every day whenever a student has a robust pattern of disciplinary incidents found by researchers to predict chronic absenteeism. The research question itself not only becomes easier to answer, but the outputs of it could immediately help educators act based on that research.
- Do instructional choices that teachers make affect students’ learning and self-efficacy?
Instructional strategies and curricular choices stored in a learning object repository (LOR), academic outcomes from formative and interim assessments, social-emotional outcomes from student surveys, and process data from educational technology platforms would be easily interconnected to estimate the impact of teachers’ choices on proximal student outcomes. - How do we best predict our current students’ likelihood to graduate high school on time?
Validated statistical models using post-hoc data on student attendance, behavior, and course completion are commonplace. Imagine how much more robust these models could become by also incorporating data related to students’ daily attendance, weekly assignment completion, semester grades, and more. Replicating this enhanced on-track model across as many education agencies across the country as want to participate and provide their interoperable data makes the model higher quality, more generalizable, and less disjointed. - Which school-based programs are most effective at increasing students’ preparation for more rigorous coursework? Questions that local experts like school counselors or deans answer in their day-to-day jobs could become more evidence-based by researchers having scalable, immediate access to course data related to content, curriculum, technology use, and standards coverage, plus program data related to participation, dosage, fidelity of implementation, and more.
- Does midyear teacher turnover influence short-term student outcomes? Researchers could explore in near real-time the specific mechanisms through which teacher turnover impacts students in the short term, then identify evidence-based strategies to minimize disruptions and develop support systems that mitigate the negative effects of turnover.
- Do student responses on SEL and school culture/climate surveys predict their short-term outcomes? Interoperable data streams can integrate student survey data with academic assessments, daily attendance, disciplinary incidents, course assignment grades, and other relevant data points to generate a set of suggested action steps that an educator could take to support their students. Educators are far too busy and overloaded to manually track all of these different data points to directly inform their professional judgment. Interoperable data streams themselves would not make this any easier on teachers—but research can use those data to generate evidence-based action steps that are automatically pushed to teachers to evaluate, select, and implement for their students.
- What skills and attributes of the K-12 education system are most effective at promoting success in the workforce? Workforce data streams could be securely provided to the K-12 data ecosystem, so that researchers could unpack what features, programs, and attributes of a school, district, or state predict positive workforce outcomes—perhaps measures like employment rates, income levels, job satisfaction, career advancement, or employer satisfaction. Across a large swath of education agencies with interoperable data streams, these models could provide both breadth and depth to this question.
- How might AI be designed as a teaching assistant to support instructional decision making? Standing up an AI algorithm on top of interoperable data streams has nearly limitless potential to support teacher effectiveness and efficiency. When trained on data related to teaching contexts and constraints, such as grade-level standards, student diagnostic assessments, required curricular materials, time constraints, and heterogeneity of student skills in a given classroom, AI could instantly generate instructional strategies or materials for a teacher to review, including lesson plans, guided practice, quiz items, homework assignments, or project-based learning. Like an omniscient data assistant, a teacher could use a simple interface like Chat GPT to ask questions like, "Alex Johnson just failed this quiz. What should I do next?" Statistical research models predicting outcomes from inputs could train this AI tool to live securely inside the interoperable data system, so that it's not simply generating plausible content from the data but is actually customizing responses based on the unique set of data in the educational data systems at that time.
Where can we go from here?
With interoperability, the answers to pressing research questions can create informed instructional, policy, systemic decisions to enable actionable educational changes. How? Interoperability can provide a real-time picture of where students, teachers, and schools are at any given point, so that decision makers are well-positioned to make changes in strategies, practices, and policies. The potential for research to live and breathe inside education systems is palpable; the chasm between rigorous statistical research findings and day-to-day decision making in schools, districts, and states is starting to narrow as the promise of interoperability becomes realized.
At a time when the demand to understand what works in education is as necessary as ever, this is one of an infinite number of reasons why interoperability is needed, and why EA is dedicated to making it work for and with our partners. EA is also committed to enabling researchers like us to answer the questions typically limited by data that are too expensive, too burdensome, too siloed, while also supporting technologists to understand the use cases and needs of researchers. In bringing these two fields closer in alignment, we believe we can improve outcomes for students and educators across the country.
The good news is that interoperability will make things radically easier, cheaper, faster, and better for researchers. But it does come at a cost: researchers will be required to change their business-as-usual practices in order to effectively work with interoperable data streams. It won't be easy, but as an organization that works on both the research and technology sides, we are starting to see how both fields need to make changes to the way we operate. Want to come change the world with us and learn together? Get in touch below.