Senior Product Manager Molly Stewart and Vice President of Product Rosh Dhanawade recently interviewed Mike Christian and Sara Hu for EA's DatabasED podcast about how interoperability changed their research work. Below are highlights from their episode.

Principal Researcher Michael Christian and Research Scientist Sara Hu.

We—Michael Christian and Sara Hu—didn’t come to Education Analytics (EA) as technologists—we came as researchers. Both of us were trained as economists. Both of us spent years building models to understand what helps students learn. For most of our careers, the data side of research looked familiar: we’d request a dataset, receive a set of flat files tailored to a specific project, and start analyzing. 

Then we began working with standardized Ed-Fi data in a modern cloud warehouse environment. And almost immediately, what felt possible in education research expanded. 

 

Who We Are and What We Work On 

Sara Hu

I earned my Ph.D. in economics from Syracuse University in 2007 and have spent my career in both research and teaching. My work as a Research Scientist at EA focuses on applied research that can inform and improve educational practice and policy. Since the start of the COVID-19 pandemic, attendance has become a central focus of my research. I’ve studied how students’ social-emotional learning skills relate to attendance, and more recently, I’ve been examining teacher value-added and its relationship to attendance outcomes. 


Michael Christian 

I’m a Principal Researcher at EA, and I earned my Ph.D. in Economics from the University of Michigan in 2004. Most of my work at EA focuses on growth and value-added models. These models use student assessments to measure the extent of student learning and then aggregate that information to the school or classroom level. We partner with states and districts to develop growth measures that meet their policy needs. 

 

Our First Reactions to Working with Ed-Fi Data 

Sara: It changes what possible looks like

Seeing Ed-Fi data in a warehouse environment was exciting because it changed my sense of what education research can do. Like many researchers, I was used to receiving a small set of flat files tailored to a specific question. That can be efficient—but it also limits you. It constrains the questions you can ask, and it often means accepting someone else’s decisions about how variables were constructed. 

With Ed-Fi data, the structure is different. I can link information across multiple domains—enrollment, course sections, school calendars, attendance events, and more—at a level of granularity that opens entirely new analytical paths. This is especially significant for attendance research. I am no longer limited to the number or percentage of days a student was absent during a school year. I can look at which course the student was absent for, at what time, and on which day. That kind of detail enables more precise measurement and more meaningful policy questions.

Example of daily attendance data in a cloud-based analytics tool.

Michael: Three game changers when using Ed-Fi data

First, there’s the sheer volume of data available, both in scope and in granularity. A state’s data warehouse contains assessments, courses, grades, student and teacher data, attendance, discipline, and more—and much of it is extremely detailed. Attendance records can exist by student, course, and day, sometimes including reasons for absence. 

Second, the way the data fit together is remarkably clean. All these different data elements connect because there are consistent “key” variables that identify schools, students, courses, course sections, class period times, and even specific dates. That structure makes it much easier to merge across domains and build the dataset you actually need. 

Third, the speed of working in a cloud-based analytics tool was eye-opening. I could build the variables I wanted directly from raw data much faster than I would have been able to in desktop statistical software. Something that was once a massive processing task—like calculating the absence rate of a student’s classmates in each course on the specific days the student was absent—becomes manageable.

 

The Learning Curve 

Sara: Building on familiar patterns and skills 

For me, the learning curve had two main components: understanding the tools and building a mental model of the data. 

  1. Learning SQL and getting comfortable in the warehouse. 
    I hadn’t used SQL before, so at first it felt like learning an entirely new language. My experience working in R helped with the learning curve, and once I recognized familiar patterns—grouping, filtering, joining, summarizing—the logic started to click. The process reminded me of learning pickleball after playing tennis: different game, but transferable underlying skills.
  2. Understanding how the tables connect. 
    The second, bigger challenge was understanding how data flows through the system and how the tables relate. One example that really shaped my understanding was calculating daily course-section attendance rates. Doing that correctly meant connecting several pieces: school calendar dates, course enrollment records, and attendance event data. Once we matched students to the days they were scheduled to be in class and pulled in attendance records, we could aggregate by course section and date to calculate the proportion of enrolled students who were absent. 

Working through that process helped me develop a clear mental map of how the data elements fit together. After that, the work became much more intuitive. 

Michael: Documentation, grain size, and a flexible analytical structure

For me, one of the most important habits to build was referring to dbt documentation.1  It tells you the grain of each table—what one row actually represents—which is essential for avoiding mistakes. It also defines variables and highlights the key fields you’ll use to connect datasets across domains. 

I also learned that SQL might look strange at first, but it isn’t fundamentally different from the logic we use in R, SAS, or Stata. Common table expressions (CTEs) are especially helpful because they let you build intermediate datasets step by step. Once I internalized that I didn’t have to accept raw variables “as-is”—that I could create new variables using math and conditional logic across tables—the workflow became much more flexible. 

What We’d Tell Other Researchers: Expect to Build, Document, and Collaborate 

Sara: Reflecting on my experience, here are a few lessons I’d share with researchers interested in working with standardized, relational data like Ed-Fi. 

  • Be patient during the initial on-ramp process. You’ll likely need to learn SQL and get comfortable thinking relationally—that is, joining tables, checking grain, validating assumptions. It can feel slow at first, but it becomes empowering once you get familiar with the system.
  • Documentation and collaboration aren’t optional—they’re essential. The dbt documentation reduces the friction of understanding tables and lineage. Collaboration with the analysts and data engineers who work in the data warehouse every day is crucial when you navigate business rules or defining variables.
  • Be prepared to construct “research-ready” metrics. Not every metric comes prebuilt. Attendance rates, for example, require careful decisions about which days count, which enrollments count, and how absences are interpreted. Courses can be tricky too, since course naming conventions vary widely across districts. Those aren’t flaws; they’re reminders that richer data requires clearer methodological choices.
  • The opportunity lies in precision and stronger early warning systems. What excites me most is the ability to measure attendance with more nuance over time. Chronic absenteeism is strongly associated with dropout risk, and attendance behaviors are shaped by complex factors. A data system that supports longitudinal, fine-grained attendance measurement—combined with modern modeling—creates real potential for better early warning systems and more targeted support for students. 

Michael: From my perspective, the strongest advantages are the volume and detail of the data, the way it all fits together, and the speed of processing in a cloud-based analytics tool. The key is that the kinds of variables you can create—and the extent to which you can connect to them—are so much broader with this kind of infrastructure. It doesn’t just make existing research workflows faster—it expands the range of questions you can realistically study. 

 

Where We’ve Landed 

Working with standardized Ed-Fi data has reshaped how we think about education research. It’s more detailed, more connected, and more flexible than the flat file approach many researchers have used throughout their careers. While there’s a learning curve, it comes with a payoff: the ability to build more precise measures, ask different questions, and create analysis-ready datasets that better reflect the complexity of real schools and real students. 

 

1 EA uses dbt to produce regularly updated documentation built on the warehouse structures that are continually maintained and improved. 

Interested in Learning More?

Listen to the latest episode of the DatabasED podcast, where co-hosts Molly Stewart, Senior Product Manager, and Rosh Dhanawade, Vice President of Product, join Sara Hu and Michael Christian to explore how Ed-Fi data and modern machine learning tools support more refined early warning systems for chronic absenteeism and dropout prevention. 

listen on spotify