What is an assessment?
Ask 10 people this question, and you’re likely to get 10 different answers—all of them accurate. That’s because assessments come in many forms, serving different purposes across the educational journey. From the first days of Kindergarten to the final stretch of high school, assessments shape how we understand student learning, progress, and readiness for the future. Some assessments include:
- Early literacy screenings in Kindergarten, designed to identify foundational skills
- Summative state tests in grades 3-8, administered each spring to measure mastery of grade-level standards
- College & career readiness exams in in high school, like the SAT or ACT, that help open doors to post-secondary options
The content and intended measures of these assessments will vary greatly, which strongly impacts how their data and scores are stored and structured. These scores are crucial—not just for teachers and district administrators but also for our work at Education Analytics (EA). To ensure assessment data effectively serves students, it must first be organized into a coherent format that enables meaningful analysis. While our team is committed to the Ed-Fi data standard, most assessment vendors do not integrate directly with Ed-Fi. This challenge requires us to rely on custom tools to build data mappings from vendor file exports. While this data standard provides a consistent structure as a starting point, the variety of needs in assessment content, measures, and vendors means that standardization alone is not enough.
The need for governance became clear during our collaboration with the South Carolina Department of Education (SCDE). Our shared goal was to integrate 10 years of historical data for more than 30 assessments across all districts in South Carolina into Ed-Fi Operational Data Stores (ODS)—a massive undertaking.
Tackling this challenge required a multi-pronged approach:
- Identify and assess ambiguous areas within the Ed-Fi assessment data model
- Examine existing assessment data mapping for inconsistencies
- Engage with districts, users, and stakeholders, to gather feedback
To streamline the process of loading data into Ed-Fi and make it interoperable for analytics, we developed tools called Earthmover and Lightbeam.
Earthmover and Lightbeam: Expanding Ed-Fi Integration
As an organization partnering with education agencies on Ed-Fi projects, we quickly realized a key limitation: the usefulness of downstream warehousing and analytics depends on the availability and quality of the underlying data. Traditionally, student information systems (SISs) were the primary source of data feeding into Ed-Fi implementations. This left districts dependent on the capabilities of their chosen SIS, which often provided only limited integrations. Crucially, these integrations frequently excluded data domains of significant importance to educators—such as assessments.
Recognizing this gap, we saw an opportunity to innovate. There was a clear need for tools that could enable a broader range of data integrations into Ed-Fi, empowering districts to unlock the full potential of their data ecosystems. In response, our team at EA developed Earthmover and Lightbeam tools specifically designed to bridge these gaps and expand the scope of data available within Ed-Fi.
Both Earthmover and Lightbeam have become essential tools of EA’s efforts to support education agencies seamlessly integrating diverse datasets to ensure that educators have access to comprehensive, actionable insights.
Command-Line Tools for Seamless Data Integration
Earthmover and Lightbeam are open-source, command-line tools designed for simplicity and flexibility. They only require a python environment, making them facile tools for managing educational data.
Earthmover transforms tabular data into text-based data formats. While particularly effective for reshaping source data into Ed-Fi compatible JavaScript Object Notation (JSON), Earthmover is not limited to Ed-Fi specific use cases.
Lightbeam, on the other hand, interacts directly with the Ed-Fi API. It validates and transmits JSONL files. Using these two tools, data can move from its original format into the Ed-Fi Operational Data Store (ODS).
The Power of Bundles
Our tools were built with three core principles in mind:
1. Flexibility
2. Efficiency
3. Broad applicability
Central to achieving these goals is the concept of a “bundle”—a reusable set of transformation instructions and JSON template files.
Bundles are valuable for integrating assessment data, where few native integrations to Ed-Fi exist. Despite this, districts across the country often share the same underlying source data structure for a particular assessment. By creating a bundle tailored to a vendor’s file specifications, we can reuse that bundle for any district with the same source data, which reduces both effort and cost. EA maintains a public repository of these bundles and that are used by many of our Ed-Fi partners.
Managing Assessment Data Variability
Assessments often vary across districts, states, and years. For instance, scores might correspond to different state performance levels, or columns could be added or removed from one year to the next. These variations can complicate data integration, and our tools are designed to handle these challenges. Thanks to the Jinja templating functionality built into our tools, we can include conditional logic that allows for flexibility to adapt to these changes dynamically.
Additionally, we developed a feature called project composition to streamline workflows. This feature allows an Earthmover project to import and build upon other Earthmover projects by importing them as packages. This simplifies the process of overriding specific values, eliminating the need to copy and paste code for minor adjustments.
Why Is Assessment Data Challenging?
Prior to building bundles, our team took a deep dive into the assessment domain of Ed-Fi to ensure our mappings were accurate and representative of the standard structure. While we are all well-versed in the Ed-Fi data model, the assessment domain is unique.
The key distinctions include:
- Flexibility of the Data Model: The Ed-Fi assessment domain is designed to be highly flexible, especially with student assessment results. This requires thoughtful governance to maintain consistency across datasets.
- Diverse Sources of Data: Unlike other domains, there is no single, standardized source of assessment data. Instead, data will be populated by multiple vendors, each with its own file structures and unique characteristics.
What are the impacts?
- Vendors can make vastly different decisions about how to populate the assessment-related resources.
- Downstream processes need to handle this flexibility in complicated ways.
When integrating a new assessment into Ed-Fi, critical data modeling decisions must be made at the outset. It’s at this stage that misalignments and inconsistencies occur. While not extensive, two of the most important decisions include:
1. Assessment Definition and Hierarchy
According to the Ed-Fi standard, the fields used to define an assessment (the primary key) are AssessmentIdentifier and Namespace. This seemly mirrors the primary key structure of other resources:
- Student primary key = StudentUID
- School primary key = SchoolId
However, there are key differences that make assessment definitions more complex:
- Source Diversity: student and school IDs typically have a standard, single source—most often the Student Information System (SIS). Assessments, by contrast, originate from various vendors, each with unique data structures and conventions for defining assessments.
- Lack of Standardized Definitions: Unlike student and school IDs, which have commonly understood definitions and are well-established within source systems, the definition of an assessment ID is ambiguous.
Ed-Fi defines the AssessmentIdentifier property as, "a unique number or alphanumeric code assigned to an assessment." This definition leaves room for ambiguities.
How would someone determine what makes an assessment unique? Should it be unique by:
- Grade?
- Subject?
- Year?
- All of the above?
Technically, any of these fields could be included in the assessment identifier, and we have seen a variety of combinations of them across implementations. Some examples include:
- MCAS03AESpring2018 (MCAS Grade 03 ELA - 2018. Seen in Boston)
- STAR-RD-V1 (Star Reading - Version 1. From Data Import Tool)
- CAINC-IREADY-DIAGNOSTIC-ELA (i-Ready Diagnostic ELA. From i-Ready native integration)
The impact of this is that each of these assessments will appear to be unique by different properties, leading to inconsistencies in what constitutes a “unique” assessment.
Early Approaches and Challenges
Previously, EA attempted to determine the AssessmentIdentifier by considering the following two questions:
- Are there components in this assessment for which scores/other metadata significantly differ?
- How does the vendor describe the assessment?
However, those questions do not have straightforward answers, and do not remove ambiguity from the process.
2. Score standardization
Currently, EA creates a vendor-specific approach during data integration. Instead of standardizing to the default AssessmentReportingMethod descriptors, EA creates new values under the specific vendor namespace that represents the score exactly as received.
For example:
- NWEA MAP: uri://www.nwea.org/map/AssessmentReportingMethodDescriptor#RIT Scale Score
- DIBELS: uri://dibels.uoregon.edu/assessment/dibels/AssessmentReportingMethodDescriptor#Composite Score
Downstream in our edu warehouse, both of those scores are unified to a single scale_score column for analysis. This approach raises an important question: Why not normalize scores names at the point of data integration into Ed-Fi?
The Complexity of Score Definitions
Score definitions vary significantly across vendors, making normalization at the point of ingestion challenging.
For instance:
- RIT Scale Score (NWEA MAP): Defined as a score on the “Rasch UnIT scale,” NWEA describes it as a stable, equal-interval scale that measures achievement independently of grade level. This means that the difference between scores is consistent across the scale, and scores are directly comparable regardless of test version or timing.
- Composite Score (DIBELS): Defined as a combination of multiple DIBELS scores, this metric provides an overall estimate of a student’s reading proficiency. Its calculation is complex and vendor-specific. The calculation behind the score is complicated, and documented here.
Technically, both of these scores are scaled scores (a raw score that has been adjusted and converted to a standardized scale), but the calculations behind those scores are different and often unique to the vendor. The score name and vendor-specific namespace captures those differences, just in case this additional information is relevant to those who need to use the data. If we normalized score names at the point of ingestion, it might be unclear which specific score (and underlying calculation) belongs to the scale_score.
Balancing Flexibility and Accessibility
A key goal in data integration is to enable flexible analytics. However, this often comes with a tradeoff between flexibility and accessibility.
For example:
- Assessments include multiple scores that could correctly be called scale_score, such as having both composite_score and rasch_score. EA designates one as the scale_score while maintaining transparency about which score was selected and including all others. If this decision were made at data ingestion, the mapping becomes opaque, and there may not be a standard place to put alternative scale scores.
- Vendors offer additional scores that would be impossible to standardize. Some examples include:
- NWEA MAP: 'Fall-To-Fall Projected Growth'
- Renaissance STAR: 'Normal Curve Equivalent’
These unique scores often cannot be normalized, and flexibility is required to map them appropriately in Ed-Fi.
Collaboration with SC/governance reviews
Like EA, the South Carolina Department of Education (SCDE) is committed to aligning their data systems to the Ed-Fi data standard. As part of this effort, they set out to load historical assessment data into this framework. Recognizing the challenges posed by inconsistencies across assessments, SCDE shared our focus on establishing governance recommendations to address these issues effectively. After months of reviewing existing data models and studying the similarities and differences of many assessments, our teams presented our research and recommendations to a group of more than 60 district representatives in South Carolina and gathered their feedback. Using that feedback, we refined our recommendations and established a governance committee tasked with reviewing both existing and future assessment integrations. This ongoing partnership ensures that the data remain consistent, actionable, and aligned with the Ed-Fi standard, paving the way for more effective use of assessment data across the state.
Governance Recommendations
Assessment Definitions / Hierarchy
When integrating assessments into Ed-Fi, it is essential to accurately capture the hierarchical structure of each assessment. This involves proper use of the AssessmentIdentifiers, ObjectiveAssessmentIdentifiers, and ParentObjectiveAssessment fields to reflect the true relationships within the assessment data.
To achieve this, the AssessmentIdentifier should represent the highest level at which scores are reported. Typically, this requires including two key properties in the identifier:
- Assessment Title
- Subject
By incorporating these properties, scores at the overall student assessment level can always be systematically mapped to a specific subject, which is vital for ensuring data integrity and supporting meaningful analytical reporting.
Why is Subject such an important field?
The subject of an assessment provides vital context for interpreting results. For example, a student might excel in reading and writing but struggle with math, or vice versa. Capturing this information allows educators to:
- Identify areas where individual students need additional support.
- Tailor instruction and interventions to meet students’ specific needs
- Gain insights into broader patterns of performance across subjects, enabling data-driven decision-making at the classroom, school, and district levels.
Take an assessment with the following structure:
And the corresponding student assessment structure:
This model does not follow our governance recommendations, and as a result, there would be no systematic way to map the math_scale_score to Mathematics, and ela_scale_score to ELA. To map scores to subjects in this example, we would need to write custom logic.
How does this end up working in practice?
Take the example of the Renaissance Star assessment — there are no scores reported across subjects, so to match these governance standards, this assessment is being mapped with the following identifiers:
- Star-MA (to represent Math)
- Star-RD (to represent Reading)
- Star-EL (to represent Early Literacy)
The main concern of this disaggregation of assessment records is that analyses or even simple querying downstream could be more difficult if the goal is to inspect all scores within a particular assessment (such as Star), regardless of subject. Other fields, such as the assessmentFamily property, exist in the data model to be used to address this drawback.
However, for an assessment like PSAT/SAT, there are scores reported across subjects, so to match governance standards, this assessment is being mapped with the following identifiers:
- PSAT 8/9
- PSAT 10
- PSAT/NMSQT
- SAT
Each of these assessments contain a single “Composite” subject at the overall assessment level, as this represents the level at which scores are captured for the overall student assessment. These assessments are inherently single-subject, so including the actual subject code in the identifier would be redundant. The various sections, tests, and their corresponding subjects can all be captured as objective assessments:
The Ongoing Data Governance Role
While the proposed solution helps to ensure consistency across assessments, ambiguity is still going to exist (and therefore, ongoing data governance oversight is still needed), particularly in regards to the following points:
- What should be included in the assessmentTitle?
- There is often no column with this information in the student data and there might not be a clear answer from the vendor.
- How should we handle assessments that cannot fit into this structure?
- This structure can work sufficiently well for most assessments, but there will be assessments that do not obviously fit within this framework.
- Take NWEA MAP Reading Fluency as an example. This assessment is inherently single subject, so maybe there should be a single assessmentIdentifier: “NWEA MAP Reading Fluency”? However, there are scores captured at the form level, which signifies that those different forms should be captured as separate assessment identifiers.
- This is a great example of needing to deeply understand the assessment in order to properly map it into the Ed-Fi structure.
- Take NWEA MAP Reading Fluency as an example. This assessment is inherently single subject, so maybe there should be a single assessmentIdentifier: “NWEA MAP Reading Fluency”? However, there are scores captured at the form level, which signifies that those different forms should be captured as separate assessment identifiers.
- This structure can work sufficiently well for most assessments, but there will be assessments that do not obviously fit within this framework.
- How should we capture assessments throughout history?
- Significant changes could have occurred throughout historical years that would impact how we define an assessment, but those changes might not always be transparent to someone modeling an assessment based on the current year. Content experts on each particular assessment should provide evidence for the splitting of assessment identifiers across history.
- ”Significant” in this context is also hard to define - but could include:
- Changes to standards
- Changes to the underlying statistical model of scores
- Changes in the versions of an assessment
- ”Significant” in this context is also hard to define - but could include:
- Significant changes could have occurred throughout historical years that would impact how we define an assessment, but those changes might not always be transparent to someone modeling an assessment based on the current year. Content experts on each particular assessment should provide evidence for the splitting of assessment identifiers across history.
Score Standardization
To support a wide range of analytics use cases, the current approach of standardizing downstream of the Ed-Fi ODS is acceptable. However, this method has its limitations—each system interacting directly with the ODS must handle score standardization independently. While this adds complexity, it can also provide the flexibility needed for specific analyses.
Looking ahead, a potential improvement would be incorporating a mechanism within the Ed-Fi model to maintain a mapping between original score names with standardized score names. This solution would enhance transparency, by preserving the connection to the original score, while improving flexibility and functionality across different use cases. Such an enhancement could streamline processes and reduce redundancy, benefitting all systems that rely on Ed-Fi data.
The Ongoing Role of Data Governance
While the translation of original score names to standardized score names is straightforward, there are instances where it is more complex. As demonstrated in the examples above, scores vary widely across vendors—not only in their names but also in how they are calculated.
Common score types include:
- Scale score
- Raw score
- Performance level
- Standard error of the mean (SEM)
- Percentile
Scores like raw score, SEM, and percentile are generally easier to to standardize due to their more universally accepted definitions. However, others, such as scale score and performance level, often require additional oversight to ensure consistency and clarity.
For instance, Renaissance Star assessments include multiple scores that could translate to a single performance_level column:
- RenaissanceBenchmarkCategoryName
- Renaissance default benchmark categories; these are standard across all Renaissance customers.
- StateBenchmarkCategoryName
- The benchmark categories used are the benchmarks defined at the state level in the Renaissance program. These values will be unique per state test (i.e. Level 1, Level 2, Level 3).
- DistrictBenchmarkCategoryName
- The benchmark categories used are the benchmarks defined at the district level in the Renaissance program.
In this case, the vendor score to map to the standard score could depend on the analytic use-case, but a default score could be defined by a governance group to avoid inconsistencies.
Lessons Learned
The journey of determining governance recommendations for assessment data integrations into the Ed-Fi data standard has been a challenging endeavor, but this journey has also provided valuable insights and lessons learned:
- Devising robust data governance practices requires diverse perspectives.
- Gathering feedback from district representatives and other stakeholders in South Carolina enabled our recommendations to accurately represent and address the needs of the community. Incorporating perspectives from individuals on the ground in these districts was integral to the success of the project.
- Data governance considerations reach beyond the assessment domain.
- While the assessment data model certainly requires attention to data governance, it is not the only domain in Ed-Fi with ambiguities. The process of discovering and applying data governance recommendations is applicable across various domains. Our team is currently investigating other areas within Ed-Fi that could benefit from more standardized modeling practices.
- Developing alternate Ed-Fi data integration methods will continue to be necessary for the growth and expansion of this project.
- Investing in the development of our Earthmover and Lightbeam tools has proven worthwhile for assessment governance; however, an increasing number of use cases beyond assessments, including historical and survey data, have emerged.