Executive summary

Twenty years of technology development and large-scale implementation of SLDS projects have generated a wealth of knowledge, strategies, and successes that have been invaluable to the next phase of SLDS work.
A common approach in the technology world that leads to high-scale, high-impact work is a strategy of reference software builds. We have already seen this kind of strategy succeed in the education world with efforts like the Ed-Fi software stack.
There is a once-in-a-generation opportunity to bring together what we’ve learned from the last 20 years with a reference software stack strategy to amplify every state's SLDS efforts.
We can do this by funding an independent organization to leverage current technologies and standards (like CEDS) to build and maintain a reference software build of an SLDS and then allowing states to start building an SLDS that is 90% complete. They would then only need to build last-mile value instead of repeating maintenance across 58 SEAs.
Most of the technologies to do this already exist but have not yet been stitched together into a strategy that can be actionable. This commentary is meant to spur discussion of what it will take to get there.

A brief history of the SLDS strategy

I am a child of the 80s. As a tech nerd, I remember the onset of the internet, 2400 baud modems, and the first World Wide Web pages that came about in the early 90s. I distinctly remember waiting 20 minutes in 1994 for a photo of Jupiter recently released by NASA to download pixel by pixel onto my AOL browser—just because it was cool! My parents were not too happy, of course, because back then, the internet was both inconvenient (taking up our phone line) and expensive (only having, say, 100 minutes a month to use the internet without exorbitant charges accruing). At the time, there was unlimited optimism for the future of the internet, but of course, we found out that our eyes were bigger than our stomach. It turned out that by the turn of the century, the realities of how complex the task of digital transformation truly was had collapsed industries and left us with the sobering reality that it would be a multi-decade slog.

Fast forward to 2002: In the aftermath of the dot com bubble collapse, we passed two major federal laws in education: Public Law 107-110, which amended the Elementary and Secondary Education Act (ESEA) to enact No Child Left Behind (NCLB), and Public Law 107-279, which included the Education Sciences Reform Act (ESRA) and the Educational and Technical Assistance Act (ETAA). These are landmark laws that did many things outside of the scope of this commentary, but there is some necessary relevant context for the topic at hand:

NCLB required large-scale assessment (and therefore data collection) in grades 3-8, as well as other requirements for states to collect and report large-scale data to the federal government.
ESRA created the Institute of Education Sciences (IES), which would oversee the National Center for Education Research (NCER), the National Center for Education Statistics (NCES), and the National Center for Education Evaluation and Regional Assistance (NCEE). This in effect created the national infrastructure for research in education to help understand the data coming out of the requirements of NCLB and to help report to the public how the US school system was performing.
Finally, and most pertinent to the topic of this commentary, ETAA established (among other things) a grant program for Statewide Longitudinal Data Systems (SLDS).

Importantly, in the ETAA legislation, there are very few requirements for this grant program. The full text is simply:

In awarding grants under this section, the Secretary shall use a peer review process that

ensures technical quality (including validity and reliability), promotes linkages across States, and protects student privacy consistent with section 183;
promotes the generation and accurate and timely use of data that is needed—
1. for States and local educational agencies to comply with the Elementary and Secondary Education Act of 1965 (20 U.S.C. 6301 et seq.) and other reporting requirements and close achievement gaps; and
2. to facilitate research to improve student academic achievement and close achievement gaps; and
gives priority to applications that meet the voluntary standards and guidelines described in section 153(a)(5).

(Note that Section 183 requires that confidentiality and privacy of data be maintained according to other US statutes, and Section 153(a)(5) demands that NCES set voluntary standards and guidelines for developing SLDS systems in conjunction with ESEA and, separately, with privacy requirements.)

So where does that leave us in 2002? We have several laws that require states to collect and report large amounts of educational data to the federal government, with the newly created IES and NCES institutions governing a grant program that will help states build the technology to do this task. The law does not mandate a national strategy or technology plan, but rather delegates that to states and incentivizes the adoption of only voluntary standards. Also in 2002, we have come to a national realization that internet technology not only is underdeveloped but also frequently over promises, and even the best technologists in the world are having trouble wrapping their heads around the coming growth in data.

As a researcher and technologist, I find this series of events fascinating. In quick succession, we tied a new national education research strategy to a technology funding strategy, and then a few years later, we started distributing very large amounts of money to states to try and accomplish this task. To this day, large-scale, multi-institutional, multi-domain data systems are one of the hardest technological challenges in industry and in government alike. Back in 2005, when the first grants were awarded, the challenge was immeasurable.

What have we seen since then?

Statistics are difficult to find this far back, but according to my scan of sources, a very rough estimate is that since 2005, when the first SLDS grants were awarded, the amount of data created in our society per year has increased on the order of 500x (most sources I found seem to say we created about 0.2 Zettabytes of data in 2005 and are on track for about 100 Zettabytes of data in 2022). This has created massive challenges and opportunities in the data technology, analytics, and cloud technology spaces. Many of these challenges and opportunities have been driven by the commercial sector. For instance, in the last 17 years we have seen (just to scratch the surface):

The rise of cloud computing and storage
The rise of massively high-performance database technology at very low cost
A huge increase in well-supported, open-source, large-scale technologies
Large-scale, internet-driven API interoperability initiatives
A pandemic that forced millions of students and educators to be fully remote and to adopt new technologies that generate vastly more data
The rise of the dominant delivery of software products to be subscription-based, cloud-hosted solutions

Since the SLDS law and strategy was formed, we have seen so much innovation and learning happen against the backdrop of shifting policy demands and technology advances. From my view, we have seen a few key things happen:

We have spent almost $900 million in grants over 20 years for states to build SLDS systems. This sounds like a lot of money, but it amounts to an average spend of roughly $800k/year per SEA, which for large-scale technology projects is low—amounting to about $0.60 per student per year.
This money spurred on a large amount of innovation and co-funding from state departments, and there have been numerous examples of SLDS projects allowing state departments to get a handle on the ever increasing complexity of P-20W data.
In 2009, under section 114(g) of ESRA (which gives the director of IES the ability to establish peer and technical review committees), a technical working group was commissioned under NCES to address data standards when considering SLDS work. This group eventually created the first version of the Common Education Data Standards or CEDS.
CEDS has established itself as a national standard for data definitions and has developed some projects to materialize those definitions into data structures.
49 of 50 states have been awarded at least one SLDS grant since 2005, and from what I can tell, almost all of them have built some version of a custom solution—even if they all have very similar data and processes. While SLDS has always been aimed at P-20W data, for the most part, we see K-12 data driving the states forward, with operational use rather than research being the dominant need.

An external observer to this whole process may mistake an organic, decentralized process for chaos and a waste of money. I propose that while organic systems can feel a little bit hard to navigate, they also teach us much about what can naturally sustain and what needs extra work. In that vein, there have been recent conversations among research organizations, states, Congress, and IES about the value of the current SLDS strategy. I believe that we know all that we need to know to embark on what the current Director of IES, Mark Schneider, calls SLDS 2.0. As Dr. Schneider comments, there are several key tenets that can take us into the next 20 years:

"Using a modern, often cloud-based, architecture
Emphasizing interoperability
Aligning coding schema and data definitions across states
Making data more widely available while remaining consistent with existing and future privacy laws at the state and federal level
Integrating data from early childhood through labor market outcomes and for other services states identify"

(Source: Modernizing IES Using Large Data Libraries, 2022)

I would add one more: a reference software build that creates an implementable example of what states could do. By reference software build, I mean a reference architecture along with a maintained implementation.

Standards alone are not enough; we need the reference build to actually facilitate innovation in the field.

The value of reference software builds

Over the past 20 years, we have seen the rise of very large open-source projects that serve as a reference for enterprises or software developers who don’t want to start from scratch on a new project. This strategy stems at least in part from the computer hardware world, where an Original Equipment Manufacturer (OEM) of computer chips would create a basic circuit board design that their chips could sit inside. That has allowed other downstream manufacturers to use the OEM's chip, and then resell that chip with only minor configuration differences, without the OEMs having to manage the whole supply chain and retail channel to sell their chips. In the end, this enables the functional separation of design of the basic idea (by the OEMs) and customization for specific use (the downstream manufacturers).

In the software world, we see a similar strategy in many spots; you might be familiar with Chromium, the reference software for the vast majority of the world’s browsers, including Google Chrome, Microsoft Edge, Opera, and Brave, to name a few. For those of us old enough to remember the browser wars, you may remember that as web standards were exploding in complexity, it was difficult for browser developers to keep up—which resulted in Internet Explorer monopolizing the market and stifling innovation. Open-source browsers like Firefox emerged but struggled to take the helm from Internet Explorer. Then, Google entered the stage with Chrome, which not only dominated the market but also included a reference build that allowed others to compete in the browser space. And now, almost no one would try and write a browser from scratch, as it would put them on an island of software support and standards compliance that would never end—even though the web standards at this point are very well established. So standards alone did not help; we needed the reference build to actually facilitate innovation in the field.

Similarly, in education, one other important innovation over the last 10 years has been the development of modern full stack systems that enable interoperability among data standards in education. There are several of these, but the one I know best is Ed-Fi, which is operated as a full stack, open-source reference build for K-12 transactional data storage and transport. Ed-Fi and similar bodies have been doing immensely challenging work to figure out ways to build and govern such systems in the complex world of education data. The results are not perfect, but they have been proven to work, and they slowly but surely are moving the needle towards better and less expensive outcomes.

What tools do we have now in 2022?

To summarize all of the assets that this 20-year history has created both within the education data field and outside of it, we now have:

High-performance, high-scale, highly replicable cloud data storage and database technologies
Common blueprints for open-source software development and deployment (developed by the commercial world)
Well-publicized reference software builds, both inside and outside of education, that show the value of that strategy for widescale technology adoption
Familiarity among almost every state with the challenges and opportunities of SLDS work
A recognition through standards communities like CEDS, SIF, Ed-Fi, etc. that common action among education agencies is much more powerful than trying to build on their own
The CEDS standards
Various CEDS-based technical projects (e.g., Generate) that help us understand the difficulties of implementing technology on top of the data standard
Massive amounts of learning on how to govern technology development across states
A strong need to modernize data technologies (exacerbated and exposed by the pandemic), plus a fear that we will just need to do it again in five years when we have another 10x growth in data volumes
A burgeoning notion that data standards are not enough—interoperability standards are taking over the tech world
Rumblings that a change in strategy is needed at every level of the education field
A massive need to protect data privacy and security, while at the same time increasing accessibility and possible use of data at all levels of the educational ecosystem

What are we missing?

Given everything I have written, one might assume that SLDS strategies and implementations in the United States education system are in a good spot. This seems not to be the case. States seem instead to be on a constant treadmill of build, fall behind the curve of exponential data complexity, experience crisis, modernize, and rebuild—and then rinse and repeat. Even the brightest examples of success fall prey to this cycle, and in our current state, the best states can do is throw more and more money at the problem just to play catch up. Given all of the knowledge and technology advancements over the last 20 years, we should be able to get out of this cycle. I believe we are missing a key thing: a collaborative cross-state strategy for collective action. This strategy would then allow us to develop, build, and govern a living software reference build of a full SLDS stack. This is what I propose for SLDS 2.0.

I believe we are missing a key thing: a collaborative cross-state strategy for collective action.

A proposal for SLDS 2.0

The diagram below shows the key architectural components needed, in my view, to enable a collaborative state strategy. In brief, these include a single point of entry for data (shown by the four boxes at the bottom of the diagram below) through modern technology (here, an API), and an implementation of a data model (in this diagram, a CEDS database layer) that is not an analytical data store but rather an operational data store—all of which enables not only automated federal reporting but also analytics and research uses.

Proposed SLDS 2.0 Architecture

Diagram of SLDS 2.0 proposed infrastructure, showing data from Early Childhood Data Systems, K-12 Data Systems, Higher Education Data Systems, and Workforce Data Systems flowing through an API transport layer and CEDS database layer to enable a reporting layer for federal reporting and an analytics/research layer for state decision making.

More specifically, to make a "reference build" of a full SLDS stack a reality, here are the key features needed:

Cloud-native technology that can be deployed easily using modern deployment strategies (like Docker)
Open-source software across the stack (meaning the reference should not be built upon a particular software vendor's proprietary database technology or cloud technology)
A reference data model implementation of the CEDS data standards in a form that is conducive to data interoperability, as opposed to analytics or use
A reference API that allows any inbound data source to know exactly what format to push to the SLDS
A stack of software tools that allow for automation of compliance reporting activities

What is promising about this strategy is that there are aspects of it already in progress, which the diagram below indicates for each element. CEDS has a very robust data definition covering large swaths of the P-20W data domains, and a community willing to expand to meet the full needs of P-20W when needed. There are already examples of materialized data models for CEDS in interoperability form (CEDS IDS) and in reporting form (CEDS RDS). The CEDS JSON project just got off of the ground, which is a first step towards being able to create an API that could feed these other models. There are projects like CIID Generate that have proven the value of being able to automate a slice of federal reporting from some of these example technologies. Technologies like Ed-Fi in the K-12 space are ready to deliver to a common transport specification, if one were available. Initiatives like PESC or ECIDS in other domains could follow suit.

Where We've Made Progress

Same diagram of SLDS 2.0 proposed infrastructure as above, but showing which aspects are already in progress: ECIDS and PDG? for Early Childhood Data Systems, Ed-Fi and SIF for K-12 Data Systems, PESC and question marks for Higher Education Data Systems, Nascent for Workforce Data Systems, CEDS JSON project for the API layer, CEDS IDS for the CEDS database layer, CIID generate for the reporting layer, CEDS RDS for the analytics/research layer, and EdFacts, IPEDS, etc. for federal reporting submissions.

Because SLDS is by law a state-level initiative, this approach would require states to come together and agree to work together not on use cases, but on strategy. It does not seem to me like a federal initiative here would be successful, since the majority of these data will be used for state purposes rather than federal purposes, and so the federal government would risk creating yet another compliance activity.

A project like this would not be easy or cheap, but it is clearly possible—which we could not have said in 2002. Any takers?

Postscript

As I have thought through this strategy, a few questions immediately come to mind. I will pose them here with my own answers to them:

Who owns the software stack? Who develops it? Who governs it?

This is a key question. I believe that this stack should be open source and owned by everyone, but housed at an independent organization. This organization should be an independent convener and implementer that has no financial incentive to sell the solution in order to survive, but that has responsibility for developing the technical strategy, roadmap, and solutions with feedback from the community. There could be many funding models for this, but that would need to be thought out.

Doesn’t this strategy obviate the need for Ed-Fi, SIF, or other transport standards? If not, where do they fit?

No! These projects have their own complexities and have taken on the monumental task of unifying the Ed Tech industry around common data transport. If they are going to allow agencies to pre-integrate data for other reasons, this only gives the SLDS a leg up on this next scope, which is wider and at a lower level of detail. If SLDS 2.0 tried to take on every interoperability challenge, it would just be boiling the ocean. P-20W is already wide enough without having to go deeper into these domains.

Should the federal government drive this strategy?

Not in my opinion. The federal reporting scope is very narrow (and larger scale) compared to state operational needs, which is in turn very narrow (and larger scale) compared to local education agency needs. A federal lead here would cause too much focus on compliance and a lack of understanding of operational design. The federal government could play a role in funding this, as it is a classic case of a public good with positive externalities that will be under-invested in if each state were to only invest based on their own benefit. But the federal government has traditionally not funded collaborative efforts like this.

Aren't we already doing this?

A comment I have heard is, “Well, we are already doing this, and there are already CEDS SLDS systems out there.” This is of course true to a degree. The problem with this is that right now there is no unified strategy or software platform. It is not clear when to use the CEDS IDS or RDS. There is no single point of entry for any other system to point at in any of the P-20W domains. The current projects require proprietary software to run and are not cloud native. I'm not saying that this is a problem, but more so that we have a lot of work to do to go from some database configuration scripts to a software reference platform build.

Interested in Learning More About EA?

We want to empower you to be informed and discerning data consumers. We get excited about the work we do and are enthusiastic about the changes it can bring to education.

A collaborative state strategy for SLDS 2.0: A call to action