Post navigation
Recently we discussed a debate about how much of the improvement in test scores of students in Mississippi can be attributed to a policy of holding back more students–in particular, having kids repeat third grade will be expected to improve average for fourth graders. Education researchers Howard Wainer, Irina Grabovsky, and Daniel Robinson [expressed skepticism](https://statmodeling.stat.columbia.edu/2025/12/01/how-much-of-mississipp…
Post navigation
Recently we discussed a debate about how much of the improvement in test scores of students in Mississippi can be attributed to a policy of holding back more students–in particular, having kids repeat third grade will be expected to improve average for fourth graders. Education researchers Howard Wainer, Irina Grabovsky, and Daniel Robinson expressed skepticism about claimed dramatic benefits from the Mississippi plan, but then there were good arguments on the other side. One thing is that a lot of the discussion was about what happened right after the new plan was implemented in the mid-2010s, but there have been longer-term trends in Mississippi and other states. Changes in averages are always hard to interpret because of possible changes in compositional effects, including decisions of the age at which children start first grade, classification of students as disabled, and who’s taking the test in any given year. Also, all these comparisons are observational: as Wainer puts it, there’s no control group. On the other other hand, decisions need to be made in the absence of ironclad evidence. So I was left in a state of uncertainty.
A couple days later we learned that Wainer et al. had garbled some statistics, entirely misreporting Mississippi’s fourth and eighth grade math scores. Wainer et al. were making a general point about testing and selection, something they’d seen in various forms many times in their careers, but they were evidently not close to the data from Mississippi, even to the aggregate data that are easily available. As I discussed, I should’ve earlier been more suspicious of their claims about the math scores, given that in my earlier post I’d noticed a discrepancy between those and others’ claims. After all this, I remain unsure what to think about Mississippi. It’s an observational comparison, there’s selection, there’s variation between states in how much they teach to the test, and at the individual level there are the spillover effects on the kids who are not held back . . . all sorts of things. On the other hand there are these long-term trends. Selection has to be explaining some of what is happening in Mississippi–if you hold kids back and give them the test later or manage to exclude them from the tested population entirely, the average scores of the remaining students should rise–but it’s hard to say how much, and at some point you have to go with the data in front of you. As is often the case, we’re not just arguing about causal effects; we’re also trying to pin down what exactly is happening.
In the meantime, I received an email from another education researcher, Doug Harris, who writes:
Wainer et al. also got it wrong on the other cities like New Orleans. To quote them: “We have seen several previous K–12 education ‘miracles’ that turned out to be hoaxes. Five of them were in Houston, Atlanta, the District of Columbia, El Paso, and New Orleans . . . The New Orleans miracle was caused by a natural disaster. Hurricane Katrina tragically relocated about a third of the students who came from the poorest areas. Removing thousands of low scorers immediately raised the average test scores of the students who remained.”
Several people pointed this out to me [Harris], especially because I have been studying the New Orleans school reforms for more than 10 years. My center, the Education Research Alliance for New Orleans, has published more than 50 articles about it. Our Advisory Board includes both supporters and critics of the reforms.
When I first came to New Orleans the sharp upward trend in outcomes gave me and others good reason to think this fit the first rule. The school reforms were sparked by Hurricane Katrina, which changed the city in many ways. Many families never returned, at least not to their original homes and neighborhoods. The whole city was hit hard, but low-income neighborhoods were hit a bit harder. Given the correlation between demographics and education outcomes, it was reasonable to be concerned that changes in the population, not the school reforms, drove the change in outcomes. Recognizing the problem, I spent years trying to disentangle this.
In the end, to my own surprise, it became clear that the reforms really did drive substantial improvement in a wide range of education outcomes—elementary/middle school test scores, high school graduation, college entrance exams, college attendance, and college graduation. They reduced many achievement gaps and may have reduced crime in the city (this last point is more difficult to determine with confidence). These results can be found here (ungated) and in economics journal, Journal of Human Resources (gated) and in my book Charter School City (University of Chicago Press, 2020). New Orleans went from being next to last in the state on almost every measure to being about average within ten years, improvement that has been largely maintained.
How did we isolate the Katrina effects from the school reforms? You can read our much longer articles, but here is a short take:
1. We tracked the trajectories of the individual students before and after the reforms and found that those who returned to New Orleans saw improved trajectories. Since these are the exact same students (the data were anonymized, of course), demographic change cannot explain that.
2. We tracked all pre-Katrina students with test scores living in New Orleans before the reforms and compared those baseline scores for returning and non-returning students. They were nearly identical, and we controlled for the remaining differences.
3. We commissioned the U.S. Census to calculate the change in demographics of households with school-age children before and after Katrina and the reforms. Again, they were nearly identical.
4. We tracked students who switched into and out of New Orleans. Those who switched into New Orleans learned at slower rates before Katrina and learned at faster rates afterward.
Demographic change was not the only potential problem. Given the system’s strict accountability, we wondered whether data manipulation was a driving force. We found no evidence of this. Tests for strange patterns in test responses and miscoded high school graduation rates turned up no or slight differences between New Orleans and the rest of the state. We also know how the improvement occurred, which provides even more confidence.
So, why do Wainer et al. call it a “hoax”? Because they apparently never looked for any evidence to back up their claim. Any basic internet search would have turned up our work. Our findings made national news. There is a “hoax” here but it’s not the one they claim.
OK, the New Orleans test scores are another story I know nothing about! What you see above is one take on them. At this point my main role is to convey these different arguments and advertise my uncertainty.