Posted by Ed at AFT
After reading Dianne Piche’s Eduwonk post on Maryland’s use of 5 as the minimum subgroup size for NCLB accountability and my colleague Beth's post below, I think it might be helpful to introduce a little Statistics 101 into the discussion. You can't get far talking about N size in isolation. The reality is that N size and confidence intervals work together.
Test results are estimates of student knowledge, and they are imperfect estimates. And size matters this way: The smaller number of students tested the less precise our understanding of whether the test results are accurately reflecting reality.
In order to adjust for this, states are allowed to introduce statistical confidence intervals in their AYP calculations. I'm not a psychometrician, but I understand that this is akin to calculating the margin of error for a poll. Doing this allows you to fairly include very small subgroups in your accountability system.
Kevin Carey at Ed Sector clearly has reservations about an accountability system that is so careful to prevent mislabeling of successful schools that it lets too many unsuccessful schools off the accountability hook. But even so, he describes this adjustment as having "merit" when done so that it creates 95 percent confidence that the school is actually performing within a given margin of error.
But as the number of students tested dwindles, the size of the margin of error expands. For very small subgroups, the margin of error can be quite large. So large, in fact, that the question of whether a state has a minimum subgroup N of 5 may, statistically speaking, may be a distinction without a difference.
Although the point of NCLB is school level accountability, given that schools vary so widely in population characteristics and sheer size, the basis for determining AYP is somewhat arbitrary and inefficient. It might make sense – for AYP purposes – to cluster small schools into larger units. This will give more valid information about the progress of students in subgroups in these schools.
Any fights that might erupt over how Small School A dragged down the cluster are not going to be that different than fights within larger schools about how Teacher A’s class dragged down the school. And, similarly, it might be logical to somehow divide big schools into smaller chunks with statistically valid populations to better focus those results.
Yes, this is arbitrary, but frankly it's not any more arbitrary than what is already happening. Ultimately, the underlying solution to the questions raised in the Eduwonk post might be to redistribute students so that school level governance is configured to optimize accountability. Or to accept that disaggregation of data, as important and generally beneficial as it has been, has limits under our current system.