Education Proposals in Tonight's State of the Union
January 23, 2007 02:16 PM
The White House Web site has information about the education provisions in tonight's State of the Union address. We'll update after we've taken a look at them.
UPDATE: The supporting documents for the president's state of the union proposals for education include the misleading claim about an increase in test scores that occurred between 1999 and 2004. Most of that time period, and presumably most of the gains (though there's no way of knowing), took place before NCLB was enacted and long before it began to have an impact on students.



Comments
Serious discussions of trends in NAEP must take into account critical changes in the way in which numbers were calculated, as described in NCES reports for the 2005 reading and math. Comparisons of 2005 data (and even 2002 or 2003) to previous years are simply not valid. Consider three major changes since 1998, two since NCLB was passed. Data and pages from the reading report (NCES 2006-451) are cited, but the same observations are true in the math data (NCES 2006-453).
1. Adding DOD schools to reporting in 2005 invalidates observations of "trends" in the achievement gap.
People citing NAEP tell us that the racial gap has narrowed. Page 6 of the 2005 reading report shows that differences between Whites and African-Americans and for Whites and Hispanics were smaller in 2005 for grade 4; page 7 shows the same for grade 8, although there is no real narrowing for White/African-American comparisons. In 2005, for the first time, DOD schools were counted in results for the national sample. DOD schools include a relatively high percent of African-American and Hispanic students and their scores are significantly higher than the national average, higher than virtually any individual state. Page 37 shows the data for grade 4 and page 43 shows the data for grade 8. The 2005 DOD schools outperform the national average in many ethnic categories, but the differences are far more pronounced for minority students. Grade 4 2005 DOD Whites scored 4 points higher than the national average, but African-Americans scored 19 points higher and Hispanics scored 18 points higher. Grade 8 2005 DOD Whites scored 7 points higher than the national average, but African-Americans scored 16 points higher and Hispanics scored 23 points higher. Given that these schools were included in 2005 for the first time (check the footnotes), wouldn't these data invalidate achievement gap comparisons to 2002/2003 data or before then?
2. Increases in the number of eligible students who were actually provided with accommodations has risen steadily since 1998 and should account for some improvement in scores over time.
From page 32, "... beginning with the 2002 reading assessment, NAEP would permit the use of accommodations." These accommodations were piloted in 1998 and 2000 and the report says that the 1998 data have been adjusted to reflect accommodations, but the actual effects of allowing accommodations can only be seen in table A-3 of the report (page 34). A careful look at the data on accommodations shows that the percent of students completely EXCLUDED has remained almost perfectly flat between 1998 and 2005. But the number of eligible students permitted accommodations, consistent with the policy to allow this, has increased for all accommodated groups, rising steadily with only one minor bump here or there from 1998 to 2005. This is true for every category, every grade, both reading and math. Assuming that the accommodations were necessary and performance without the accommodation was actually an underestimate of achievement, the rising USE of accommodations should result in higher scores for those students. And they are not an insignificant number of students. What was the effect of the increasing trend in USE of accommodations on average scores and percent passing over time since NCLB? If the accommodations did not result in better performance, why were accommodations allowed or necessary at all? If performance was better with accommodations, what part of the differences in scores between 2002 and 2005 can be attributed to this change in policy?
3. Changes in the way the national figures are calculated make trend lines reported here pre-2002 and post-2002 invalid.
Changes in the way the national data were calculated not only have resulted in smaller standard errors since NCLB, as the report describes, but differences in percent proficient and average scores. From page 33: "Beginning in 2002, the national samples have been derived from the sum of all of the state samples, instead of from a separate and smaller nationally representative sample." In other words, they changed the way state data were weighted and new numbers simply are not comparable to the old. The only large jump in scores in the past 10 years took place when they changed the way the national data were calculated after NCLB legislation was passed. The change took place between 1998/2000 and 2002/2003 data, where a large jump is reported, after which scores stabilize again. Page 3 shows the grade 4 and grade 8 scores from 2002 to 2005 (G4: 219, 218, 219; G8: 264, 263, 262). Of course, legislation "mandated" participation in NAEP, after which samples were much larger. However, in other NCES NAEP reports, the authors revised previous years' data to reflect current weighting of the sample. If that was done here, it is not mentioned.
The federal government is putting a great deal of trust in these data and claiming evidence that NCLB is working. They have a test to prove it. There is a lot we can learn from NAEP.
Posted by: NAEP Comments | January 23, 2007 09:18 PM
NAEP Comments,
If you look back in our previous posts on this, you'll see we're referring to the NAEP long-term trends assessments. Since the Bush administration's claims refer to "9-year-olds" rather than 4th-graders, the claim must be based on NAEP trends results.
It seems to me you're referring to the regular NAEP tests, which are updated from time to time. The NAEP trends tests, to my knowledge, have been administered in a consistent way for decades.
Posted by: John at AFT | January 23, 2007 10:07 PM
The NAEP trends report emphasizes that the results are comparable over time, but there are indeed several critical changes with the 2004 analyses. For example, for the first time with NAEP trends, the data were calculated using nonpoststratified sample weights (p. 107). This was done for the bridge tests that provided the data for trend reporting. When a similar change was made with the Science tests in 2005 (NCES2006-466, p. 38), the previous years' data were recalculated because the weighting of the samples resulted in different estimates. The different methods for weighting were not comparable, so they re-calculated the previous data. There is no indication that a similar re-stating of previous data was done here.
This is not to mention the fact that, with the NAEP trends test in 2004, they actually changed the primary sample unit to stay within state physical boundaries (p. 97) and for the first time used ONLY age-eligible students (p. 93).
DOD schools were also included in the NAEP trends data (p. 112) in 2004, although they may have been all along.
As the NAEP trends report notes, "Several changes were made to the long-term trend assessments in 2004 to align it with current assessment practices and policies applicable to the NAEP main assessments." (viii.) Those include changes to the "scoring and scaling" (p. 2 and p.67). Some of those changes were made to the modified form, which was not used as the standard here in the trend data. Some were made to both the reported bridge and modified tests.
Tests should be updated to reflect scientific advances in measurement, but it is difficult to know if a child has grown an inch when you keep changing the yardstick.
Posted by: NAEP | January 24, 2007 04:32 PM