So, as a requirement to get my credential to teach Secondary Science, I'm taking an online methods course from the UCLA extension. So far its been really fascinating. This week's reading has been on the subject of standardized testing.
Another difficulty stemming from the profusion of curricular aims that customized standards-based tests try to assess is that the tests themselves can’t possibly include a sufficient number of items to shed light on a students’ status regarding the curricular aims that do make the cut in a given year. More often than not, there’s only one item, or perhaps two, to assess students’ achievement of a specific curricular aim. Teachers really can’t get an accurate fix on a student’s mastery of a curricular aim based on only one or two items. Typically, the developers of customized standards-based accountability tests report students’ performances at more general levels of aggregation: amalgamations of students’ performance with respect to collections of a variety of curricular aims, some of them very different. This is what you see when accountability results are reported at the “strand” level or at the “content standard” level. But this kind of reporting obfuscates which curricular aims students have mastered.
Excerpted from “Transformative Assessment,” by W. James Popham, Copyright 2008, Association for Supervision and Curriculum Development, Alexandria, VA.
I'm not sure if I did this right, so someone correct me if I'm wrong. But what I did was create a hypothetical sample class of 10 students, (A-J). I showed the results of 10 concepts covered in class, and their respective true performance on each. They ranged from one student (A) getting 90% correct, to another (J) only getting 50%.
(click on image for larger view)
But as the reading pointed out, there is simply not enough time to test for every concept, and so they sample. Even if they try and target important, representative skills, you're still stuck with the problem of getting a poor reflection of what was taught in the classroom, much less that student's particular knowledge.
In my sketch, student B happened to miss one of the sampled concepts, and so scored a 50%, even though he was 80% mastery over all. And student J, only 50% over all, happened to get the sampled concepts correct, and so was scored 100%.
The author goes on to argue that what standardized tests end up showing is a student's SES (socioeconomic status) and inherited aptitudes. Which would explain high levels of correlation between test scores and demographics. To me this is basically what NCLB has highlighted, by making such a huge deal out of test scores with the public: that there are wide gaps in academic achievement across demographics.
Unfortunately, the take away for many here is simply that teachers are doing a terrible job. All you have to do is look at test scores as a measure of teacher performance and it looks like you have some really bad teaching going on. But of course the reality is that there are numerous other factors involved, and that while bad teaching is bad teaching, it is apples and oranges. I liken it to measuring the performance of two race car drivers, one in a Ferrari, and one in a Yugo. This would seem a terrible way to describe students (as cars), but what we are really comparing is the craftsmanship and materials used to create the individual. A student can be made into a Ferrari or a Yugo over the course of his life. But once he gets to the teacher, most of the raw materials are in place, and there is only so much the average teacher can do to get him up to speed.
Just a thought.