Just how flat ARE those 12th grade NAEP scores?

Perhaps you read or heard that the 12th grade NAEP reading and math scores, which just got reported, were “flat“.

Did you wonder what that meant?

The short answer is: those scores have essentially not changed since they began giving the tests! Not for the kids at the top of the testing heap, not for those at the bottom, not for blacks, not for whites, not for hispanics.

No change, nada, zip.

Not even after a full dozen years of Bush’s looney No Child Left Behind Act, nor its twisted Obama-style descendant, Race to the Trough. Top.

I took a look at the official reports and I’ve plotted them here you can see how little effect all those billions spent on testing;  firing veteran teachers; writing and publishing new tests and standards; and opening thousands of charter schools has had.

Here are the tables:

naep 12th grade reading by percentiles over time

This first graph shows that other than a slight widening of the gap between the kids at the top (at the 90th percentile) and those at the bottom (at the 10th percentile) back in the early 1990s, there has been essentially no change in the average scores over the past two full decades.

I think we can assume that the test makers, who are professional psychometricians and not political appointees, tried their very best to make the test of equal difficulty every year. So those flat lines mean that there has been no change, despite all the efforts of the education secretaries of Clinton, Bush 2, and Obama. And despite the wholesale replacement of an enormous fraction of the nation’s teachers, and the handing over of public education resources to charter school operators.

naep 12th grade reading by group over time

 

This next graph shows much the same thing, but the data is broken down into ethnic/racial groups. Again, these lines are about as flat (horizontal) as you will ever see in the social sciences,

However, I think it’s instructive to note that the gap between, say, Hispanic and Black students on the one hand, and White and Asian students on the other, is much smaller than the gap between the 10th and 90th percentiles we saw in the very first graph: about 30 points as opposed to almost 100 points.
naep 12th grade math by percentiles over time

 

The third graph shows the  NAEP math scores for 12th graders since 2005, since that was the first time that the test was given. The psychometricians atNAEP claim there has been a :statistically significant” change since 2005 in some of those scores, but I don’t really see it. Being “statistically significant’ and being REALLY significant are two different things.

*Note: the 12th grade Math NAEP was given for the first time in 2005, unlike the 12th grade reading test.

naep 12th grade math by group over time

 

And here we have the same data broken down by ethnic/racial groups. Since 2009 there has been essentially no change, and there was precious little before that, except for Asian students.

Diane Ravitch correctly dismissed all of this as a sign that everything that Rod Paige, Margaret Spellings and Arne Duncan have done, is a complete and utter failure. Her conclusion, which I agree with, is that NCLB and RTTT need to be thrown out.

 

One of the The Things Wrong With Testing: They Are Invalid to Begin With!

A Test Writer Comments on New York’s Common Core Tests

by dianerav

This comment was posted yesterday:

I am a former, part time item writer for a private testing company; I wrote for many different state standards under NCLB. I must say that poorly constructed, confusing, or developmentally inappropriate items undermine the validity of standardized scores and subsequent use in teacher evaluation. When standardized tests are properly constructed, such items which might make it to a field test will almost certainly be vetted during what is typically a two year process. Many items on the Pearson math and ELA administered last April here in NY were written, in my opinion, in an intentionally confusing style using obtuse or arcane vocabulary. The ELA test in particular included confusing item stems and distractors that were not clearly wrong. There were far too many items that turned subjective opinions (most likely; best; author’s intent; etc.) into a “one right, three wrong” format. Many teachers were unsure of the correct answers on a number of vague and fuzzy items.
The math test included many items that were ridiculously convoluted. Although there may be other compelling arguments against VAM teacher evaluations, corrupt test writing, norm referencing (instead of criterion referenced scoring), and manipulating cut scores add up to a rather important set of reasons to invalidate the entire process.

Published in: on October 23, 2013 at 2:44 pm  Comments (2)  
Tags: , ,

Another Weekly Roundup of News on the Movement Against Testing Mania

This is from Bob Schaeffer at FairTest, as usual:

====================================

How much more evidence do policy makers need before they recognize that test-and-punish policies have failed? Learning gains have stagnated, progress toward closing the “achievement gap” has stalled, and their constituents increasingly reject the strategy. Even some of their strongest newspaper editorial page allies — including the New York Times and Los Angeles Times — are saying that it is time to look at alternative approaches.  Enough is enough!

Major National Survey Finds Parents Strongly Oppose Standardized Testing Misuse and Overuse
http://www.aft.org/newspubs/press/2013/072213b.cfm

Standardized Tests Take Over the School Day
http://www.dailykos.com/story/2013/07/23/1225947/-Standardized-tests-take-over-the-school-day#

High-Stakes Testing Leads to Increased Incarceration
http://www.therealnews.com/t2/index.php?option=com_content&task=view&id=31&Itemid=74&jumival=10458
See FairTest Infographic “How High-Stakes Testing Feeds the School-to-Prison Pipeline
http://fairtest.org/pipeline-infographic

Support for Common Core Testing Declines Dramatically
http://www.politico.com/story/2013/07/common-core-academic-standards-94628.html
see FairTest fact sheet: Common Core Assessments: More Tests But Not Much Better
http://fairtest.org/common-core-assessments-more-tests-not-much-better

ACLU Sues Rhode Island Over Grad Test Requirement
http://www.providencejournal.com/breaking-news/content/20130723-r.i.-aclu-plans-to-announce-lawsuit-over-necap-graduation-requirement.ece

Florida School Grading System a “Politically Manipulated Scam”
http://www.washingtonpost.com/blogs/answer-sheet/wp/2013/07/19/floridas-change-to-school-rating-system-called-scam/
Convoluted School Grading System Fails. Yet Jeb Bush’s Disciples Are Pushing It in Other States
http://www.tampabay.com/opinion/columns/convoluted-school-grading-system-fails-all/2131954

Erase to the Top: Tainted Philadelphia Scores Demonstrate Flaws of Test-Driven School “Reform”
http://www.citypaper.net/cover_story/Erase_to_the_Top.html

Widespread Test Cheating at New Orleans Charters and “Reconstruction” Schools
http://www.nola.com/education/index.ssf/2013/07/state_documents_detail_standar.html

Oklahoma (and Indiana) Seek Test-Company Penalties for Exam Screw-ups
http://newsok.com/oklahoma-educators-need-to-get-beyond-testing-troubles/article/3865122

Why Are Texas Test-Takers Being Held More Accountable Than Test-Makers
http://blog.chron.com/txpotomac/2013/07/commentary-amid-school-testing-scandal-why-are-test-takers-being-held-more-accountable-than-test-makers/ 

Standardized Testing’s Casualties — another excellent Letter to the Editor
http://www.latimes.com/news/opinion/letters/la-le-0721-sunday-standardized-tests-20130721,0,6581072.story

Minnesota Plans K-12 Testing “Without the Tears and Trauma”
http://minnesota.publicradio.org/display/web/2013/07/18/daily-circuit-state-testing

Focus on Test Scores Misses the Point on Urban Education
http://www.ctpost.com/opinion/article/Ann-Evans-de-Bernard-Focus-on-test-scores-misses-4662379.php

State Testing Makes Kids Hate School
http://www.mansfieldnewsjournal.com/article/20120527/LIFESTYLE/205270316/Drat-State-testing-makes-bright-kids-hate-school

Students Tell Policy-Makers: Stop Corporate Ed “Reform” and High-Stakes Testing
http://www.students4ourschools.org/educationrevolution.html

Key School “Reform” Questions Beg for Answers
http://www.washingtonpost.com/blogs/answer-sheet/wp/2013/07/18/key-questions-begging-for-answers-about-school-reform/

Five Basic Lessons on Public Education
http://www.washingtonpost.com/blogs/answer-sheet/wp/2013/07/19/five-basic-lessons-on-public-education-short-and-long-versions/

New Books on Assessment Reform
“The Mismeasure of Education” — http://www.infoagepub.com/products/The-Mismeasure-of-Education
“Left Behind in the Race to the Top” — http://www.infoagepub.com/products/Left-Behind-in-the-Race-to-the-Top

Bob Schaeffer, Public Education Director
FairTest: National Center for Fair & Open Testing
ph- (239) 395-6773  fax- (239) 395-6779
cell- (239) 699-0468
web- http://www.fairtest.org

If the tests by which all of education is to be measured by are garbage, then so are the results

On this blog I have reprinted examples of what I see are crappy test items and dissected them, hoping to show readers that those items neither made sense nor measured what they are purported to measure.

However, I never worked inside the testing industry itself, so I don’t have direct experience of making up BS test items on an industrial scale.* My own experience, however, is that EVERY test — no matter how good — has validity and reliability problems. This passage shows that the tests on which all US educational decisions are supposed to be based are, in fact, ridiculously badly made from the beginning, and cannot possibly measure what they pretend to measure, are unreliable, and thus utterly invalid.  (Plus the tests are snatching at least potentially valuable class time away from our students, while enabling a handful of big corporations like Pearson (more on which below) are raking in huge dividends because they control almost the entire education market.)

============================

This comes from an interview published by Diane Ravitch ( http://dianeravitch.net/2012/12/27/11990/ )

Rebecca Rubenstein: Since your book was published in 2009, has the “standardized” testing industry improved?

Todd Farley: Not the slightest bit. There was a story in The New York Times in 2001 about how test-scoring was a wildly out-of-control industry, which quotes various employees—not me!—as saying that they faced “too little time, too much to do, not enough people.” It implies the industry was doing a terribly suspect job. Since then, the industry is about a hundred times bigger, but those problems mentioned in the Times article or in my book have never been addressed. The industry has simply grown exponentially, and there are hundreds of millions of dollars to be earned by companies that are completely unregulated—to repeat, completely unregulated, so whatever Pearson et. al. tell us, we’re supposed to say “thank you very much” and just write them a staggeringly large check—but of course things haven’t gotten any better.

In my time in test-scoring, we never had enough temporary employees to do the work; we always had too much to do and too little time to do it; and there were always financial punishments looming over our heads if we didn’t get things done. We cut whatever corners we could to get it done (I’m sorry to say). Today the work load is a hundred times bigger and the money to be made is a hundred times bigger, but the system didn’t work to begin with and of course it doesn’t work now.

The same is true in the test development business. When I worked for one publisher as a test developer, it was always a madcap race to get tests written on time, and we faced absurd deadlines and pressure to do so. The reality is that quality was always secondary to the bottom line when developing tests, and then when the Common Core standards were introduced, and tests and products needed to be written for them, our deadlines became laughably absurd; I was once involved in the development of 200 tests in two months, which I think is literally more tests than ETS has produced in its entire existence. With the Common Core standards released, all the companies knew all the other companies were racing to finish their tests and products first, so quality became even worse than secondary. It became tertiary, or “fourthiary,” or whatever. Subcontractors who had been fired for poor work were rehired; item writers were hired off Craigslist; test developers with neither teaching experience nor test development experience were given full-time jobs. It’s important to remember that at the end of the day, companies like Pearson are for-profit enterprises. They want to make money. They want to make money, so of course they do a crappy job, because the quality of the work is never anywhere near as important as their desire to make a profit, and there’s always too much work and too little time to do it.

continue reading …

A comment: I was at first skeptical of the “200 tests” mentioned being more that the ETS has created in its entire existence. But I think he may be right: The SAT is essentially one, or two, or three tests, depending on how you look at it; it just gets revised a little bit each year. Reading, Math, and Writing. Plus, there perhaps a couple of score different Advanced Placement (AP) tests and Achievement tests in different subjects; they get revised every year, at least they do in the field of math (which I follow, of course) and others.

But what Pearson is doing now is essentially trying to replace the teacher in every single grade level, for every single course, by making the entire curriculum driven by the tests and pre-tests and practice tests and test prep material provided by them.  Yes, I do mean all of third grade. Yes, I do mean 6th grade science, music appreciation, and geography and PE. Every class. And if you count every single course or subject area that a student might be measured by from Pre-K-3 all the way up to graduating from high school, that might in fact be roughly 200 brand-new test series! Not just end-of-course tests, by no means. A different corporate multiple-choice test every month or two!

All this corporate educa-crap is just that: crap forced down the throat of public school kids and ONLY kids in public schools.

And it won’t improve a damned thing. Except for corporate bottom lines.

Of course the children or grandchildren of Michelle Rhee, Michael Bloomberg, Arne Duncan, Eli Broad, Bill Gates, the Koch brothers, and Barack Obama will never, ever be subjected to such a poor excuse for an education.

Oh, no.

That’s just for the poor black and latino and white kids who are in high-poverty regions; the only way they can opt out is to go to a charter school which might be doing any damned thing and is almost sure to be even more segregated than the nearest public school, if that’s even possible.

This is progress?

====================

* My students and I often found mistakes on tests and quizzes and assignments I made up. I used to congratulate the student and give him/her/them a point when they pointed out an error. ETS and Pearson’s responses have been rather different. Remember the famous talking pineapple question? And do you recall that essentially no-one has ever been able to explain, line by line, number by number, exactly how ANY single teacher’s VAM numbers were calculated? Has any school district ever released data showing how well VAM and supposedly ‘scientific’ classroom observation data correlate with each other? (Hint: they don’t!!)

Once again, let me urge the leadership of the Washington Teachers’ Union, and teacher unions elsewhere, to enlist a good statistician with his/her feet on the ground, and poke holes in VAM. It’s all a tissue of fabrications.

Reform the Tests! As they are, they don’t test anything important!

A brilliant article by Marion Brady, reprinted by Valerie Strauss at the Washington Post.

Brady points out that what we are actually testing with NCLB, RTTT and so in is worse than useless. What needs to happen is that the tests themselves need to be drastically changed in ways that actually teach higher-order thinking skills. I only quote a small  excerpt to try to get you to read the entire, well-reasoned article:

” If higher order thinking skills are tested, teachers will teach them. Those who don’t know how will quickly learn.

 Of course, Pearson, McGraw-Hill, Educational Testing Service, and other test manufacturers aren’t going to volunteer to test student-initiated higher order thinking skills. Neither are the politicians they help elect and re-elect going to make them even try to do so unless they think voters give them no alternative.

So voters should give them no alternative. Unless politicians and test manufacturers can make a convincing case for not teaching the young to think, they should be told what they’ve been telling teachers who say standardized tests are a waste of time and money: “No excuses!”

It’s likely that nothing short of binding agreements between states and test manufacturers will yield the new tests. To that end, in appropriate legal language, contracts should make clear that (a) every test question in every subject will evaluate a particular, named thinking skill, (b) every test will evaluate a balanced mix of all known thinking skills, and (c) a panel of experts not connected to test manufacturers or politicians will preview all test items to assure contract compliance. No excuses.

Fairtest, Parents Across America, United Opt Out National, and other state and local organizations have strategies in place to try to persuade. Petitions and referendums invite signers. Parents, grandparents — indeed, all who care about kids and country — should get on board.

 No more multimillion dollar checks for tests that no one but manufacturers are allowed to see. No more tests the pass-fail cut scores of which can be raised and lowered to make political points. No more kids labeled and discarded, every one with a brain wired to do all sorts of amazing things. If storing trivia in short-term memory doesn’t happen to be one of those things, that shouldn’t put them out of school and on the street.”

Signs of Backlash Against Excessive Student Testing — in Texas, of all places

Signs of change?

A number of parents, teachers, AND administrators in Texas, of all places, are beginning to pull out from, or protest against, the huge number of standardized machine-scored tests that they feel are sucking the life out of education. Or that’s what it describes in this article in the New York Times today 2/4/12.

A few excerpts:

In the Panhandle, the Hereford Independent School District superintendent may withhold her district’s test scores from the state. An Austin parent is considering a lawsuit to stop the rollout of the tests. Some legislators are mulling how to postpone some of the tests’ consequences for students.

In a high-level turnaround, Robert Scott, the commissioner of the Texas Education Agency, said Tuesday that student testing in the state had become a “perversion of its original intent” and that he looked forward to “reeling it back” in the future. Earning a standing ovation from an annual gathering of 4,000 educators that has given him chillier receptions in the past, Mr. Scott called for an accountability process that measured “every other day of a school’s life besides testing day.”

Many viewed the speech as a reversal for Mr. Scott, who has rarely spoken publicly against the role of standardized testing in public schools. He declined to talk about his remarks for this article.

“I think he sees that we are at a cusp of philosophical changes in the Legislature and across the state over what we’ve been doing the past few years with accountability and whether there’s been any worthwhile gain from all the testing we’ve done,” said Joe Smith, a former superintendent [...]

Kelli Moulton, the superintendent of Hereford I.S.D., is considering an outright rebellion. She said that she was still exploring the repercussions of refusing to send her students’ test scores to the agency but that she was encouraged by Mr. Scott’s remarks.

“We talk a lot, but nobody’s stepped off to do anything really bold,” she said. “Clearly now as a state, at least with a leader who is willing to say testing has gone too far, when do we put a stick in a wheel and say, that’s enough, stop? Because we are going to spend the next 10 years trying to slow that wheel down, and we’ve got 10 years of kids that are suffering.”

It also may be a sign of shifting political tides. [...]

What would it take to get a real public uprising against the destruction of our public school system? How do we organize a real movement in favor of having a free, publicly-funded and -run, enriching, engaging and useful education for all of our students?   

 

Published in: on February 4, 2012 at 6:41 pm  Comments (1)  
Tags: , , , ,

Lists of DC Public Schools With Suspiciously High Wrong-to-Right Erasure Rates


Please note that it was the TESTING COMPANY ITSELF that found all of these erasures over the past three years to be suspicious. They informed the State Superintendent of Education, who has exactly zero power in DC, so she asked Rhee and Henderson and their underlings to investigate. The latter group, naturally, stonewalled, and hushed the whole thing up.

As jaded and as cynical as some of you might think I am, I had no earthly idea that this evident fraud was so widespread. Naturally, Michelle Rhee has now said that she thinks that the USA Today investigation itself is an “insult” to teachers and students. I disagree. I think that everything that Michelle Rhee has done since she quit teaching in Baltimore has been an insult to parents, teachers, students, and honest administrators.

I think that the DC Inspector General’s office needs to investigate this fully and to indict the leaders who caused this fraud to happen.

There was a time that the IG office actually did their job and investigated serious crimes and official misdemeanors in DC; but apparently those days are over. In Atlanta, the investigation was far-reaching and has revealed widespread corruption. See this, this, this, and this. The way they got the ‘goods’ on the higher-ups was the usual: low-level teachers and counselors who did the cheating under orders or threats were offered immunity in exchange for truthful testimony.

Here is a link to the USA Today article. Lots of tables!

Is your school on the list?

By the way, classes were “flagged” as suspicious if the number of wrong-to-right erasures on a test was four or more full  standard deviations above the mean. Four standard deviations (or ‘four sigma’)  is a TREMENDOUSLY HUGE increase over the normal number of such erasures. For comparison, there are typically 2 or 3 such erasures on a single student’s test.

To put it in more familiar terms, think about the average adult man’s height in the USA: about 5 feet 9.5 inches. Anybody within about 3 inches of that (taller or shorter) is within one “sigma” of the mean – and this means, statistically, that about 68% of all adult males are between 5’7″ and 6’1″ – and that includes this writer.

So, ‘one sigma’ in terms of adult US male height is about 3 inches of variation in height.

To be over four sigma higher than the mean adult height is to be a giant.  Here is a little table that will allow you to look at what it means. When you look at it, also try to think about the fact that there are just about one hundred million adult males in the US. (100,000,000) So to be one of those 3,200 people who are four sigma above the mean is to be, basically, a freak of nature. And there are only 28 people who are five sigma above the mean. And there are only TWO humans in the entire WORLD (over six BILLION people) who are 6 sigma above the mean. (Source is here.)

Notice that four sigma above the mean (i.e., 6’7″ and higher) doesn’t even show up on this graph!

However, in DCPS, according to the testing company that scored the DC-CAS (not according to me!) we have HUNDREDS and HUNDREDS of classes where the AVERAGE number of erasures is FOUR SIGMA above the mean.

There is only one reasonable explanation for this situation.

I will soon post documents from the previous investigations, so you can look at them for yourself. Stay tuned!

Guess who’s scoring those ‘brief constructed responses’?

This was written by one of the scorers. You can find the entire article here.

(It’s called “The Loneliness of the Long-Distance Scorer.”)

“Test-scoring companies make their money by hiring a temporary workforce each spring, people willing to work for low wages (generally $11 to $13 an hour), no benefits, and no hope of long-term employment—not exactly the most attractive conditions for trained and licensed educators. So all it takes to become a test scorer is a bachelor’s degree, a lack of a steady job, and a willingness to throw independent thinking out the window and follow the absurd and ever-changing guidelines set by the test-scoring companies. Some of us scorers are retired teachers, but most are former office workers, former security guards, or former holders of any of the diverse array of jobs previously done by the currently unemployed. When I began working in test scoring three years ago, my first “team leader” was qualified to supervise, not because of his credentials in the field of education, but because he had been a low-level manager at a local Target.

“In the test-scoring centers in which I have worked, located in downtown St. Paul and a Minneapolis suburb, the workforce has been overwhelmingly white—upwards of 90 percent. Meanwhile, in many of the school districts for which these scores matter the most—where officials will determine whether schools will be shut down, or kids will be held back, or teachers fired—the vast majority are students of color. As of 2005, 80 percent of students in the nation’s twenty largest school districts were youth of color. The idea that these cultural barriers do not matter, since we are supposed to be grading all students by the same standard, seems far-fetched, to say the least. Perhaps it would be better to outsource the jobs to India, where the cultural gap might, in some ways, be smaller.

“Many test scorers have been doing this job for years—sometimes a decade or more. Yet these are the ultimate in temporary, seasonal jobs. The Human Resources people who interview and hire you are temps, as are most of the supervisors. In one test-scoring center, even the office space and computers were leased temporarily. Whenever I complained about these things, some coworker would inevitably say, “Hey, it beats working at Subway or McDonald’s.”

“True, but does it inspire confidence to know that, for the people scoring the tests at the center of this nation’s education policy, the alternative is working in fast food? Or to know that, because of our low wages and lack of benefits, many test scorers have to work two jobs—delivering newspapers in the morning, hustling off to cashier or waitress at night, or, if you’re me (and plenty of others like me) heading home to start a second shift of test scoring for another company?

“Company communications with test-scoring employees often feel like they have been lifted from a Kafka novel. Scorers working from home almost never talk to an actual human being. Pearson sends all its communications to home scorers via e-mail, now supplemented by automated phone calls telling you to check your inbox. After the start of a project, even these e-mails cease, and scorers are forced to check the project homepage on their own initiative to find out any important changes. Remarkably, for a company entrusted with assessing students’ educational performance, messages from Pearson contain a disturbing number of misspellings, incorrect dates, typos, and missing information. Pearson’s online video orientation, for example, warns scorers that they may face “civil lawshits” from sexual harassment. Error-free communications are rare. I was considering whether this was a fair assessment, when I received a message from Pearson with the subject “Pearson Fall 2010.” The link in the e-mail took me to a survey to find out my availability—for the spring of 2011.”

A rant concerning education

There is fraud in many, many realms of work and human enterprise. Including lawyers, doctors, businessmen, accountants, engineers, policemen, nurses, painters, taxi drivers, politicians, ‘reformers’, housewives, babies, children, students, the retired, stockholders, hunter-gatherers, soldiers, officers, spies, writers like me… (Sorry if I left out your favorite group; I got tired of typing this list) We are all sometimes crooked, no? Including some teachers.

But I think the problem is deeper. Yes, there is an awful lot of corruption and outright graft in education (as it is in many other areas). But I think that education and upbringing of the next generation is one of the most important things we can do. The last thing we really want is to have gangs of unemployed, disengaged kids hanging on street-corners, engaging in thuggish and criminal behavior, getting locked up for various offenses, engaging in violence and so on … regardless of whether their freaking math and reading test scores were ‘proficient’, ‘advanced’, ‘basic’, or ‘below basic’ – that’s not really important. What’s important is, are they becoming good human beings, or otherwise? And is it the sole job of the classroom teacher to fix all that? I don’t think he or she could if they tried. And, lord knows, they have been trying. And in the past 10 years they have been forced to work harder and harder, to no real human avail nor real improvement.

One could easily make the argument that we don’t spend nearly enough money on education. Heck, every single student should begin learning a foreign language soon after they learn to write their own. Plus, they should get really good coaching in some sort of physical endeavor (not necessarily a sport). Plus, they should all learn to play a musical instrument and to cook good food. And to appreciate good literature, music, and other cultures. And learn how to use various tools (metal, wood, software, and much, much more).

And to learn how society actually does function, and how it SHOULD work, why it works the way it does instead of the way it should, and to try to figure out ways from get from the actual present situation to an improved situation.

We are doing very little of any of this with our most underprivileged young society members. The kids who are raised in our ghettoes very seldom get to learn any of that stuff. Instead, society waits until they do something really, really wrong, and then locks them up. But it’s really, really expensive to keep someone locked up for 30 or 40 years – at about $20,000 per prisoner per year, that’s six hundred thousand to eight hundred thousand dollars ($600,000 to $800,000) per prisoner. It would have been a lot cheaper in the long run to invest in after-school programs to seriously engage students in sports, music, and much, much more, including lots of field trips to museums, zoos, mountains, beaches, factories, farms, and much, much more.

Instead, we are narrowing our educational goals more and more onto things that really don’t matter very much at all. (Have you actually LOOKED at the inane questions they ask on these dinky standardized NCLB tests? They were written by people who have absolutely no experience in the real world, or chose to ignore everything they ever learned about it.)

Whether DC-CAS scores go up or down at any school seems mostly to be random!

After reviewing the changes in math and reading scores at all DC public schools for 2006 through 2009, I have come to the conclusion that the year-to-year school-wide changes in those scores are essentially random. That is to say, any growth (or slippage) from one year to the next is not very likely to be repeated the next year.

Actually, it’s even worse than that.The record shows that any change from year 1 to year 2 is somewhat NEGATIVELY correlated to the changes between year 2 and year 3. That is, if there is growth from year 1 to year 2, then, it is a bit more likely than not that there will be a shrinkage between year 2 and year 3.  Or, if the scores got worse from year 1 to year 2, they there is a slightly better-than-even chance that the scores will improve the following year.

And it doesn’t seem to matter whether the same principal is kept during all three years, or whether the principals are replaced one or more times over the three-year period.

In other words, all this shuffling of principals (and teachers) and turning the entire school year into preparation for the DC-CAS seems to be futile. EVEN IF YOU BELIEVE THAT THE SOLE PURPOSE OF EDUCATION IS TO PRODUCE HIGH STANDARDIZED TEST SCORES. (Which I don’t.)

Don’t believe me? I have prepared some scatterplots, below, and you can see the raw data here as a Google Doc.

My first graph is a scatterplot relating the changes in percentages of students scoring ‘proficient’ or better on the reading tests from Spring 2006 to Spring 2007 on the x-axis, with changes in percentages of students scoring ‘proficient’ or better in reading from ’07 to ’08 on the y-axis, at DC Public Schools that kept the same principals for 2005 through 2008.

If there were a positive correlation between the two time intervals in question, then the scores would cluster mostly in the first and third quadrants. And that would mean that if scores grew from ’06 to ’07 then they also grew from ’07 to ’08; or if they went down from ’06 to ’07, then they also declined from ’07 to ’08.

But that’s not what happened. In fact, in the 3rd quadrant, I only see one school – apparently  M.C.Terrell – where the scores went down during both intervals. However, there are about as many schools in the second quadrant as in the first quadrant. Being in the second quadrant means that the scores declined from ’06 to ’07 but then rose from ’07 to ’08. And there appear to be about 7 schools in the fourth quadrant. Those are schools where the scores rose from ’06 to ’07 but then declined from ’07 to ’08.

I asked Excel to calculate a regression line of best fit between the two sets of data, and it produced the line that you see, slanted downwards to the right. Notice that R-squared is 0.1998, which is rather weak. If we look at R, the square root of R-squared, that’s the regression constant, my calculator gives me -0.447, which means again that the correlation between the growth (or decline) from ’06 to ’07 is negatively correlated to the growth (or decline) from ’07 to ’08 – but not in a strong manner.

OK. Well, how about during years ’07-’08-’09? Maybe Michelle Rhee was better at picking winners and losers than former Superintendent Janey? Let’s take a look at schools where she allowed the same principal to stay in place for ’07, ’08, and ’09:

Actually, this graph looks worse! There are nearly twice as many schools in quadrant four as in quadrant one! That means that there are lots of schools where reading scores went up between ’07 and ’08, but DECLINED from ’08 to ’09; but many fewer schools where the scores went up both years. In the second quadrant, I  see about four schools where the scores declined from ’07 to ’08 but then went up between ’08 and ’09. Excel again provided a linear regression line of best fit, and again, the line slants down and to the right. R-squared is 0.1575, which is low. R itself is about -0.397, which is, again, rather low.

OK, what about schools where a principal got replaced? If you believe that all veteran administrators are bad and need to be replaced with new ones with limited or no experience, you might expect to see negative correlations, but with positive overall outcomes; in other words, the scores should cluster in the second quadrant. Let’s see if that’s true. First, reading changes over the period 2006-’07-’08:

Although there are schools in the second quadrant, there are also a lot in the first quadrant, and I also see more schools in quadrants 3 and 4 than we’ve seen in the first two graphs. According to Excel, R-squared is extremely low: 0.0504, which means that R is about -0.224, which means, essentially, that it is almost impossible to predict what the changes would be from one year to the next.

Well, how about the period ’07-’08-’09? Maybe Rhee did a better job of changing principals then? Let’s see:

Nope. Once again, it looks like there are as many schools in quadrant 4 as in quadrant 1, and considerably fewer in quadrant 2. (To refresh your memory: if a school is in quadrant 2, then the scores went down from ’07 to ’08, but increased from ’08 to ’09. That would represent a successful ‘bet’ by the superintendent or chancellor. However, if a school is in quadrant 4, that means that reading scores went up from ’07 to ’08, but went DOWN from ’08 to ’09; that would represent a losing ‘bet’ by the person in charge.) Once again, the line of regression slants down and to the right.  The value of R-squared, 0.3115, is higher than in any previous scatterplot (I get R = -0.558) which is not a good sign if you believe that superintendents and chancellors can read the future.

Perhaps things are more predictable with mathematics scores? Let’s take a look. First, changes in math scores during ’06-’07-’08 at schools that kept the same principal all 3 years:

Doesn’t look all that different from our first Reading graph, does it? Now, math score changes during ’07-’08-’09, schools with the same principal all 3 years:

Again, a weak negative correlation. OK, what about schools where the principals changed at least once? First look at ’06-’07-‘-8:

And how about ’07-’08-’09 for schools with at least one principal change?

Again, a very weak negative correlation, with plenty of ‘losing bets’.

Notice that every single one of these graphs presented a weak negative correlation, with plenty of what I am calling “losing bets” – by which I mean cases where the scores went up from the first year to the second, but then went down from the second year to the third.

OK. Perhaps it’s not enough to change principals once every 3 or 4 years. Perhaps it’s best to do it every year or two? (Anybody who has actually been in a school knows that when the principal gets replaced frequently, then it’s generally a very bad sign. But let’s leave common sense aside for a moment.) Here we have scatterplots showing what the situation was, in reading and math, from ’07 through ’09, at schools that had 2 or more principal changes from ’06 to ’09:

and

This conclusion is not going to win me lots of friends among those who want to use “data-based” methods of deciding whether teachers or administrators keep their jobs, or how much they get paid. But facts are facts.

==============================================================================

A little bit of mathematical background on statistics:

Statisticians say that two quantities (let’s call them A and B) are positively correlated when an increase in one quantity (A)  is linked to an increase in the other quantity (B). An example might be a person’s height(for quantity A) and length of a person’s foot (for quantity B). Generally, the taller you are, the longer your feet are. Yes, there are exceptions, so these two things don’t have a perfect correlation, but the connection is pretty strong.

If two things are negatively correlated, that means that when one quantity (A) increases, then the other quantity (B) decreases. An example would be the speed of a runner versus the time it takes to run a given distance.  The higher the speed at which the athlete runs, the less time it takes to finish the race. And if you run at a lower speed, then it takes you more time to finish.

And, of course, there are things that have no correlation to speak of.

Published in: on March 13, 2010 at 3:37 pm  Comments (2)  
Tags: , , , , ,
Follow

Get every new post delivered to your Inbox.

Join 387 other followers