Kaya Henderson Really Doesn’t Know How to Run a School System

DCPS Chancellor Kaya Henderson told the city two days ago, “I want to be clear. We know what we need to do, and we have what it takes to get it done.”

That is patently untrue.

Even by her own yardsticks, namely test scores, Henderson and her kind of ‘reform’ has so far been a complete failure; Continuing the churn-and-test-prep regime won’t make it any better

As I wrote in a comment on the article in the Washington Post:

All of Henderson’s boasts of continuous progress are completely bogus. 
If you look at the scores on the DC-CAS for every single subgroup, you can see that they have stagnated since 2009, which was the year before Rhee, Kamras and Henderson implemented their trademark reforms (IMPACT, TLF, VAM “merit pay” and eliminating seniority protections for teachers). The gaps between white students and hispanic or black students have NOT narrowed since that time. There were some increases from 2006-2009, but it’s not clear how much of that was due to adults cheating, or simply because students and teachers were adapting to a brand-new test. (You may recall that the DC-CAS was administered for the very first time in 2006, and the percentages of kids deemed ‘proficient’ dropped quite a bit in comparison to what they were under the old test, especially in math.) 
Also: out of the 78 measurable goals set by Rhee and four large foundations, in order to earn that $64.5 million grant in 2009, the DCPS leadership has achieved a mere one and one-half of those goals (and I’m being generous with the one-half). That is a success rate of TWO PERCENT. 
In other words, Rhee and Henderson have an almost perfect record of failure, none of which is publicized by the media (esp. not WaPo editorial staff) but is easy to see if you look at the official OSSE statistics and are willing to dig a little bit.  
I’ve done some digging and have made some pretty easy-to-understand graphs showing how much Rhee and Henderson have failed. Look at my blog, gfbrandenburg.wordpress.com , and in particular at http://bit.ly/10mna8c , http://bit.ly/10mneEY , and http://bit.ly/1ptal1K . 
After you read those blog posts, can you explain to me why Kaya Henderson still has a job? It is so clear that mayoral control has been a complete failure!

Did Rheeformers Rhee and Henderson Actually Close Any of Those Achievement Gaps in DC Public Schools?

Part Sixteen and Final

Today we look at the black-white and hispanic-white achievement gaps in the Washington, DC public school system, which has now been under mayoral control for seven full years.

My four graphs and tables today will show how laughably pitiful their claims of success really are.

You will see that the achievement gap is pretty much unchanged since the year I retired (2009), but the gap between Rhee’s promises and reality has been getting wider and wider.

A lot of their promises had to do with closing the ‘achievement gaps’ between white and more-affluent students on the one hand, and black, white, and impoverished students on the other hand. As you probably are aware, standardized test scores are very strongly linked to family income and educational levels. You may not be aware that the white population of Washington DC is generally very well-educated.and fairly affluent (unlike rural white populations in, say, West Virginia or Kentucky). Washington has the highest-scoring white student body in the nation on the National Assessment of Educational Progress (NAEP), and the widest gap between the scores of white students and of hispanic or black students.

However, Michelle Rhee and her minions promised spectacular reductions in those gaps, as measured by the relative percentages of students scoring ‘proficient’ or ‘advanced’ on the DC-CAS among white students, hispanic students, black students, and students who are eligible for free and reduced-price lunches (versus those not eligible).

What I found is a complete and utter failure to make any progress whatsoever since 2008 or 2009 — the year that Rhee twisted the arms of every single principal in the school system to come up with miraculous gains, and when many of those principals (and teachers) engaged in cheating to boost the scores.

As usual, don’t just take my word for it. Look at the following four graphs and check my sources if you like.

With these graphs and tables, low numbers are GOOD because that means that the gap between white students on the one hand and black or hispanic students on the other is getting smaller. High numbers are BAD because the gap is getting bigger.

You will notice that each graph has a solid black line — that represents what really happened.

Each graph also has a dotted red line. It represents how much Rhee et al promised that things would improve.

I don’t exactly know what they were smoking when they made those promises, but it seems like they were hallucinating that by WILL alone, and by replacing all the veteran teachers and administrators with untrained, unqualified and inexperienced newbies from TFA or TNTP, they would achieve miracles.

Again, see for yourself.

First we look at the gaps between the scores of black and white students, in math, on the DC-CAS, from 2007-2014.

promised and actual math black-white gaps, 2007-2014

Since 2009, the year that Rhee and many principals were outed as cheaters by a lengthy series of reports in USAToday, you can see that there has in fact been no progress in closing the gap. The prediction is the red, dotted line. The actual performance is the black line, which is essentially horizontal after 2009.

Now let’s look at the black-white achievement gap in reading:

promised and actual reading black-white gaps, 2007-2014

In this case, the gap between the scores of black and white students — as shown by the solid black line — has actually been growing slightly wider since 2008! As in the previous graph, the totally imaginary promises of Rhee and Henderson are the red, dotted line – a line which got farther and farther away from the truth every single year.  Some accomplishment, Rhee and Henderson and Gray!

Thirdly, we look at the gaps between hispanic and white students in math:

promised and actual hispanic-white math gaps, 2007-2014

We see here that the black line has been wiggling up and down since 2009, with the result that the gap for 2014 is almost exactly the same as the gap in 2009, while we were promised miracles. Once again, there is a very important gap that is getting much wider: the gap between the prediction and reality.

My last table and graph for the day concerns the achievement gap for reading, between hispanic and white students.

promised and actual hispanic-white reading gaps, 2007-2014

As you can see, this achievement gap is now actually a bit wider than it was in either 2008 or 2009. And the gap between those promises and reality got steadily wider and wider.

Some people have told me that I’m being unfair, because Rhee and Henderson, under mayoral control, have been making tremendous progress in raising test scores and in closing the achievement gaps. I hope that this post sets the record straight: they have in fact made NO progress in closing the achievement gaps, and their predictions became more and more laughable as time went on.

Can someone explain to me why Kaya Henderson still has a job as chancellor of DC public schools?


This is my last post in this series of articles.

I’ve been examining the promised, miraculous gains that were promised in the troubled Washington, DC public school system to see whether any of those 78 promised goals were reached.

Rhee and Henderson actually accomplished one and a half out of that 78 goals.

It is true that there have been steady improvements on the scores of DCPS students (all groups) in math on the NAEP — but those improvements began in the 1990s, a decade before Mayor Adrian Fenty got the wacky idea of hiring a totally unqualified sociopathic liar (Michelle Rhee) as Chancellor. There were also some fairly large gains in DC-CAS test scores during the first two years it was given, but that’s normal. As far as I have seen, any time any school district adopts a new standardized test, students’ test scores plummet the first year, but then rise after a year or two, as the teachers and students get used to the new format.

The sources I used to compile this data are here and here. My fifteen previous posts on this topic can be found here:

The saga so far:

What I actually had time to say …

Since I had to abbreviate my remarks, here is what I actually said:

I am Guy Brandenburg, retired DCPS mathematics teacher.

To depart from my text, I want to start by proposing a solution: look hard at the collaborative assessment model being used a few miles away in Montgomery County [MD] and follow the advice of Edwards Deming.

Even though I personally retired before [the establishment of the] IMPACT [teacher evaluation system], I want to use statistics and graphs to show that the Value-Added measurements that are used to evaluate teachers are unreliable, invalid, and do not help teachers improve instruction. To the contrary: IVA measurements are driving a number of excellent, veteran teachers to resign or be fired from DCPS to go elsewhere.

Celebrated mathematician John Ewing says that VAM is “mathematical intimidation” and a “modern, mathematical version of the Emperor’s New Clothes.”

I agree.

One of my colleagues was able to pry the value-added formula [used in DC] from [DC data honcho] Jason Kamras after SIX MONTHS of back-and-forth emails. [Here it is:]

value added formula for dcps - in mathtype format

One problem with that formula is that nobody outside a small group of highly-paid consultants has any idea what are the values of any of those variables.

In not a single case has the [DCPS] Office of Data and Accountability sat down with a teacher and explained, in detail, exactly how a teacher’s score is calculated, student by student and class by class.

Nor has that office shared that data with the Washington Teachers’ Union.

I would ask you, Mr. Catania, to ask the Office of Data and Accountability to share with the WTU all IMPACT scores for every single teacher, including all the sub-scores, for every single class a teacher has.

Now let’s look at some statistics.

My first graph is completely random data points that I had Excel make up for me [and plot as x-y pairs].

pic 3 - completely random points

Notice that even though these are completely random, Excel still found a small correlation: r-squared was about 0.08 and r was about 29%.

Now let’s look at a very strong case of negative correlation in the real world: poverty rates and student achievement in Nebraska:

pic  4 - nebraska poverty vs achievement

The next graph is for the same sort of thing in Wisconsin:

pic 5 - wisconsin poverty vs achievement

Again, quite a strong correlation, just as we see here in Washington, DC:

pic 6 - poverty vs proficiency in DC

Now, how about those Value-Added scores? Do they correlate with classroom observations?

Mostly, we don’t know, because the data is kept secret. However, someone leaked to me the IVA and classroom observation scores for [DCPS in] SY 2009-10, and I plotted them [as you can see below].

pic 7 - VAM versus TLF in DC IMPACT 2009-10

I would say this looks pretty much no correlation at all. It certainly gives teachers no assistance on what to improve in order to help their students learn better.

And how stable are Value-Added measurements [in DCPS] over time? Unfortunately, since DCPS keeps all the data hidden, we don’t know how stable these scores are here. However, the New York Times leaked the value-added data for NYC teachers for several years, and we can look at those scores to [find out]. Here is one such graph [showing how the same teachers, in the same schools, scored in 2008-9 versus 2009-10]:

pic 8 - value added for 2 successive years Rubenstein NYC

That is very close to random.

How about teachers who teach the same subject to two different grade levels, say, fourth-grade math and fifth-grade math? Again, random points:

pic 9 - VAM for same subject different grades NYC rubenstein

One last point:

Mayor Gray and chancellors Henderson and Rhee all claim that education in DC only started improving after mayoral control of the schools, starting in 2007. Look for yourself [in the next two graphs].

pic 11 - naep 8th grade math avge scale scores since 1990 many states incl dc


pic 12 naep 4th grade reading scale scores since 1993 many states incl dc

Notice that gains began almost 20 years ago, long before mayoral control or chancellors Rhee and Henderson, long before IMPACT.

To repeat, I suggest that we throw out IMPACT and look hard at the ideas of Edwards Deming and the assessment models used in Montgomery County.

Recent Articles Against Race to the Trough and other Deformations of US Public Education

Bob Schaeffer of FairTest has been compiling weekly lists of good articles that give a view from ordinary schools and households on what it’s been like under NCLB and its successor, RTTT. Here’s Schaeffer’s latest list.   — gfb


Assessment reform pressure continued to escalate even as Hurricane Sandy slammed ashore.  Best wishes to our friends and allies in the mid-Atlantic states as they recover from the storm.

Arne Duncan’s Legacy: Doubling Down on High Stakes Testing Failures

Texas Tests Breed Schools for Scandal

Testing in Kindergarten — Whatever Happened to Story Time?

Hudson Valley Parents Rip Excess Testing

Data Missing for School Improvement Grant Claims

The MLK Imperative in an Era of “No Excuses”

Researchers Urge “Caution” in Use of Value-Added Scores

Measuring the Worth of a Teacher

The Naked Emporer: What Test Scores Don’t Tell Us

Superintendent Dissects Race to the Trough’s Flaws

Bob Schaeffer, Public Education Director
FairTest: National Center for Fair & Open Testing
ph-   (239) 395-6773    fax-  (239) 395-6779
cell-  (239) 699-0468
web- http://www.fairtest.org
Where the data came from

I neglected to give the source for the data for my last two posts. It’s at the website for what looks like a NYC radio or TV station:




if you prefer it shorter.

I will warn you that some of the spreadsheets are quite large.

BTW, I just now did a graph showing how well New York City does at predicting the value-added scores of its teachers for school year 2007-2008. The answer seems to be, not very well. Here is the scatter plot:

The correlation is, again, close to zero, even though NYC’s department of assessment and numerology has done their best to try to get it right. In fact, even though the line of best fit doesn’t fit very well, you notice that it slopes downwards to the right. That means that with kids who are predicted to improve relative to the previous year, teachers’ value-added scores are, in general, lower than predicted; whereas with kids who are predicted to do worse than the previous year, teachers’ value-added scores are, in general, a tad higher than predicted.

Not ready for prime time. And not ready to be used to base hiring and firing and bonus decisions on.

An Apology? And a look at the DQC website, as I should have done earlier

I’m thinking of asking the DQC leadership for an apology for not even allowing me to stay and observe their national summit last week.

Even though I had pre-registered for the conference.

Even if I were to solemnly promise to sit there like a good boy and not bother anybody by giving them a 2-page commentary I had written, I could not remain.

Nor even if I promised to raise my hand politely before either being ignored or called on, at their pleasure…

I guess they seriously consider me a dangerous crank. But you know something? There seem to be a lot of teachers in the classrooms or retired, old or young, as well as parents and students, who make pretty much the same points that I do. Often they make their points much more eloquently and clearly than me on their blogs, in replies to other folks’ comments, on Facebook, and elsewhere. My major (very small, but not without significance) contribution to the debate on NCLB and the future of education has been mostly to supply some data showing that the Educational DEform movement does not seem to be based on real data.

I am still working on the wording of the letter to DQC, and am trying to figure out what to ask for.* I sent out a couple of drafts, got considerable help from those folks who edited the drafts, and now need to somehow combine all the contributions and make my letter to DQC first of all a lot shorter than it is.

Was this official DQC policy that was spelled out that Guy Brandenburg is so special that he can’t come to any of our conferences? Or more broadly, anybody who disagrees with Arne Duncan or Michelle Rhee cannot attend? Or that anybody will be barred who raises the idea that there are serious problems with a lot of the data that is being collected, and that a lot of these mathematical models are built upon an utterly shaky and weak deck of cards, and give wrong results? (I refer to the sudden surge, across the nation, in “Value-added methods”.)

Or was it just someone local who really hates me? Gosh, I didn’t think I was so infamous. Should I be flattered?

Meanwhile, I began looking closer at the DQC site, which I had only glanced at earlier.

A lot of what I saw, or didn’t see, shocked me.

I had no idea that this group has been around for several years and has been busy attempting essentially to federalize/nationalize all data about schools and everything else, in all 50 states and all the counties and school boards and city councils and individual schools and teachers and classrooms. I think it’s kinda scary when anybody at all has that much information about you, AND THAT SO MUCH OF IT IS WRONG!

It’s clear now that there is a lot of fraudulent test data in a lot of schools and school districts across the country, in many cities and states. Some states have gone after the cheating problem with serious investigations that netted scores, or is it hundreds, of confessions, and long lists of chargees. A lot of folks who understand a little bit about human nature said, “I told you so. When people are put under pressure to keep their jobs by meeting impossible goals, then they cheat. People cheat to achieve large bonuses, as many bankers, politicans, and businessmen bear witness.” (Some do a little time in jail, but generally they get away with their riches with merely some bad headlines in the press for a week or so.) They also point out, “When you dangle rewards in front of people, you get lousy results. People do their best work because they really want to, deep down, not for a paycheck.” (Confirmed in many studies of learning and human response, and once again by Dr. Roland Fryer of Harvard, as I’ve mentioned earlier. Search this blog, top right of this page.)

And this fraud has sometimes devastating effects for honest teachers, and doesn’t help the kids at all. For example, suppose that teacher T has an average-achieving class as scored onthe NCLB test. The kids are promoted to the next grade, where they have teacher U. Teacher U, with or without the knowledge of anybody else, cheats, somehow or other, and raises the kids’ scores significantly above where they would have without the cheating. [there are many ways of doing it, I realized by doing some brainstorming with some other folks a while ago. Don't think I'm going to list them, at least not here. Never tried any of them!!!!!]

So, Teacher U earns a big fat bonus check, her name on certificates and programs, perhaps other benefits as well, like a promotion to master educator so she can get out of the classroom… Who knows?

Next year, those kids go to teacher V. Teacher V receives kids, and V eventually notices an enormous discrepancy between the kids’ achievements on tests made by the school district’s own publisher, given in the second week of school, way before teacher V had learned all of their mothers’ last names by heart or had had much time to have much of an impact on the kids at all. I mean, how could all of those kids go from “Advanced” or “Proficient” to “Basic” and “Below Basic” in the time from June 20 to about August 25? That sort of thing happens only in the magical world of high-stakes, corruptible testing as we have now.

Kids in V’s class who should have had an extra class in math or reading or whatever because they are so far behind, don’t receive those services, because they are supposedly well above average. Sometimes those extra classes really help, too.

So there is teacher V with a class of kids with contradictory data: last year’s data which showed wonderful scores, and all of this year’s data, starting with the very first official, district-wide standardized, multiple-choice, machine-scored commercially-produced and -marketed test; a test supposed to “inform” the instructor what he/she should instruct. Teacher V doesn’t cheat; instead he/she works his/her but off trying to use all that data to determine what to teach, how and when. But V still does not perform miracles. (We know that miracles are, at best, and by definition, rather rare.) In other words, V’s students score much lower on the NCLB test than they had done the previous year’s test would supposedly predict by the incomprehensibly complex mathematical data algorithm that the school district is using. There are many such models; they are not all the same.)

Teacher V is notified in late July that he/she is terminated because of those bad test scores.

Remember teacher U, who cheated? He or she might be the teacher leader for that grade level next year. And might show the others just how it’s done…

But there seems to be no awareness of this by our ‘friends’ at DQC.

We all know people who have lied to us. And we’ve all told lies before. All of us do it from time to time. We only disagree about exactly how much, and when we tell those lies, and whether they are justifiable or not. Why doesn’t DQC seem to emphasize the riskiness in all of this?

Curious, and knowing that there was no time to search through the enormous DQC website by mouse, I used their “search” engine to look for a few terms, to see if they even consider the concepts of fraud and cheating.

Here are the number of hits I got on their website:

Fraud – 0 hits. Zero.

Cheat – 1 hit; the reference pointed to a page on theft of social security numbers.

dishonest – 0 hits.

data – 750 hits on their website (that was just to make sure the search engine was working!)

erasure – 0.

erase – 0.

form – 400 (yup, it’s still working. They are very much into forms and other paperwork…)

erasing – 0 {I didn’t know if it would consider erasure, erase, and erasing differently. I still don’t, but none of them turned up anything}.

jeopardy – 0.

risk – 9 .

This last one is quite instructive. When I got those 9 hits from the word “risk”, one of the hits was the following, and I swear that I am not making this up:

“Guide to Protecting the Confidentiality of Personally Identifiable Information (PII).”

Not that that’s a bad thing, of course. Had you come across the acronym PII before? Wow. They’ve really been at this for a while, haven’t they? An entirely new set of acronyms…

The entire bureaucratic mind-set is stunning. Reminds me of nothing more than the US military and federal government bureaucracies, and I don’t know about you, but they scare me. I want as little as possible to do with them; and with that, my sympathies definitely lie with the people who rose up and fought a revolutionary war in this country for about 7 years and defeated the greatest naval and military power in the world at that time.  They didn’t want something like Big Brother watching over them. Not just the right-wingers want some privacy! I mean, I think everybody has a right to their own private life, and unless you’re the type of weirdo that likes everybody to know everything about you, there’s a lot that I, and just about all of you as well, want to keep private.

After all, for one thing, medical conditions can sometimes cause really enormous financial problems (like going bankrupt) if the wrong people find out about them — namely insurance companies. (Irony of ironies!)

Now some of the stuff on the DQC website seems very reasonable and logical, and I find no objection to at all. Individually.

But it seems to me that they really think that putting all of the quantifiable data on every person into a single database will immediately produce policy conclusions that must be promptly put into effect.

Not so fast.

What if the data you are using is utterly fraudulent in the fist place?

Wouldn’t that then be like the phony data described in various books and stories by Gulag prisoners and others in the USSR or in Kafka stories – perverse incentives to do useless things to survive by cheating. (I don’t think Kafka ever quite figured out how some people can prosper and live quite beautiful lives of luxury by cheating others.)

Are those formulas really valid? After all, you aren’t using all of that data in those conclusions you just drew. You cherry-picked the numbers you wanted to get the result you wanted using a formula of your own choosing, producing the result in a clearly visible way, or a way that appeals to certain decision-makers.

What if we use different data, different formulas, and throw out stuff that we think is invalid? Well, we’d probably get different results. And what about all of those internal TFA survey documents that apparently show no positive results, or else they would have been trumpeted in the national media?

Instead, you could use this as a data base for social “scientists” to examine and try a number of different models, and see whether stuff works. But unlike with drug or medical companies, we need to publicize the results of experiments with NEGATIVE results as well. (BTW: it’s not only companies like Glaxo or Pfizer that do this sort of selective data blackout.  I remember looking at the NIH alternative medicine website a few years ago to see what the results were for the various experiments that NIH had paid good money to fund – you know, on acupuncture and other stuff like that. I did this search many years after the studies had started. But NONE of them had any published results.

Does that mean that they were all incompetent and couldn’t finish their studies?

In that case, they should pay us (i.e., the public in the form of NIH) back with interest or penalties or something.

Or does it mean that the results weren’t published because they did not show what the authors WANTED them to show?

I suspect the latter.

It also really began to bother me that so many person-years of highly skilled, highly-paid labor had been poured into this project, linking up pretty much all of the data on every single person in every state, county, city, and every single teacher and school and every student, just about completely out of the limelight. For lots and lots of money, in other words, some of it privately raised, but I’ll bet there is a lot of taxpayer money in there as well in one way or another.

Keep in mind that a relative handful of testing-and-data companies will earn enormous revenues from processing all of this enormous database of data .

While none of us citizens have had a say on this at all.

And while so much of the data is flat out WRONG.

Which seems to be utterly ignored.

Be for real: In the entire Data Quality Campaign, is there no written acknowledgment that there is cheating and erasures on high-stakes tests? No procedures to discount and correct for that? No calls for additional forensic studies of cheating patterns?

Answer: No. The word “forensic” returned no hits at all on their website.

That’s pretty disturbing.

Or did I miss something?


* no, not for money! I think that the views of teachers, parents and students in the trenches, so to speak, should be heard, and the problems with fraud and misrepresentation in test data are very serious and must be discussed. Roland Fryer’s conclusions also need to be aired. {and I speak as someone who has disagreed with him, in print, in the past.}

