What did Education Reform in DC Actually Mean?

Short answer: nothing that would actually help students or teachers. But it’s made for well-padded resumes for a handful of insiders.

This is an important review, by the then-director of assessment. His criticisms echo the points that I have been making along with Mary Levy, Erich Martel, Adell Cothorne, and many others.

Nonpartisan Education Review / Testimonials

Access this testimonial in .pdf format

Looking Back on DC Education Reform 10 Years After, 

Part 1: The Grand Tour

Richard P Phelps

Ten years ago, I worked as the Director of Assessments for the District of Columbia Public Schools (DCPS). My tenure coincided with Michelle Rhee’s last nine months as Chancellor. I departed shortly after Vincent Gray defeated Adrian Fenty in the September 2010 DC mayoral primary

My primary task was to design an expansion of that testing program that served the IMPACT teacher evaluation system to include all core subjects and all grade levels. Despite its fame (or infamy), the test score aspect of the IMPACT program affected only 13% of teachers, those teaching either reading or math in grades four through eight. Only those subjects and grade levels included the requisite pre- and post-tests required for teacher “value added” measurements (VAM). Not included were most subjects (e.g., science, social studies, art, music, physical education), grades kindergarten to two, and high school.

Chancellor Rhee wanted many more teachers included. So, I designed a system that would cover more than half the DCPS teacher force, from kindergarten through high school. You haven’t heard about it because it never happened. The newly elected Vincent Gray had promised during his mayoral campaign to reduce the amount of testing; the proposed expansion would have increased it fourfold.

VAM affected teachers’ jobs. A low value-added score could lead to termination; a high score, to promotion and a cash bonus. VAM as it was then structured was obviously, glaringly flawed,[1] as anyone with a strong background in educational testing could have seen. Unfortunately, among the many new central office hires from the elite of ed reform circles, none had such a background.

Before posting a request for proposals from commercial test developers for the testing expansion plan, I was instructed to survey two groups of stakeholders—central office managers and school-level teachers and administrators.

Not surprisingly, some of the central office managers consulted requested additions or changes to the proposed testing program where they thought it would benefit their domain of responsibility. The net effect on school-level personnel would have been to add to their administrative burden. Nonetheless, all requests from central office managers would be honored. 

The Grand Tour

At about the same time, over several weeks of the late Spring and early Summer of 2010, along with a bright summer intern, I visited a dozen DCPS schools. The alleged purpose was to collect feedback on the design of the expanded testing program. I enjoyed these meetings. They were informative, animated, and very well attended. School staff appreciated the apparent opportunity to contribute to policy decisions and tried to make the most of it.

Each school greeted us with a full complement of faculty and staff on their days off, numbering a several dozen educators at some venues. They believed what we had told them: that we were in the process of redesigning the DCPS assessment program and were genuinely interested in their suggestions for how best to do it. 

At no venue did we encounter stand-pat knee-jerk rejection of education reform efforts. Some educators were avowed advocates for the Rhee administration’s reform policies, but most were basically dedicated educators determined to do what was best for their community within the current context. 

The Grand Tour was insightful, too. I learned for the first time of certain aspects of DCPS’s assessment system that were essential to consider in its proper design, aspects of which the higher-ups in the DCPS Central Office either were not aware or did not consider relevant. 

The group of visited schools represented DCPS as a whole in appropriate proportions geographically, ethnically, and by education level (i.e., primary, middle, and high). Within those parameters, however, only schools with “friendly” administrations were chosen. That is, we only visited schools with principals and staff openly supportive of the Rhee-Henderson agenda. 

But even they desired changes to the testing program, whether or not it was expanded. Their suggestions covered both the annual districtwide DC-CAS (or “comprehensive” assessment system), on which the teacher evaluation system was based, and the DC-BAS (or “benchmarking” assessment system), a series of four annual “no-stakes” interim tests unique to DCPS, ostensibly offered to help prepare students and teachers for the consequential-for-some-school-staff DC-CAS.[2]

At each staff meeting I asked for a show of hands on several issues of interest that I thought were actionable. Some suggestions for program changes received close to unanimous support. Allow me to describe several.

1. Move DC-CAS test administration later in the school year. Many citizens may have logically assumed that the IMPACT teacher evaluation numbers were calculated from a standard pre-post test schedule, testing a teacher’s students at the beginning of their academic year together and then again at the end. In 2010, however, the DC-CAS was administered in March, three months before school year end. Moreover, that single administration of the test served as both pre- and post-test, posttest for the current school year and pretest for the following school year. Thus, before a teacher even met their new students in late August or early September, almost half of the year for which teachers were judged had already transpired—the three months in the Spring spent with the previous year’s teacher and almost three months of summer vacation. 

School staff recommended pushing DC-CAS administration to later in the school year. Furthermore, they advocated a genuine pre-post-test administration schedule—pre-test the students in late August–early September and post-test them in late-May–early June—to cover a teacher’s actual span of time with the students.

This suggestion was rejected because the test development firm with the DC-CAS contract required three months to score some portions of the test in time for the IMPACT teacher ratings scheduled for early July delivery, before the start of the new school year. Some small number of teachers would be terminated based on their IMPACT scores, so management demanded those scores be available before preparations for the new school year began.[3] The tail wagged the dog.

2. Add some stakes to the DC-CAS in the upper grades. Because DC-CAS test scores portended consequences for teachers but none for students, some students expended little effort on the test. Indeed, extensive research on “no-stakes” (for students) tests reveal that motivation and effort vary by a range of factors including gender, ethnicity, socioeconomic class, the weather, and age. Generally, the older the student, the lower the test-taking effort. This disadvantaged some teachers in the IMPACT ratings for circumstances beyond their control: unlucky student demographics. 

Central office management rejected this suggestion to add even modest stakes to the upper grades’ DC-CAS; no reason given. 

3. Move one of the DC-BAS tests to year end. If management rejected the suggestion to move DC-CAS test administration to the end of the school year, school staff suggested scheduling one of the no-stakes DC-BAS benchmarking tests for late May–early June. As it was, the schedule squeezed all four benchmarking test administrations between early September and mid-February. Moving just one of them to the end of the year would give the following year’s teachers a more recent reading (by more than three months) of their new students’ academic levels and needs.

Central Office management rejected this suggestion probably because the real purpose of the DC-BAS was not to help teachers understand their students’ academic levels and needs, as the following will explain.

4. Change DC-BAS tests so they cover recently taught content. Many DC citizens probably assumed that, like most tests, the DC-BAS interim tests covered recently taught content, such as that covered since the previous test administration. Not so in 2010. The first annual DC-BAS was administered in early September, just after the year’s courses commenced. Moreover, it covered the same content domain—that for the entirety of the school year—as each of the next three DC-BAS tests. 

School staff proposed changing the full-year “comprehensive” content coverage of each DC-BAS test to partial-year “cumulative” coverage, so students would only be tested on what they had been taught prior to each test administration.

This suggestion, too, was rejected. Testing the same full-year comprehensive content domain produced a predictable, flattering score rise. With each DC-BAS test administration, students recognized more of the content, because they had just been exposed to more of it, so average scores predictably rose. With test scores always rising, it looked like student achievement improved steadily each year. Achieving this contrived score increase required testing students on some material to which they had not yet been exposed, both a violation of professional testing standards and a poor method for instilling student confidence. (Of course, it was also less expensive to administer essentially the same test four times a year than to develop four genuinely different tests.)

5. Synchronize the sequencing of curricular content across the District. DCPS management rhetoric circa 2010 attributed classroom-level benefits to the testing program. Teachers would know more about their students’ levels and needs and could also learn from each other. Yet, the only student test results teachers received at the beginning of each school year was half-a-year old, and most of the information they received over the course of four DC-BAS test administrations was based on not-yet-taught content.

As for cross-district teacher cooperation, unfortunately there was no cross-District coordination of common curricular sequences. Each teacher paced their subject matter however they wished and varied topical emphases according to their own personal preference.

It took DCPS’s Chief Academic Officer, Carey Wright, and her chief of staff, Dan Gordon, less than a minute to reject the suggestion to standardize topical sequencing across schools so that teachers could consult with one another in real time. Tallying up the votes: several hundred school-level District educators favored the proposal, two of Rhee’s trusted lieutenants opposed it. It lost.

6. Offer and require a keyboarding course in the early grades. DCPS was planning to convert all its testing from paper-and-pencil mode to computer delivery within a few years. Yet, keyboarding courses were rare in the early grades. Obviously, without systemwide keyboarding training in computer use some students would be at a disadvantage in computer testing.

Suggestion rejected.

In all, I had polled over 500 DCPS school staff. Not only were all of their suggestions reasonable, some were essential in order to comply with professional assessment standards and ethics. 

Nonetheless, back at DCPS’ Central Office, each suggestion was rejected without, to my observation, any serious consideration. The rejecters included Chancellor Rhee, the head of the office of Data and Accountability—the self-titled “Data Lady,” Erin McGoldrick—and the head of the curriculum and instruction division, Carey Wright, and her chief deputy, Dan Gordon. 

Four central office staff outvoted several-hundred school staff (and my recommendations as assessment director). In each case, the changes recommended would have meant some additional work on their parts, but in return for substantial improvements in the testing program. Their rhetoric was all about helping teachers and students; but the facts were that the testing program wasn’t structured to help them.

What was the purpose of my several weeks of school visits and staff polling? To solicit “buy in” from school level staff, not feedback.

Ultimately, the new testing program proposal would incorporate all the new features requested by senior Central Office staff, no matter how burdensome, and not a single feature requested by several hundred supportive school-level staff, no matter how helpful. Like many others, I had hoped that the education reform intention of the Rhee-Henderson years was genuine. DCPS could certainly have benefitted from some genuine reform. 

Alas, much of the activity labelled “reform” was just for show, and for padding resumes. Numerous central office managers would later work for the Bill and Melinda Gates Foundation. Numerous others would work for entities supported by the Gates or aligned foundations, or in jurisdictions such as Louisiana, where ed reformers held political power. Most would be well paid. 

Their genuine accomplishments, or lack thereof, while at DCPS seemed to matter little. What mattered was the appearance of accomplishment and, above all, loyalty to the group. That loyalty required going along to get along: complicity in maintaining the façade of success while withholding any public criticism of or disagreement with other in-group members.

Unfortunately, in the United States what is commonly showcased as education reform is neither a civic enterprise nor a popular movement. Neither parents, the public, nor school-level educators have any direct influence. Rather, at the national level, US education reform is an elite, private club—a small group of tightly-connected politicos and academicsa mutual admiration society dedicated to the career advancement, political influence, and financial benefit of its members, supported by a gaggle of wealthy foundations (e.g., Gates, Walton, Broad, Wallace, Hewlett, Smith-Richardson). 

For over a decade, The Ed Reform Club exploited DC for its own benefit. Local elite formed the DC Public Education Fund (DCPEF) to sponsor education projects, such as IMPACT, which they deemed worthy. In the negotiations between the Washington Teachers’ Union and DCPS concluded in 2010, DCPEF arranged a 3 year grant of $64.5M from the Arnold, Broad, Robertson and Walton Foundations to fund a 5-year retroactive teacher pay raise in return for contract language allowing teacher excessing tied to IMPACT, which Rhee promised would lead to annual student test score increases by 2012. Projected goals were not metfoundation support continued nonetheless.

Michelle Johnson (nee Rhee) now chairs the board of a charter school chain in California and occasionally collects $30,000+ in speaker fees but, otherwise, seems to have deliberately withdrawn from the limelight. Despite contributing her own additional scandalsafter she assumed the DCPS Chancellorship, Kaya Henderson ascended to great fame and glory with a “distinguished professorship” at Georgetown; honorary degrees from Georgetown and Catholic Universities; gigs with the Chan Zuckerberg Initiative, Broad Leadership Academy, and Teach for All; and board memberships with The Aspen Institute, The College Board, Robin Hood NYC, and Teach For America. Carey Wright is now state superintendent in Mississippi. Dan Gordon runs a 30-person consulting firm, Education Counsel that strategically partners with major players in US education policy. The manager of the IMPACT teacher evaluation program, Jason Kamras, now works as Superintendent of the Richmond, VA public schools. 

Arguably the person most directly responsible for the recurring assessment system fiascos of the Rhee-Henderson years, then Chief of Data and Accountability Erin McGoldrick, now specializes in “data innovation” as partner and chief operating officer at an education management consulting firm. Her firm, Kitamba, strategically partners with its own panoply of major players in US education policy. Its list of recent clients includes the DC Public Charter School Board and DCPS.

If the ambitious DC central office folk who gaudily declared themselves leading education reformers were not really, who were the genuine education reformers during the Rhee-Henderson decade of massive upheaval and per-student expenditures three times those in the state of Utah? They were the school principals and staff whose practical suggestions were ignored by central office glitterati. They were whistleblowers like history teacher Erich Martel who had documented DCPS’ student records’ manipulation and phony graduation rates years before the Washington Post’s celebrated investigation of Ballou High School, and was demoted and then “excessed” by Henderson. Or, school principal Adell Cothorne, who spilled the beans on test answer sheet “erasure parties” at Noyes Education Campus and lost her job under Rhee. 

Real reformers with “skin in the game” can’t play it safe.

The author appreciates the helpful comments of Mary Levy and Erich Martel in researching this article. 

Access this testimonial in .pdf format

People are Not Cattle!

This apparently did not occur to William Sanders.

He thought that statistical methods that are useful with farm animals could also be used to measure effectiveness of teachers.

I grew up on a farm, and as both a kid and a young man I had considerable experience handling cows, chickens, and sheep. (These are generic critter photos, not the actual animals we had.)

I also taught math and some science to kids like the ones shown below for over 30 years.

guy teaching  deal students

Caring for farm animals and teaching young people are not the same thing.


As the saying goes: “Teaching isn’t rocket science. It’s much harder.”

I am quite sure that with careful measurements of different types of feed, medications, pasturage, and bedding, it is quite possible to figure out which mix of those elements might help or hinder the production of milk and cream from dairy cows. That’s because dairy or meat cattle (or chickens, or sheep, or pigs) are pretty simple creatures: all a farmer wants is for them to produce lots of high-quality milk, meat, wool, or eggs for the least cost to the farmer, and without getting in trouble.

William Sanders was well-known for his statistical work with dairy cows. His step into hubris and nuttiness was to translate this sort of mathematics to little humans. From Wikipedia:

“The model has prompted numerous federal lawsuits charging that the evaluation system, which is now tied to teacher pay and tenure in Tennessee, doesn’t take into account student-level variables such as growing up in poverty. In 2014, the American Statistical Association called its validity into question, and other critics have said TVAAS should not be the sole tool used to judge teachers.”

But there are several problems with this.

  • We  don’t have an easily-defined and nationally-agreed upon goal for education that we can actually measure. If you don’t believe this, try asking a random set of people what they think should be primary the goal of education, and listen to all the different ideas!
  • It’s certainly not just ‘higher test scores’ — the math whizzes who brought us “collateralization of debt-swap obligations in leveraged financings” surely had exceedingly high math test scores, but I submit that their character education (as in, ‘not defrauding the public’) was lacking. In their selfishness and hubris, they have succeeded in nearly bankrupting the world economy while buying themselves multiple mansions and yachts, yet causing misery to billions living in slums around the world and millions here in the US who lost their homes and are now sleeping in their cars.
  • Is our goal also to ‘educate’ our future generations for the lowest cost? Given the prices for the best private schools and private tutors, it is clear that the wealthy believe that THEIR children should be afforded excellent educations that include very small classes, sports, drama, music, free play and exploration, foreign languages, writing, literature, a deep understanding and competency in mathematics & all of the sciences, as well as a solid grounding in the social sciences (including history, civics, and character education). Those parents realize that a good education is expensive, so they ‘throw money at the problem’. Unfortunately, the wealthy don’t want to do the same for the children of the poor.
  • Reducing the goals of education to just a student’s scores on secretive tests in just two subjects, and claiming that it’s possible to tease out the effectiveness of ANY teacher, even those who teach neither English/Language Arts or Math, is madness.
  • Why? Study after study (not by Sanders, of course) has shown that the actual influence of any given teacher on a student is only from 1% of 14% of test scores. By far the greatest influence is from the student’s own family background, not the ability of a single teacher to raise test scores in April. (An effect which I have shown is chimerical — the effect one year is mostly likely completely different the next year!)
  • By comparison, a cow’s life is pretty simple. They eat whatever they are given (be that straw, shredded newspaper, cotton seeds, chicken poop mixed with sawdust, or even the dregs from squeezing out orange juice [no, I’m not making that up.]. Cows also poop, drink, pee, chew their cud, and sometimes they try to bully each other. If it’s a dairy cow, it gets milked twice a day, every day, at set times. If it’s a steer, he/it mostly sits around and eats (and poops and pees) until it’s time to send  them off to the slaughterhouse. That’s pretty much it.
  • Gary Rubinstein and I have dissected the value-added scores for New York City public school teachers that were computed and released by the New York Times. We both found that for any given teacher who taught the same subject matter and grade level in the very same school over the period of the NYT data, there was almost NO CORRELATION between their scores for one year to the next.
  • We also showed that teachers who were given scores in both math and reading (say, elementary teachers), there was almost no correlation between their scores in math and in reading.
  • Furthermore, with teachers who were given scores in a single subject (say, math) but at different grade levels (say, 6th and 7th grade math), you guessed it: extremely low correlation.
  • In other words, it seemed to act like a very, very expensive and complicated random-number generator.
  • People have much, much more complicated inputs, and much more complicated outputs. Someone should have written on William Sanders’ tombstone the phrase “People are not cattle.”

Interesting fact: Jason Kamras was considered to be the architect of Value-Added measurement for teachers in Washington, DC, implemented under the notorious and now-disgraced Michelle Rhee. However, when he left DC to become head of Richmond VA public schools, he did not bring it with him.


The ‘Smoking Memo’ on Michelle Rhee’s EraserGate was leaked to John Merrow

The “smoking memo” has turned up.
The one that Michelle Rhee, Kaya Henderson, and Charles Willoughby didn’t want the public to see.

The one where the testing company expert told them all about the cheating and what steps they should take — none of which were taken.

That memo was leaked to John Merrow of Frontline. You really should read his entire article. It’s long, it’s got footnotes, and it’s excellent.



Teachers, parents, and concerned citizens should take the time to read this long, footnoted, in-depth follow-up by John Merrow (a journalist at Frontline) on the cheating scandal (by adults) in Washington, DC public schools, in particular at Noyes right here in Brookland.
The article points out several things:
(1) Rhee gave lots of money to adults who cheated
(2) She put impossible pressure on principals to cheat; they, in turn, put that pressure on their teachers
(3) The achievement gap between white and black students, and between poor kids and wealthier kids, increased on Rhee’s and Henderson’s watches; any increases in NAEP scores are continuations of trends that began under her predecessors; and DCPS students’s scores are still at the bottom of the nation
(4) Rhee, Henderson, Kamras, and IG Willoughby have steadfastly refused to investigate the cheating seriously and to do the sort of analysis that actually shows malfeasance
(5) Turnover among administrators and teachers in DCPS has turned a revolving door into a whirlwind
(6) The idealistic principal who followed Wayne Ryan at Noyes, and who was originally a great admirer of Rhee, found a lot of evidence of cheating there, but her whistleblower suit was dismissed, and she now runs a cupcake store
(7) Despite noises to the contrary by Rhee, the number of highly-paid central-office administrators has jumped; DCPS has the highest administrator-to-student ratio anywhere in the region
(8) Funds that should have been used to help students who were behind were, instead, used to pay illegitimate bonuses to dishonest adults.
Here is the URL:
A couple of key quotes:

” former DeKalb County District Attorney Robert … Wilson said that he had been following the DCPS story closely.  “There’s not a shred of doubt in my mind that adults cheated in Washington,” he said. “The big difference is that nobody in DC wanted to know the truth.”


It’s easy to see how not trying to find out who had done the erasing–burying the problem–was better for Michelle Rhee personally, at least in the short term.  She had just handed out over $1.5 million in bonuses in a well-publicized celebration of the test increases[9]. She had been praised by presidential candidates Obama and McCain[10] in their October debate, and she must have known that she was soon to be on the cover of Time Magazine[11].  The public spectacle of an investigation of nearly half of her schools would have tarnished her glowing reputation, especially if the investigators proved that adults cheated–which seems likely given that their jobs depended on raising test scores.

Moreover, a cheating scandal might well have implicated her own “Produce or Else” approach to reform.  Early in her first year she met one-on-one with each principal and demanded a written, signed guarantee[12] of precisely how many points their DC-CAS scores would increase.

It’s 2013.  Is there any point to investigating probable cheating that occurred in 2008, 2009 and 2010?  After all, the children who received inflated scores can’t get a ‘do-over,’ and it’s probably too late to claw back bonuses from adults who cheated, even if they could be identified.  While erasure analysis would reveal the extent of cheating, what deserves careful scrutiny is the behavior of the leadership when it learned that a significant number of adults were probably cheating, because five years later, Rhee’s former deputy is in charge of public schools, and Rhee continues her efforts to persuade states and districts to adopt her approach to education reform–an approach, the evidence indicates, did little or nothing to improve the public schools in our nation’s capital.

This story is bound to remind old Washington hands of Watergate and Senator Howard Baker’s famous question, “What did the President know and when did he know it?” It has a memo that answers an echo of Baker’s question, “What did Michelle know, and when did she know it?” And the entire sordid story recalls the lesson of Watergate lesson, “It’s not the crime; it’s the coverup.”

That Michelle Rhee named her new organization “StudentsFirst” is beyond ironic.

Utter, Stunning Failure by Rhee, Kamras, Henderson et al:

Mr. Teachbad” did such a great job analyzing the utter failure of these contemptible liars that I hope he won’t mind that I re-post it in full:


16 MAR 2013       by 

Well, shit…THAT didn’t work. Now what?

This is stunning.

You remem­ber Michelle Rhee, right? She came to turn the DC pub­lic school sys­tem around. In 2007 she grabbed this city by the throat and shook it into sub­mis­sion.  Teach­ers were fired by the hun­dreds and prin­ci­pals by the dozens. Thou­sands have left the sys­tem because they did not want to work under the con­di­tions Rhee and Jason Kam­ras, her chief teacher tech­ni­cian, were imposing.

That was fine with her. Screw ‘em.  She would find new peo­ple who were will­ing to work hard and believed in chil­dren. Mil­lions upon mil­lions of new dol­lars were found and spent on telling teach­ers how to teach, reward­ing the lap­dogs and fer­ret­ing out the infidels.

Big change never comes easy. You can’t make an omelet with­out break­ing some eggs, etc. But if the right peo­ple have the resources and the courage to make and fol­low through with the tough deci­sions, great things can happen.

After five years, how is DCPS doing? A DC Fis­cal Pol­icy Insti­tute study released ear­lier this week has eval­u­ated the work of Rhee and her suc­ces­sor, Kaya “sucks-to-be-me” Hen­der­son. A write up of the study by Emma Brown can also be found at the Wash­ing­ton Post.

The prin­ci­pal find­ing of the study was that the “share of stu­dents scor­ing at a pro­fi­cient level at the typ­i­cal school fell slightly between 2008 and 2012.”

Whatch­utalk­in­boutwillis? Seri­ously? Read that again. Oh…my…God.

But hold on. That can’t really say every­thing. And what the hell is a “typ­i­cal school”? Let’s dis­ag­gre­gate the data.

Fair enough. The first thing to notice is that pub­lic char­ter schools are doing bet­ter than DCPS schools; not by a huge amount, but it is notice­able and across the board. So there’s that.

More impor­tantly, inter­est­ing pat­terns are revealed when look­ing at schools across these five years by income quin­tiles. Then, as now, the best per­form­ing schools are in the wealth­i­est parts of town and the worst per­form­ing schools are in the poor­est parts of town. That almost goes with­out say­ing. But have schools in the poor­est parts of the city begun to catch up? After all, that’s what this is sup­posed to be all about; clos­ing the achieve­ment gap. How’s that going?

There’s no easy way to say this, so I’ll just come out with it:

      Pro­fi­ciency rates have increased in the four wards with the high­est incomes. Pro­fi­ciency rates have fallen in the four wards with the low­est incomes.   

So, Michelle, Kaya and Jason…it appears you have man­aged to INCREASE the size of the Achieve­ment Gap in Wash­ing­ton, DC. And, Michelle, you are now try­ing to export your great ideas to the entire coun­try? If the three of you don’t feel stu­pid by now, you’re even dumber than I thought. You should all resign. Immediately.

But maybe there’s hope. There is a new plan. Not just any plan, but a strate­gic plan. The study notes that DCPS’s newCap­tial Com­mit­ment plan (yawn) sets the “ambi­tious goal of increas­ing pro­fi­ciency rates at the 40 low­est per­form­ing schools by 40 per­cent­age points by 2017….Given the DC CAS score trends over the past four years, it would appear that DCPS needs to under­take sub­stan­tial changes to the way it oper­ates to make this goal a reality.”

Wait. Didn’t we just do that?

——— Mr. Teach­bad


Two Powerful Posts From Current Teachers

First, we have “Florence”, a DCPS teacher, on the utter BS from Kamras and Henderson on recruiting new teachers while trashing the current ones:


Second, we have teacher Abby Breaux in Louisiana explaining why she can’t take it any more:



Published in: on March 12, 2013 at 9:55 pm  Leave a Comment  
Tags: , , ,

The Correlation Between ‘Value-Added’ Scores and Observation Scores in DCPS under IMPACT is, in fact, Exceedingly Weak

As I suspected, there is nearly no correlation between the scores obtained by DCPS teachers on two critical measures.

I know this because someone leaked me a copy of the entire summary spreadsheet, which I will post on the web at Google Docs shortly.

As usual, a scatter plot does an excellent job of showing how ridiculous the entire IMPACT evaluation system is. It doesn’t predict anything to speak of.

Here is the first graph.

Notice that the r^2 value is quite low: 0.1233, or about 12%. Not quite a random distribution, but fairly close. Certainly not something that should be used to decide whether someone gets to keep their job or earn a bonus.

The Aspen Institute study apparently used R rather than r*2; they reported R values of about 0.35, which is about what you get when you take the square root of 0.1233.

Here is the second graph, which plots teachers’ ranks on the classroom observations versus their ranks on the Value Added scores. Do you see any correlation?


Remember, this is the correlation that Jason Kamras said was quite strong.

Some Thoughts on the Wysocki Case

Since I’m so special and my comments are so wonderful (syke), I thought I would offer a “revised and extended” version of my comments on Bill Turque’s excellent expose today in the Washington Post.
Look at the scatter plots of the NYC value-added data. They are essentially random clouds. Meaning that the VAM score one year essentially explains nothing about what’s going to happen next year. Or look here, or here, or here. Or here  or here or here (by Gary Rubenstein)
Would you trust a medical test of some sort that is only correct seven percent of the time?
(Yes, I said SEVEN percent (7%) . Not seventy percent (70%).)
That’s about how useful a VAM measure is, going by the only city-wide historical data that we have been able to look at so far — that from NYC.
(The LATimes seems to want at least $500 to allow me to look at such a spreadsheet for the LAUSD teachers, and from what Bill Turque writes, DCPS has lots of data as well and won’t release it. Because they are ashamed at how unreliable and useless it is. So I don’t have any other city except NYC.)
Utter nonsense.
I am glad that Ms. Wysocki spoke up and that Mr. Turque wrote it up.
This is good journalism.
(By the way, if look at one of Gary R’s posts,(his part 3) you will see that he found that if you track the NYCPS STUDENTS’ scores from year to year, they lie on a by nice messy cigar-shaped blob that goes up and to the right. The KIPP schools don’t change that. The charter schools don’t change that. Nor do the regular public schools.) Meaning that the students who score well one year, score pretty well the next year. Big surprise, huh? Teachers’ influence — well, we still don’t know how to measure it accurately. Maybe in a few decades will figure out something. But not now. We need to throw the educational DEformers out of office so that they can’t hurt anybody else.)


There are many excellent reasons to reject VAM as a means of making decisions that affect either teachers’ or students’ lives. One of those reasons is the story of Ms. Wysocki – and she’s not the only one.

I have recently done some simple scatterplots using Excel on the now-publicly-available New York City value-added database. I won’t bore you with the details, but I was able to compare value-added scores for school years 0506, 0607, and 0708 — comparing the exact same teachers in the exact same schools teaching the exact same subjects at the exact same grade level.

In any case, if you look at a number of my recent columns, you will see scatter plots where I paired the value-added scores for each pair of years. And what I discovered was that the LACK of correlation was simply overwhelming.

From one year to the next, the variation was phenomenal. It was almost as bad as someone rolling dice or throwing darts. Not quite as random as that, but close.

The r-squared correlation values, the part of one year’s value-added score that statisticians say “explains” the next year’s score, was between 0.05 and 0.08. Yup, five to eight percent. For something like VAM to be effective and useful, in my opinion, it should have an r or r^2 value in the high 80s or 90s (meaning 80% to 90% or more, if you prefer) for r or r^2 – I’m not even picky here. (Tricky fact here:  when you deal with decimals between 0 and 1, r^2 is SMALLER than r. But you knew that, right?)

You heard me right.

It’s like I’m saying that for VAM to be useful, its r^2 value – its ability to predict anything – should be scoring about in the range of 70% to 99%. However, as a whole, the predictive value of VAM is less than TEN PERCENT. Think about it. What kind of grade do you give someone (like Jason Kamras) who, with this system, are earning, essentially, scores of 6% to 9%?

That means, they fail. UTTERLY. It’s not even close.

And it’s not Ms. Wysocki who is failing. It’s that pompous ass, Jason Kamras, and his idol Erik Hanushek.

Seriously, would you trust medical or forensic test of anything if it only gives you the right answer less than ten percent of the time?

Please, see the charts for yourself.

BTW: Obviously other stuff changed in NYC (kids’ names, curriculum might change, etc) but this is about as close as you can get to a controlled experiment in education, holding most teacher stuff constant.

If you read the propaganda from Kamras and Rhee and their followers and funders, you would think that these scores would be very strongly correlated. And I’m talking about VALUE ADDED SCORES, not raw scores. Supposedly a strong teacher who is really strong is going to have a high value-added score every year, with only a very few exceptions, right? And if you continue drinking the VAM Kool-aid, you would also believe that teachers with low VAM scores would do a crappy job year after year after year as they wait in their cushy teachers’ chair to retire with a cast-gold pension without lifting a finger to do anything in class.

Facts are, however, stubborn things. And having data in hand allows us to see whether the educational DEformers’ claims are correct. Are VAM scores for teachers as consistent, say as students’ IQ and standardized tests and SAT scores and so on? (I bet those correlate pretty strongly in any one student — middling scores on any one of those tests will probably accompany middling scores on the others, fairly strongly; kids who get high SAT scores in HS probably did quite well on their state’s NCLB tests (if they take them); kids who get put through an IQ test and end up with low scores generally get low scores on the others as well. I put that in layman’s terms; statisticians have various ways of manipulating the numbers to come up with formulas that they find very meaningful, and are, but are often a bit much for the public to digest. (unfortunately).

However, a good scatter plot can equal many, many words, equations, or individual numbers for standard deviations, r-squared, linear or quadratic correlations, and the like.

If you have a blob that looks like a cigar that slopes up to the right at a 45-degree angle on a 2-variable graph, where you are plotting one score for a single teacher versus the same type of score for the same teacher, the VERY NEXT YEAR or the year after that, teaching the same subject in the same school building and at the same grade level, then most other things will be pretty stable the next year, usually. Such a graph implies that high-flyers are consistently good and scumbag lazy teachers are do-nothing idiots every year. If you have an elongated blimp that points down and to the right, then you have a strong negative correlation, which means that if they do well on the quantity measured on the X-axis, then they do POORLY on the quantity measured on the Y-axis, whatever they may be. The skinnier the blimpy blob, the stronger the correlation.

If you look at the graphs I made, which took me very, very little effort (Excel did all the hard work), you will see that this isn’t at all what we have. We have the next closest thing to absolute randomness: a big round blob with almost no direction at all.

Don’t believe me? Go look at the data yourself. Fire up a copy of Excel or any other spreadsheet. Put the entire column for 0506’s overall value-added score, expressed however you like (percentiles or whatever the NYC computed VA value is) and do the same thing for either 0607 or for 0708. (That may take some searching.) Ask your spreadsheet to plot them as a scatter plot. See if you don’t get a nearly formless blob. Ask Excel to calculate a linear regression. See what r^2 coefficient you get.

Don’t take my word for it. Have your computer do the hard work. See if I’m telling the truth or lying (like Kamras, Rhee, Bloomberg, and all the rest of them).

So this, in part, explains the insanity of the DCPS ‘value-added’ algorithm, and how a promising young teacher like Ms. Wysocki got fired.


What I found when looking at the value-added scores in NYC for 2005-8 was that a teacher in the top quintile among his/her fellow teachers (i.e., a “high flyer” by VAM) would have less than one chance in four of being anywhere in the top quintile the next year or 2 years later, even if teaching the exact same subject in the same school at the same grade level. And about a 20% chance of being in the bottom half in the second year. In other words, there was next to no correlation in scores from one year to the next.
Plus, I found that the “confidence interval” was extremely wide. a typical teacher (one at the median), the uncertainty in their value-added scores was FIFTY-SEVEN PERCENTILE RANKS.
Yup, you read that right.
If your printout says you are really at the 45th percentile (not so great; more than half the teachers’ computed scores are higher than yours), they openly admit that your real score might be anywhere from, sa, the 23rd percentile (really wretched) all the way up to 80th percentile (really great).
I’m not making this up.
See the data for yourself if you don’t believe me.
A question for Bill Turque:
Did you ever ask Kaya Henderson whether she had any innocent explanation for the fact that Wayne Ryan, a literal poster boy for Michelle Rhee and Value Added and all that, who was all-but-indicted in the USA Today investigative series as a massive and serial eraser and changer of test answers in order to inflate test scores (and, not coincidentally, his bank account and official status) — yeah, him, — did she have any explanation as to why Wayne Ryan suddenly disappeared off the edge of the DC radar by suddenly “resigning” to “pursue other interests” soon after the report came out?
Has your wife actually looked at the year-to-year correlations of value-added scores in the New York City public school system? Entire spreadsheets are on line and easy to manipulate. Has she seen how low the correlation coefficients are from one year to the next for the exact same teachers teaching the exact same grade level, the identical subjects, in the exact same school buildings? I find it astonishing that anybody could be so stupid as to base ANYTHING of substance on numbers that are so close to numerology or throwing darts blindfolded at a chart on the wall to decide who gets fired and who gets a bonus.
An exchange with a certain know-it-all:
FormerMCPSStudent wrote:
“DCPS is the 51% quality education in the US. The “veteran” teachers have proven themselves worthless.  
“The DC School Board protected these teachers for decades as a politically connected jobs program. No one who’s a “veteran” really earned their way into DCPS with talent or skill. They just had a connected uncle.”
 someone else wrote back,
“And what does this have to do with the article? Looks like you went to MCPS not DCPS, so where are your facts coming from?
 FormerMCPSStudent wrote back,
“Having graduated from MCPS schools, I’m educated well enough to tell the difference between quality and crap.”
I then commented,
“Well, FormerMCPSStudent, perhaps you should try teaching in DCPS and showing the rest of us lazy, lame brain veteran teachers how easy it is to produce miracles?  
“I’m sure DCPS is hiring – teachers are constantly quitting in frustration or retiring, and the DC-CAS is administered next month. Surely you can pull out a 4.0 on your value-added measures in both english and math at the 5th grade level? You have a full five weeks to get it done! “
Published in: on March 7, 2012 at 3:35 pm  Comments (2)  
Tags: , ,

DCPS Administrators Won’t or Can’t Give a DCPS Teacher the IMPACT Value-Added Algorithm

Does this sound familiar?

A veteran DCPS math teacher at Hardy MS has been asking DCPS top administrator Jason Kamras for details, in writing, on exactly how the “Value-Added” portion of IMPACT teacher evaluation system is calculated for teachers. To date, she has still not received an answer.

How the “Value-Added” portion of the IMPACT actually works is rather important: for a lot of teachers, it’s about half of their annual score. The general outline of the VAM is explained in the IMPACT documents, but none of the details. Supposedly, all of the scores of all of a teachers’ students’ in April are compared with all of those same students’ scores last April; and then, the socio-economic status and current achievement scores of those students are taken into account somehow, and the teacher is labeled with a single number that supposedly shows how his or her students gained during that year with respect to all other similar students.

But how those comparisons are made is totally unclear. So far  I have heard that the algorithm, or mathematical procedure, that has been used is designed to make it so that exactly half of all teachers are deemed, in non-technical terms, ‘below average’ in that regard — which of course will set them up to be fired sooner or later. Whether that’s an accurate description of the algorithm, I don’t know. Ms. Bax told me that she heard that DCPS expects that teachers with almost all Below-Basic students would be expected to achieve tremendous gains with their students. However, my own educational research indicates the opposite.

In any case, Kamras and his other staff haven’t put any details in writing. Yet.

At one place Kamras writes that “we use a regression equation to determine this score.” OK, Bax and Kamras and I all teach or taught math. We all understand a fair amount about regression equations. But there are lots of such equations! Just saying that there is a regression equation is involved is like saying Newton “used algebraic equations” in writing “Principia Mathematica”, or that Tolstoy used “words and sentences” when he wrote “War and Peace.” And just about equally informative.

I attach a series of emails between Ms. Bax, an 8th grade math teacher, and Mr. Kamras and a few other people in DCPS Central Administration. The emails were supplied to me, in frustration, by Ms. Bax. I used color to try to make it clear who was writing what: Green is Ms. Bax, and reds and browns and pinks denote those written by for various administrators. Note that this exchange of emails started in September of 2010.

Perhaps publicizing this exchange might prod Mr. Kamras to reveal details on a system that has already shown by Mathematica (the same group that designed the system) to be highly flawed and unreliable?


From: “Bax, Sarah (MS)” <sarah.bax@dc.gov> Date: Mon, 13 Sep 2010 17:12:41 -0400

To: Jason Kamras jason.kamras@dc.gov Subject: Impact


I hope the year is off to a great start for you.

I am writing concerning the IMPACT IVA score calculations.  I am very  frustrated with this process on a number of fronts.  First, I would like to have an actual explanation of how the growth scores are calculated. As they have been explained, the process seems quite flawed in actually measuring teacher effectiveness.  Further, I would like to know if teachers have any recourse in having their scores reexamined, etc.

Last year, 89% of the eighth graders at Hardy scored in the Proficient or Advanced range in Mathematics.  As the sole eighth grade mathematics teacher last year, I taught almost all of the students except for a handful that were pulled for special education services.  Beyond this accomplishment, I am extremely proud to report that 89% of our Black students were at that Proficient or Advanced level.

With statistics like these, I take issue with a report that scores my IVA at 3.4 (to add insult to this injury, even under your system if my students had earned just one-tenth more of a growth point, my IVA would be a 3.5 and I would be considered highly effective).

Frankly, I teach among the best of the best in DCPS– with very few of us rated highly effective.  The IMPACT scoring system has had a terrific negative impact on morale at our school.


Sarah Bax


From: Kamras, Jason (MS) Sent: Tue 9/14/2010 7:50 AM To: Bax, Sarah (MS) Subject: Re: Impact

Hi Sarah,

I’m disappointed to hear how frustrated you are. Can you give me a call at 202-321-1248 to discuss?



Jason Kamras

Director, Teacher Human Capital



I really do not have the time to call to discuss my concerns.  If you would forward the requested information regarding specific explanation about the growth scores calculation process I would be most obliged.

I would like specifics about the equation.  Please forward my inquiry to one of your technical experts so that he or she may email me with additional information about the mathematical model.




From: Barber, Yolanda (OOC) Sent: Mon 12/20/2010 2:12 PM To: Bax, Sarah (MS)

Subject: FW: IMPACT Question

Ms. Bax,

Sorry for the barrage of emails, but I received a response concerning your question.  Please read the response below.  I hope this helps.  Please let me know if you’d like to continue with our session on the 4th.  Thanks again.


Yolanda Barber

Master Educator | Secondary Mathematics

District of Columbia Public Schools

Office of the Chancellor


From: Rodberg, Simon (OOC) Sent: Monday, December 20, 2010 2:05 PM

To: Barber, Yolanda (OOC); Lindy, Benjamin (DCPS); Gregory, Anna (OOC) Subject: RE: IMPACT Question

Hi Yolanda,

We will be doing more training, including full information on Ms. Bax’s question, this spring. We’d like to give a coherent, full explanation at that time rather than give piecemeal  answers to questions in the meantime.

Thanks, and I hope you enjoy your break.


Simon Rodberg

Manager, IMPACT Design, Office of Human Capital



I got notice a couple of weeks ago that I have jury duty on your office hours day at Hardy so I won’t be able to make the appointment.  I’m sorry to miss you, but appreciate your efforts to send my concerns to the appropriate office.

The response below is obviously no help at all as it clearly indicates the Office of Human Capital is unwilling to answer my specific question regarding the calculations involved in determining my rating.  I believe my only request was to have an accurate description of how the expected growth score is calculated.  My question has been left unanswered since last spring.  Can you imagine if a student of mine asked how his or her grade was determined and I told them I couldn’t provide a coherent explanation right now, but see me in a year?

Thanks again for your help.  I look forward to meeting you in person in the future!




From: Bax, Sarah (MS) Sent: Tuesday, December 21, 2010 11:09 AM To: Rodberg, Simon (OOC)

Cc: Henderson, Kaya (OOC) Subject: FW: Appointment #80 (from DCPS Master Educator Office Hours Signup)

Mr. Rodberg,

I am requesting a response to my inquiry below:    ‘explanation of actual algorithm to determine predicted growth score’.


S. Bax


Ms. Bax,

What’s a good phone number to reach you on? I think it would be easiest to explain over the phone.

Thank you, and happy holidays.


Simon Rodberg

Manager, IMPACT Design, Office of Human Capital


From: Kamras, Jason (DCPS) [mailto:jason.kamras@dc.gov] Sent: Sun 12/26/2010 1:42 PM

To: Bax, Sarah (MS) Cc: Henderson, Kaya (OOC) Subject: Value-added calculation

Hi Sarah,

The Chancellor informed me that you’re looking for a more detailed explanation of how your “predicted” score is calculated. In short, we use a regression equation to determine this score. If you’d like to know more about the specifics of the equation, please let me know and I can set up a time for your to meet with our technical experts.

Happy New Year!


Jason Kamras

Chief, Office of Human Capital


On 12/27/10 12:17 PM, “Bax, Sarah (DCPS-MS)” <sarah.bax@dc.gov> wrote:


I have requested an explanation of the value-added calculation since September, with my initial request beginning with you (see email exchange pasted below).  I would like specifics about the equation.  Please forward my inquiry to one of your technical experts so that he or she may email me with additional information about the mathematical model.




On 12/27/10 12:23 PM, “Kamras, Jason (DCPS)” <jason.kamras@dc.gov> wrote:

My deepest apologies, Sarah. I’ll set this up as soon as I get back.

Jason Kamras

Chief, Office of Human Capital

—–Original Message—–

From: Kamras, Jason (DCPS) [mailto:jason.kamras@dc.gov] Sent: Tue 1/25/2011 11:02 PM

To: Bax, Sarah (MS) Subject: FW: Value-added calculation

Hi Sarah,

I just wanted to follow up on this. When could we get together to go over the equation?

Hope you’re well,


Jason Kamras

Chief, Office of Human Capital


From: Bax, Sarah (MS) Sent: Fri 1/28/2011 1:15 PM To: Kamras, Jason (DCPS)

Subject: RE: Value-added calculation


I really would just like something in writing that I can go over– and then I could contact you if I have questions.  It is difficult to carve out meeting time in my schedule.




From: “Bax, Sarah (DCPS-MS)” <sarah.bax@dc.gov> Date: Thu, 10 Feb 2011 14:05:43 -0500

To: Jason Kamras <jason.kamras@dc.gov> Subject: FW: Value-added calculation


I didn’t hear back from you after this last email.




From: Kamras, Jason (DCPS) [mailto:jason.kamras@dc.gov] Sent: Thu 2/10/2011 6:00 PM

To: Bax, Sarah (MS) Subject: Re: Value-added calculation

Ugh. So sorry, Sarah. The only thing we have in writing is the technical report, which is being finalized. It should be available on our website this spring. Of course, let me know if you’d like to meet before then.



Jason Kamras

Chief, Office of Human Capital


On Feb 25, 2011, at 9:29 PM, “Bax, Sarah (MS)” <sarah.bax@dc.gov> wrote:


How do you justify evaluating people by a measure [for] which you are unable to provide explanation?



Sat, February 26, 2011 11:25:33 AM


To be clear, we can certainly explain how the value-added calculation works. However, you’ve asked for a level of detail that is best explained by our technical partner, Mathematica Policy Research. When I offered you the opportunity to sit down with them, you declined.

As I have also noted previously, the detail you seek will be available in the formal Technical Report, which is being finalized and will be posted to our website in May. I very much look forward to the release, as I think you’ll be pleased by the thoughtfulness and statistical rigor that have guided our work in this area.

Finally, let me add that our model has been vetted and approved by a Technical Advisory Board of leading academics from around the country. We take this work very seriously, which is why we have subjected it to such extensive technical scrutiny.


Jason Kamras
Chief, Office of Human Capital


To be clear, I did not decline the opportunity to speak with your technical partner.  On December 27th I wrote to you, “I would like specifics about the equation.  Please forward my inquiry to one of your technical experts so that he or she may email me with additional information about the mathematical model.” I never received a response to this request.

In addition, both you and Mr. Rodberg offered to provide information about the equation to me on the phone or in person, but have yet to agree to send any information in writing.  You have stated, “I just wanted to follow up on this.
When could we get together to go over the equation?”  Mr. Rodberg wrote, “What’s a good phone number to reach you on? I think it would be easiest to explain over the phone.”

Why not transpose the explanation you would offer verbally to an email?  Please send in writing the information that you do know about how the predicted growth score is calculated.  For instance, I would expect you are familiar with what variables are considered and which data sources are used to determine their value.  Let me know what you would tell me if I were to meet with you.

As a former teacher, you must realize the difficulty in arranging actual face-time meetings given my teaching duties.  And as a former mathematics teacher, I would imagine you could identify with my desire to have an understanding of the quantitative components of my evaluation.


Published in: on February 27, 2011 at 8:59 pm  Comments (23)  
Tags: , , , , ,
%d bloggers like this: