More Problems With Value-Added Measurements for Teachers

I finally got around to reading and skimming the MATHEMATICA reports on VAM for schools and individual teachers in DCPS.

At first blush, it’s pretty impressive mathematical and statistical work. It looks like they were very careful to take care of lots of possible problems, and they have lots of nice greek letters and very learned and complicated mathematical formulas, with tables giving the values of many of the variables in their model. They even use large words like heteroscedasticity to scare off those not really adept at professional statistics (which would include even me). See pages 12 – 20 for examples of this mathematics of intimidation, as John Ewing of MfA and the AMS has described it. Here is one such learned equation:
value added equation
However clever and complex a model might be, it needs to do a good job of explaining and describing reality, or it’s just another failed hypothesis that needs to be rejected (like the theories of the 4 humours or the Aether). One needs to actually compare its track record with the real world and see how well the model compares with the real world.
Which is precisely what these authors do NOT do, even though they claim that “for teachers with the lowest possible IMPACT score in math — the bottom 3.6 percent of DCPS teachers — one can say with at least 99.9 percent confidence that these teachers were below average in 2010.” (p. 5)
Among other things, such a model would need to be consistent over time, i.e., reliable. Every indication I have seen, including in other cities that the authors themselves cite (NYC–see p. 2 of the 2010 report) indicates that individual value-added scores for a given teacher jump around randomly from year to year in cases of a teacher working at the exact same school, exact same grade level, exact same subject; or in cases of a teacher teaching 2 grade levels in the same school; or in cases of a teacher teaching 2 subjects, during the same year. Those correlations appear to be in the range of 0.2 to 0.3, which is frankly not enough to judge who is worth receiving large cash bonuses or a pink slip.
Unless something obvious escaped me, the authors do not appear to mention any study of how teachers’ IVA scores vary over time or from class to class, even though they had every student’s DC-CAS scores from 2007 through the present (see footnote, page 7).
In neither report do they acknowledge the possibility of cheating by adults (or students).
They do acknowledge on page 2 that a 2008 study found low correlations between proficiency gains and value-added estimates for individual schools in DCPS from 2005-2007. They attempt to explain that low correlation by “changes in the compositions of students from one year to the next” — which I doubt. I suspect it’s that neither one is a very good measure.
They also don’t mention anything about correlations between value-added scores and classroom-observations scores. From the one year of data that I received, this correlation is also very low. It is possible that this correlation is tighter today than it used to be, but I would be willing to wager tickets to a professional DC basketball, hockey, or soccer game that it’s not over 0.4.
The authors acknowledge that “[t]he DC CAS is not specifically designed for users to compare gains across grades.” Which means, they probably shouldn’t be doing it. It’s also the case that many, many people do not feel that the DC-CAS does a very good job of measuring much of anything useful except the socio-economic status of the student’s parents.
In any case, the mathematical model they have made may be wonderful, but real data so far suggests that it does not predict anything useful about teaching and learning.
Published in: on January 21, 2014 at 11:08 am  Comments (2)  
Tags: , ,

What I actually had time to say …

Since I had to abbreviate my remarks, here is what I actually said:

I am Guy Brandenburg, retired DCPS mathematics teacher.

To depart from my text, I want to start by proposing a solution: look hard at the collaborative assessment model being used a few miles away in Montgomery County [MD] and follow the advice of Edwards Deming.

Even though I personally retired before [the establishment of the] IMPACT [teacher evaluation system], I want to use statistics and graphs to show that the Value-Added measurements that are used to evaluate teachers are unreliable, invalid, and do not help teachers improve instruction. To the contrary: IVA measurements are driving a number of excellent, veteran teachers to resign or be fired from DCPS to go elsewhere.

Celebrated mathematician John Ewing says that VAM is “mathematical intimidation” and a “modern, mathematical version of the Emperor’s New Clothes.”

I agree.

One of my colleagues was able to pry the value-added formula [used in DC] from [DC data honcho] Jason Kamras after SIX MONTHS of back-and-forth emails. [Here it is:]

value added formula for dcps - in mathtype format

One problem with that formula is that nobody outside a small group of highly-paid consultants has any idea what are the values of any of those variables.

In not a single case has the [DCPS] Office of Data and Accountability sat down with a teacher and explained, in detail, exactly how a teacher’s score is calculated, student by student and class by class.

Nor has that office shared that data with the Washington Teachers’ Union.

I would ask you, Mr. Catania, to ask the Office of Data and Accountability to share with the WTU all IMPACT scores for every single teacher, including all the sub-scores, for every single class a teacher has.

Now let’s look at some statistics.

My first graph is completely random data points that I had Excel make up for me [and plot as x-y pairs].

pic 3 - completely random points

Notice that even though these are completely random, Excel still found a small correlation: r-squared was about 0.08 and r was about 29%.

Now let’s look at a very strong case of negative correlation in the real world: poverty rates and student achievement in Nebraska:

pic  4 - nebraska poverty vs achievement

The next graph is for the same sort of thing in Wisconsin:

pic 5 - wisconsin poverty vs achievement

Again, quite a strong correlation, just as we see here in Washington, DC:

pic 6 - poverty vs proficiency in DC

Now, how about those Value-Added scores? Do they correlate with classroom observations?

Mostly, we don’t know, because the data is kept secret. However, someone leaked to me the IVA and classroom observation scores for [DCPS in] SY 2009-10, and I plotted them [as you can see below].

pic 7 - VAM versus TLF in DC IMPACT 2009-10

I would say this looks pretty much no correlation at all. It certainly gives teachers no assistance on what to improve in order to help their students learn better.

And how stable are Value-Added measurements [in DCPS] over time? Unfortunately, since DCPS keeps all the data hidden, we don’t know how stable these scores are here. However, the New York Times leaked the value-added data for NYC teachers for several years, and we can look at those scores to [find out]. Here is one such graph [showing how the same teachers, in the same schools, scored in 2008-9 versus 2009-10]:

pic 8 - value added for 2 successive years Rubenstein NYC

That is very close to random.

How about teachers who teach the same subject to two different grade levels, say, fourth-grade math and fifth-grade math? Again, random points:

pic 9 - VAM for same subject different grades NYC rubenstein

One last point:

Mayor Gray and chancellors Henderson and Rhee all claim that education in DC only started improving after mayoral control of the schools, starting in 2007. Look for yourself [in the next two graphs].

pic 11 - naep 8th grade math avge scale scores since 1990 many states incl dc


pic 12 naep 4th grade reading scale scores since 1993 many states incl dc

Notice that gains began almost 20 years ago, long before mayoral control or chancellors Rhee and Henderson, long before IMPACT.

To repeat, I suggest that we throw out IMPACT and look hard at the ideas of Edwards Deming and the assessment models used in Montgomery County.

The Correlation Between ‘Value-Added’ Scores and Observation Scores in DCPS under IMPACT is, in fact, Exceedingly Weak

As I suspected, there is nearly no correlation between the scores obtained by DCPS teachers on two critical measures.

I know this because someone leaked me a copy of the entire summary spreadsheet, which I will post on the web at Google Docs shortly.

As usual, a scatter plot does an excellent job of showing how ridiculous the entire IMPACT evaluation system is. It doesn’t predict anything to speak of.

Here is the first graph.

Notice that the r^2 value is quite low: 0.1233, or about 12%. Not quite a random distribution, but fairly close. Certainly not something that should be used to decide whether someone gets to keep their job or earn a bonus.

The Aspen Institute study apparently used R rather than r*2; they reported R values of about 0.35, which is about what you get when you take the square root of 0.1233.

Here is the second graph, which plots teachers’ ranks on the classroom observations versus their ranks on the Value Added scores. Do you see any correlation?


Remember, this is the correlation that Jason Kamras said was quite strong.

Excellent DCPS Teacher Fired For Low Value-Added Scores — Because of Cheating?

You really need to read this article in today’s Washington Post, by Bill Turque. It describes the situation of Sarah Wysocki, a teacher at MacFarland, who was given excellent evaluations by her administrators during her second year; but since her “Value-Added” scores were low for the second year in a row, she was fired.–motivating-and-fired/2012/02/04/gIQAwzZpvR_story.html


Ms. Wysocki raises the possibility that someone cheated at Barnard, the school where a lot of her students had attended the previous year; she said that there were students who scored “advanced” in reading who could, in fact, not read at all.

Curious, I looked at the OSSE data for Barnard and found that the percentages of “advanced” students in grades 3 and 4 had what looks to me to be some rather suspicious drops from SY2009-10 to SY 2010-2011, at a school that apparently has a 70% to 80% free-or-reduced-price lunch population:

Grade 3, reading, 2010: 11% “advanced” but only 3% the next year;

Grade 4, reading, 2010: 29% “advanced”, but only 7% the next year.

Ms. Wysocki raised the accusation of cheating, but, as usual, DCPS administration put a bunch of roadblocks in the way and deliberately failed to investigate.

And naturally, Jason Kamras thinks he’s doing a peachy job and that there is nothing wrong with IMPACT or DC’s method of doing value-added computations.

Published in: on March 7, 2012 at 10:38 am  Comments (5)  
Tags: , , , , ,

Why One Teacher Quit DCPS Rather Than Continue Under IMPACT’s VAM

A very poignant commentary on how the rise of IMPACT has, in fact, ruined education in Washington, DC public schools. A few paragraphs:

“Before last school year, I had worked crazy hours and given up much of my life for work, but only because I loved my job and really believed in what I was doing.  Last year, my mindset was completely different.  I started doing everything I was doing because I was scared of what would happen if I didn’t do those things.  I was no longer motivated by a passion for teaching and learning, nor was I trying to develop myself into the great teacher I had once dreamed of becoming; I was motivated by a fear of being stigmatized a loser, and I was trying to do whatever it would take not to be considered one.

“Not to give away the ending to my story, but in this process I burnt out and lost faith in what I was doing in teaching.  I grew tired of caring so much about a test that I didn’t really care that much about.  I became frustrated with having to pass up opportunities to teach skills and concepts that I really thought my students needed to learn in order to teach them things I knew they were going to be tested on.  I couldn’t stand the taskmaster role I had to take on as a teacher.  Basically, I became sick of caring too much about all of the wrong things, and not enough about the things that really mattered.
In my quest to prove my worth and value, I started to feel worthless and easily replaceable.  Even worse, I felt like I was being told at every opportunity possible by the district to do this or that better.  And they weren’t telling me to do the things I knew I should be doing – if anything, the system seemed to be encouraging my worst behaviors, and seemingly suggesting I might even want to do them more intensely.  At every turn, I was presented with more data, more practice test scores, and more suggestions for how I might do things differently in order to get those test scores even further up. “
The rest of the article is here:
Published in: on January 25, 2012 at 3:46 pm  Comments (2)  
Tags: , , ,

DCPS Administrators Won’t or Can’t Give a DCPS Teacher the IMPACT Value-Added Algorithm

Does this sound familiar?

A veteran DCPS math teacher at Hardy MS has been asking DCPS top administrator Jason Kamras for details, in writing, on exactly how the “Value-Added” portion of IMPACT teacher evaluation system is calculated for teachers. To date, she has still not received an answer.

How the “Value-Added” portion of the IMPACT actually works is rather important: for a lot of teachers, it’s about half of their annual score. The general outline of the VAM is explained in the IMPACT documents, but none of the details. Supposedly, all of the scores of all of a teachers’ students’ in April are compared with all of those same students’ scores last April; and then, the socio-economic status and current achievement scores of those students are taken into account somehow, and the teacher is labeled with a single number that supposedly shows how his or her students gained during that year with respect to all other similar students.

But how those comparisons are made is totally unclear. So far  I have heard that the algorithm, or mathematical procedure, that has been used is designed to make it so that exactly half of all teachers are deemed, in non-technical terms, ‘below average’ in that regard — which of course will set them up to be fired sooner or later. Whether that’s an accurate description of the algorithm, I don’t know. Ms. Bax told me that she heard that DCPS expects that teachers with almost all Below-Basic students would be expected to achieve tremendous gains with their students. However, my own educational research indicates the opposite.

In any case, Kamras and his other staff haven’t put any details in writing. Yet.

At one place Kamras writes that “we use a regression equation to determine this score.” OK, Bax and Kamras and I all teach or taught math. We all understand a fair amount about regression equations. But there are lots of such equations! Just saying that there is a regression equation is involved is like saying Newton “used algebraic equations” in writing “Principia Mathematica”, or that Tolstoy used “words and sentences” when he wrote “War and Peace.” And just about equally informative.

I attach a series of emails between Ms. Bax, an 8th grade math teacher, and Mr. Kamras and a few other people in DCPS Central Administration. The emails were supplied to me, in frustration, by Ms. Bax. I used color to try to make it clear who was writing what: Green is Ms. Bax, and reds and browns and pinks denote those written by for various administrators. Note that this exchange of emails started in September of 2010.

Perhaps publicizing this exchange might prod Mr. Kamras to reveal details on a system that has already shown by Mathematica (the same group that designed the system) to be highly flawed and unreliable?


From: “Bax, Sarah (MS)” <> Date: Mon, 13 Sep 2010 17:12:41 -0400

To: Jason Kamras Subject: Impact


I hope the year is off to a great start for you.

I am writing concerning the IMPACT IVA score calculations.  I am very  frustrated with this process on a number of fronts.  First, I would like to have an actual explanation of how the growth scores are calculated. As they have been explained, the process seems quite flawed in actually measuring teacher effectiveness.  Further, I would like to know if teachers have any recourse in having their scores reexamined, etc.

Last year, 89% of the eighth graders at Hardy scored in the Proficient or Advanced range in Mathematics.  As the sole eighth grade mathematics teacher last year, I taught almost all of the students except for a handful that were pulled for special education services.  Beyond this accomplishment, I am extremely proud to report that 89% of our Black students were at that Proficient or Advanced level.

With statistics like these, I take issue with a report that scores my IVA at 3.4 (to add insult to this injury, even under your system if my students had earned just one-tenth more of a growth point, my IVA would be a 3.5 and I would be considered highly effective).

Frankly, I teach among the best of the best in DCPS– with very few of us rated highly effective.  The IMPACT scoring system has had a terrific negative impact on morale at our school.


Sarah Bax


From: Kamras, Jason (MS) Sent: Tue 9/14/2010 7:50 AM To: Bax, Sarah (MS) Subject: Re: Impact

Hi Sarah,

I’m disappointed to hear how frustrated you are. Can you give me a call at 202-321-1248 to discuss?



Jason Kamras

Director, Teacher Human Capital



I really do not have the time to call to discuss my concerns.  If you would forward the requested information regarding specific explanation about the growth scores calculation process I would be most obliged.

I would like specifics about the equation.  Please forward my inquiry to one of your technical experts so that he or she may email me with additional information about the mathematical model.




From: Barber, Yolanda (OOC) Sent: Mon 12/20/2010 2:12 PM To: Bax, Sarah (MS)

Subject: FW: IMPACT Question

Ms. Bax,

Sorry for the barrage of emails, but I received a response concerning your question.  Please read the response below.  I hope this helps.  Please let me know if you’d like to continue with our session on the 4th.  Thanks again.


Yolanda Barber

Master Educator | Secondary Mathematics

District of Columbia Public Schools

Office of the Chancellor


From: Rodberg, Simon (OOC) Sent: Monday, December 20, 2010 2:05 PM

To: Barber, Yolanda (OOC); Lindy, Benjamin (DCPS); Gregory, Anna (OOC) Subject: RE: IMPACT Question

Hi Yolanda,

We will be doing more training, including full information on Ms. Bax’s question, this spring. We’d like to give a coherent, full explanation at that time rather than give piecemeal  answers to questions in the meantime.

Thanks, and I hope you enjoy your break.


Simon Rodberg

Manager, IMPACT Design, Office of Human Capital



I got notice a couple of weeks ago that I have jury duty on your office hours day at Hardy so I won’t be able to make the appointment.  I’m sorry to miss you, but appreciate your efforts to send my concerns to the appropriate office.

The response below is obviously no help at all as it clearly indicates the Office of Human Capital is unwilling to answer my specific question regarding the calculations involved in determining my rating.  I believe my only request was to have an accurate description of how the expected growth score is calculated.  My question has been left unanswered since last spring.  Can you imagine if a student of mine asked how his or her grade was determined and I told them I couldn’t provide a coherent explanation right now, but see me in a year?

Thanks again for your help.  I look forward to meeting you in person in the future!




From: Bax, Sarah (MS) Sent: Tuesday, December 21, 2010 11:09 AM To: Rodberg, Simon (OOC)

Cc: Henderson, Kaya (OOC) Subject: FW: Appointment #80 (from DCPS Master Educator Office Hours Signup)

Mr. Rodberg,

I am requesting a response to my inquiry below:    ‘explanation of actual algorithm to determine predicted growth score’.


S. Bax


Ms. Bax,

What’s a good phone number to reach you on? I think it would be easiest to explain over the phone.

Thank you, and happy holidays.


Simon Rodberg

Manager, IMPACT Design, Office of Human Capital


From: Kamras, Jason (DCPS) [] Sent: Sun 12/26/2010 1:42 PM

To: Bax, Sarah (MS) Cc: Henderson, Kaya (OOC) Subject: Value-added calculation

Hi Sarah,

The Chancellor informed me that you’re looking for a more detailed explanation of how your “predicted” score is calculated. In short, we use a regression equation to determine this score. If you’d like to know more about the specifics of the equation, please let me know and I can set up a time for your to meet with our technical experts.

Happy New Year!


Jason Kamras

Chief, Office of Human Capital


On 12/27/10 12:17 PM, “Bax, Sarah (DCPS-MS)” <> wrote:


I have requested an explanation of the value-added calculation since September, with my initial request beginning with you (see email exchange pasted below).  I would like specifics about the equation.  Please forward my inquiry to one of your technical experts so that he or she may email me with additional information about the mathematical model.




On 12/27/10 12:23 PM, “Kamras, Jason (DCPS)” <> wrote:

My deepest apologies, Sarah. I’ll set this up as soon as I get back.

Jason Kamras

Chief, Office of Human Capital

—–Original Message—–

From: Kamras, Jason (DCPS) [] Sent: Tue 1/25/2011 11:02 PM

To: Bax, Sarah (MS) Subject: FW: Value-added calculation

Hi Sarah,

I just wanted to follow up on this. When could we get together to go over the equation?

Hope you’re well,


Jason Kamras

Chief, Office of Human Capital


From: Bax, Sarah (MS) Sent: Fri 1/28/2011 1:15 PM To: Kamras, Jason (DCPS)

Subject: RE: Value-added calculation


I really would just like something in writing that I can go over– and then I could contact you if I have questions.  It is difficult to carve out meeting time in my schedule.




From: “Bax, Sarah (DCPS-MS)” <> Date: Thu, 10 Feb 2011 14:05:43 -0500

To: Jason Kamras <> Subject: FW: Value-added calculation


I didn’t hear back from you after this last email.




From: Kamras, Jason (DCPS) [] Sent: Thu 2/10/2011 6:00 PM

To: Bax, Sarah (MS) Subject: Re: Value-added calculation

Ugh. So sorry, Sarah. The only thing we have in writing is the technical report, which is being finalized. It should be available on our website this spring. Of course, let me know if you’d like to meet before then.



Jason Kamras

Chief, Office of Human Capital


On Feb 25, 2011, at 9:29 PM, “Bax, Sarah (MS)” <> wrote:


How do you justify evaluating people by a measure [for] which you are unable to provide explanation?



Sat, February 26, 2011 11:25:33 AM


To be clear, we can certainly explain how the value-added calculation works. However, you’ve asked for a level of detail that is best explained by our technical partner, Mathematica Policy Research. When I offered you the opportunity to sit down with them, you declined.

As I have also noted previously, the detail you seek will be available in the formal Technical Report, which is being finalized and will be posted to our website in May. I very much look forward to the release, as I think you’ll be pleased by the thoughtfulness and statistical rigor that have guided our work in this area.

Finally, let me add that our model has been vetted and approved by a Technical Advisory Board of leading academics from around the country. We take this work very seriously, which is why we have subjected it to such extensive technical scrutiny.


Jason Kamras
Chief, Office of Human Capital


To be clear, I did not decline the opportunity to speak with your technical partner.  On December 27th I wrote to you, “I would like specifics about the equation.  Please forward my inquiry to one of your technical experts so that he or she may email me with additional information about the mathematical model.” I never received a response to this request.

In addition, both you and Mr. Rodberg offered to provide information about the equation to me on the phone or in person, but have yet to agree to send any information in writing.  You have stated, “I just wanted to follow up on this.
When could we get together to go over the equation?”  Mr. Rodberg wrote, “What’s a good phone number to reach you on? I think it would be easiest to explain over the phone.”

Why not transpose the explanation you would offer verbally to an email?  Please send in writing the information that you do know about how the predicted growth score is calculated.  For instance, I would expect you are familiar with what variables are considered and which data sources are used to determine their value.  Let me know what you would tell me if I were to meet with you.

As a former teacher, you must realize the difficulty in arranging actual face-time meetings given my teaching duties.  And as a former mathematics teacher, I would imagine you could identify with my desire to have an understanding of the quantitative components of my evaluation.


Published in: on February 27, 2011 at 8:59 pm  Comments (23)  
Tags: , , , , ,
%d bloggers like this: