What I actually had time to say …

Since I had to abbreviate my remarks, here is what I actually said:

I am Guy Brandenburg, retired DCPS mathematics teacher.

To depart from my text, I want to start by proposing a solution: look hard at the collaborative assessment model being used a few miles away in Montgomery County [MD] and follow the advice of Edwards Deming.

Even though I personally retired before [the establishment of the] IMPACT [teacher evaluation system], I want to use statistics and graphs to show that the Value-Added measurements that are used to evaluate teachers are unreliable, invalid, and do not help teachers improve instruction. To the contrary: IVA measurements are driving a number of excellent, veteran teachers to resign or be fired from DCPS to go elsewhere.

Celebrated mathematician John Ewing says that VAM is “mathematical intimidation” and a “modern, mathematical version of the Emperor’s New Clothes.”

I agree.

One of my colleagues was able to pry the value-added formula [used in DC] from [DC data honcho] Jason Kamras after SIX MONTHS of back-and-forth emails. [Here it is:]

value added formula for dcps - in mathtype format

One problem with that formula is that nobody outside a small group of highly-paid consultants has any idea what are the values of any of those variables.

In not a single case has the [DCPS] Office of Data and Accountability sat down with a teacher and explained, in detail, exactly how a teacher’s score is calculated, student by student and class by class.

Nor has that office shared that data with the Washington Teachers’ Union.

I would ask you, Mr. Catania, to ask the Office of Data and Accountability to share with the WTU all IMPACT scores for every single teacher, including all the sub-scores, for every single class a teacher has.

Now let’s look at some statistics.

My first graph is completely random data points that I had Excel make up for me [and plot as x-y pairs].

pic 3 - completely random points

Notice that even though these are completely random, Excel still found a small correlation: r-squared was about 0.08 and r was about 29%.

Now let’s look at a very strong case of negative correlation in the real world: poverty rates and student achievement in Nebraska:

pic  4 - nebraska poverty vs achievement

The next graph is for the same sort of thing in Wisconsin:

pic 5 - wisconsin poverty vs achievement

Again, quite a strong correlation, just as we see here in Washington, DC:

pic 6 - poverty vs proficiency in DC

Now, how about those Value-Added scores? Do they correlate with classroom observations?

Mostly, we don’t know, because the data is kept secret. However, someone leaked to me the IVA and classroom observation scores for [DCPS in] SY 2009-10, and I plotted them [as you can see below].

pic 7 - VAM versus TLF in DC IMPACT 2009-10

I would say this looks pretty much no correlation at all. It certainly gives teachers no assistance on what to improve in order to help their students learn better.

And how stable are Value-Added measurements [in DCPS] over time? Unfortunately, since DCPS keeps all the data hidden, we don’t know how stable these scores are here. However, the New York Times leaked the value-added data for NYC teachers for several years, and we can look at those scores to [find out]. Here is one such graph [showing how the same teachers, in the same schools, scored in 2008-9 versus 2009-10]:

pic 8 - value added for 2 successive years Rubenstein NYC

That is very close to random.

How about teachers who teach the same subject to two different grade levels, say, fourth-grade math and fifth-grade math? Again, random points:

pic 9 - VAM for same subject different grades NYC rubenstein

One last point:

Mayor Gray and chancellors Henderson and Rhee all claim that education in DC only started improving after mayoral control of the schools, starting in 2007. Look for yourself [in the next two graphs].

pic 11 - naep 8th grade math avge scale scores since 1990 many states incl dc


pic 12 naep 4th grade reading scale scores since 1993 many states incl dc

Notice that gains began almost 20 years ago, long before mayoral control or chancellors Rhee and Henderson, long before IMPACT.

To repeat, I suggest that we throw out IMPACT and look hard at the ideas of Edwards Deming and the assessment models used in Montgomery County.

Poverty Isn’t Destiny?

Quite a few Ed Deformers say that Poverty Isn’t Destiny. They say that it doesn’t matter if a child has been subjected to lead poisoning, separation from parents, violent or otherwise cruel child abuse, inadequate nutrition, and has lacked dental or health care and the love and care of a family during the first, crucial years. All it takes is for a Bright Young Thing fresh out of college to work her butt off for two years before she goes to work for a bank — and all of those handicaps will be overcome, with no extra dollars invested, and maybe even less!

Or maybe not.

Lots of teachers have been working their butts off for many decades, doing their best, believe it or not (for the most part).

Here are two three graphs from Wisconsin that show how close the connection between the poverty rates and student achievement levels, at all of their schools for which they provide data. My data come from here and are for SY 2011-2012. In fact, you can download the entire spreadsheet for the state of Wisconsin if you click on this link:


In both all three graphs, the percentage of students at the schools is along the horizontal (X) axis. In the first two, the average achievement score at the school is along the vertical (Y) axis.

In this first graph, Wisconsin uses a 100-point scale for overall student achievement.

wisconsin school overall student ach score by pct of poor kids

That is an incredibly strong correlation between poverty levels and student achievement. The fewer the proportion of poor students at a school, the better the achievement scores at that school.

I had Excel compute two correlation “trend” lines – one straight, in black, and one curved, in red following a third-degree polynomial, since it looks like we have a serious “Matthew effect” going on here. In either case, the R-squared and R values are very elevated, showing that, in fact, poverty is in fact destiny for a lot of kids.

The next graph is for reading only, but it shows essentially the same trend. School reading scores go from 0 to 50.

Wisconsin school READING scores by pct of poor kids

There are very few real-life correlations between two entities stronger than what you see in these two graphs.

This next graph is a little different, for two reasons: the y-axis is math, and it’s the percent of students deemed ‘proficient’ on whatever test Wisconsin is using. It also shows a very strong correlation.

wisconsin school poverty rate versus percent of students proficient in MATH

Teacher VAM scores aren’t stable over time in Florida, either

I quote from a paper studying whether value-added scores for the same teachers tend to be consistent. In other words, does VAM allow us a chance to pick out the crappy teachers and give bonuses to the good one?

The answer, in complicated language, is essentially NO, but here is how they word it:

“Recently, a number of school districts have begun using measures of teachers’ contributions to student test scores or teacher “value added” to determine salaries and other monetary rewards.

In this paper we investigate the precision of valueadded measures by analyzing their inter-temporal stability.

We find that these measures of teacher productivity are only moderately stable over time, with year-to-year correlations in the range of 0.2-0.3.”

Or in plain English, and if you know anything at all about scatter plots and linear correlation, those scores wander all over the place and should never be used to provide any serious evidence about anything. Speculation, perhaps, but not policy or hiring or firing decisions of any sort.

They do say that they have some statistical tricks that allow them to make the correlation look better, but I don’t trust that sort of thing.  It’s not real.

Here’s a table from the paper. Look at those R values, and note that if you squared those correlation constants (go ahead, use your calculator on your cell phone) you get numbers that are way, way smaller – like what I and Gary Rubenstein reported concerning DCPS and NYCPS.

For your convenience, I circled the highest R value, 0.61, in middle schools on something called the normed FCAT-SSS, whatever that is (go ahead and look it up if it interests you) in  Duval county, Florida, one of the places where they had data. I also circled the lowest R value, 0.07, in Palm Beach  county, on the FCAT-NRT, whatever that is.

I couldn’t resist, so 0.56^2 is about 0.31 as an r-squared, which is moderate. There is only one score anywhere near that high 0.56, out of 24 such correlation calculations. The lowest value is 0.07 and if we square that and round it off we get an r-squared value of 0.005, shockingly low — essentially none at all.

The median correlation constant is about 0.285, which I indicated by circling two adjacent values of 0.28 and 0.29 in green. If you square that value you get r^2=0.08, which is nearly useless. Again.

I’m really sorry, but even though this paper was published four years ago, it’s still under wraps, or so it says?!?! I’m not supposed to quote from it? Well, to hell with that. it’s important data, for keereissake!

The title and authors are as follows, and perhaps they can forgive me. I don’t know how to contact them anyway. Does anybody have their contact information? Here is the title, credits, and warning:

Daniel F. McCaffrey;         Tim R. Sass;                         J. R. Lockwood
The RAND Corporation; Florida State University; The RAND Corporation
Original Version: April 9, 2008
This Version: June 27, 2008

*This paper has not been formally reviewed and should not be cited, quoted, reproduced, or retransmitted without the authors’ permission. This material is based on work supported by a supplemental grant to the National Center for Performance Initiatives funded by the United States Department of Education, Institute of Education Sciences. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of these organizations.

Published in: on March 30, 2012 at 3:58 pm  Comments (7)  
Tags: , ,

The Correlation Between ‘Value-Added’ Scores and Observation Scores in DCPS under IMPACT is, in fact, Exceedingly Weak

As I suspected, there is nearly no correlation between the scores obtained by DCPS teachers on two critical measures.

I know this because someone leaked me a copy of the entire summary spreadsheet, which I will post on the web at Google Docs shortly.

As usual, a scatter plot does an excellent job of showing how ridiculous the entire IMPACT evaluation system is. It doesn’t predict anything to speak of.

Here is the first graph.

Notice that the r^2 value is quite low: 0.1233, or about 12%. Not quite a random distribution, but fairly close. Certainly not something that should be used to decide whether someone gets to keep their job or earn a bonus.

The Aspen Institute study apparently used R rather than r*2; they reported R values of about 0.35, which is about what you get when you take the square root of 0.1233.

Here is the second graph, which plots teachers’ ranks on the classroom observations versus their ranks on the Value Added scores. Do you see any correlation?


Remember, this is the correlation that Jason Kamras said was quite strong.

More Value Added Comparisons

Someone who professes to understand Value-Added scores better than me claims that my graphs for NYC are meaningless because the scores for 2007 were inflated; he claimed that the overall year-to-year and year-to-career value-added correlation coefficients are much higher than what I found — thus, VA is really useful, just not my particular graphs..

Taking this objection seriously, I decided to leave out SY 0607, and compare SY 0506 to SY 0708. Same exact teachers, same exact subjects and grade levels, same exact schools, obviously different (but quite similar) kids.

Here is the scatterplot of what I found. Again, I asked Excel to calculate a line of best fit, and it drew it. Notice that the r-squared correlation value is about 0.05 — seriously LOW. Notice also that this scatterplot is basically a blob again, again a classic example of one variable showing very little correlation with another. (West Virginia’s map has a much more defined shape!) In any case, there are lots (hundreds? thousands?) of teachers with positive VA scores in the first year and negative VA scores the third year, and vice-versa. Only an easily countable handful of teachers have scores of +0.2 or better both years, or worse than -0.2 both years. Out of all of the thousands of teachers. And I bet those are all accidents as well.

So, in other words, I find, as did Gary Rubenstein, that there is extremely little correlation between two things that should be, you would think, very close to a perfect 1.00 correlation. (In the real world, of course, you almost never get a 1.00 correlation between any real entities or quantities. However, when you are talking about the scores of teachers who have been teaching IN THE SAME SCHOOL, THE SAME SUBJECT, THE SAME GRADE LEVEL for three straight years, then you would think that their performances would be rather similar all three years. If anything, they would normally get better unless they had suffered some sort of physically or mentally debilitating injury or illness (often from old age and the incredible amount of stress). In particular, a lot of teachers will admit to you that they absolutely sucked at teaching during their first year, but that they then figured out a lot of those errors and tried not to make the same ones the next year, so they really improved, or else they quit. But these folks didn’t quit. These are at the very least three-year veterans, which in DC would make them eligibility for department or grade level chair at their school as a result of seniority alone, since so many of the older teachers have quit or retired, and the turnover and attrition over the last few years among the newest hires in our school system is probably unprecedented in the history of education. (Perhaps not, but it’s a subject I’d like to pursue.)


Finally, while I admit that I exaggerated a bit (for effect) when I said that the shapes of these graphs, and the very low computed values for the r-squared coefficient of linear correlation, made value-added about as predictive as numerology. I thought about that particular exaggeration and wondered how serious it was. So, even though I have participated in a fairly large number of courses on calculating probabilities and distribution, it’s always a bit fraught with error: Have we counted all of the possibilities? Have we left any out? Have we double-counted any of them? Is there a much better, faster, or less error-prone method hidden right around the corner?


To make a long story short: the Monte Carlo method is a great way of deciding, say, how likely something is to happen. It’s called “Monte Carlo” because it’s very much like gambling in a casino, except you a4ren’t betting any5thing except your time. You just roll some dice (they might be funny-looking non-cubical polyhedra) or spinning a wheel or throwing darts or spattering paint or vaporized metal… And then you see what happens, and draw conclusions. Today, it’s 4really easy t6o do.

So I decided to see whether, in fact, the number of letters in the teachers’ names had any correlation with their Value Added scores. (I thought it was possible, tho not very likely.) I discovered that Excel found the r-squared constant was about 0.000000. That is zero correlation, my friends. Here is one such scatterplot:

The vertical axis, which goes up the middle, is the number of letter in the teachers’ first name times the number of letters in their last name as listed in the spreadsheet. The horizontal axis, which is at the bottom of the page, is their 2005-2006 value-added score, which can be either negative (theoretically bad) or positive (supposedly good). To me, it sort of looks like bush that hasn’t been pruned in several years – a classic case of no correlation at all.

I asked Excel to draw and calculate the line of best fit. It’s the green, nearly-horizontal line near the center of the graph. Notice the r-squared value: 6E-05, which for all of you innumerates out there, means 0.00006, which is seriously smaller (three orders of magnitude smaller) than 0.05; i.e., one-thousandth as big.

Notice that I’m only using r-squared. Someone objected that i should use just r. If you want, take the square root of all of the correlations I had my computer calculate, and you’ll get r. Compare and contrast.

So, in any case, I definitely did exaggerate.

Whether DC-CAS scores go up or down at any school seems mostly to be random!

After reviewing the changes in math and reading scores at all DC public schools for 2006 through 2009, I have come to the conclusion that the year-to-year school-wide changes in those scores are essentially random. That is to say, any growth (or slippage) from one year to the next is not very likely to be repeated the next year.

Actually, it’s even worse than that.The record shows that any change from year 1 to year 2 is somewhat NEGATIVELY correlated to the changes between year 2 and year 3. That is, if there is growth from year 1 to year 2, then, it is a bit more likely than not that there will be a shrinkage between year 2 and year 3.  Or, if the scores got worse from year 1 to year 2, they there is a slightly better-than-even chance that the scores will improve the following year.

And it doesn’t seem to matter whether the same principal is kept during all three years, or whether the principals are replaced one or more times over the three-year period.

In other words, all this shuffling of principals (and teachers) and turning the entire school year into preparation for the DC-CAS seems to be futile. EVEN IF YOU BELIEVE THAT THE SOLE PURPOSE OF EDUCATION IS TO PRODUCE HIGH STANDARDIZED TEST SCORES. (Which I don’t.)

Don’t believe me? I have prepared some scatterplots, below, and you can see the raw data here as a Google Doc.

My first graph is a scatterplot relating the changes in percentages of students scoring ‘proficient’ or better on the reading tests from Spring 2006 to Spring 2007 on the x-axis, with changes in percentages of students scoring ‘proficient’ or better in reading from ’07 to ’08 on the y-axis, at DC Public Schools that kept the same principals for 2005 through 2008.

If there were a positive correlation between the two time intervals in question, then the scores would cluster mostly in the first and third quadrants. And that would mean that if scores grew from ’06 to ’07 then they also grew from ’07 to ’08; or if they went down from ’06 to ’07, then they also declined from ’07 to ’08.

But that’s not what happened. In fact, in the 3rd quadrant, I only see one school – apparently  M.C.Terrell – where the scores went down during both intervals. However, there are about as many schools in the second quadrant as in the first quadrant. Being in the second quadrant means that the scores declined from ’06 to ’07 but then rose from ’07 to ’08. And there appear to be about 7 schools in the fourth quadrant. Those are schools where the scores rose from ’06 to ’07 but then declined from ’07 to ’08.

I asked Excel to calculate a regression line of best fit between the two sets of data, and it produced the line that you see, slanted downwards to the right. Notice that R-squared is 0.1998, which is rather weak. If we look at R, the square root of R-squared, that’s the regression constant, my calculator gives me -0.447, which means again that the correlation between the growth (or decline) from ’06 to ’07 is negatively correlated to the growth (or decline) from ’07 to ’08 – but not in a strong manner.

OK. Well, how about during years ’07-’08-’09? Maybe Michelle Rhee was better at picking winners and losers than former Superintendent Janey? Let’s take a look at schools where she allowed the same principal to stay in place for ’07, ’08, and ’09:

Actually, this graph looks worse! There are nearly twice as many schools in quadrant four as in quadrant one! That means that there are lots of schools where reading scores went up between ’07 and ’08, but DECLINED from ’08 to ’09; but many fewer schools where the scores went up both years. In the second quadrant, I  see about four schools where the scores declined from ’07 to ’08 but then went up between ’08 and ’09. Excel again provided a linear regression line of best fit, and again, the line slants down and to the right. R-squared is 0.1575, which is low. R itself is about -0.397, which is, again, rather low.

OK, what about schools where a principal got replaced? If you believe that all veteran administrators are bad and need to be replaced with new ones with limited or no experience, you might expect to see negative correlations, but with positive overall outcomes; in other words, the scores should cluster in the second quadrant. Let’s see if that’s true. First, reading changes over the period 2006-’07-’08:

Although there are schools in the second quadrant, there are also a lot in the first quadrant, and I also see more schools in quadrants 3 and 4 than we’ve seen in the first two graphs. According to Excel, R-squared is extremely low: 0.0504, which means that R is about -0.224, which means, essentially, that it is almost impossible to predict what the changes would be from one year to the next.

Well, how about the period ’07-’08-’09? Maybe Rhee did a better job of changing principals then? Let’s see:

Nope. Once again, it looks like there are as many schools in quadrant 4 as in quadrant 1, and considerably fewer in quadrant 2. (To refresh your memory: if a school is in quadrant 2, then the scores went down from ’07 to ’08, but increased from ’08 to ’09. That would represent a successful ‘bet’ by the superintendent or chancellor. However, if a school is in quadrant 4, that means that reading scores went up from ’07 to ’08, but went DOWN from ’08 to ’09; that would represent a losing ‘bet’ by the person in charge.) Once again, the line of regression slants down and to the right.  The value of R-squared, 0.3115, is higher than in any previous scatterplot (I get R = -0.558) which is not a good sign if you believe that superintendents and chancellors can read the future.

Perhaps things are more predictable with mathematics scores? Let’s take a look. First, changes in math scores during ’06-’07-’08 at schools that kept the same principal all 3 years:

Doesn’t look all that different from our first Reading graph, does it? Now, math score changes during ’07-’08-’09, schools with the same principal all 3 years:

Again, a weak negative correlation. OK, what about schools where the principals changed at least once? First look at ’06-’07-‘-8:

And how about ’07-’08-’09 for schools with at least one principal change?

Again, a very weak negative correlation, with plenty of ‘losing bets’.

Notice that every single one of these graphs presented a weak negative correlation, with plenty of what I am calling “losing bets” – by which I mean cases where the scores went up from the first year to the second, but then went down from the second year to the third.

OK. Perhaps it’s not enough to change principals once every 3 or 4 years. Perhaps it’s best to do it every year or two? (Anybody who has actually been in a school knows that when the principal gets replaced frequently, then it’s generally a very bad sign. But let’s leave common sense aside for a moment.) Here we have scatterplots showing what the situation was, in reading and math, from ’07 through ’09, at schools that had 2 or more principal changes from ’06 to ’09:


This conclusion is not going to win me lots of friends among those who want to use “data-based” methods of deciding whether teachers or administrators keep their jobs, or how much they get paid. But facts are facts.


A little bit of mathematical background on statistics:

Statisticians say that two quantities (let’s call them A and B) are positively correlated when an increase in one quantity (A)  is linked to an increase in the other quantity (B). An example might be a person’s height(for quantity A) and length of a person’s foot (for quantity B). Generally, the taller you are, the longer your feet are. Yes, there are exceptions, so these two things don’t have a perfect correlation, but the connection is pretty strong.

If two things are negatively correlated, that means that when one quantity (A) increases, then the other quantity (B) decreases. An example would be the speed of a runner versus the time it takes to run a given distance.  The higher the speed at which the athlete runs, the less time it takes to finish the race. And if you run at a lower speed, then it takes you more time to finish.

And, of course, there are things that have no correlation to speak of.

Published in: on March 13, 2010 at 3:37 pm  Comments (2)  
Tags: , , , , ,
%d bloggers like this: