What I actually had time to say …

Since I had to abbreviate my remarks, here is what I actually said:

I am Guy Brandenburg, retired DCPS mathematics teacher.

To depart from my text, I want to start by proposing a solution: look hard at the collaborative assessment model being used a few miles away in Montgomery County [MD] and follow the advice of Edwards Deming.

Even though I personally retired before [the establishment of the] IMPACT [teacher evaluation system], I want to use statistics and graphs to show that the Value-Added measurements that are used to evaluate teachers are unreliable, invalid, and do not help teachers improve instruction. To the contrary: IVA measurements are driving a number of excellent, veteran teachers to resign or be fired from DCPS to go elsewhere.

Celebrated mathematician John Ewing says that VAM is “mathematical intimidation” and a “modern, mathematical version of the Emperor’s New Clothes.”

I agree.

One of my colleagues was able to pry the value-added formula [used in DC] from [DC data honcho] Jason Kamras after SIX MONTHS of back-and-forth emails. [Here it is:]

value added formula for dcps - in mathtype format

One problem with that formula is that nobody outside a small group of highly-paid consultants has any idea what are the values of any of those variables.

In not a single case has the [DCPS] Office of Data and Accountability sat down with a teacher and explained, in detail, exactly how a teacher’s score is calculated, student by student and class by class.

Nor has that office shared that data with the Washington Teachers’ Union.

I would ask you, Mr. Catania, to ask the Office of Data and Accountability to share with the WTU all IMPACT scores for every single teacher, including all the sub-scores, for every single class a teacher has.

Now let’s look at some statistics.

My first graph is completely random data points that I had Excel make up for me [and plot as x-y pairs].

pic 3 - completely random points

Notice that even though these are completely random, Excel still found a small correlation: r-squared was about 0.08 and r was about 29%.

Now let’s look at a very strong case of negative correlation in the real world: poverty rates and student achievement in Nebraska:

pic  4 - nebraska poverty vs achievement

The next graph is for the same sort of thing in Wisconsin:

pic 5 - wisconsin poverty vs achievement

Again, quite a strong correlation, just as we see here in Washington, DC:

pic 6 - poverty vs proficiency in DC

Now, how about those Value-Added scores? Do they correlate with classroom observations?

Mostly, we don’t know, because the data is kept secret. However, someone leaked to me the IVA and classroom observation scores for [DCPS in] SY 2009-10, and I plotted them [as you can see below].

pic 7 - VAM versus TLF in DC IMPACT 2009-10

I would say this looks pretty much no correlation at all. It certainly gives teachers no assistance on what to improve in order to help their students learn better.

And how stable are Value-Added measurements [in DCPS] over time? Unfortunately, since DCPS keeps all the data hidden, we don’t know how stable these scores are here. However, the New York Times leaked the value-added data for NYC teachers for several years, and we can look at those scores to [find out]. Here is one such graph [showing how the same teachers, in the same schools, scored in 2008-9 versus 2009-10]:

pic 8 - value added for 2 successive years Rubenstein NYC

That is very close to random.

How about teachers who teach the same subject to two different grade levels, say, fourth-grade math and fifth-grade math? Again, random points:

pic 9 - VAM for same subject different grades NYC rubenstein

One last point:

Mayor Gray and chancellors Henderson and Rhee all claim that education in DC only started improving after mayoral control of the schools, starting in 2007. Look for yourself [in the next two graphs].

pic 11 - naep 8th grade math avge scale scores since 1990 many states incl dc

 

pic 12 naep 4th grade reading scale scores since 1993 many states incl dc

Notice that gains began almost 20 years ago, long before mayoral control or chancellors Rhee and Henderson, long before IMPACT.

To repeat, I suggest that we throw out IMPACT and look hard at the ideas of Edwards Deming and the assessment models used in Montgomery County.

Recent Articles Against Race to the Trough and other Deformations of US Public Education

Bob Schaeffer of FairTest has been compiling weekly lists of good articles that give a view from ordinary schools and households on what it’s been like under NCLB and its successor, RTTT. Here’s Schaeffer’s latest list.   — gfb

=====================================================

Assessment reform pressure continued to escalate even as Hurricane Sandy slammed ashore.  Best wishes to our friends and allies in the mid-Atlantic states as they recover from the storm.

Arne Duncan’s Legacy: Doubling Down on High Stakes Testing Failures
http://education.nationaljournal.com/2012/10/what-has-arne-done-for-us.php#2258005

Texas Tests Breed Schools for Scandal
http://www.texastribune.org/texas-education/public-education/guest-column-tests-breed-schools-scandal/

Testing in Kindergarten — Whatever Happened to Story Time?
http://www.chicagoreader.com/chicago/testing-consumes-kindergarten-class-time-in-chicago/Content?oid=7740293

Hudson Valley Parents Rip Excess Testing
http://newyork.newsday.com/westchester/westchester-now-1.3784383/hudson-valley-parents-rip-excess-school-testing-1.4152170

Data Missing for School Improvement Grant Claims
http://blogs.edweek.org/edweek/campaign-k-12/2012/10/transparency_watch_obama_has_t.html

The MLK Imperative in an Era of “No Excuses”
http://www.dailykos.com/story/2012/10/27/1149535/-The-MLK-Imperative-in-an-Era-of-No-Excuses

Researchers Urge “Caution” in Use of Value-Added Scores
http://www.edweek.org/ew/articles/2012/10/25/10valueadd.h32.html

Measuring the Worth of a Teacher
http://www.latimes.com/news/local/la-me-teacher-evals-20121029,0,592261,full.story

The Naked Emporer: What Test Scores Don’t Tell Us
http://www.psychologytoday.com/blog/young-minds/201210/what-test-scores-dont-tell-us-the-naked-emperor

Superintendent Dissects Race to the Trough’s Flaws
http://www.washingtonpost.com/blogs/answer-sheet/wp/2012/10/31/school-superintendent-to-thomas-friedman-why-you-are-wrong-about-race-to-the-top/

Bob Schaeffer, Public Education Director
FairTest: National Center for Fair & Open Testing
ph-   (239) 395-6773    fax-  (239) 395-6779
cell-  (239) 699-0468
web- http://www.fairtest.org
Published in: on October 31, 2012 at 4:41 pm  Comments (1)  
Tags: , , ,

Where the data came from

I neglected to give the source for the data for my last two posts. It’s at the website for what looks like a NYC radio or TV station:

http://www.ny1.com/content/top_stories/156599/now-available–2007-2010-nyc-teacher-performance-data#doereports

or

http://tinyurl.com/836a8cj

if you prefer it shorter.

I will warn you that some of the spreadsheets are quite large.

BTW, I just now did a graph showing how well New York City does at predicting the value-added scores of its teachers for school year 2007-2008. The answer seems to be, not very well. Here is the scatter plot:

The correlation is, again, close to zero, even though NYC’s department of assessment and numerology has done their best to try to get it right. In fact, even though the line of best fit doesn’t fit very well, you notice that it slopes downwards to the right. That means that with kids who are predicted to improve relative to the previous year, teachers’ value-added scores are, in general, lower than predicted; whereas with kids who are predicted to do worse than the previous year, teachers’ value-added scores are, in general, a tad higher than predicted.

Not ready for prime time. And not ready to be used to base hiring and firing and bonus decisions on.

Published in: on March 4, 2012 at 2:39 pm  Comments (3)  
Tags: , ,

An Apology? And a look at the DQC website, as I should have done earlier

I’m thinking of asking the DQC leadership for an apology for not even allowing me to stay and observe their national summit last week.

Even though I had pre-registered for the conference.

Even if I were to solemnly promise to sit there like a good boy and not bother anybody by giving them a 2-page commentary I had written, I could not remain.

Nor even if I promised to raise my hand politely before either being ignored or called on, at their pleasure…

I guess they seriously consider me a dangerous crank. But you know something? There seem to be a lot of teachers in the classrooms or retired, old or young, as well as parents and students, who make pretty much the same points that I do. Often they make their points much more eloquently and clearly than me on their blogs, in replies to other folks’ comments, on Facebook, and elsewhere. My major (very small, but not without significance) contribution to the debate on NCLB and the future of education has been mostly to supply some data showing that the Educational DEform movement does not seem to be based on real data.

I am still working on the wording of the letter to DQC, and am trying to figure out what to ask for.* I sent out a couple of drafts, got considerable help from those folks who edited the drafts, and now need to somehow combine all the contributions and make my letter to DQC first of all a lot shorter than it is.

Was this official DQC policy that was spelled out that Guy Brandenburg is so special that he can’t come to any of our conferences? Or more broadly, anybody who disagrees with Arne Duncan or Michelle Rhee cannot attend? Or that anybody will be barred who raises the idea that there are serious problems with a lot of the data that is being collected, and that a lot of these mathematical models are built upon an utterly shaky and weak deck of cards, and give wrong results? (I refer to the sudden surge, across the nation, in “Value-added methods”.)

Or was it just someone local who really hates me? Gosh, I didn’t think I was so infamous. Should I be flattered?

Meanwhile, I began looking closer at the DQC site, which I had only glanced at earlier.

A lot of what I saw, or didn’t see, shocked me.

I had no idea that this group has been around for several years and has been busy attempting essentially to federalize/nationalize all data about schools and everything else, in all 50 states and all the counties and school boards and city councils and individual schools and teachers and classrooms. I think it’s kinda scary when anybody at all has that much information about you, AND THAT SO MUCH OF IT IS WRONG!

It’s clear now that there is a lot of fraudulent test data in a lot of schools and school districts across the country, in many cities and states. Some states have gone after the cheating problem with serious investigations that netted scores, or is it hundreds, of confessions, and long lists of chargees. A lot of folks who understand a little bit about human nature said, “I told you so. When people are put under pressure to keep their jobs by meeting impossible goals, then they cheat. People cheat to achieve large bonuses, as many bankers, politicans, and businessmen bear witness.” (Some do a little time in jail, but generally they get away with their riches with merely some bad headlines in the press for a week or so.) They also point out, “When you dangle rewards in front of people, you get lousy results. People do their best work because they really want to, deep down, not for a paycheck.” (Confirmed in many studies of learning and human response, and once again by Dr. Roland Fryer of Harvard, as I’ve mentioned earlier. Search this blog, top right of this page.)

And this fraud has sometimes devastating effects for honest teachers, and doesn’t help the kids at all. For example, suppose that teacher T has an average-achieving class as scored onthe NCLB test. The kids are promoted to the next grade, where they have teacher U. Teacher U, with or without the knowledge of anybody else, cheats, somehow or other, and raises the kids’ scores significantly above where they would have without the cheating. [there are many ways of doing it, I realized by doing some brainstorming with some other folks a while ago. Don't think I'm going to list them, at least not here. Never tried any of them!!!!!]

So, Teacher U earns a big fat bonus check, her name on certificates and programs, perhaps other benefits as well, like a promotion to master educator so she can get out of the classroom… Who knows?

Next year, those kids go to teacher V. Teacher V receives kids, and V eventually notices an enormous discrepancy between the kids’ achievements on tests made by the school district’s own publisher, given in the second week of school, way before teacher V had learned all of their mothers’ last names by heart or had had much time to have much of an impact on the kids at all. I mean, how could all of those kids go from “Advanced” or “Proficient” to “Basic” and “Below Basic” in the time from June 20 to about August 25? That sort of thing happens only in the magical world of high-stakes, corruptible testing as we have now.

Kids in V’s class who should have had an extra class in math or reading or whatever because they are so far behind, don’t receive those services, because they are supposedly well above average. Sometimes those extra classes really help, too.

So there is teacher V with a class of kids with contradictory data: last year’s data which showed wonderful scores, and all of this year’s data, starting with the very first official, district-wide standardized, multiple-choice, machine-scored commercially-produced and -marketed test; a test supposed to “inform” the instructor what he/she should instruct. Teacher V doesn’t cheat; instead he/she works his/her but off trying to use all that data to determine what to teach, how and when. But V still does not perform miracles. (We know that miracles are, at best, and by definition, rather rare.) In other words, V’s students score much lower on the NCLB test than they had done the previous year’s test would supposedly predict by the incomprehensibly complex mathematical data algorithm that the school district is using. There are many such models; they are not all the same.)

Teacher V is notified in late July that he/she is terminated because of those bad test scores.

Remember teacher U, who cheated? He or she might be the teacher leader for that grade level next year. And might show the others just how it’s done…

But there seems to be no awareness of this by our ‘friends’ at DQC.

We all know people who have lied to us. And we’ve all told lies before. All of us do it from time to time. We only disagree about exactly how much, and when we tell those lies, and whether they are justifiable or not. Why doesn’t DQC seem to emphasize the riskiness in all of this?

Curious, and knowing that there was no time to search through the enormous DQC website by mouse, I used their “search” engine to look for a few terms, to see if they even consider the concepts of fraud and cheating.

Here are the number of hits I got on their website:

Fraud – 0 hits. Zero.

Cheat – 1 hit; the reference pointed to a page on theft of social security numbers.

dishonest – 0 hits.

data – 750 hits on their website (that was just to make sure the search engine was working!)

erasure – 0.

erase – 0.

form – 400 (yup, it’s still working. They are very much into forms and other paperwork…)

erasing – 0 {I didn’t know if it would consider erasure, erase, and erasing differently. I still don’t, but none of them turned up anything}.

jeopardy – 0.

risk – 9 .

This last one is quite instructive. When I got those 9 hits from the word “risk”, one of the hits was the following, and I swear that I am not making this up:

“Guide to Protecting the Confidentiality of Personally Identifiable Information (PII).”

Not that that’s a bad thing, of course. Had you come across the acronym PII before? Wow. They’ve really been at this for a while, haven’t they? An entirely new set of acronyms…

The entire bureaucratic mind-set is stunning. Reminds me of nothing more than the US military and federal government bureaucracies, and I don’t know about you, but they scare me. I want as little as possible to do with them; and with that, my sympathies definitely lie with the people who rose up and fought a revolutionary war in this country for about 7 years and defeated the greatest naval and military power in the world at that time.  They didn’t want something like Big Brother watching over them. Not just the right-wingers want some privacy! I mean, I think everybody has a right to their own private life, and unless you’re the type of weirdo that likes everybody to know everything about you, there’s a lot that I, and just about all of you as well, want to keep private.

After all, for one thing, medical conditions can sometimes cause really enormous financial problems (like going bankrupt) if the wrong people find out about them — namely insurance companies. (Irony of ironies!)

Now some of the stuff on the DQC website seems very reasonable and logical, and I find no objection to at all. Individually.

But it seems to me that they really think that putting all of the quantifiable data on every person into a single database will immediately produce policy conclusions that must be promptly put into effect.

Not so fast.

What if the data you are using is utterly fraudulent in the fist place?

Wouldn’t that then be like the phony data described in various books and stories by Gulag prisoners and others in the USSR or in Kafka stories – perverse incentives to do useless things to survive by cheating. (I don’t think Kafka ever quite figured out how some people can prosper and live quite beautiful lives of luxury by cheating others.)

Are those formulas really valid? After all, you aren’t using all of that data in those conclusions you just drew. You cherry-picked the numbers you wanted to get the result you wanted using a formula of your own choosing, producing the result in a clearly visible way, or a way that appeals to certain decision-makers.

What if we use different data, different formulas, and throw out stuff that we think is invalid? Well, we’d probably get different results. And what about all of those internal TFA survey documents that apparently show no positive results, or else they would have been trumpeted in the national media?

Instead, you could use this as a data base for social “scientists” to examine and try a number of different models, and see whether stuff works. But unlike with drug or medical companies, we need to publicize the results of experiments with NEGATIVE results as well. (BTW: it’s not only companies like Glaxo or Pfizer that do this sort of selective data blackout.  I remember looking at the NIH alternative medicine website a few years ago to see what the results were for the various experiments that NIH had paid good money to fund – you know, on acupuncture and other stuff like that. I did this search many years after the studies had started. But NONE of them had any published results.

Does that mean that they were all incompetent and couldn’t finish their studies?

In that case, they should pay us (i.e., the public in the form of NIH) back with interest or penalties or something.

Or does it mean that the results weren’t published because they did not show what the authors WANTED them to show?

I suspect the latter.

It also really began to bother me that so many person-years of highly skilled, highly-paid labor had been poured into this project, linking up pretty much all of the data on every single person in every state, county, city, and every single teacher and school and every student, just about completely out of the limelight. For lots and lots of money, in other words, some of it privately raised, but I’ll bet there is a lot of taxpayer money in there as well in one way or another.

Keep in mind that a relative handful of testing-and-data companies will earn enormous revenues from processing all of this enormous database of data .

While none of us citizens have had a say on this at all.

And while so much of the data is flat out WRONG.

Which seems to be utterly ignored.

Be for real: In the entire Data Quality Campaign, is there no written acknowledgment that there is cheating and erasures on high-stakes tests? No procedures to discount and correct for that? No calls for additional forensic studies of cheating patterns?

Answer: No. The word “forensic” returned no hits at all on their website.

That’s pretty disturbing.

Or did I miss something?

==========

* no, not for money! I think that the views of teachers, parents and students in the trenches, so to speak, should be heard, and the problems with fraud and misrepresentation in test data are very serious and must be discussed. Roland Fryer’s conclusions also need to be aired. {and I speak as someone who has disagreed with him, in print, in the past.}

Published in: on January 25, 2012 at 10:53 am  Comments (1)  
Tags: , , , ,
Follow

Get every new post delivered to your Inbox.

Join 349 other followers