Against Proposed DoE Regulations on ESSA

This is from Monty Neill:

===========

Dear Friends,

The U.S. Department of Education (DoE) has drafted regulations for
implementing the accountability provisions of the Every Student Succeeds
Act (ESSA). The DOE proposals would continue test-and-punish practices
imposed by the failed No Child Left Behind (NCLB) law. The draft
over-emphasizes standardized exam scores, mandates punitive
interventions not required in law, and extends federal micro-management.
The draft regulations would also require states to punish schools in
which larger numbers of parents refuse to let their children be tested.
When DoE makes decisions that should have been set locally in
partnership with educators, parents, and students, it takes away local
voices that ESSA tried to restore.

You can help push back against these dangerous proposals in two ways:

First, tell DoE it must drop harmful proposed regulations. You can
simply cut and paste the Comment below into DoE’s website at
https://www.regulations.gov/#!submitComment;D=ED-2016-OESE-0032-0001
<https://www.regulations.gov/#%21submitComment;D=ED-2016-OESE-0032-0001>
or adapt it into your own words. (The text below is part of FairTest’s
submission.) You could emphasize that the draft regulations steal the
opportunity ESSA provides for states and districts to control
accountability and thereby silences the voice of educators, parents,
students and others.

Second, urge Congress to monitor the regulations. Many Members have
expressed concern that DoE is trying to rewrite the new law, not draft
appropriate regulations to implement it. Here’s a letter you can easily
send to your Senators and Representative asking them to tell leaders of
Congress’ education committees to block DoE’s proposals:
https://actionnetwork.org/letters/tell-congress-department-must-drop-proposed-accountability-regulations.

Together, we can stop DoE’s efforts to extend NLCB policies that the
American people and Congress have rejected.

FairTest

Note: DoE website has a character limit; if you add your own comments,
you likely will need to cut some of the text below:

*/You can cut and paste this text into the DoE website:/*

I support the Comments submitted by FairTest on June 15 (Comment #).
Here is a slightly edited version:

While the accountability provision in the Every Student Succeeds Act
(ESSA) are superior to those in No Child Left Behind (NCLB), the
Department of Education’s (DoE) draft regulations intensify ESSA’s worst
aspects and will perpetuate many of NCLB’s most harmful practices. The
draft regulations over-emphasize testing, mandate punishments not
required in law, and continue federal micro-management. When DoE makes
decisions that should be set at the state and local level in partnership
with local educators, parents, and students, it takes away local voices
that ESSA restores. All this will make it harder for states, districts
and schools to recover from the educational damage caused by NLCB – the
very damage that led Congress to fundamentally overhaul NCLB’s
accountability structure and return authority to the states.

The DoE must remove or thoroughly revise five draft regulations:

_DoE draft regulation 200.15_ would require states to lower the ranking
of any school that does not test 95% of its students or to identify it
as needing “targeted support.” No such mandate exists in ESSA. This
provision violates statutory language that ESSA does not override “a
State or local law regarding the decision of a parent to not have the
parent’s child participate in the academic assessments.” This regulation
appears designed primarily to undermine resistance to the overuse and
misuse of standardized exams.

_Recommendation:_ DoE should simply restate ESSA language allowing the
right to opt out as well as its requirements that states test 95% of
students in identified grades and factor low participation rates into
their accountability systems. Alternatively, DoE could write no
regulation at all. In either case, states should decide how to implement
this provision.

_DoE draft regulation 200.18_ transforms ESSA’s requirement for
“meaningful differentiation” among schools into a mandate that states
create “at least three distinct levels of school performance” for each
indicator. ESSA requires states to identify their lowest performing five
percent of schools as well as those in which “subgroups” of students are
doing particularly poorly. Neither provision necessitates creation of
three or more levels. This proposal serves no educationally useful
purpose. Several states have indicated they oppose this provision
because it obscures rather than enhances their ability to precisely
identify problems and misleads the public. This draft regulation would
pressure schools to focus on tests to avoid being placed in a lower
level. Performance levels are also another way to attack schools in
which large numbers of parents opt out, as discussed above.

_DoE draft regulation 200.18_ also mandates that states combine multiple
indicators into a single “summative” score for each school. As Rep. John
Kline, chair of the House Education Committee, pointed out, ESSA
includes no such requirement. Summative scores are simplistically
reductive and opaque. They encourage the flawed school grading schemes
promoted by diehard NCLB defenders.

_Recommendation:_ DoE should drop this draft regulation. It should allow
states to decide how to use their indicators to identify schools and
whether to report a single score. Even better, the DoE should encourage
states to drop their use of levels.

_DoE draft regulation 200.18_ further proposes that a state’s academic
indicators together carry “much greater” weight than its “school
quality” (non-academic) indicators. Members of Congress differ as to the
intent of the relevant ESSA passage. Some say it simply means more than
50%, while others claim it implies much more than 50%. The phrase “much
greater” is likely to push states to minimize the weight of non-academic
factors in order to win plan approval from DOE, especially since the
overall tone of the draft regulations emphasizes testing.

_Recommendation: _The regulations should state that the academic
indicators must count for more than 50% of the weighting in how a state
identifies schools needing support.

_DoE draft regulation 200.18_ also exceeds limits ESSA placed on DoE
actions regarding state accountability plans.

_DoE draft regulation 200.19_ would require states to use 2016-17 data
to select schools for “support and improvement” in 2017-18. This leaves
states barely a year for implementation, too little time to overhaul
accountability systems. It will have the harmful consequence of
encouraging states to keep using a narrow set of test-based indicators
and to select only one additional “non-academic” indicator.

_Recommendation:_ The regulations should allow states to use 2017-18
data to identify schools for 2018-19. This change is entirely consistent
with ESSA’s language.

Lastly, we are concerned that an additional effect of these unwarranted
regulations will be to unhelpfully constrain states that choose to
participate in ESSA’s “innovative assessment” program.


Monty Neill, Ed.D.; Executive Director, FairTest; P.O. Box 300204,
Jamaica Plain, MA 02130; 617-477-9792; http://www.fairtest.org; Donate
to FairTest: https://donatenow.networkforgood.org/fairtest

Judge in NY State Throws Out ‘Value-Added Model’ Ratings

I am pleased that in an important, precedent-setting case, a judge in New York State has ruled that using Value-Added measurements to judge the effectiveness of teachers is ‘arbitrary’ and ‘capricious’.

The case involved teacher Sheri Lederman, and was argued by her husband.

“New York Supreme Court Judge Roger McDonough said in his decision that he could not rule beyond the individual case of fourth-grade teacher Sheri G. Lederman because regulations around the evaluation system have been changed, but he said she had proved that the controversial method that King developed and administered in New York had provided her with an unfair evaluation. It is thought to be the first time a judge has made such a decision in a teacher evaluation case.”

In case you were unaware of it, VAM is a statistical black box used to predict how a hypothetical student is supposed to score on a Big Standardized Test one year based on the scores of every other student that year and in previous years. Any deviation (up or down) of that score is attributed to the teacher.

Gary Rubinstein and I have looked into how stable those VAM scores are in New York City, where we had actual scores to work with (leaked by the NYTimes and other newspapers). We found that they were inconsistent and unstable in the extreme! When you graph one year’s score versus next year’s score, we found that there was essentially no correlation at all, meaning that a teacher who is assigned the exact same grade level, in the same school, with very similar  students, can score high one year, low the next, and middling the third, or any combination of those. Very, very few teachers got scores that were consistent from year to year. Even teachers who taught two or more grade levels of the same subject (say, 7th and 8th grade math) had no consistency from one subject to the next. See my blog  (not all on NY City) herehere, here,  here, herehere, here, here,  herehere, and here. See Gary R’s six part series on his blog here, here, here, here, here, and here. As well as a less technical explanation here.

Mercedes Schneider has done similar research on teachers’ VAM scores in Louisiana and came up with the same sorts of results that Rubinstein and I did.

Which led all three of us to conclude that the entire VAM machinery was invalid.

And which is why the case of Ms. Lederman is so important. Similar cases have been filed in numerous states, but this is apparently the first one where a judgement has been reached.

(Also read this. and this.)

A 3 minute news segment on the NY lawsuit against Value-Added Modeling

Even-handed video from Al-Jazeera interviewing some of the people involved in the anti-Value-Added-Model lawsuit against Value-Added Model evalutations of teachers in New York state.

You may have heard of the lawsuit – it was filed by elementary teacher Sherri Lederman and her lawyer, who is also her husband.

Parents, students, and administrators had nothing but glowing praise for teacher Lederman. In fact, Sherri’s principal is quoted as saying,

“any computer program claiming Lederman ‘ineffective’ is fundamentally flawed.”

Lederman herself states,

“The model doesn’t work. It’s misrepresenting an entire profession.”

Statistician Aaron Pallas of Columbia University states,

“In Sherri’s case, she went from a 14 out of 20, which was fully in the effective range, to 1 out of 20 [the very next year], ineffective, and we look at that and say, ‘How can this be? Can a teacher’s performance really have changed that dramatically from one year to the next?’

“And if the numbers are really jumping around like that, can we really trust that they are telling us something important about a teacher’s career?”

Professor Pallas could perhaps have used one of my graphs as a visual aid, to help show just how much those scores do jump around from year to year, as you see here. This one shows how raw value-added scores for teachers in New York City in school year 2005-2006 correlated with those very same teachers, teaching the same grade level students in the very same schools the exact same subjects, one year later. Gary Rubinstein has similar graphs. You can look here, here, here, or here. if you want to see some more from me on this topic.

The plot that follows is a classic case of ‘nearly no correlation whatsoever’ that we now teach to kids in middle school.

In other words, yes, teachers’ scores do, indeed jump around like crazy from year to year. If you were above average on VAM one year – that is, to the right of the Y axis anywhere, it is quite likely that you will end under the X-axis (and hence below average) the next year. Or not.

I am glad somebody is finally taking this to court, because this sort of mathematics of intimidation has got to stop.

nyc raw value added scores sy 0506 versus 0607

Important Article Shows that ‘Value-Added’ Measurements are Neither Valid nor Reliable

As you probably know, a handful of agricultural researchers and economists have come up with extremely complicated “Value-Added” Measurement (VAM) systems that purport to be able to grade teachers’ output exactly.

These economists (Hanushek, Chetty and a few others) claim that their formulas are magically mathematically able to single out the contribution of every single teacher to the future test scores and total lifetime earnings of their students 5 to 50 years into the future. I’m not kidding.

Of course, those same economists claim that the teacher is the single most important variable affecting their student’s school and trajectories – not family background or income, nor peer pressure, nor even whole-school variables. (Many other studies have shown that the effect of any individual teacher, or all teachers, is pretty small – from 1% to 14% of the entire variation, which corresponds to what I found during my 30 years of teaching … ie, not nearly as much of an impact as I would have liked [or feared], one way or another…)

Diane Ravitch has brought to my attention an important study by Stuart Yen at UMinn that (once again) refutes those claims, which are being used right now in state after state and county after county, to randomly fire large numbers of teachers who have tried to devote their lives to helping students.

According to the study, here are a few of the problems with VAM:

1. As I have shown repeatedly using the New York City value-added scores that were printed in the NYTimes and NYPost, teachers’ VAM scores vary tremendously over time. (More on that below; note that if you use VAM scores, 80% of ALL teachers should be fired after their first year of teaching) Plus RAND researchers found much the same thing in North CarolinaAlso see this. And this.

2. Students are not assigned randomly to teachers (I can vouch for that!) or to schools, and there are always a fair number of students for whom no prior or future data is available, because they move to other schools or states, or drop out, or whatever; and those students with missing data are NOT randomly distributed, which pretty makes the whole VAM setup an exercise in futility.

3. The tests themselves often don’t measure what they are purported to measure. (Complaints about the quality of test items are legion…)

Here is an extensive quote from the article. It’s a section that Ravitch didn’t excerpt, so I will, with a few sentences highlighted by me, since it concurs with what I have repeatedly claimed on my blog:

A largely ignored problem is that true teacher performance, contrary to the main assumption underlying current VAM models, varies over time (Goldhaber & Hansen, 2012). These models assume that each teacher exhibits an underlying trend in performance that can be detected given a sufficient amount of data. The question of stability is not a question about whether average teacher performance rises, declines, or remains flat over time.

The issue that concerns critics of VAM is whether individual teacher performance fluctuates over time in a way that invalidates inferences that an individual teacher is “low-” or “high-” performing.

This distinction is crucial because VAM is increasingly being applied such that individual teachers who are identified as low-performing are to be terminated. From the perspective of individual teachers, it is inappropriate and invalid to fire a teacher whose performance is low this year but high the next year, and it is inappropriate to retain a teacher whose performance is high this year but low next year.

Even if average teacher performance remains stable over time, individual teacher performance may fluctuate wildly from year to year.  (my emphasis – gfb)

While previous studies examined the intertemporal stability of value-added teacher rankings over one-year periods and found that reliability is inadequate for high-stakes decisions, researchers tended to assume that this instability was primarily a function of measurement error and sought ways to reduce this error (Aaronson, Barrow, & Sander, 2007; Ballou, 2005; Koedel & Betts, 2007; McCaffrey, Sass, Lockwood, & Mihaly, 2009).

However, this hypothesis was rejected by Goldhaber and Hansen (2012), who investigated the stability of teacher performance in North Carolina using data spanning 10 years and found that much of a teacher’s true performance varies over time due to unobservable factors such as effort, motivation, and class chemistry that are not easily captured through VAM. This invalidates the assumption of stable teacher performance that is embedded in Hanushek’s (2009b) and Gordon et al.’s (2006) VAM-based policy proposals, as well as VAM models specified by McCaffrey et al. (2009) and Staiger and Rockoff (2010) (see Goldhaber & Hansen, 2012, p. 15).

The implication is that standard estimates of impact when using VAM to identify and replace low-performing teachers are significantly inflated (see Goldhaber & Hansen, 2012, p. 31).

As you also probably know, the four main ‘tools’ of the billionaire-led educational DEform movement are:

* firing lots of teachers

* breaking their unions

* closing public schools and turning education over to the private sector

* changing education into tests to prepare for tests that get the kids ready for tests that are preparation for the real tests

They’ve been doing this for almost a decade now under No Child Left Untested and Race to the Trough, and none of these ‘reforms’ have shown to make any actual improvement in the overall education of our youth.


What I actually had time to say …

Since I had to abbreviate my remarks, here is what I actually said:

I am Guy Brandenburg, retired DCPS mathematics teacher.

To depart from my text, I want to start by proposing a solution: look hard at the collaborative assessment model being used a few miles away in Montgomery County [MD] and follow the advice of Edwards Deming.

Even though I personally retired before [the establishment of the] IMPACT [teacher evaluation system], I want to use statistics and graphs to show that the Value-Added measurements that are used to evaluate teachers are unreliable, invalid, and do not help teachers improve instruction. To the contrary: IVA measurements are driving a number of excellent, veteran teachers to resign or be fired from DCPS to go elsewhere.

Celebrated mathematician John Ewing says that VAM is “mathematical intimidation” and a “modern, mathematical version of the Emperor’s New Clothes.”

I agree.

One of my colleagues was able to pry the value-added formula [used in DC] from [DC data honcho] Jason Kamras after SIX MONTHS of back-and-forth emails. [Here it is:]

value added formula for dcps - in mathtype format

One problem with that formula is that nobody outside a small group of highly-paid consultants has any idea what are the values of any of those variables.

In not a single case has the [DCPS] Office of Data and Accountability sat down with a teacher and explained, in detail, exactly how a teacher’s score is calculated, student by student and class by class.

Nor has that office shared that data with the Washington Teachers’ Union.

I would ask you, Mr. Catania, to ask the Office of Data and Accountability to share with the WTU all IMPACT scores for every single teacher, including all the sub-scores, for every single class a teacher has.

Now let’s look at some statistics.

My first graph is completely random data points that I had Excel make up for me [and plot as x-y pairs].

pic 3 - completely random points

Notice that even though these are completely random, Excel still found a small correlation: r-squared was about 0.08 and r was about 29%.

Now let’s look at a very strong case of negative correlation in the real world: poverty rates and student achievement in Nebraska:

pic  4 - nebraska poverty vs achievement

The next graph is for the same sort of thing in Wisconsin:

pic 5 - wisconsin poverty vs achievement

Again, quite a strong correlation, just as we see here in Washington, DC:

pic 6 - poverty vs proficiency in DC

Now, how about those Value-Added scores? Do they correlate with classroom observations?

Mostly, we don’t know, because the data is kept secret. However, someone leaked to me the IVA and classroom observation scores for [DCPS in] SY 2009-10, and I plotted them [as you can see below].

pic 7 - VAM versus TLF in DC IMPACT 2009-10

I would say this looks pretty much no correlation at all. It certainly gives teachers no assistance on what to improve in order to help their students learn better.

And how stable are Value-Added measurements [in DCPS] over time? Unfortunately, since DCPS keeps all the data hidden, we don’t know how stable these scores are here. However, the New York Times leaked the value-added data for NYC teachers for several years, and we can look at those scores to [find out]. Here is one such graph [showing how the same teachers, in the same schools, scored in 2008-9 versus 2009-10]:

pic 8 - value added for 2 successive years Rubenstein NYC

That is very close to random.

How about teachers who teach the same subject to two different grade levels, say, fourth-grade math and fifth-grade math? Again, random points:

pic 9 - VAM for same subject different grades NYC rubenstein

One last point:

Mayor Gray and chancellors Henderson and Rhee all claim that education in DC only started improving after mayoral control of the schools, starting in 2007. Look for yourself [in the next two graphs].

pic 11 - naep 8th grade math avge scale scores since 1990 many states incl dc

 

pic 12 naep 4th grade reading scale scores since 1993 many states incl dc

Notice that gains began almost 20 years ago, long before mayoral control or chancellors Rhee and Henderson, long before IMPACT.

To repeat, I suggest that we throw out IMPACT and look hard at the ideas of Edwards Deming and the assessment models used in Montgomery County.

My Testimony Yesterday Before DC City Council’s Education Subcommittee ‘Roundtable’

Testimony of Guy Brandenburg, retired DCPS mathematics teacher before the DC City Council Committee on Education Roundtable, December 14, 2013 at McKinley Tech

 

Hello, Mr. Catania, audience members, and any other DC City Council members who may be present. I am a veteran DC math teacher who began teaching in Southeast DC about 35 years ago, and spent my last 15 years of teaching at Alice Deal JHS/MS. I taught everything from remedial 7th grade math through pre-calculus, as well as computer applications.

Among other things, I coached MathCounts teams at Deal and at Francis JHS, with my students often taking first place against all other public, private, and charter schools in the city and going on to compete against other state teams. As a result, I have several boxes full of trophies and some teaching awards.

Since retiring, I have been helping Math for America – DC (which is totally different from Teach for America) in training and mentoring new but highly skilled math teachers in DC public and charter schools; operating a blog that mostly concerns education; teaching astronomy and telescope making as an unpaid volunteer; and also tutoring [as a volunteer] students at the school closest to my house in Brookland, where my daughter attended kindergarten about 25 years ago.

But this testimony is not about me; as a result, I won’t read the previous paragraphs aloud.

My testimony is about how the public is being deceived with bogus statistics into thinking things are getting tremendously better under mayoral control of schools and under the chancellorships of Rhee and Henderson.

In particular, I want to show that the Value-Added measurements that are used to evaluate teachers are unreliable, invalid, and do not help teachers improve their methods of instruction. To the contrary: IVA measurements are driving a number of excellent, veteran teachers to resign or be fired from DCPS to go elsewhere.

I will try to show this mostly with graphs made by me and others, because in statistics, a good scatter plot is worth many a word or formula.

John Ewing, who is the president of Math for America and is a former executive director of the American Mathematical Society, wrote that VAM is “mathematical intimidation” and not reliable. I quote:

pic 1 john ewing

 

In case you were wondering how the formula goes, this is all that one of my colleagues was able to pry from Jason Kamras after SIX MONTHS of back-and-forth emails asking for additional information:

pic 2 dcps iva vam formula

One problem with that formula is that nobody outside a small group of highly-paid consultants has any idea what are the values of any of those variables. What’s more, many of those variables are composed of lists or matrices (“vectors”) of other variables.

In not a single case has the Office of Data and Accountability sat down with a teacher and explained, in detail, exactly how a teachers’ score is calculated, student by student, class by class, test score by test score.

Nor has that office shared that data with the Washington Teachers’ Union.

It’s the mathematics of intimidation, lack of accountability, and obfuscation.

I would ask you, Mr. Catania, to ask the Office of Data and Accountability to share with the WTU all IMPACT scores for every single teacher, including all the sub-scores, such as those for IVA and classroom observations.

To put a personal touch to my data, one of my former Deal colleagues shared with me that she resigned from DCPS specifically because her IVA scores kept bouncing around with no apparent reason. In fact, the year that she thought she did her very best job ever in her entire career – that’s when she earned her lowest value-added score. She now teaches in Montgomery County and recently earned the distinction of becoming a National Board Certified teacher – a loss for DCPS students, but a gain for those in Maryland.

Bill Turque of the Washington Post documented the case of Sarah Wysocki, an excellent teacher with outstanding classroom observation results, who was fired by DCPS for low IVA scores. She is now happily teaching in Virginia. I am positive that these two examples can be multiplied many times over.

Now let’s look at some statistics. As I mentioned, in many cases, pictures and graphs speak more clearly than words or numbers or equations.

My first graph is of completely random data points that should show absolutely no correlation with each other, meaning, they are not linked to each other in any way. I had my Excel spreadsheet to make two lists of random numbers, and I plotted those as the x- and y- variables on the following graph.

pic 3 - completely random points

I asked Excel also to draw a line of best fit and to calculate the correlation coefficient R and R-squared. It did so, as you can see, R-squared is very low, about 0.08 (eight percent). R, the square root of R-squared, is about 29 percent.

Remember, those are completely random numbers generated by Excel.

Now let’s look at a very strong correlation of real numbers: poverty rates and student achievement in a number of states. The first one is for Nebraska.

pic  4 - nebraska poverty vs achievement

R would be about 94% in this case – a very strong correlation indeed.

The next one is for Wisconsin:

pic 5 - wisconsin poverty vs achievement

Again, quite a strong correlation – a negative one: the poorer the student body, the lower the average achievement, which we see repeated in every state and every country in the world. Including DC, as you can see here:

 

pic 6 - poverty vs proficiency in DC

Now, how about those Value-Added scores? Do they correlate with classroom observations?

Mostly, we don’t know, because the data is kept secret. However, someone leaked to me the IVA and classroom observation scores for all DCPS teachers for SY 2009-10, and I plotted them. Is this a strong correlation, or not?

pic 7 - VAM versus TLF in DC IMPACT 2009-10

I would say this looks pretty much like no correlation at all. What on earth are these two things measuring? It certainly gives teachers no assistance on what to improve in order to help their students learn better.

And how stable are Value-Added measurements over time? If they are stable, that would mean that we might be able to use them to weed out the teachers who consistently score at the bottom, and reward those who consistently score at the top.

Unfortunately, since DCPS keeps all the data hidden, we don’t exactly know how stable these scores are here. However, the New York Times leaked the value-added data for NYC teachers for several years, and we can look at those scores to see.

Here is one such graph:

pic 8 - value added for 2 successive years Rubenstein NYC

That is very close to random.

How about teachers who teach the same subject to two different grade levels (say, fourth-grade math and fifth-grade math)? Again, random points:

pic 9 - VAM for same subject different grades NYC rubenstein

One thing that all veteran teachers agree on is that they stunk at their job during their first year and got a lot better their second year. This should show up on value-added graphs of year 1 versus year 2 scores for the same teachers, right?

Wrong.

Take a look:

pic 10 - VAM first yr vs second year same teacher rubenstein nyc

One last point:

Mayor Gray and chancellors Henderson and Rhee all claim that education in DC only started improving after mayoral control of the schools, starting in 2007.

Graphs and the NAEP show a different story. We won’t know until next week how DCPS and the charter schools did, separately, for 2013, but the following graphs show that reading andmath scores for DC fourth- and eighth-graders have been rising fairly steadily for nearly twenty years, or long before mayoral control or the appointments of our two chancellors (Rhee and Henderson).

 

 

pic 13 - naep reading 8th since 1998 scale scores many states incl dc

 

pic 12 naep 4th grade reading scale scores since 1993 many states incl dc

pic 11 - naep 8th grade math avge scale scores since 1990 many states incl dc

 

 

A New-ish Study Showing Problems with Value-Added Measurements

I haven’t read this yet, but it looks useful:

http://www.ets.org/Media/Research/pdf/PICANG14.pdf

Published in: on December 12, 2013 at 10:02 am  Leave a Comment  
Tags: , , ,

Another Study Showing that VAM is Bogus and Unstable

Here is another study that shows that Value-Added measurements for teachers are extremely unstable over time. It’s by one Mercedes K. Schneider, and it was done for Louisiana. You can read all of the details yourself. Here I am going to reproduce a couple of the key tables:

stability of VAM ratings or not

and I also quote some of her explanation:

“Each number in the table is a percentage of teachers in the study/actual number of teachers who were first ranked one way using 2008-09 student test scores (reading to the left) then ranked either the same way (bolded diagonal) or a different way (all numbers not bolded) using 2009-10 student test scores (reading at the top). For example, the percentage 4.5% (23 teachers) in Table 6 (immediately above this text) represents the percentage of ELA teachers originally ranked in 2008-09 in the top 91-99% (reading to the left) but reranked in 2009-10 in the bottom 1-10% (reading at the top of the column) given that the teachers changed nothing in their teaching. 
.
“Thus, these two tables represent how poorly the standardized tests classify teachers (2008-09) then reclassify teachers (2009-10) into their original rankings. Tables 5 and 6 are a test of the consistency of using standardized tests to classify teachers. It is like standing on a bathroom scale; reading your weight; stepping off (no change in your weight); then, stepping on the scale again to determine how consistent the scale is at measuring your weight. Thus, if the standardized tests are stable (consistent) measures, they will reclassify teachers into their original rankings with a high level of accuracy. This high level of accuracy is critical if school systems are told they must use standardized tests to determine employment and merit pay decisions. I have bolded the cells on the diagonals of both tables to show just how unstable these two standardized tests are at classifying then correctly reclassifying teachers. If the iLEAP and LEAP-21 were stable, then the bolded percentages on the diagonals of both tables would be very high, almost perfect (99%).
.
“Here is what we see from the diagonal in Table 5: 
.
“If a math teacher is originally ranked as the lowest, without altering his or her teaching, the teacher will likely be re-ranked in the lowest category only 26.8% of the time. Conversely, without altering his/her teaching, a math teacher ranked as the highest would likely be re-ranked in the highest group only 45.8% of the time even if she/he continued to teach the same way. (…)
“A math teacher originally ranked in the highest category will be re-ranked in the middle category 35.1% of the time and re-ranked in the lowest category 1.8% of the time. These alterations in ranking are out of the teacher’s control and do not reflect any change in teaching. Even though 1.8% might seem low, notice that in the study alone, this represented 8 math teachers, 8 real human beings, who could potentially lose their jobs and face the stigma of being labeled “low
performers.”
.
“As we did for Table 5, let’s consider the diagonal for Table 6:
.
“If an ELA teacher is originally ranked as the lowest, without altering his or her teaching, the teacher will likely be re-ranked in the lowest category only 22.3% of the time. Conversely, without altering his/her teaching, an ELA teacher ranked as the highest would likely be re-ranked in the highest group only 37.5% of the time even if she/he continued to teach the same way.  (…)
.
“An ELA teacher originally ranked in the highest category will be re-ranked in the middle category 37.1% of the time and re-ranked in the lowest category 4.5% of the time. These alterations in ranking are out of the teacher’s control and do not reflect any change in teaching.
.
“Even though 4.5% might seem low, notice that in the study alone, this represented 23 ELA teachers who could potentially lose their jobs and face the stigma of being labeled “low performers.” “
———–
Your thoughts? To leave a comment, click on the way-too-tiny “Leave a Comment” button below.
Published in: on January 8, 2013 at 12:04 pm  Leave a Comment  
Tags: , ,

Law and Statistics at Work in Combating Rhee-Style Educational Deforms

This article is worth reading a couple of times. It makes the point that teachers and their unions need to look into the legal and statistical framework that supposedly upholds Value-Added Methodologies (VAM) and to challenge both.

I agree!

http://schoolfinance101.wordpress.com/2012/03/31/firing-teachers-based-on-bad-vam-versus-wrong-sgp-measures-of-effectiveness-legal-note/

Published in: on April 18, 2012 at 7:20 am  Leave a Comment  
Tags:

Teacher VAM scores aren’t stable over time in Florida, either

I quote from a paper studying whether value-added scores for the same teachers tend to be consistent. In other words, does VAM allow us a chance to pick out the crappy teachers and give bonuses to the good one?

The answer, in complicated language, is essentially NO, but here is how they word it:

“Recently, a number of school districts have begun using measures of teachers’ contributions to student test scores or teacher “value added” to determine salaries and other monetary rewards.

In this paper we investigate the precision of valueadded measures by analyzing their inter-temporal stability.

We find that these measures of teacher productivity are only moderately stable over time, with year-to-year correlations in the range of 0.2-0.3.”

Or in plain English, and if you know anything at all about scatter plots and linear correlation, those scores wander all over the place and should never be used to provide any serious evidence about anything. Speculation, perhaps, but not policy or hiring or firing decisions of any sort.

They do say that they have some statistical tricks that allow them to make the correlation look better, but I don’t trust that sort of thing.  It’s not real.

Here’s a table from the paper. Look at those R values, and note that if you squared those correlation constants (go ahead, use your calculator on your cell phone) you get numbers that are way, way smaller – like what I and Gary Rubenstein reported concerning DCPS and NYCPS.

For your convenience, I circled the highest R value, 0.61, in middle schools on something called the normed FCAT-SSS, whatever that is (go ahead and look it up if it interests you) in  Duval county, Florida, one of the places where they had data. I also circled the lowest R value, 0.07, in Palm Beach  county, on the FCAT-NRT, whatever that is.

I couldn’t resist, so 0.56^2 is about 0.31 as an r-squared, which is moderate. There is only one score anywhere near that high 0.56, out of 24 such correlation calculations. The lowest value is 0.07 and if we square that and round it off we get an r-squared value of 0.005, shockingly low — essentially none at all.

The median correlation constant is about 0.285, which I indicated by circling two adjacent values of 0.28 and 0.29 in green. If you square that value you get r^2=0.08, which is nearly useless. Again.

I’m really sorry, but even though this paper was published four years ago, it’s still under wraps, or so it says?!?! I’m not supposed to quote from it? Well, to hell with that. it’s important data, for keereissake!

The title and authors are as follows, and perhaps they can forgive me. I don’t know how to contact them anyway. Does anybody have their contact information? Here is the title, credits, and warning:

THE INTERTEMPORAL STABILITY OF TEACHER EFFECT ESTIMATES *
by
Daniel F. McCaffrey;         Tim R. Sass;                         J. R. Lockwood
The RAND Corporation; Florida State University; The RAND Corporation
Original Version: April 9, 2008
This Version: June 27, 2008

*This paper has not been formally reviewed and should not be cited, quoted, reproduced, or retransmitted without the authors’ permission. This material is based on work supported by a supplemental grant to the National Center for Performance Initiatives funded by the United States Department of Education, Institute of Education Sciences. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of these organizations.

Published in: on March 30, 2012 at 3:58 pm  Comments (7)  
Tags: , ,
%d bloggers like this: