Bob Schaeffer’s Weekly Roundup of News on Testing Mania

This is entirely from Bob Schaeffer:

==============================================

With public schools closing for the summer, many states are reviewing their 2015-2016 testing experience (once again, not a pretty picture) and planning to implement assessment reforms in coming years.  You can help stop the U.S. Department of Education from promoting testing misuse and overuse by weighing in on proposed Every Student Succeeds Act regulations.

National  Act Now to Stop Federal Regulations That Reimpose Failed No Child Left Behind Test-and-Punish Policies

https://actionnetwork.org/letters/tell-congress-department-must-drop-proposed-accountability-regulations

Alaska
State Preps for Implementing New Federal Education Law
http://skagwaynews.com/school-preps-for-phasing-out-no-child-left-behind-policies/

Delaware
Teacher Evaluations Could Be Less Focused on Test scores
http://www.delawareonline.com/story/news/education/2016/06/20/test-scores-evaluations/86134396/

Florida
Legal Fight Looms Over Third Grade Retention Based on Test Participation
http://www.sun-sentinel.com/local/palm-beach/fl-opt-out-retention-20160619-story.html
Florida Parents Pressure School Board on Test-Use Policies
http://www.bradenton.com/news/local/education/article84734742.html

Georgia
School Chief Addresses Testing Meltdown
http://getschooled.blog.myajc.com/2016/06/17/state-school-chief-on-milestones-meltdown-were-fixing-it/

Indiana
Panel Unclear on Vision for New Assessments
http://indianapublicmedia.org/stateimpact/2016/06/14/istep-panel-unclear-vision-assessment/

Kansas
State Testing Time Will Be Reduced
http://www.kake.com/story/32231184/state-test-time-to-be-reduced

Kentucky
Feds Respond to State’s Accountability Plan Concerns
http://www.courier-journal.com/story/news/education/2016/06/16/us-ed-dept-responds-accountability-concerns/86010782/

Maryland
State Commission Passes Buck to Reduce Testing to Schools
http://baltimorepostexaminer.com/testing-commission-wraps-asking-local-school-systems-finish-work/2016/06/15
Maryland Students Say Too Much Testing
http://www.baltimoresun.com/news/opinion/readersrespond/bs-ed-testing-letter-20160617-story.html

Massachusetts
Schools to Help Map Assessments of the Future
http://www.capenews.net/bourne/news/bourne-to-help-map-future-of-school-assessments/article_4048811d-eddc-5195-ad20-eec61eb86a60.html

Missouri Schools Are More Than Test Scores
http://ccheadliner.com/opinion/local-viewpoint-jtsd-is-more-than-its-test-scores/article_0c9d7b60-3305-11e6-a685-cf3e9a4ffb56.html

New York
Test Flexibility for Students with Learning Disabilities is Step in Right Direction
http://www.lohud.com/story/opinion/editorials/2016/06/15/regents-disabilities-graduation-rule-change-editorial/85885818/
New York Families Fight Back Against Opt-Out Punishments
https://www.washingtonpost.com/news/answer-sheet/wp/2016/06/16/how-some-students-who-refused-to-take-high-stakes-standardized-tests-are-being-punished/

Ohio
State Eases Some Test Score Cut Offs
http://www.mydaytondailynews.com/news/news/state-eases-some-test-score-levels/nrgQZ/

Oklahoma
Legislature Ends Exit Exam Graduation Requirement
http://www.tulsaworld.com/homepagelatest/what-last-minute-change-in-student-testing-law-means-for/article_f69102e3-97c2-52bc-b616-4fcab147a186.html

Tennessee
State Comptroller Finds Computer Testing Problems Widespread
http://www.tennessean.com/story/news/education/2016/06/20/tennessee-comptroller-lists-online-test-issues-every-state/86137098/
Tennessee Testing Is “In a Transition Phase”
http://www.chalkbeat.org/posts/tn/2016/06/14/theme-of-junes-testing-task-force-meeting-were-in-a-transition-phase/

Texas
Scrapped STAAR Scores Add to Standardized Testing Frustration
http://www.breitbart.com/texas/2016/06/15/scrapped-staar-scores-add-frustration-standardized-testing-texas/
Texas Legislator Says State Should Not Pay for Flawed Tests
http://amarillo.com/news/local-news/2016-06-13
Texas Study Panel Not Yet Ready to Ditch State Standardized Exams
http://keranews.org/post/study-panel-not-ready-ditch-staar

Utah
State Residents Give Failing Grade to Common Core Standardized Testing
http://www.sltrib.com/news/4001870-155/tribune-poll-utahns-give-failing-grades

Wisconsin Test Changes Render Year-to-Year Comparisons Useless
http://www.wiscnews.com/baraboonewsrepublic/opinion/editorial/article_8b7bf9a8-5825-5791-a621-d02ed86c3b63.html

International
Nine Out of Ten British Teachers Say Test Prep Focus Hurts Students’ Mental Health
https://www.tes.com/news/school-news/breaking-news/nine-10-teachers-believe-sats-preparation-harms-childrens-mental

University Admission If High School GPA Is Best Predictor of College Outcomes, Why Do Schools Cling to ACT/SAT
http://getschooled.blog.myajc.com/2016/06/15/if-gpa-is-the-best-predictor-of-college-success-why-do-colleges-cling-to-act-and-sat/

Worth Reading
Opt-Out Movement Reflects Genuine Concerns of Parents
http://educationnext.org/opt-out-reflects-genuine-concerns-of-parents-forum-testing/
Worth Reading Study Finds More Testing, Less Play in Kindergarten
http://www.npr.org/sections/ed/2016/06/21/481404169/more-testing-less-play-study-finds-higher-expectations-for-kindergartners
Worth Reading Test Scores Are Poor Predictors of Life Outcomes
https://janresseger.wordpress.com/2016/06/17/test-scores-poor-indicator-of-students-life-outcomes-and-school-quality-new-consensus/

Bob Schaeffer, Public Education Director
FairTest: National Center for Fair & Open Testing
office-   (239) 395-6773   fax-  (239) 395-6779
mobile- (239) 699-0468
web-  http://www.fairtest.org

Against Proposed DoE Regulations on ESSA

This is from Monty Neill:

===========

Dear Friends,

The U.S. Department of Education (DoE) has drafted regulations for
implementing the accountability provisions of the Every Student Succeeds
Act (ESSA). The DOE proposals would continue test-and-punish practices
imposed by the failed No Child Left Behind (NCLB) law. The draft
over-emphasizes standardized exam scores, mandates punitive
interventions not required in law, and extends federal micro-management.
The draft regulations would also require states to punish schools in
which larger numbers of parents refuse to let their children be tested.
When DoE makes decisions that should have been set locally in
partnership with educators, parents, and students, it takes away local
voices that ESSA tried to restore.

You can help push back against these dangerous proposals in two ways:

First, tell DoE it must drop harmful proposed regulations. You can
simply cut and paste the Comment below into DoE’s website at
https://www.regulations.gov/#!submitComment;D=ED-2016-OESE-0032-0001
<https://www.regulations.gov/#%21submitComment;D=ED-2016-OESE-0032-0001>
or adapt it into your own words. (The text below is part of FairTest’s
submission.) You could emphasize that the draft regulations steal the
opportunity ESSA provides for states and districts to control
accountability and thereby silences the voice of educators, parents,
students and others.

Second, urge Congress to monitor the regulations. Many Members have
expressed concern that DoE is trying to rewrite the new law, not draft
appropriate regulations to implement it. Here’s a letter you can easily
send to your Senators and Representative asking them to tell leaders of
Congress’ education committees to block DoE’s proposals:
https://actionnetwork.org/letters/tell-congress-department-must-drop-proposed-accountability-regulations.

Together, we can stop DoE’s efforts to extend NLCB policies that the
American people and Congress have rejected.

FairTest

Note: DoE website has a character limit; if you add your own comments,
you likely will need to cut some of the text below:

*/You can cut and paste this text into the DoE website:/*

I support the Comments submitted by FairTest on June 15 (Comment #).
Here is a slightly edited version:

While the accountability provision in the Every Student Succeeds Act
(ESSA) are superior to those in No Child Left Behind (NCLB), the
Department of Education’s (DoE) draft regulations intensify ESSA’s worst
aspects and will perpetuate many of NCLB’s most harmful practices. The
draft regulations over-emphasize testing, mandate punishments not
required in law, and continue federal micro-management. When DoE makes
decisions that should be set at the state and local level in partnership
with local educators, parents, and students, it takes away local voices
that ESSA restores. All this will make it harder for states, districts
and schools to recover from the educational damage caused by NLCB – the
very damage that led Congress to fundamentally overhaul NCLB’s
accountability structure and return authority to the states.

The DoE must remove or thoroughly revise five draft regulations:

_DoE draft regulation 200.15_ would require states to lower the ranking
of any school that does not test 95% of its students or to identify it
as needing “targeted support.” No such mandate exists in ESSA. This
provision violates statutory language that ESSA does not override “a
State or local law regarding the decision of a parent to not have the
parent’s child participate in the academic assessments.” This regulation
appears designed primarily to undermine resistance to the overuse and
misuse of standardized exams.

_Recommendation:_ DoE should simply restate ESSA language allowing the
right to opt out as well as its requirements that states test 95% of
students in identified grades and factor low participation rates into
their accountability systems. Alternatively, DoE could write no
regulation at all. In either case, states should decide how to implement
this provision.

_DoE draft regulation 200.18_ transforms ESSA’s requirement for
“meaningful differentiation” among schools into a mandate that states
create “at least three distinct levels of school performance” for each
indicator. ESSA requires states to identify their lowest performing five
percent of schools as well as those in which “subgroups” of students are
doing particularly poorly. Neither provision necessitates creation of
three or more levels. This proposal serves no educationally useful
purpose. Several states have indicated they oppose this provision
because it obscures rather than enhances their ability to precisely
identify problems and misleads the public. This draft regulation would
pressure schools to focus on tests to avoid being placed in a lower
level. Performance levels are also another way to attack schools in
which large numbers of parents opt out, as discussed above.

_DoE draft regulation 200.18_ also mandates that states combine multiple
indicators into a single “summative” score for each school. As Rep. John
Kline, chair of the House Education Committee, pointed out, ESSA
includes no such requirement. Summative scores are simplistically
reductive and opaque. They encourage the flawed school grading schemes
promoted by diehard NCLB defenders.

_Recommendation:_ DoE should drop this draft regulation. It should allow
states to decide how to use their indicators to identify schools and
whether to report a single score. Even better, the DoE should encourage
states to drop their use of levels.

_DoE draft regulation 200.18_ further proposes that a state’s academic
indicators together carry “much greater” weight than its “school
quality” (non-academic) indicators. Members of Congress differ as to the
intent of the relevant ESSA passage. Some say it simply means more than
50%, while others claim it implies much more than 50%. The phrase “much
greater” is likely to push states to minimize the weight of non-academic
factors in order to win plan approval from DOE, especially since the
overall tone of the draft regulations emphasizes testing.

_Recommendation: _The regulations should state that the academic
indicators must count for more than 50% of the weighting in how a state
identifies schools needing support.

_DoE draft regulation 200.18_ also exceeds limits ESSA placed on DoE
actions regarding state accountability plans.

_DoE draft regulation 200.19_ would require states to use 2016-17 data
to select schools for “support and improvement” in 2017-18. This leaves
states barely a year for implementation, too little time to overhaul
accountability systems. It will have the harmful consequence of
encouraging states to keep using a narrow set of test-based indicators
and to select only one additional “non-academic” indicator.

_Recommendation:_ The regulations should allow states to use 2017-18
data to identify schools for 2018-19. This change is entirely consistent
with ESSA’s language.

Lastly, we are concerned that an additional effect of these unwarranted
regulations will be to unhelpfully constrain states that choose to
participate in ESSA’s “innovative assessment” program.


Monty Neill, Ed.D.; Executive Director, FairTest; P.O. Box 300204,
Jamaica Plain, MA 02130; 617-477-9792; http://www.fairtest.org; Donate
to FairTest: https://donatenow.networkforgood.org/fairtest

Judge in NY State Throws Out ‘Value-Added Model’ Ratings

I am pleased that in an important, precedent-setting case, a judge in New York State has ruled that using Value-Added measurements to judge the effectiveness of teachers is ‘arbitrary’ and ‘capricious’.

The case involved teacher Sheri Lederman, and was argued by her husband.

“New York Supreme Court Judge Roger McDonough said in his decision that he could not rule beyond the individual case of fourth-grade teacher Sheri G. Lederman because regulations around the evaluation system have been changed, but he said she had proved that the controversial method that King developed and administered in New York had provided her with an unfair evaluation. It is thought to be the first time a judge has made such a decision in a teacher evaluation case.”

In case you were unaware of it, VAM is a statistical black box used to predict how a hypothetical student is supposed to score on a Big Standardized Test one year based on the scores of every other student that year and in previous years. Any deviation (up or down) of that score is attributed to the teacher.

Gary Rubinstein and I have looked into how stable those VAM scores are in New York City, where we had actual scores to work with (leaked by the NYTimes and other newspapers). We found that they were inconsistent and unstable in the extreme! When you graph one year’s score versus next year’s score, we found that there was essentially no correlation at all, meaning that a teacher who is assigned the exact same grade level, in the same school, with very similar  students, can score high one year, low the next, and middling the third, or any combination of those. Very, very few teachers got scores that were consistent from year to year. Even teachers who taught two or more grade levels of the same subject (say, 7th and 8th grade math) had no consistency from one subject to the next. See my blog  (not all on NY City) herehere, here,  here, herehere, here, here,  herehere, and here. See Gary R’s six part series on his blog here, here, here, here, here, and here. As well as a less technical explanation here.

Mercedes Schneider has done similar research on teachers’ VAM scores in Louisiana and came up with the same sorts of results that Rubinstein and I did.

Which led all three of us to conclude that the entire VAM machinery was invalid.

And which is why the case of Ms. Lederman is so important. Similar cases have been filed in numerous states, but this is apparently the first one where a judgement has been reached.

(Also read this. and this.)

A Concise Primer on Privatization from Marion Brady

This is a concise primer, written by Marion Brady, on how the 1/100 of 1% have been privatizing our schools and getting away with it. -GFB

Advice column for pundits and politicians

https://www.washingtonpost.com/news/answer-sheet/wp/2016/01/07/a-primer-on-the-damaging-movement-to-privatize-public-schools/

Privatizing public schools: A primer for pundits and politicians

 

When, about thirty years ago, corporate interests began their highly organized, well-funded effort to privatize public education, you wouldn’t have read or heard about it. They didn’t want to trigger the debate that such a radical change in an important institution warranted.

If, like most pundits and politicians, you’ve supported that campaign, it’s likely you’ve been snookered. Here’s a quick overview of the snookering process.

 

The pitch

 

Talking Points: (a) Standardized testing proves America’s schools are poor. (b) Other countries are eating our lunch. (c) Teachers deserve most of the blame. (d) The lazy ones need to be forced out by performance evaluations. (e) The dumb ones need scripts to read or “canned standards” telling them exactly what to teach. (f) The experienced ones are too set in their ways to change and should be replaced by fresh Five-Week-Wonders from Teach for America. (Bonus: Replacing experienced teachers saves a ton of money.) (g) Public (“government”) schools are a step down the slippery slope to socialism.

 

Tactics

 

Education establishment resistance to privatization is inevitable, so (a) avoid it as long as possible by blurring the lines between “public” and “private.” (b) Push school choice, vouchers, tax write-offs, tax credits, school-business partnerships, profit-driven charter chains. (c) When resistance comes, crank up fear with the, “They’re eating our lunch!” message. (d) Contribute generously to all potential resisters—academic publications, professional organizations, unions, and school support groups such as PTA. (e) Create fake “think tanks,” give them impressive names, and have them do “research” supporting privatization. (f) Encourage investment in teacher-replacer technology—internet access, I-pads, virtual schooling, MOOCS, etc. (e) Pressure state legislators to make life easier for profit-seeking charter chains by taking approval decisions away from local boards and giving them to easier-to-lobby state-level bureaucrats. (g) Elect the “right” people at all levels of government. (When they’re campaigning, have them keep their privatizing agenda quiet.)

 

Weapon

 

If you’ll read the fine-print disclaimers on high-stakes standardized tests, you’ll see how grossly they’re being misused, but they’re the key to privatization. The general public, easily impressed by numbers and mathematical razzle-dazzle, believes competition is the key to quality, so want quality quantified even though it can’t be done. Machine-scored tests don’t measure quality. They rank.

It’s hard to rank unlike things so it’s necessary to standardize. That’s what the Common Core State Standards (CCSS) do. To get the job done quickly, Bill Gates picked up the tab, got the CCSS “legitimized” by getting important politicians to sign off on them, then handed them to teachers as a done deal.

The Standards make testing and ranking a cinch. They also make making billions a cinch. Manufacturers can use the same questions for every state that has adopted the Standards or facsimiles thereof.

If challenged, test fans often quote the late Dr. W. Edward Deming, the world-famous quality guru who showed Japanese companies how to build better stuff than anybody else. In his book, The New Economics, Deming wrote, “If you can’t measure it, you can’t manage it.”

Here’s the whole sentence as he wrote it: “It is wrong to suppose that if you can’t measure it, you can’t manage it—a costly myth.”

 

Operating the weapon

 

What’s turned standardized testing into a privatizing juggernaut are pass-fail “cut scores” set by politicians. Saying kids need to be challenged, they set the cut score high enough to fail many (sometimes most) kids. When the scores are published, they point to the high failure rate to “prove” public schools can’t do the job and should be closed or privatized. Clever, huh?

The privatizing machinery is in place. Left alone, it’ll gradually privatize most, but not all, public schools. Those that serve the poorest, the sickest, the handicapped, the most troubled, the most expensive to educate—those will stay in what’s left of the public schools.

 

Weapon malfunction

 

Look at standardized tests from kids’ perspective. Test items (a) measure recall of secondhand, standardized, delivered information, or (b) require a skill to be demonstrated, or (c) reward an ability to second-guess whoever wrote the test item. Because kids didn’t ask for the information, because the skill they’re being asked to demonstrate rarely has immediate practical use, and because they don’t give a tinker’s dam what the test-item writer thinks, they have zero emotional investment in what’s being tested.

As every real teacher knows, no emotional involvement means no real learning. Period. What makes standardized testslook like they work is learner emotion, but it’s emotion that doesn’t have anything to do with learning. The ovals get penciled in to avoid trouble, to please somebody, to get a grade, or to jump through a bureaucratic hoop to be eligible to jump through another bureaucratic hoop. When the pencil is laid down, what’s tested, having no perceived value, automatically erases from memory.

 

Before you write…

 

If you want to avoid cranking out the usual amateurish drivel about standardized testing that appears in the op-eds, editorials, and syndicated columns of the mainstream media, ask yourself a few questions about the testing craze: (a) Should life-altering decisions hinge on the scores of commercially produced tests not open to public inspection? (b) How wise is it to only teach what machines can measure? (c) How fair is it to base any part of teacher pay on scores from tests that can’t evaluate complex thought? (d) Are tests that have no “success in life” predictive power worth the damage they’re doing?

Here’s a longer list of problems you should think about before you write.

 

Perspective

America’s schools have always struggled—an inevitable consequence, first, of a decision in 1893 to narrow and standardize the high school curriculum and emphasize college prep; second, from a powerful strain of individualism in our national character that eats away support for public institutions; third, from a really sorry system of institutional organization. Politicians, not educators, make education policy, basing it on the simplistic conventional wisdom that educating means “delivering information.”

In fact, educating is the most complex and difficult of all professions. Done right, teaching is an attempt to help the young align their beliefs, values, and assumptions more closely with what’s true and real, escape the bonds of ethnocentrism, explore the wonders and potential of humanness, and become skilled at using thought processes that make it possible to realize those aims.

Historically, out of the institution’s dysfunctional organizational design came schools with lots of problems, but with one redeeming virtue. They were “loose.” Teachers had enough autonomy to do their thing. So they did, and the kids that some of them coached brought America far more than its share of patents, scholarly papers, scientific advances, international awards, and honors.

Notwithstanding their serious problems, America’s public schools were once the envy of the world. Now, educators around that world shake their heads in disbelief (or maybe cheer?) as we spend billions of dollars to standardize what once made America great—un-standardized thought.

A salvage operation is still (barely) possible, but not if politicians, prodded by pundits, continue to do what they’ve thus far steadfastly refused to do—listen to people who’ve actually worked with real students in real classrooms, and did so long enough and thoughtfully enough to know something about teaching.

 

Note: I invite response, especially from those in positions of influence or authority who disagree with me.

Marion Brady mbrady2222@gmail.com

View this on Basecamp

A 3 minute news segment on the NY lawsuit against Value-Added Modeling

Even-handed video from Al-Jazeera interviewing some of the people involved in the anti-Value-Added-Model lawsuit against Value-Added Model evalutations of teachers in New York state.

You may have heard of the lawsuit – it was filed by elementary teacher Sherri Lederman and her lawyer, who is also her husband.

Parents, students, and administrators had nothing but glowing praise for teacher Lederman. In fact, Sherri’s principal is quoted as saying,

“any computer program claiming Lederman ‘ineffective’ is fundamentally flawed.”

Lederman herself states,

“The model doesn’t work. It’s misrepresenting an entire profession.”

Statistician Aaron Pallas of Columbia University states,

“In Sherri’s case, she went from a 14 out of 20, which was fully in the effective range, to 1 out of 20 [the very next year], ineffective, and we look at that and say, ‘How can this be? Can a teacher’s performance really have changed that dramatically from one year to the next?’

“And if the numbers are really jumping around like that, can we really trust that they are telling us something important about a teacher’s career?”

Professor Pallas could perhaps have used one of my graphs as a visual aid, to help show just how much those scores do jump around from year to year, as you see here. This one shows how raw value-added scores for teachers in New York City in school year 2005-2006 correlated with those very same teachers, teaching the same grade level students in the very same schools the exact same subjects, one year later. Gary Rubinstein has similar graphs. You can look here, here, here, or here. if you want to see some more from me on this topic.

The plot that follows is a classic case of ‘nearly no correlation whatsoever’ that we now teach to kids in middle school.

In other words, yes, teachers’ scores do, indeed jump around like crazy from year to year. If you were above average on VAM one year – that is, to the right of the Y axis anywhere, it is quite likely that you will end under the X-axis (and hence below average) the next year. Or not.

I am glad somebody is finally taking this to court, because this sort of mathematics of intimidation has got to stop.

nyc raw value added scores sy 0506 versus 0607

Comment by Duane Swacker on Stuart Yen’s Study, at Diane Ravitch’s Blog

I hope Duane Swacker will not mind me reposting one of his long comments after the recent blog post by Diane Ravitch about professor Stuart Yen’s study on the lack of validity of Value-Added Metrics.

=================

To understand the COMPLETE INSANITY that is VAM & SLO/SGP read and understand Noel Wilson’s never refuted nor rebutted destruction of educational standards and standardized testing (of which VAM & SLO/SGP are the bastard stepchildren) in “Educational Standards and the Problem of Error” found at: http://epaa.asu.edu/ojs/article/view/577/700

Brief outline of Wilson’s “Educational Standards and the Problem of Error” and some comments of mine.

1. A description of a quality can only be partially quantified. Quantity is almost always a very small aspect of quality. It is illogical to judge/assess a whole category only by a part of the whole. The assessment is, by definition, lacking in the sense that “assessments are always of multidimensional qualities. To quantify them as unidimensional quantities (numbers or grades) is to perpetuate a fundamental logical error” (per Wilson). The teaching and learning process falls in the logical realm of aesthetics/qualities of human interactions. In attempting to quantify educational standards and standardized testing the descriptive information about said interactions is inadequate, insufficient and inferior to the point of invalidity and unacceptability.

2. A major epistemological mistake is that we attach, with great importance, the “score” of the student, not only onto the student but also, by extension, the teacher, school and district. Any description of a testing event is only a description of an interaction, that of the student and the testing device at a given time and place. The only correct logical thing that we can attempt to do is to describe that interaction (how accurately or not is a whole other story). That description cannot, by logical thought, be “assigned/attached” to the student as it cannot be a description of the student but the interaction. And this error is probably one of the most egregious “errors” that occur with standardized testing (and even the “grading” of students by a teacher).

3. Wilson identifies four “frames of reference” each with distinct assumptions (epistemological basis) about the assessment process from which the “assessor” views the interactions of the teaching and learning process: the Judge (think college professor who “knows” the students capabilities and grades them accordingly), the General Frame-think standardized testing that claims to have a “scientific” basis, the Specific Frame-think of learning by objective like computer based learning, getting a correct answer before moving on to the next screen, and the Responsive Frame-think of an apprenticeship in a trade or a medical residency program where the learner interacts with the “teacher” with constant feedback. Each category has its own sources of error and more error in the process is caused when the assessor confuses and conflates the categories.

4. Wilson elucidates the notion of “error”: “Error is predicated on a notion of perfection; to allocate error is to imply what is without error; to know error it is necessary to determine what is true. And what is true is determined by what we define as true, theoretically by the assumptions of our epistemology, practically by the events and non-events, the discourses and silences, the world of surfaces and their interactions and interpretations; in short, the practices that permeate the field. . . Error is the uncertainty dimension of the statement; error is the band within which chaos reigns, in which anything can happen. Error comprises all of those eventful circumstances which make the assessment statement less than perfectly precise, the measure less than perfectly accurate, the rank order less than perfectly stable, the standard and its measurement less than absolute, and the communication of its truth less than impeccable.”

In other word all the logical errors involved in the process render any conclusions invalid.

5. The test makers/psychometricians, through all sorts of mathematical machinations attempt to “prove” that these tests (based on standards) are valid-errorless or supposedly at least with minimal error [they aren’t]. Wilson turns the concept of validity on its head and focuses on just how invalid the machinations and the test and results are. He is an advocate for the test taker not the test maker. In doing so he identifies thirteen sources of “error”, any one of which renders the test making/giving/disseminating of results invalid. And a basic logical premise is that once something is shown to be invalid it is just that, invalid, and no amount of “fudging” by the psychometricians/test makers can alleviate that invalidity.

6. Having shown the invalidity, and therefore the unreliability, of the whole process Wilson concludes, rightly so, that any result/information gleaned from the process is “vain and illusory”. In other words start with an invalidity, end with an invalidity (except by sheer chance every once in a while, like a blind and anosmic squirrel who finds the occasional acorn, a result may be “true”) or to put in more mundane terms crap in-crap out.

7. And so what does this all mean? I’ll let Wilson have the second to last word: “So what does a test measure in our world? It measures what the person with the power to pay for the test says it measures. And the person who sets the test will name the test what the person who pays for the test wants the test to be named.”

In other words it attempts to measure “’something’ and we can specify some of the ‘errors’ in that ‘something’ but still don’t know [precisely] what the ‘something’ is.” The whole process harms many students as the social rewards for some are not available to others who “don’t make the grade (sic)” Should American public education have the function of sorting and separating students so that some may receive greater benefits than others, especially considering that the sorting and separating devices, educational standards and standardized testing, are so flawed not only in concept but in execution?
My answer is NO!!!!!

One final note with Wilson channeling Foucault and his concept of subjectivization:

“So the mark [grade/test score] becomes part of the story about yourself and with sufficient repetitions becomes true: true because those who know, those in authority, say it is true; true because the society in which you live legitimates this authority; true because your cultural habitus makes it difficult for you to perceive, conceive and integrate those aspects of your experience that contradict the story; true because in acting out your story, which now includes the mark and its meaning, the social truth that created it is confirmed; true because if your mark is high you are consistently rewarded, so that your voice becomes a voice of authority in the power-knowledge discourses that reproduce the structure that helped to produce you; true because if your mark is low your voice becomes muted and confirms your lower position in the social hierarchy; true finally because that success or failure confirms that mark that implicitly predicted the now self-evident consequences. And so the circle is complete.”

In other words students “internalize” what those “marks” (grades/test scores) mean, and since the vast majority of the students have not developed the mental skills to counteract what the “authorities” say, they accept as “natural and normal” that “story/description” of them. Although paradoxical in a sense, the “I’m an “A” student” is almost as harmful as “I’m an ‘F’ student” in hindering students becoming independent, critical and free thinkers. And having independent, critical and free thinkers is a threat to the current socio-economic structure of society.

Important Article Shows that ‘Value-Added’ Measurements are Neither Valid nor Reliable

As you probably know, a handful of agricultural researchers and economists have come up with extremely complicated “Value-Added” Measurement (VAM) systems that purport to be able to grade teachers’ output exactly.

These economists (Hanushek, Chetty and a few others) claim that their formulas are magically mathematically able to single out the contribution of every single teacher to the future test scores and total lifetime earnings of their students 5 to 50 years into the future. I’m not kidding.

Of course, those same economists claim that the teacher is the single most important variable affecting their student’s school and trajectories – not family background or income, nor peer pressure, nor even whole-school variables. (Many other studies have shown that the effect of any individual teacher, or all teachers, is pretty small – from 1% to 14% of the entire variation, which corresponds to what I found during my 30 years of teaching … ie, not nearly as much of an impact as I would have liked [or feared], one way or another…)

Diane Ravitch has brought to my attention an important study by Stuart Yen at UMinn that (once again) refutes those claims, which are being used right now in state after state and county after county, to randomly fire large numbers of teachers who have tried to devote their lives to helping students.

According to the study, here are a few of the problems with VAM:

1. As I have shown repeatedly using the New York City value-added scores that were printed in the NYTimes and NYPost, teachers’ VAM scores vary tremendously over time. (More on that below; note that if you use VAM scores, 80% of ALL teachers should be fired after their first year of teaching) Plus RAND researchers found much the same thing in North CarolinaAlso see this. And this.

2. Students are not assigned randomly to teachers (I can vouch for that!) or to schools, and there are always a fair number of students for whom no prior or future data is available, because they move to other schools or states, or drop out, or whatever; and those students with missing data are NOT randomly distributed, which pretty makes the whole VAM setup an exercise in futility.

3. The tests themselves often don’t measure what they are purported to measure. (Complaints about the quality of test items are legion…)

Here is an extensive quote from the article. It’s a section that Ravitch didn’t excerpt, so I will, with a few sentences highlighted by me, since it concurs with what I have repeatedly claimed on my blog:

A largely ignored problem is that true teacher performance, contrary to the main assumption underlying current VAM models, varies over time (Goldhaber & Hansen, 2012). These models assume that each teacher exhibits an underlying trend in performance that can be detected given a sufficient amount of data. The question of stability is not a question about whether average teacher performance rises, declines, or remains flat over time.

The issue that concerns critics of VAM is whether individual teacher performance fluctuates over time in a way that invalidates inferences that an individual teacher is “low-” or “high-” performing.

This distinction is crucial because VAM is increasingly being applied such that individual teachers who are identified as low-performing are to be terminated. From the perspective of individual teachers, it is inappropriate and invalid to fire a teacher whose performance is low this year but high the next year, and it is inappropriate to retain a teacher whose performance is high this year but low next year.

Even if average teacher performance remains stable over time, individual teacher performance may fluctuate wildly from year to year.  (my emphasis – gfb)

While previous studies examined the intertemporal stability of value-added teacher rankings over one-year periods and found that reliability is inadequate for high-stakes decisions, researchers tended to assume that this instability was primarily a function of measurement error and sought ways to reduce this error (Aaronson, Barrow, & Sander, 2007; Ballou, 2005; Koedel & Betts, 2007; McCaffrey, Sass, Lockwood, & Mihaly, 2009).

However, this hypothesis was rejected by Goldhaber and Hansen (2012), who investigated the stability of teacher performance in North Carolina using data spanning 10 years and found that much of a teacher’s true performance varies over time due to unobservable factors such as effort, motivation, and class chemistry that are not easily captured through VAM. This invalidates the assumption of stable teacher performance that is embedded in Hanushek’s (2009b) and Gordon et al.’s (2006) VAM-based policy proposals, as well as VAM models specified by McCaffrey et al. (2009) and Staiger and Rockoff (2010) (see Goldhaber & Hansen, 2012, p. 15).

The implication is that standard estimates of impact when using VAM to identify and replace low-performing teachers are significantly inflated (see Goldhaber & Hansen, 2012, p. 31).

As you also probably know, the four main ‘tools’ of the billionaire-led educational DEform movement are:

* firing lots of teachers

* breaking their unions

* closing public schools and turning education over to the private sector

* changing education into tests to prepare for tests that get the kids ready for tests that are preparation for the real tests

They’ve been doing this for almost a decade now under No Child Left Untested and Race to the Trough, and none of these ‘reforms’ have shown to make any actual improvement in the overall education of our youth.


Two VAMboozlers down, but many more to go

Two of the foremost promoters of the junk science known as VAM (Value-Added Measurements) have just resigned, one in Tennessee and one in Louisiana: Kevin Huffman and John Ayers.

Yay! But there are several dozen more who need to be fired across the country as well.

(Huffman is the ex-husband of the notorious liar and self-promoting former chancellor of DC Public Schools, Michelle Rhee, who is now selling fertilizer. Huffman was once chosen by the pro-EduDeformer Washington Post editorial board as its main educational pundit.)

Audrey Beardsley has the details on her blog, VAMboozled.

 

More Problems With Value-Added Measurements for Teachers

I finally got around to reading and skimming the MATHEMATICA reports on VAM for schools and individual teachers in DCPS.

.
At first blush, it’s pretty impressive mathematical and statistical work. It looks like they were very careful to take care of lots of possible problems, and they have lots of nice greek letters and very learned and complicated mathematical formulas, with tables giving the values of many of the variables in their model. They even use large words like heteroscedasticity to scare off those not really adept at professional statistics (which would include even me). See pages 12 – 20 for examples of this mathematics of intimidation, as John Ewing of MfA and the AMS has described it. Here is one such learned equation:
value added equation
BUT:
.
However clever and complex a model might be, it needs to do a good job of explaining and describing reality, or it’s just another failed hypothesis that needs to be rejected (like the theories of the 4 humours or the Aether). One needs to actually compare its track record with the real world and see how well the model compares with the real world.
.
Which is precisely what these authors do NOT do, even though they claim that “for teachers with the lowest possible IMPACT score in math — the bottom 3.6 percent of DCPS teachers — one can say with at least 99.9 percent confidence that these teachers were below average in 2010.” (p. 5)
.
Among other things, such a model would need to be consistent over time, i.e., reliable. Every indication I have seen, including in other cities that the authors themselves cite (NYC–see p. 2 of the 2010 report) indicates that individual value-added scores for a given teacher jump around randomly from year to year in cases of a teacher working at the exact same school, exact same grade level, exact same subject; or in cases of a teacher teaching 2 grade levels in the same school; or in cases of a teacher teaching 2 subjects, during the same year. Those correlations appear to be in the range of 0.2 to 0.3, which is frankly not enough to judge who is worth receiving large cash bonuses or a pink slip.
.
Unless something obvious escaped me, the authors do not appear to mention any study of how teachers’ IVA scores vary over time or from class to class, even though they had every student’s DC-CAS scores from 2007 through the present (see footnote, page 7).
.
In neither report do they acknowledge the possibility of cheating by adults (or students).
.
They do acknowledge on page 2 that a 2008 study found low correlations between proficiency gains and value-added estimates for individual schools in DCPS from 2005-2007. They attempt to explain that low correlation by “changes in the compositions of students from one year to the next” — which I doubt. I suspect it’s that neither one is a very good measure.
.
They also don’t mention anything about correlations between value-added scores and classroom-observations scores. From the one year of data that I received, this correlation is also very low. It is possible that this correlation is tighter today than it used to be, but I would be willing to wager tickets to a professional DC basketball, hockey, or soccer game that it’s not over 0.4.
.
The authors acknowledge that “[t]he DC CAS is not specifically designed for users to compare gains across grades.” Which means, they probably shouldn’t be doing it. It’s also the case that many, many people do not feel that the DC-CAS does a very good job of measuring much of anything useful except the socio-economic status of the student’s parents.
.
In any case, the mathematical model they have made may be wonderful, but real data so far suggests that it does not predict anything useful about teaching and learning.
Published in: on January 21, 2014 at 11:08 am  Comments (2)  
Tags: , ,

What I actually had time to say …

Since I had to abbreviate my remarks, here is what I actually said:

I am Guy Brandenburg, retired DCPS mathematics teacher.

To depart from my text, I want to start by proposing a solution: look hard at the collaborative assessment model being used a few miles away in Montgomery County [MD] and follow the advice of Edwards Deming.

Even though I personally retired before [the establishment of the] IMPACT [teacher evaluation system], I want to use statistics and graphs to show that the Value-Added measurements that are used to evaluate teachers are unreliable, invalid, and do not help teachers improve instruction. To the contrary: IVA measurements are driving a number of excellent, veteran teachers to resign or be fired from DCPS to go elsewhere.

Celebrated mathematician John Ewing says that VAM is “mathematical intimidation” and a “modern, mathematical version of the Emperor’s New Clothes.”

I agree.

One of my colleagues was able to pry the value-added formula [used in DC] from [DC data honcho] Jason Kamras after SIX MONTHS of back-and-forth emails. [Here it is:]

value added formula for dcps - in mathtype format

One problem with that formula is that nobody outside a small group of highly-paid consultants has any idea what are the values of any of those variables.

In not a single case has the [DCPS] Office of Data and Accountability sat down with a teacher and explained, in detail, exactly how a teacher’s score is calculated, student by student and class by class.

Nor has that office shared that data with the Washington Teachers’ Union.

I would ask you, Mr. Catania, to ask the Office of Data and Accountability to share with the WTU all IMPACT scores for every single teacher, including all the sub-scores, for every single class a teacher has.

Now let’s look at some statistics.

My first graph is completely random data points that I had Excel make up for me [and plot as x-y pairs].

pic 3 - completely random points

Notice that even though these are completely random, Excel still found a small correlation: r-squared was about 0.08 and r was about 29%.

Now let’s look at a very strong case of negative correlation in the real world: poverty rates and student achievement in Nebraska:

pic  4 - nebraska poverty vs achievement

The next graph is for the same sort of thing in Wisconsin:

pic 5 - wisconsin poverty vs achievement

Again, quite a strong correlation, just as we see here in Washington, DC:

pic 6 - poverty vs proficiency in DC

Now, how about those Value-Added scores? Do they correlate with classroom observations?

Mostly, we don’t know, because the data is kept secret. However, someone leaked to me the IVA and classroom observation scores for [DCPS in] SY 2009-10, and I plotted them [as you can see below].

pic 7 - VAM versus TLF in DC IMPACT 2009-10

I would say this looks pretty much no correlation at all. It certainly gives teachers no assistance on what to improve in order to help their students learn better.

And how stable are Value-Added measurements [in DCPS] over time? Unfortunately, since DCPS keeps all the data hidden, we don’t know how stable these scores are here. However, the New York Times leaked the value-added data for NYC teachers for several years, and we can look at those scores to [find out]. Here is one such graph [showing how the same teachers, in the same schools, scored in 2008-9 versus 2009-10]:

pic 8 - value added for 2 successive years Rubenstein NYC

That is very close to random.

How about teachers who teach the same subject to two different grade levels, say, fourth-grade math and fifth-grade math? Again, random points:

pic 9 - VAM for same subject different grades NYC rubenstein

One last point:

Mayor Gray and chancellors Henderson and Rhee all claim that education in DC only started improving after mayoral control of the schools, starting in 2007. Look for yourself [in the next two graphs].

pic 11 - naep 8th grade math avge scale scores since 1990 many states incl dc

 

pic 12 naep 4th grade reading scale scores since 1993 many states incl dc

Notice that gains began almost 20 years ago, long before mayoral control or chancellors Rhee and Henderson, long before IMPACT.

To repeat, I suggest that we throw out IMPACT and look hard at the ideas of Edwards Deming and the assessment models used in Montgomery County.

Follow

Get every new post delivered to your Inbox.

Join 464 other followers

%d bloggers like this: