Part Two: Cheating in DCPS

DC Education Reform Ten Years After, 

Part 2: Test Cheats

Richard P Phelps

Ten years ago, I worked as the Director of Assessments for the District of Columbia Public Schools (DCPS). For temporal context, I arrived after the first of the infamous test cheating scandals and left just before the incident that spawned a second. Indeed, I filled a new position created to both manage test security and design an expanded testing program. I departed shortly after Vincent Gray, who opposed an expanded testing program, defeated Adrian Fenty in the September 2010 DC mayoral primary. My tenure coincided with Michelle Rhee’s last nine months as Chancellor. 

The recurring test cheating scandals of the Rhee-Henderson years may seem extraordinary but, in fairness, DCPS was more likely than the average US school district to be caught because it received a much higher degree of scrutiny. Given how tests are typically administered in this country, the incidence of cheating is likely far greater than news accounts suggest, for several reasons: 

·      in most cases, those who administer tests—schoolteachers and administrators—have an interest in their results;

·      test security protocols are numerous and complicated yet, nonetheless, the responsibility of non-expert ordinary school personnel, guaranteeing their inconsistent application across schools and over time; 

·      after-the-fact statistical analyses are not legal proof—the odds of a certain amount of wrong-to-right erasures in a single classroom on a paper-and-pencil test being coincidental may be a thousand to one, but one-in-a-thousand is still legally plausible; and

·      after-the-fact investigations based on interviews are time-consuming, scattershot, and uneven. 

Still, there were measures that the Rhee-Henderson administrations could have adopted to substantially reduce the incidence of cheating, but they chose none that might have been effective. Rather, they dug in their heels, insisted that only a few schools had issues, which they thoroughly resolved, and repeatedly denied any systematic problem.  

Cheating scandals

From 2007 to 2009 rumors percolated of an extraordinary level of wrong-to-right erasures on the test answer sheets at many DCPS schools. “Erasure analysis” is one among several “red flag” indicators that testing contractors calculate to monitor cheating. The testing companies take no responsibility for investigating suspected test cheating, however; that is the customer’s, the local or state education agency. 

In her autobiographical account of her time as DCPS Chancellor, Michelle Johnson (nee Rhee), wrote (p. 197)

“For the first time in the history of DCPS, we brought in an outside expert to examine and audit our system. Caveon Test Security – the leading expert in the field at the time – assessed our tests, results, and security measures. Their investigators interviewed teachers, principals, and administrators.

“Caveon found no evidence of systematic cheating. None.”

Caveon, however, had not looked for “systematic” cheating. All they did was interview a few people at several schools where the statistical anomalies were more extraordinary than at others. As none of those individuals would admit to knowingly cheating, Caveon branded all their excuses as “plausible” explanations. That’s it; that is all that Caveon did. But, Caveon’s statement that they found no evidence of “widespread” cheating—despite not having looked for it—would be frequently invoked by DCPS leaders over the next several years.[1]

Incidentally, prior to the revelation of its infamous decades-long, systematic test cheating, the Atlanta Public Schools had similarly retained Caveon Test Security and was, likewise, granted a clean bill of health. Only later did the Georgia state attorney general swoop in and reveal the truth. 

In its defense, Caveon would note that several cheating prevention measures it had recommended to DCPS were never adopted.[2] None of the cheating prevention measures that I recommended were adopted, either.

The single most effective means for reducing in-classroom cheating would have been to rotate teachers on test days so that no teacher administered a test to his or her own students. It would not have been that difficult to randomly assign teachers to different classrooms on test days.

The single most effective means for reducing school administratorcheating would have been to rotate test administrators on test days so that none managed the test materials for their own schools. The visiting test administrators would have been responsible for keeping test materials away from the school until test day, distributing sealed test booklets to the rotated teachers on test day, and for collecting re-sealed test booklets at the end of testing and immediately removing them from the school. 

Instead of implementing these, or a number of other feasible and effective test security measures, DCPS leaders increased the number of test proctors, assigning each of a few dozen or so central office staff a school to monitor. Those proctors could not reasonably manage the volume of oversight required. A single DC test administration could encompass a hundred schools and a thousand classrooms.


So, what effort, if any, did DCPS make to counter test cheating? They hired me, but then rejected all my suggestions for increasing security. Also, they established a telephone tip line. Anyone who suspected cheating could report it, even anonymously, and, allegedly, their tip would be investigated. 

Some forms of cheating are best investigated through interviews. Probably the most frequent forms of cheating at DCPS—teachers helping students during test administrations and school administrators looking at test forms prior to administration—leave no statistical residue. Eyewitness testimony is the only type of legal evidence available in such cases, but it is not just inconsistent, it may be socially destructive. 

I remember two investigations best: one occurred in a relatively well-to-do neighborhood with well-educated parents active in school affairs; the other in one of the city’s poorest neighborhoods. Superficially, the cases were similar—an individual teacher was accused of helping his or her own students with answers during test administrations. Making a case against either elementary school teacher required sworn testimony from eyewitnesses, that is, students—eight-to-ten-year olds. 

My investigations, then, consisted of calling children into the principal’s office one-by-one to be questioned about their teacher’s behavior. We couldn’t hide the reason we were asking the questions. And, even though each student agreed not to tell others what had occurred in their visit to the principal’s office, we knew we had only one shot at an uncorrupted jury pool. 

Though the accusations against the two teachers were similar and the cases against them equally strong, the outcomes could not have been more different. In the high-poverty neighborhood, the students seemed suspicious and said little; none would implicate the teacher, whom they all seemed to like. 

In the more prosperous neighborhood, students were more outgoing, freely divulging what they had witnessed. The students had discussed the alleged coaching with their parents who, in turn, urged them to tell investigators what they knew. During his turn in the principal’s office, the accused teacher denied any wrongdoing. I wrote up each interview, then requested that each student read and sign. 

Thankfully, that accused teacher made a deal and left the school system a few weeks later. Had he not, we would have required the presence in court of the eight-to-ten-year olds to testify under oath against their former teacher, who taught multi-grade classes. Had that prosecution not succeeded, the eyewitness students could have been routinely assigned to his classroom the following school year.

My conclusion? Only in certain schools is the successful prosecution of a cheating teacher through eyewitness testimony even possible. But, even where possible, it consumes inordinate amounts of time and, otherwise, comes at a high price, turning young innocents against authority figures they naturally trusted. 

Cheating blueprints

Arguably the most widespread and persistent testing malfeasance in DCPS received little attention from the press. Moreover, it was directly propagated by District leaders, who published test blueprints on the web. Put simply, test “blueprints” are lists of the curricular standards (e.g., “student shall correctly add two-digit numbers”) and the number of test items included in an upcoming test related to each standard. DC had been advance publishing its blueprints for years.

I argued that the way DC did it was unethical. The head of the Division of Data & Accountability, Erin McGoldrick, however, defended the practice, claimed it was common, and cited its existence in the state of California as precedent. The next time she and I met for a conference call with one of DCPS’s test providers, Discover Education, I asked their sales agent how many of their hundreds of other customers advance-published blueprints. His answer: none.

In the state of California, the location of McGoldrick’s only prior professional experience, blueprints were, indeed, published in advance of test administrations. But their tests were longer than DC’s and all standards were tested. Publication of California’s blueprints served more to remind the populace what the standards were in advance of each test administration. Occasionally, a standard considered to be of unusual importance might be assigned a greater number of test items than the average, and the California blueprints signaled that emphasis. 

In Washington, DC, the tests used in judging teacher performance were shorter, covering only some of each year’s standards. So, DC’s blueprints showed everyone well in advance of the test dates exactly which standards would be tested and which would not. For each teacher, this posed an ethical dilemma: should they “narrow the curriculum” by teaching only that content they knew would be tested? Or, should they do the right thing and teach all the standards, as they were legally and ethically bound to, even though it meant spending less time on the to-be-tested content? It’s quite a conundrum when one risks punishment for behaving ethically.

Monthly meetings convened to discuss issues with the districtwide testing program, the DC Comprehensive Assessment System (DC-CAS)—administered to comply with the federal No Child Left Behind (NCLB) Act. All public schools, both DCPS and charters, administered those tests. At one of these regular meetings, two representatives from the Office of the State Superintendent of Education (OSSE) announced plans to repair the broken blueprint process.[3]

The State Office employees argued thoughtfully and reasonably that it was professionally unethical to advance publish DC test blueprints. Moreover, they had surveyed other US jurisdictions in an effort to find others that followed DC’s practice and found none. I was the highest-ranking DCPS employee at the meeting and I expressed my support, congratulating them for doing the right thing. I assumed that their decision was final.

I mentioned the decision to McGoldrick, who expressed surprise and speculation that it might have not been made at the highest level in the organizational hierarchy. Wasting no time, she met with other DCPS senior managers and the proposed change was forthwith shelved. In that, and other ways, the DCPS tail wagged the OSSE dog. 

* * *

It may be too easy to finger ethical deficits for the recalcitrant attitude toward test security of the Rhee-Henderson era ed reformers. The columnist Peter Greene insists that knowledge deficits among self-appointed education reformers also matter: 

“… the reformistan bubble … has been built from Day One without any actual educators inside it. Instead, the bubble is populated by rich people, people who want rich people’s money, people who think they have great ideas about education, and even people who sincerely want to make education better. The bubble does not include people who can turn to an Arne Duncan or a Betsy DeVos or a Bill Gates and say, ‘Based on my years of experience in a classroom, I’d have to say that idea is ridiculous bullshit.’”

“There are a tiny handful of people within the bubble who will occasionally act as bullshit detectors, but they are not enough. The ed reform movement has gathered power and money and set up a parallel education system even as it has managed to capture leadership roles within public education, but the ed reform movement still lacks what it has always lacked–actual teachers and experienced educators who know what the hell they’re talking about.”

In my twenties, I worked for several years in the research department of a state education agency. My primary political lesson from that experience, consistently reinforced subsequently, is that most education bureaucrats tell the public that the system they manage works just fine, no matter what the reality. They can get away with this because they control most of the evidence and can suppress it or spin it to their advantage.

In this proclivity, the DCPS central office leaders of the Rhee-Henderson era proved themselves to be no different than the traditional public-school educators they so casually demonized. 

US school systems are structured to be opaque and, it seems, both educators and testing contractors like it that way. For their part, and contrary to their rhetoric, Rhee, Henderson, and McGoldrick passed on many opportunities to make their system more transparent and accountable.

Education policy will not improve until control of the evidence is ceded to genuinely independent third parties, hired neither by the public education establishment nor by the education reform club.

The author gratefully acknowledges the fact-checking assistance of Erich Martel and Mary Levy.

[1] A perusal of Caveon’s website clarifies that their mission is to help their clients–state and local education departments–not get caught. Sometimes this means not cheating in the first place; other times it might mean something else. One might argue that, ironically, Caveon could be helping its clients to cheat in more sophisticated ways and cover their tracks better.

[2] Among them: test booklets should be sealed until the students open them and resealed by the students immediately after; and students should be assigned seats on test day and a seating chart submitted to test coordinators (necessary for verifying cluster patterns in student responses that would suggest answer copying).

[3] Yes, for those new to the area, the District of Columbia has an Office of the “State” Superintendent of Education (OSSE). Its domain of relationships includes not just the regular public schools (i.e., DCPS), but also other public schools (i.e., charters) and private schools. Practically, it primarily serves as a conduit for funneling money from a menagerie of federal education-related grant and aid programs

What did Education Reform in DC Actually Mean?

Short answer: nothing that would actually help students or teachers. But it’s made for well-padded resumes for a handful of insiders.

This is an important review, by the then-director of assessment. His criticisms echo the points that I have been making along with Mary Levy, Erich Martel, Adell Cothorne, and many others.

Looking Back on DC Education Reform 10 Years After, 

Part 1: The Grand Tour 

Part 1: The Grand Tour

Richard P Phelps

Ten years ago, I worked as the Director of Assessments for the District of Columbia Public Schools (DCPS). My tenure coincided with Michelle Rhee’s last nine months as Chancellor. I departed shortly after Vincent Gray defeated Adrian Fenty in the September 2010 DC mayoral primary

My primary task was to design an expansion of that testing program that served the IMPACT teacher evaluation system to include all core subjects and all grade levels. Despite its fame (or infamy), the test score aspect of the IMPACT program affected only 13% of teachers, those teaching either reading or math in grades four through eight. Only those subjects and grade levels included the requisite pre- and post-tests required for teacher “value added” measurements (VAM). Not included were most subjects (e.g., science, social studies, art, music, physical education), grades kindergarten to two, and high school.

Chancellor Rhee wanted many more teachers included. So, I designed a system that would cover more than half the DCPS teacher force, from kindergarten through high school. You haven’t heard about it because it never happened. The newly elected Vincent Gray had promised during his mayoral campaign to reduce the amount of testing; the proposed expansion would have increased it fourfold.

VAM affected teachers’ jobs. A low value-added score could lead to termination; a high score, to promotion and a cash bonus. VAM as it was then structured was obviously, glaringly flawed,[1] as anyone with a strong background in educational testing could have seen. Unfortunately, among the many new central office hires from the elite of ed reform circles, none had such a background.

Before posting a request for proposals from commercial test developers for the testing expansion plan, I was instructed to survey two groups of stakeholders—central office managers and school-level teachers and administrators.

Not surprisingly, some of the central office managers consulted requested additions or changes to the proposed testing program where they thought it would benefit their domain of responsibility. The net effect on school-level personnel would have been to add to their administrative burden. Nonetheless, all requests from central office managers would be honored. 

The Grand Tour

At about the same time, over several weeks of the late Spring and early Summer of 2010, along with a bright summer intern, I visited a dozen DCPS schools. The alleged purpose was to collect feedback on the design of the expanded testing program. I enjoyed these meetings. They were informative, animated, and very well attended. School staff appreciated the apparent opportunity to contribute to policy decisions and tried to make the most of it.

Each school greeted us with a full complement of faculty and staff on their days off, numbering a several dozen educators at some venues. They believed what we had told them: that we were in the process of redesigning the DCPS assessment program and were genuinely interested in their suggestions for how best to do it. 

At no venue did we encounter stand-pat knee-jerk rejection of education reform efforts. Some educators were avowed advocates for the Rhee administration’s reform policies, but most were basically dedicated educators determined to do what was best for their community within the current context. 

The Grand Tour was insightful, too. I learned for the first time of certain aspects of DCPS’s assessment system that were essential to consider in its proper design, aspects of which the higher-ups in the DCPS Central Office either were not aware or did not consider relevant. 

The group of visited schools represented DCPS as a whole in appropriate proportions geographically, ethnically, and by education level (i.e., primary, middle, and high). Within those parameters, however, only schools with “friendly” administrations were chosen. That is, we only visited schools with principals and staff openly supportive of the Rhee-Henderson agenda. 

But even they desired changes to the testing program, whether or not it was expanded. Their suggestions covered both the annual districtwide DC-CAS (or “comprehensive” assessment system), on which the teacher evaluation system was based, and the DC-BAS (or “benchmarking” assessment system), a series of four annual “no-stakes” interim tests unique to DCPS, ostensibly offered to help prepare students and teachers for the consequential-for-some-school-staff DC-CAS.[2]

At each staff meeting I asked for a show of hands on several issues of interest that I thought were actionable. Some suggestions for program changes received close to unanimous support. Allow me to describe several.

1. Move DC-CAS test administration later in the school year. Many citizens may have logically assumed that the IMPACT teacher evaluation numbers were calculated from a standard pre-post test schedule, testing a teacher’s students at the beginning of their academic year together and then again at the end. In 2010, however, the DC-CAS was administered in March, three months before school year end. Moreover, that single administration of the test served as both pre- and post-test, posttest for the current school year and pretest for the following school year. Thus, before a teacher even met their new students in late August or early September, almost half of the year for which teachers were judged had already transpired—the three months in the Spring spent with the previous year’s teacher and almost three months of summer vacation. 

School staff recommended pushing DC-CAS administration to later in the school year. Furthermore, they advocated a genuine pre-post-test administration schedule—pre-test the students in late August–early September and post-test them in late-May–early June—to cover a teacher’s actual span of time with the students.

This suggestion was rejected because the test development firm with the DC-CAS contract required three months to score some portions of the test in time for the IMPACT teacher ratings scheduled for early July delivery, before the start of the new school year. Some small number of teachers would be terminated based on their IMPACT scores, so management demanded those scores be available before preparations for the new school year began.[3] The tail wagged the dog.

2. Add some stakes to the DC-CAS in the upper grades. Because DC-CAS test scores portended consequences for teachers but none for students, some students expended little effort on the test. Indeed, extensive research on “no-stakes” (for students) tests reveal that motivation and effort vary by a range of factors including gender, ethnicity, socioeconomic class, the weather, and age. Generally, the older the student, the lower the test-taking effort. This disadvantaged some teachers in the IMPACT ratings for circumstances beyond their control: unlucky student demographics. 

Central office management rejected this suggestion to add even modest stakes to the upper grades’ DC-CAS; no reason given. 

3. Move one of the DC-BAS tests to year end. If management rejected the suggestion to move DC-CAS test administration to the end of the school year, school staff suggested scheduling one of the no-stakes DC-BAS benchmarking tests for late May–early June. As it was, the schedule squeezed all four benchmarking test administrations between early September and mid-February. Moving just one of them to the end of the year would give the following year’s teachers a more recent reading (by more than three months) of their new students’ academic levels and needs.

Central Office management rejected this suggestion probably because the real purpose of the DC-BAS was not to help teachers understand their students’ academic levels and needs, as the following will explain.

4. Change DC-BAS tests so they cover recently taught content. Many DC citizens probably assumed that, like most tests, the DC-BAS interim tests covered recently taught content, such as that covered since the previous test administration. Not so in 2010. The first annual DC-BAS was administered in early September, just after the year’s courses commenced. Moreover, it covered the same content domain—that for the entirety of the school year—as each of the next three DC-BAS tests. 

School staff proposed changing the full-year “comprehensive” content coverage of each DC-BAS test to partial-year “cumulative” coverage, so students would only be tested on what they had been taught prior to each test administration.

This suggestion, too, was rejected. Testing the same full-year comprehensive content domain produced a predictable, flattering score rise. With each DC-BAS test administration, students recognized more of the content, because they had just been exposed to more of it, so average scores predictably rose. With test scores always rising, it looked like student achievement improved steadily each year. Achieving this contrived score increase required testing students on some material to which they had not yet been exposed, both a violation of professional testing standards and a poor method for instilling student confidence. (Of course, it was also less expensive to administer essentially the same test four times a year than to develop four genuinely different tests.)

5. Synchronize the sequencing of curricular content across the District. DCPS management rhetoric circa 2010 attributed classroom-level benefits to the testing program. Teachers would know more about their students’ levels and needs and could also learn from each other. Yet, the only student test results teachers received at the beginning of each school year was half-a-year old, and most of the information they received over the course of four DC-BAS test administrations was based on not-yet-taught content.

As for cross-district teacher cooperation, unfortunately there was no cross-District coordination of common curricular sequences. Each teacher paced their subject matter however they wished and varied topical emphases according to their own personal preference.

It took DCPS’s Chief Academic Officer, Carey Wright, and her chief of staff, Dan Gordon, less than a minute to reject the suggestion to standardize topical sequencing across schools so that teachers could consult with one another in real time. Tallying up the votes: several hundred school-level District educators favored the proposal, two of Rhee’s trusted lieutenants opposed it. It lost.

6. Offer and require a keyboarding course in the early grades. DCPS was planning to convert all its testing from paper-and-pencil mode to computer delivery within a few years. Yet, keyboarding courses were rare in the early grades. Obviously, without systemwide keyboarding training in computer use some students would be at a disadvantage in computer testing.

Suggestion rejected.

In all, I had polled over 500 DCPS school staff. Not only were all of their suggestions reasonable, some were essential in order to comply with professional assessment standards and ethics. 

Nonetheless, back at DCPS’ Central Office, each suggestion was rejected without, to my observation, any serious consideration. The rejecters included Chancellor Rhee, the head of the office of Data and Accountability—the self-titled “Data Lady,” Erin McGoldrick—and the head of the curriculum and instruction division, Carey Wright, and her chief deputy, Dan Gordon. 

Four central office staff outvoted several-hundred school staff (and my recommendations as assessment director). In each case, the changes recommended would have meant some additional work on their parts, but in return for substantial improvements in the testing program. Their rhetoric was all about helping teachers and students; but the facts were that the testing program wasn’t structured to help them.

What was the purpose of my several weeks of school visits and staff polling? To solicit “buy in” from school level staff, not feedback.

Ultimately, the new testing program proposal would incorporate all the new features requested by senior Central Office staff, no matter how burdensome, and not a single feature requested by several hundred supportive school-level staff, no matter how helpful. Like many others, I had hoped that the education reform intention of the Rhee-Henderson years was genuine. DCPS could certainly have benefitted from some genuine reform. 

Alas, much of the activity labelled “reform” was just for show, and for padding resumes. Numerous central office managers would later work for the Bill and Melinda Gates Foundation. Numerous others would work for entities supported by the Gates or aligned foundations, or in jurisdictions such as Louisiana, where ed reformers held political power. Most would be well paid. 

Their genuine accomplishments, or lack thereof, while at DCPS seemed to matter little. What mattered was the appearance of accomplishment and, above all, loyalty to the group. That loyalty required going along to get along: complicity in maintaining the façade of success while withholding any public criticism of or disagreement with other in-group members.

Unfortunately, in the United States what is commonly showcased as education reform is neither a civic enterprise nor a popular movement. Neither parents, the public, nor school-level educators have any direct influence. Rather, at the national level, US education reform is an elite, private club—a small group of tightly-connected politicos and academicsa mutual admiration society dedicated to the career advancement, political influence, and financial benefit of its members, supported by a gaggle of wealthy foundations (e.g., Gates, Walton, Broad, Wallace, Hewlett, Smith-Richardson). 

For over a decade, The Ed Reform Club exploited DC for its own benefit. Local elite formed the DC Public Education Fund (DCPEF) to sponsor education projects, such as IMPACT, which they deemed worthy. In the negotiations between the Washington Teachers’ Union and DCPS concluded in 2010, DCPEF arranged a 3 year grant of $64.5M from the Arnold, Broad, Robertson and Walton Foundations to fund a 5-year retroactive teacher pay raise in return for contract language allowing teacher excessing tied to IMPACT, which Rhee promised would lead to annual student test score increases by 2012. Projected goals were not metfoundation support continued nonetheless.

Michelle Johnson (nee Rhee) now chairs the board of a charter school chain in California and occasionally collects $30,000+ in speaker fees but, otherwise, seems to have deliberately withdrawn from the limelight. Despite contributing her own additional scandalsafter she assumed the DCPS Chancellorship, Kaya Henderson ascended to great fame and glory with a “distinguished professorship” at Georgetown; honorary degrees from Georgetown and Catholic Universities; gigs with the Chan Zuckerberg Initiative, Broad Leadership Academy, and Teach for All; and board memberships with The Aspen Institute, The College Board, Robin Hood NYC, and Teach For America. Carey Wright is now state superintendent in Mississippi. Dan Gordon runs a 30-person consulting firm, Education Counsel that strategically partners with major players in US education policy. The manager of the IMPACT teacher evaluation program, Jason Kamras, now works as Superintendent of the Richmond, VA public schools. 

Arguably the person most directly responsible for the recurring assessment system fiascos of the Rhee-Henderson years, then Chief of Data and Accountability Erin McGoldrick, now specializes in “data innovation” as partner and chief operating officer at an education management consulting firm. Her firm, Kitamba, strategically partners with its own panoply of major players in US education policy. Its list of recent clients includes the DC Public Charter School Board and DCPS.

If the ambitious DC central office folk who gaudily declared themselves leading education reformers were not really, who were the genuine education reformers during the Rhee-Henderson decade of massive upheaval and per-student expenditures three times those in the state of Utah? They were the school principals and staff whose practical suggestions were ignored by central office glitterati. They were whistleblowers like history teacher Erich Martel who had documented DCPS’ student records’ manipulation and phony graduation rates years before the Washington Post’s celebrated investigation of Ballou High School, and was demoted and then “excessed” by Henderson. Or, school principal Adell Cothorne, who spilled the beans on test answer sheet “erasure parties” at Noyes Education Campus and lost her job under Rhee. 

Real reformers with “skin in the game” can’t play it safe.

The author appreciates the helpful comments of Mary Levy and Erich Martel in researching this article. 

Against Proposed DoE Regulations on ESSA

This is from Monty Neill:


Dear Friends,

The U.S. Department of Education (DoE) has drafted regulations for
implementing the accountability provisions of the Every Student Succeeds
Act (ESSA). The DOE proposals would continue test-and-punish practices
imposed by the failed No Child Left Behind (NCLB) law. The draft
over-emphasizes standardized exam scores, mandates punitive
interventions not required in law, and extends federal micro-management.
The draft regulations would also require states to punish schools in
which larger numbers of parents refuse to let their children be tested.
When DoE makes decisions that should have been set locally in
partnership with educators, parents, and students, it takes away local
voices that ESSA tried to restore.

You can help push back against these dangerous proposals in two ways:

First, tell DoE it must drop harmful proposed regulations. You can
simply cut and paste the Comment below into DoE’s website at!submitComment;D=ED-2016-OESE-0032-0001
or adapt it into your own words. (The text below is part of FairTest’s
submission.) You could emphasize that the draft regulations steal the
opportunity ESSA provides for states and districts to control
accountability and thereby silences the voice of educators, parents,
students and others.

Second, urge Congress to monitor the regulations. Many Members have
expressed concern that DoE is trying to rewrite the new law, not draft
appropriate regulations to implement it. Here’s a letter you can easily
send to your Senators and Representative asking them to tell leaders of
Congress’ education committees to block DoE’s proposals:

Together, we can stop DoE’s efforts to extend NLCB policies that the
American people and Congress have rejected.


Note: DoE website has a character limit; if you add your own comments,
you likely will need to cut some of the text below:

*/You can cut and paste this text into the DoE website:/*

I support the Comments submitted by FairTest on June 15 (Comment #).
Here is a slightly edited version:

While the accountability provision in the Every Student Succeeds Act
(ESSA) are superior to those in No Child Left Behind (NCLB), the
Department of Education’s (DoE) draft regulations intensify ESSA’s worst
aspects and will perpetuate many of NCLB’s most harmful practices. The
draft regulations over-emphasize testing, mandate punishments not
required in law, and continue federal micro-management. When DoE makes
decisions that should be set at the state and local level in partnership
with local educators, parents, and students, it takes away local voices
that ESSA restores. All this will make it harder for states, districts
and schools to recover from the educational damage caused by NLCB – the
very damage that led Congress to fundamentally overhaul NCLB’s
accountability structure and return authority to the states.

The DoE must remove or thoroughly revise five draft regulations:

_DoE draft regulation 200.15_ would require states to lower the ranking
of any school that does not test 95% of its students or to identify it
as needing “targeted support.” No such mandate exists in ESSA. This
provision violates statutory language that ESSA does not override “a
State or local law regarding the decision of a parent to not have the
parent’s child participate in the academic assessments.” This regulation
appears designed primarily to undermine resistance to the overuse and
misuse of standardized exams.

_Recommendation:_ DoE should simply restate ESSA language allowing the
right to opt out as well as its requirements that states test 95% of
students in identified grades and factor low participation rates into
their accountability systems. Alternatively, DoE could write no
regulation at all. In either case, states should decide how to implement
this provision.

_DoE draft regulation 200.18_ transforms ESSA’s requirement for
“meaningful differentiation” among schools into a mandate that states
create “at least three distinct levels of school performance” for each
indicator. ESSA requires states to identify their lowest performing five
percent of schools as well as those in which “subgroups” of students are
doing particularly poorly. Neither provision necessitates creation of
three or more levels. This proposal serves no educationally useful
purpose. Several states have indicated they oppose this provision
because it obscures rather than enhances their ability to precisely
identify problems and misleads the public. This draft regulation would
pressure schools to focus on tests to avoid being placed in a lower
level. Performance levels are also another way to attack schools in
which large numbers of parents opt out, as discussed above.

_DoE draft regulation 200.18_ also mandates that states combine multiple
indicators into a single “summative” score for each school. As Rep. John
Kline, chair of the House Education Committee, pointed out, ESSA
includes no such requirement. Summative scores are simplistically
reductive and opaque. They encourage the flawed school grading schemes
promoted by diehard NCLB defenders.

_Recommendation:_ DoE should drop this draft regulation. It should allow
states to decide how to use their indicators to identify schools and
whether to report a single score. Even better, the DoE should encourage
states to drop their use of levels.

_DoE draft regulation 200.18_ further proposes that a state’s academic
indicators together carry “much greater” weight than its “school
quality” (non-academic) indicators. Members of Congress differ as to the
intent of the relevant ESSA passage. Some say it simply means more than
50%, while others claim it implies much more than 50%. The phrase “much
greater” is likely to push states to minimize the weight of non-academic
factors in order to win plan approval from DOE, especially since the
overall tone of the draft regulations emphasizes testing.

_Recommendation: _The regulations should state that the academic
indicators must count for more than 50% of the weighting in how a state
identifies schools needing support.

_DoE draft regulation 200.18_ also exceeds limits ESSA placed on DoE
actions regarding state accountability plans.

_DoE draft regulation 200.19_ would require states to use 2016-17 data
to select schools for “support and improvement” in 2017-18. This leaves
states barely a year for implementation, too little time to overhaul
accountability systems. It will have the harmful consequence of
encouraging states to keep using a narrow set of test-based indicators
and to select only one additional “non-academic” indicator.

_Recommendation:_ The regulations should allow states to use 2017-18
data to identify schools for 2018-19. This change is entirely consistent
with ESSA’s language.

Lastly, we are concerned that an additional effect of these unwarranted
regulations will be to unhelpfully constrain states that choose to
participate in ESSA’s “innovative assessment” program.

Monty Neill, Ed.D.; Executive Director, FairTest; P.O. Box 300204,
Jamaica Plain, MA 02130; 617-477-9792;; Donate
to FairTest:

A Six-Page Graphic on How to Persuade Parents that Common Core Testing is Really Wonderful

I will copy part of one page so you can get the flavor of it without following it. Don’t know whom to attribute this to, since it’s apparently unsigned. It looks like it has Gates or Walton money behind it, since it’s so slick.

How would you argue against these arguments?

Thanks to Anthony Cody for uncovering this.

how to talk about testing

Just how flat ARE those 12th grade NAEP scores?

Perhaps you read or heard that the 12th grade NAEP reading and math scores, which just got reported, were “flat“.

Did you wonder what that meant?

The short answer is: those scores have essentially not changed since they began giving the tests! Not for the kids at the top of the testing heap, not for those at the bottom, not for blacks, not for whites, not for hispanics.

No change, nada, zip.

Not even after a full dozen years of Bush’s looney No Child Left Behind Act, nor its twisted Obama-style descendant, Race to the Trough. Top.

I took a look at the official reports and I’ve plotted them here you can see how little effect all those billions spent on testing;  firing veteran teachers; writing and publishing new tests and standards; and opening thousands of charter schools has had.

Here are the tables:

naep 12th grade reading by percentiles over time

This first graph shows that other than a slight widening of the gap between the kids at the top (at the 90th percentile) and those at the bottom (at the 10th percentile) back in the early 1990s, there has been essentially no change in the average scores over the past two full decades.

I think we can assume that the test makers, who are professional psychometricians and not political appointees, tried their very best to make the test of equal difficulty every year. So those flat lines mean that there has been no change, despite all the efforts of the education secretaries of Clinton, Bush 2, and Obama. And despite the wholesale replacement of an enormous fraction of the nation’s teachers, and the handing over of public education resources to charter school operators.

naep 12th grade reading by group over time


This next graph shows much the same thing, but the data is broken down into ethnic/racial groups. Again, these lines are about as flat (horizontal) as you will ever see in the social sciences,

However, I think it’s instructive to note that the gap between, say, Hispanic and Black students on the one hand, and White and Asian students on the other, is much smaller than the gap between the 10th and 90th percentiles we saw in the very first graph: about 30 points as opposed to almost 100 points.
naep 12th grade math by percentiles over time


The third graph shows the  NAEP math scores for 12th graders since 2005, since that was the first time that the test was given. The psychometricians atNAEP claim there has been a :statistically significant” change since 2005 in some of those scores, but I don’t really see it. Being “statistically significant’ and being REALLY significant are two different things.

*Note: the 12th grade Math NAEP was given for the first time in 2005, unlike the 12th grade reading test.

naep 12th grade math by group over time


And here we have the same data broken down by ethnic/racial groups. Since 2009 there has been essentially no change, and there was precious little before that, except for Asian students.

Diane Ravitch correctly dismissed all of this as a sign that everything that Rod Paige, Margaret Spellings and Arne Duncan have done, is a complete and utter failure. Her conclusion, which I agree with, is that NCLB and RTTT need to be thrown out.


One of the The Things Wrong With Testing: They Are Invalid to Begin With!

A Test Writer Comments on New York’s Common Core Tests

by dianerav

This comment was posted yesterday:

I am a former, part time item writer for a private testing company; I wrote for many different state standards under NCLB. I must say that poorly constructed, confusing, or developmentally inappropriate items undermine the validity of standardized scores and subsequent use in teacher evaluation. When standardized tests are properly constructed, such items which might make it to a field test will almost certainly be vetted during what is typically a two year process. Many items on the Pearson math and ELA administered last April here in NY were written, in my opinion, in an intentionally confusing style using obtuse or arcane vocabulary. The ELA test in particular included confusing item stems and distractors that were not clearly wrong. There were far too many items that turned subjective opinions (most likely; best; author’s intent; etc.) into a “one right, three wrong” format. Many teachers were unsure of the correct answers on a number of vague and fuzzy items.
The math test included many items that were ridiculously convoluted. Although there may be other compelling arguments against VAM teacher evaluations, corrupt test writing, norm referencing (instead of criterion referenced scoring), and manipulating cut scores add up to a rather important set of reasons to invalidate the entire process.

Published in: on October 23, 2013 at 2:44 pm  Comments (2)  
Another Weekly Roundup of News on the Movement Against Testing Mania

This is from Bob Schaeffer at FairTest, as usual:


How much more evidence do policy makers need before they recognize that test-and-punish policies have failed? Learning gains have stagnated, progress toward closing the “achievement gap” has stalled, and their constituents increasingly reject the strategy. Even some of their strongest newspaper editorial page allies — including the New York Times and Los Angeles Times — are saying that it is time to look at alternative approaches.  Enough is enough!

Major National Survey Finds Parents Strongly Oppose Standardized Testing Misuse and Overuse

Standardized Tests Take Over the School Day

High-Stakes Testing Leads to Increased Incarceration
See FairTest Infographic “How High-Stakes Testing Feeds the School-to-Prison Pipeline

Support for Common Core Testing Declines Dramatically
see FairTest fact sheet: Common Core Assessments: More Tests But Not Much Better

ACLU Sues Rhode Island Over Grad Test Requirement

Florida School Grading System a “Politically Manipulated Scam”
Convoluted School Grading System Fails. Yet Jeb Bush’s Disciples Are Pushing It in Other States

Erase to the Top: Tainted Philadelphia Scores Demonstrate Flaws of Test-Driven School “Reform”

Widespread Test Cheating at New Orleans Charters and “Reconstruction” Schools

Oklahoma (and Indiana) Seek Test-Company Penalties for Exam Screw-ups

Why Are Texas Test-Takers Being Held More Accountable Than Test-Makers 

Standardized Testing’s Casualties — another excellent Letter to the Editor,0,6581072.story

Minnesota Plans K-12 Testing “Without the Tears and Trauma”

Focus on Test Scores Misses the Point on Urban Education

State Testing Makes Kids Hate School

Students Tell Policy-Makers: Stop Corporate Ed “Reform” and High-Stakes Testing

Key School “Reform” Questions Beg for Answers

Five Basic Lessons on Public Education

New Books on Assessment Reform
“The Mismeasure of Education” —
“Left Behind in the Race to the Top” —

Bob Schaeffer, Public Education Director
FairTest: National Center for Fair & Open Testing
ph- (239) 395-6773  fax- (239) 395-6779
cell- (239) 699-0468

If the tests by which all of education is to be measured by are garbage, then so are the results

On this blog I have reprinted examples of what I see are crappy test items and dissected them, hoping to show readers that those items neither made sense nor measured what they are purported to measure.

However, I never worked inside the testing industry itself, so I don’t have direct experience of making up BS test items on an industrial scale.* My own experience, however, is that EVERY test — no matter how good — has validity and reliability problems. This passage shows that the tests on which all US educational decisions are supposed to be based are, in fact, ridiculously badly made from the beginning, and cannot possibly measure what they pretend to measure, are unreliable, and thus utterly invalid.  (Plus the tests are snatching at least potentially valuable class time away from our students, while enabling a handful of big corporations like Pearson (more on which below) are raking in huge dividends because they control almost the entire education market.)


This comes from an interview published by Diane Ravitch ( )

Rebecca Rubenstein: Since your book was published in 2009, has the “standardized” testing industry improved?

Todd Farley: Not the slightest bit. There was a story in The New York Times in 2001 about how test-scoring was a wildly out-of-control industry, which quotes various employees—not me!—as saying that they faced “too little time, too much to do, not enough people.” It implies the industry was doing a terribly suspect job. Since then, the industry is about a hundred times bigger, but those problems mentioned in the Times article or in my book have never been addressed. The industry has simply grown exponentially, and there are hundreds of millions of dollars to be earned by companies that are completely unregulated—to repeat, completely unregulated, so whatever Pearson et. al. tell us, we’re supposed to say “thank you very much” and just write them a staggeringly large check—but of course things haven’t gotten any better.

In my time in test-scoring, we never had enough temporary employees to do the work; we always had too much to do and too little time to do it; and there were always financial punishments looming over our heads if we didn’t get things done. We cut whatever corners we could to get it done (I’m sorry to say). Today the work load is a hundred times bigger and the money to be made is a hundred times bigger, but the system didn’t work to begin with and of course it doesn’t work now.

The same is true in the test development business. When I worked for one publisher as a test developer, it was always a madcap race to get tests written on time, and we faced absurd deadlines and pressure to do so. The reality is that quality was always secondary to the bottom line when developing tests, and then when the Common Core standards were introduced, and tests and products needed to be written for them, our deadlines became laughably absurd; I was once involved in the development of 200 tests in two months, which I think is literally more tests than ETS has produced in its entire existence. With the Common Core standards released, all the companies knew all the other companies were racing to finish their tests and products first, so quality became even worse than secondary. It became tertiary, or “fourthiary,” or whatever. Subcontractors who had been fired for poor work were rehired; item writers were hired off Craigslist; test developers with neither teaching experience nor test development experience were given full-time jobs. It’s important to remember that at the end of the day, companies like Pearson are for-profit enterprises. They want to make money. They want to make money, so of course they do a crappy job, because the quality of the work is never anywhere near as important as their desire to make a profit, and there’s always too much work and too little time to do it.

continue reading …

A comment: I was at first skeptical of the “200 tests” mentioned being more that the ETS has created in its entire existence. But I think he may be right: The SAT is essentially one, or two, or three tests, depending on how you look at it; it just gets revised a little bit each year. Reading, Math, and Writing. Plus, there perhaps a couple of score different Advanced Placement (AP) tests and Achievement tests in different subjects; they get revised every year, at least they do in the field of math (which I follow, of course) and others.

But what Pearson is doing now is essentially trying to replace the teacher in every single grade level, for every single course, by making the entire curriculum driven by the tests and pre-tests and practice tests and test prep material provided by them.  Yes, I do mean all of third grade. Yes, I do mean 6th grade science, music appreciation, and geography and PE. Every class. And if you count every single course or subject area that a student might be measured by from Pre-K-3 all the way up to graduating from high school, that might in fact be roughly 200 brand-new test series! Not just end-of-course tests, by no means. A different corporate multiple-choice test every month or two!

All this corporate educa-crap is just that: crap forced down the throat of public school kids and ONLY kids in public schools.

And it won’t improve a damned thing. Except for corporate bottom lines.

Of course the children or grandchildren of Michelle Rhee, Michael Bloomberg, Arne Duncan, Eli Broad, Bill Gates, the Koch brothers, and Barack Obama will never, ever be subjected to such a poor excuse for an education.

Oh, no.

That’s just for the poor black and latino and white kids who are in high-poverty regions; the only way they can opt out is to go to a charter school which might be doing any damned thing and is almost sure to be even more segregated than the nearest public school, if that’s even possible.

This is progress?


* My students and I often found mistakes on tests and quizzes and assignments I made up. I used to congratulate the student and give him/her/them a point when they pointed out an error. ETS and Pearson’s responses have been rather different. Remember the famous talking pineapple question? And do you recall that essentially no-one has ever been able to explain, line by line, number by number, exactly how ANY single teacher’s VAM numbers were calculated? Has any school district ever released data showing how well VAM and supposedly ‘scientific’ classroom observation data correlate with each other? (Hint: they don’t!!)

Once again, let me urge the leadership of the Washington Teachers’ Union, and teacher unions elsewhere, to enlist a good statistician with his/her feet on the ground, and poke holes in VAM. It’s all a tissue of fabrications.

Reform the Tests! As they are, they don’t test anything important!

A brilliant article by Marion Brady, reprinted by Valerie Strauss at the Washington Post.

Brady points out that what we are actually testing with NCLB, RTTT and so in is worse than useless. What needs to happen is that the tests themselves need to be drastically changed in ways that actually teach higher-order thinking skills. I only quote a small  excerpt to try to get you to read the entire, well-reasoned article:

” If higher order thinking skills are tested, teachers will teach them. Those who don’t know how will quickly learn.

 Of course, Pearson, McGraw-Hill, Educational Testing Service, and other test manufacturers aren’t going to volunteer to test student-initiated higher order thinking skills. Neither are the politicians they help elect and re-elect going to make them even try to do so unless they think voters give them no alternative.

So voters should give them no alternative. Unless politicians and test manufacturers can make a convincing case for not teaching the young to think, they should be told what they’ve been telling teachers who say standardized tests are a waste of time and money: “No excuses!”

It’s likely that nothing short of binding agreements between states and test manufacturers will yield the new tests. To that end, in appropriate legal language, contracts should make clear that (a) every test question in every subject will evaluate a particular, named thinking skill, (b) every test will evaluate a balanced mix of all known thinking skills, and (c) a panel of experts not connected to test manufacturers or politicians will preview all test items to assure contract compliance. No excuses.

Fairtest, Parents Across America, United Opt Out National, and other state and local organizations have strategies in place to try to persuade. Petitions and referendums invite signers. Parents, grandparents — indeed, all who care about kids and country — should get on board.

 No more multimillion dollar checks for tests that no one but manufacturers are allowed to see. No more tests the pass-fail cut scores of which can be raised and lowered to make political points. No more kids labeled and discarded, every one with a brain wired to do all sorts of amazing things. If storing trivia in short-term memory doesn’t happen to be one of those things, that shouldn’t put them out of school and on the street.”

Signs of Backlash Against Excessive Student Testing — in Texas, of all places

Signs of change?

A number of parents, teachers, AND administrators in Texas, of all places, are beginning to pull out from, or protest against, the huge number of standardized machine-scored tests that they feel are sucking the life out of education. Or that’s what it describes in this article in the New York Times today 2/4/12.

A few excerpts:

In the Panhandle, the Hereford Independent School District superintendent may withhold her district’s test scores from the state. An Austin parent is considering a lawsuit to stop the rollout of the tests. Some legislators are mulling how to postpone some of the tests’ consequences for students.

In a high-level turnaround, Robert Scott, the commissioner of the Texas Education Agency, said Tuesday that student testing in the state had become a “perversion of its original intent” and that he looked forward to “reeling it back” in the future. Earning a standing ovation from an annual gathering of 4,000 educators that has given him chillier receptions in the past, Mr. Scott called for an accountability process that measured “every other day of a school’s life besides testing day.”

Many viewed the speech as a reversal for Mr. Scott, who has rarely spoken publicly against the role of standardized testing in public schools. He declined to talk about his remarks for this article.

“I think he sees that we are at a cusp of philosophical changes in the Legislature and across the state over what we’ve been doing the past few years with accountability and whether there’s been any worthwhile gain from all the testing we’ve done,” said Joe Smith, a former superintendent […]

Kelli Moulton, the superintendent of Hereford I.S.D., is considering an outright rebellion. She said that she was still exploring the repercussions of refusing to send her students’ test scores to the agency but that she was encouraged by Mr. Scott’s remarks.

“We talk a lot, but nobody’s stepped off to do anything really bold,” she said. “Clearly now as a state, at least with a leader who is willing to say testing has gone too far, when do we put a stick in a wheel and say, that’s enough, stop? Because we are going to spend the next 10 years trying to slow that wheel down, and we’ve got 10 years of kids that are suffering.”

It also may be a sign of shifting political tides. […]

What would it take to get a real public uprising against the destruction of our public school system? How do we organize a real movement in favor of having a free, publicly-funded and -run, enriching, engaging and useful education for all of our students?   


Published in: on February 4, 2012 at 6:41 pm  Comments (1)  
