Part Two: Cheating in DCPS

DC Education Reform Ten Years After, 

Part 2: Test Cheats

Richard P Phelps

Ten years ago, I worked as the Director of Assessments for the District of Columbia Public Schools (DCPS). For temporal context, I arrived after the first of the infamous test cheating scandals and left just before the incident that spawned a second. Indeed, I filled a new position created to both manage test security and design an expanded testing program. I departed shortly after Vincent Gray, who opposed an expanded testing program, defeated Adrian Fenty in the September 2010 DC mayoral primary. My tenure coincided with Michelle Rhee’s last nine months as Chancellor. 

The recurring test cheating scandals of the Rhee-Henderson years may seem extraordinary but, in fairness, DCPS was more likely than the average US school district to be caught because it received a much higher degree of scrutiny. Given how tests are typically administered in this country, the incidence of cheating is likely far greater than news accounts suggest, for several reasons: 

·      in most cases, those who administer tests—schoolteachers and administrators—have an interest in their results;

·      test security protocols are numerous and complicated yet, nonetheless, the responsibility of non-expert ordinary school personnel, guaranteeing their inconsistent application across schools and over time; 

·      after-the-fact statistical analyses are not legal proof—the odds of a certain amount of wrong-to-right erasures in a single classroom on a paper-and-pencil test being coincidental may be a thousand to one, but one-in-a-thousand is still legally plausible; and

·      after-the-fact investigations based on interviews are time-consuming, scattershot, and uneven. 

Still, there were measures that the Rhee-Henderson administrations could have adopted to substantially reduce the incidence of cheating, but they chose none that might have been effective. Rather, they dug in their heels, insisted that only a few schools had issues, which they thoroughly resolved, and repeatedly denied any systematic problem.  

Cheating scandals

From 2007 to 2009 rumors percolated of an extraordinary level of wrong-to-right erasures on the test answer sheets at many DCPS schools. “Erasure analysis” is one among several “red flag” indicators that testing contractors calculate to monitor cheating. The testing companies take no responsibility for investigating suspected test cheating, however; that is the customer’s, the local or state education agency. 

In her autobiographical account of her time as DCPS Chancellor, Michelle Johnson (nee Rhee), wrote (p. 197)

“For the first time in the history of DCPS, we brought in an outside expert to examine and audit our system. Caveon Test Security – the leading expert in the field at the time – assessed our tests, results, and security measures. Their investigators interviewed teachers, principals, and administrators.

“Caveon found no evidence of systematic cheating. None.”

Caveon, however, had not looked for “systematic” cheating. All they did was interview a few people at several schools where the statistical anomalies were more extraordinary than at others. As none of those individuals would admit to knowingly cheating, Caveon branded all their excuses as “plausible” explanations. That’s it; that is all that Caveon did. But, Caveon’s statement that they found no evidence of “widespread” cheating—despite not having looked for it—would be frequently invoked by DCPS leaders over the next several years.[1]

Incidentally, prior to the revelation of its infamous decades-long, systematic test cheating, the Atlanta Public Schools had similarly retained Caveon Test Security and was, likewise, granted a clean bill of health. Only later did the Georgia state attorney general swoop in and reveal the truth. 

In its defense, Caveon would note that several cheating prevention measures it had recommended to DCPS were never adopted.[2] None of the cheating prevention measures that I recommended were adopted, either.

The single most effective means for reducing in-classroom cheating would have been to rotate teachers on test days so that no teacher administered a test to his or her own students. It would not have been that difficult to randomly assign teachers to different classrooms on test days.

The single most effective means for reducing school administratorcheating would have been to rotate test administrators on test days so that none managed the test materials for their own schools. The visiting test administrators would have been responsible for keeping test materials away from the school until test day, distributing sealed test booklets to the rotated teachers on test day, and for collecting re-sealed test booklets at the end of testing and immediately removing them from the school. 

Instead of implementing these, or a number of other feasible and effective test security measures, DCPS leaders increased the number of test proctors, assigning each of a few dozen or so central office staff a school to monitor. Those proctors could not reasonably manage the volume of oversight required. A single DC test administration could encompass a hundred schools and a thousand classrooms.

Investigations

So, what effort, if any, did DCPS make to counter test cheating? They hired me, but then rejected all my suggestions for increasing security. Also, they established a telephone tip line. Anyone who suspected cheating could report it, even anonymously, and, allegedly, their tip would be investigated. 

Some forms of cheating are best investigated through interviews. Probably the most frequent forms of cheating at DCPS—teachers helping students during test administrations and school administrators looking at test forms prior to administration—leave no statistical residue. Eyewitness testimony is the only type of legal evidence available in such cases, but it is not just inconsistent, it may be socially destructive. 

I remember two investigations best: one occurred in a relatively well-to-do neighborhood with well-educated parents active in school affairs; the other in one of the city’s poorest neighborhoods. Superficially, the cases were similar—an individual teacher was accused of helping his or her own students with answers during test administrations. Making a case against either elementary school teacher required sworn testimony from eyewitnesses, that is, students—eight-to-ten-year olds. 

My investigations, then, consisted of calling children into the principal’s office one-by-one to be questioned about their teacher’s behavior. We couldn’t hide the reason we were asking the questions. And, even though each student agreed not to tell others what had occurred in their visit to the principal’s office, we knew we had only one shot at an uncorrupted jury pool. 

Though the accusations against the two teachers were similar and the cases against them equally strong, the outcomes could not have been more different. In the high-poverty neighborhood, the students seemed suspicious and said little; none would implicate the teacher, whom they all seemed to like. 

In the more prosperous neighborhood, students were more outgoing, freely divulging what they had witnessed. The students had discussed the alleged coaching with their parents who, in turn, urged them to tell investigators what they knew. During his turn in the principal’s office, the accused teacher denied any wrongdoing. I wrote up each interview, then requested that each student read and sign. 

Thankfully, that accused teacher made a deal and left the school system a few weeks later. Had he not, we would have required the presence in court of the eight-to-ten-year olds to testify under oath against their former teacher, who taught multi-grade classes. Had that prosecution not succeeded, the eyewitness students could have been routinely assigned to his classroom the following school year.

My conclusion? Only in certain schools is the successful prosecution of a cheating teacher through eyewitness testimony even possible. But, even where possible, it consumes inordinate amounts of time and, otherwise, comes at a high price, turning young innocents against authority figures they naturally trusted. 

Cheating blueprints

Arguably the most widespread and persistent testing malfeasance in DCPS received little attention from the press. Moreover, it was directly propagated by District leaders, who published test blueprints on the web. Put simply, test “blueprints” are lists of the curricular standards (e.g., “student shall correctly add two-digit numbers”) and the number of test items included in an upcoming test related to each standard. DC had been advance publishing its blueprints for years.

I argued that the way DC did it was unethical. The head of the Division of Data & Accountability, Erin McGoldrick, however, defended the practice, claimed it was common, and cited its existence in the state of California as precedent. The next time she and I met for a conference call with one of DCPS’s test providers, Discover Education, I asked their sales agent how many of their hundreds of other customers advance-published blueprints. His answer: none.

In the state of California, the location of McGoldrick’s only prior professional experience, blueprints were, indeed, published in advance of test administrations. But their tests were longer than DC’s and all standards were tested. Publication of California’s blueprints served more to remind the populace what the standards were in advance of each test administration. Occasionally, a standard considered to be of unusual importance might be assigned a greater number of test items than the average, and the California blueprints signaled that emphasis. 

In Washington, DC, the tests used in judging teacher performance were shorter, covering only some of each year’s standards. So, DC’s blueprints showed everyone well in advance of the test dates exactly which standards would be tested and which would not. For each teacher, this posed an ethical dilemma: should they “narrow the curriculum” by teaching only that content they knew would be tested? Or, should they do the right thing and teach all the standards, as they were legally and ethically bound to, even though it meant spending less time on the to-be-tested content? It’s quite a conundrum when one risks punishment for behaving ethically.

Monthly meetings convened to discuss issues with the districtwide testing program, the DC Comprehensive Assessment System (DC-CAS)—administered to comply with the federal No Child Left Behind (NCLB) Act. All public schools, both DCPS and charters, administered those tests. At one of these regular meetings, two representatives from the Office of the State Superintendent of Education (OSSE) announced plans to repair the broken blueprint process.[3]

The State Office employees argued thoughtfully and reasonably that it was professionally unethical to advance publish DC test blueprints. Moreover, they had surveyed other US jurisdictions in an effort to find others that followed DC’s practice and found none. I was the highest-ranking DCPS employee at the meeting and I expressed my support, congratulating them for doing the right thing. I assumed that their decision was final.

I mentioned the decision to McGoldrick, who expressed surprise and speculation that it might have not been made at the highest level in the organizational hierarchy. Wasting no time, she met with other DCPS senior managers and the proposed change was forthwith shelved. In that, and other ways, the DCPS tail wagged the OSSE dog. 

* * *

It may be too easy to finger ethical deficits for the recalcitrant attitude toward test security of the Rhee-Henderson era ed reformers. The columnist Peter Greene insists that knowledge deficits among self-appointed education reformers also matter: 

“… the reformistan bubble … has been built from Day One without any actual educators inside it. Instead, the bubble is populated by rich people, people who want rich people’s money, people who think they have great ideas about education, and even people who sincerely want to make education better. The bubble does not include people who can turn to an Arne Duncan or a Betsy DeVos or a Bill Gates and say, ‘Based on my years of experience in a classroom, I’d have to say that idea is ridiculous bullshit.’”

“There are a tiny handful of people within the bubble who will occasionally act as bullshit detectors, but they are not enough. The ed reform movement has gathered power and money and set up a parallel education system even as it has managed to capture leadership roles within public education, but the ed reform movement still lacks what it has always lacked–actual teachers and experienced educators who know what the hell they’re talking about.”

In my twenties, I worked for several years in the research department of a state education agency. My primary political lesson from that experience, consistently reinforced subsequently, is that most education bureaucrats tell the public that the system they manage works just fine, no matter what the reality. They can get away with this because they control most of the evidence and can suppress it or spin it to their advantage.

In this proclivity, the DCPS central office leaders of the Rhee-Henderson era proved themselves to be no different than the traditional public-school educators they so casually demonized. 

US school systems are structured to be opaque and, it seems, both educators and testing contractors like it that way. For their part, and contrary to their rhetoric, Rhee, Henderson, and McGoldrick passed on many opportunities to make their system more transparent and accountable.

Education policy will not improve until control of the evidence is ceded to genuinely independent third parties, hired neither by the public education establishment nor by the education reform club.

The author gratefully acknowledges the fact-checking assistance of Erich Martel and Mary Levy.

Access this testimonial in .pdf format

Citation:  Phelps, R. P. (2020, September). Looking Back on DC Education Reform 10 Years After, Part 2: Test Cheats. Nonpartisan Education Review / Testimonials. https://nonpartisaneducation.org/Review/Testimonials/v16n3.htm


[1] A perusal of Caveon’s website clarifies that their mission is to help their clients–state and local education departments–not get caught. Sometimes this means not cheating in the first place; other times it might mean something else. One might argue that, ironically, Caveon could be helping its clients to cheat in more sophisticated ways and cover their tracks better.

[2] Among them: test booklets should be sealed until the students open them and resealed by the students immediately after; and students should be assigned seats on test day and a seating chart submitted to test coordinators (necessary for verifying cluster patterns in student responses that would suggest answer copying).

[3] Yes, for those new to the area, the District of Columbia has an Office of the “State” Superintendent of Education (OSSE). Its domain of relationships includes not just the regular public schools (i.e., DCPS), but also other public schools (i.e., charters) and private schools. Practically, it primarily serves as a conduit for funneling money from a menagerie of federal education-related grant and aid programs

What did Education Reform in DC Actually Mean?

Short answer: nothing that would actually help students or teachers. But it’s made for well-padded resumes for a handful of insiders.

This is an important review, by the then-director of assessment. His criticisms echo the points that I have been making along with Mary Levy, Erich Martel, Adell Cothorne, and many others.

Nonpartisan Education Review / Testimonials

Access this testimonial in .pdf format

Looking Back on DC Education Reform 10 Years After, 

Part 1: The Grand Tour

Richard P Phelps

Ten years ago, I worked as the Director of Assessments for the District of Columbia Public Schools (DCPS). My tenure coincided with Michelle Rhee’s last nine months as Chancellor. I departed shortly after Vincent Gray defeated Adrian Fenty in the September 2010 DC mayoral primary

My primary task was to design an expansion of that testing program that served the IMPACT teacher evaluation system to include all core subjects and all grade levels. Despite its fame (or infamy), the test score aspect of the IMPACT program affected only 13% of teachers, those teaching either reading or math in grades four through eight. Only those subjects and grade levels included the requisite pre- and post-tests required for teacher “value added” measurements (VAM). Not included were most subjects (e.g., science, social studies, art, music, physical education), grades kindergarten to two, and high school.

Chancellor Rhee wanted many more teachers included. So, I designed a system that would cover more than half the DCPS teacher force, from kindergarten through high school. You haven’t heard about it because it never happened. The newly elected Vincent Gray had promised during his mayoral campaign to reduce the amount of testing; the proposed expansion would have increased it fourfold.

VAM affected teachers’ jobs. A low value-added score could lead to termination; a high score, to promotion and a cash bonus. VAM as it was then structured was obviously, glaringly flawed,[1] as anyone with a strong background in educational testing could have seen. Unfortunately, among the many new central office hires from the elite of ed reform circles, none had such a background.

Before posting a request for proposals from commercial test developers for the testing expansion plan, I was instructed to survey two groups of stakeholders—central office managers and school-level teachers and administrators.

Not surprisingly, some of the central office managers consulted requested additions or changes to the proposed testing program where they thought it would benefit their domain of responsibility. The net effect on school-level personnel would have been to add to their administrative burden. Nonetheless, all requests from central office managers would be honored. 

The Grand Tour

At about the same time, over several weeks of the late Spring and early Summer of 2010, along with a bright summer intern, I visited a dozen DCPS schools. The alleged purpose was to collect feedback on the design of the expanded testing program. I enjoyed these meetings. They were informative, animated, and very well attended. School staff appreciated the apparent opportunity to contribute to policy decisions and tried to make the most of it.

Each school greeted us with a full complement of faculty and staff on their days off, numbering a several dozen educators at some venues. They believed what we had told them: that we were in the process of redesigning the DCPS assessment program and were genuinely interested in their suggestions for how best to do it. 

At no venue did we encounter stand-pat knee-jerk rejection of education reform efforts. Some educators were avowed advocates for the Rhee administration’s reform policies, but most were basically dedicated educators determined to do what was best for their community within the current context. 

The Grand Tour was insightful, too. I learned for the first time of certain aspects of DCPS’s assessment system that were essential to consider in its proper design, aspects of which the higher-ups in the DCPS Central Office either were not aware or did not consider relevant. 

The group of visited schools represented DCPS as a whole in appropriate proportions geographically, ethnically, and by education level (i.e., primary, middle, and high). Within those parameters, however, only schools with “friendly” administrations were chosen. That is, we only visited schools with principals and staff openly supportive of the Rhee-Henderson agenda. 

But even they desired changes to the testing program, whether or not it was expanded. Their suggestions covered both the annual districtwide DC-CAS (or “comprehensive” assessment system), on which the teacher evaluation system was based, and the DC-BAS (or “benchmarking” assessment system), a series of four annual “no-stakes” interim tests unique to DCPS, ostensibly offered to help prepare students and teachers for the consequential-for-some-school-staff DC-CAS.[2]

At each staff meeting I asked for a show of hands on several issues of interest that I thought were actionable. Some suggestions for program changes received close to unanimous support. Allow me to describe several.

1. Move DC-CAS test administration later in the school year. Many citizens may have logically assumed that the IMPACT teacher evaluation numbers were calculated from a standard pre-post test schedule, testing a teacher’s students at the beginning of their academic year together and then again at the end. In 2010, however, the DC-CAS was administered in March, three months before school year end. Moreover, that single administration of the test served as both pre- and post-test, posttest for the current school year and pretest for the following school year. Thus, before a teacher even met their new students in late August or early September, almost half of the year for which teachers were judged had already transpired—the three months in the Spring spent with the previous year’s teacher and almost three months of summer vacation. 

School staff recommended pushing DC-CAS administration to later in the school year. Furthermore, they advocated a genuine pre-post-test administration schedule—pre-test the students in late August–early September and post-test them in late-May–early June—to cover a teacher’s actual span of time with the students.

This suggestion was rejected because the test development firm with the DC-CAS contract required three months to score some portions of the test in time for the IMPACT teacher ratings scheduled for early July delivery, before the start of the new school year. Some small number of teachers would be terminated based on their IMPACT scores, so management demanded those scores be available before preparations for the new school year began.[3] The tail wagged the dog.

2. Add some stakes to the DC-CAS in the upper grades. Because DC-CAS test scores portended consequences for teachers but none for students, some students expended little effort on the test. Indeed, extensive research on “no-stakes” (for students) tests reveal that motivation and effort vary by a range of factors including gender, ethnicity, socioeconomic class, the weather, and age. Generally, the older the student, the lower the test-taking effort. This disadvantaged some teachers in the IMPACT ratings for circumstances beyond their control: unlucky student demographics. 

Central office management rejected this suggestion to add even modest stakes to the upper grades’ DC-CAS; no reason given. 

3. Move one of the DC-BAS tests to year end. If management rejected the suggestion to move DC-CAS test administration to the end of the school year, school staff suggested scheduling one of the no-stakes DC-BAS benchmarking tests for late May–early June. As it was, the schedule squeezed all four benchmarking test administrations between early September and mid-February. Moving just one of them to the end of the year would give the following year’s teachers a more recent reading (by more than three months) of their new students’ academic levels and needs.

Central Office management rejected this suggestion probably because the real purpose of the DC-BAS was not to help teachers understand their students’ academic levels and needs, as the following will explain.

4. Change DC-BAS tests so they cover recently taught content. Many DC citizens probably assumed that, like most tests, the DC-BAS interim tests covered recently taught content, such as that covered since the previous test administration. Not so in 2010. The first annual DC-BAS was administered in early September, just after the year’s courses commenced. Moreover, it covered the same content domain—that for the entirety of the school year—as each of the next three DC-BAS tests. 

School staff proposed changing the full-year “comprehensive” content coverage of each DC-BAS test to partial-year “cumulative” coverage, so students would only be tested on what they had been taught prior to each test administration.

This suggestion, too, was rejected. Testing the same full-year comprehensive content domain produced a predictable, flattering score rise. With each DC-BAS test administration, students recognized more of the content, because they had just been exposed to more of it, so average scores predictably rose. With test scores always rising, it looked like student achievement improved steadily each year. Achieving this contrived score increase required testing students on some material to which they had not yet been exposed, both a violation of professional testing standards and a poor method for instilling student confidence. (Of course, it was also less expensive to administer essentially the same test four times a year than to develop four genuinely different tests.)

5. Synchronize the sequencing of curricular content across the District. DCPS management rhetoric circa 2010 attributed classroom-level benefits to the testing program. Teachers would know more about their students’ levels and needs and could also learn from each other. Yet, the only student test results teachers received at the beginning of each school year was half-a-year old, and most of the information they received over the course of four DC-BAS test administrations was based on not-yet-taught content.

As for cross-district teacher cooperation, unfortunately there was no cross-District coordination of common curricular sequences. Each teacher paced their subject matter however they wished and varied topical emphases according to their own personal preference.

It took DCPS’s Chief Academic Officer, Carey Wright, and her chief of staff, Dan Gordon, less than a minute to reject the suggestion to standardize topical sequencing across schools so that teachers could consult with one another in real time. Tallying up the votes: several hundred school-level District educators favored the proposal, two of Rhee’s trusted lieutenants opposed it. It lost.

6. Offer and require a keyboarding course in the early grades. DCPS was planning to convert all its testing from paper-and-pencil mode to computer delivery within a few years. Yet, keyboarding courses were rare in the early grades. Obviously, without systemwide keyboarding training in computer use some students would be at a disadvantage in computer testing.

Suggestion rejected.

In all, I had polled over 500 DCPS school staff. Not only were all of their suggestions reasonable, some were essential in order to comply with professional assessment standards and ethics. 

Nonetheless, back at DCPS’ Central Office, each suggestion was rejected without, to my observation, any serious consideration. The rejecters included Chancellor Rhee, the head of the office of Data and Accountability—the self-titled “Data Lady,” Erin McGoldrick—and the head of the curriculum and instruction division, Carey Wright, and her chief deputy, Dan Gordon. 

Four central office staff outvoted several-hundred school staff (and my recommendations as assessment director). In each case, the changes recommended would have meant some additional work on their parts, but in return for substantial improvements in the testing program. Their rhetoric was all about helping teachers and students; but the facts were that the testing program wasn’t structured to help them.

What was the purpose of my several weeks of school visits and staff polling? To solicit “buy in” from school level staff, not feedback.

Ultimately, the new testing program proposal would incorporate all the new features requested by senior Central Office staff, no matter how burdensome, and not a single feature requested by several hundred supportive school-level staff, no matter how helpful. Like many others, I had hoped that the education reform intention of the Rhee-Henderson years was genuine. DCPS could certainly have benefitted from some genuine reform. 

Alas, much of the activity labelled “reform” was just for show, and for padding resumes. Numerous central office managers would later work for the Bill and Melinda Gates Foundation. Numerous others would work for entities supported by the Gates or aligned foundations, or in jurisdictions such as Louisiana, where ed reformers held political power. Most would be well paid. 

Their genuine accomplishments, or lack thereof, while at DCPS seemed to matter little. What mattered was the appearance of accomplishment and, above all, loyalty to the group. That loyalty required going along to get along: complicity in maintaining the façade of success while withholding any public criticism of or disagreement with other in-group members.

Unfortunately, in the United States what is commonly showcased as education reform is neither a civic enterprise nor a popular movement. Neither parents, the public, nor school-level educators have any direct influence. Rather, at the national level, US education reform is an elite, private club—a small group of tightly-connected politicos and academicsa mutual admiration society dedicated to the career advancement, political influence, and financial benefit of its members, supported by a gaggle of wealthy foundations (e.g., Gates, Walton, Broad, Wallace, Hewlett, Smith-Richardson). 

For over a decade, The Ed Reform Club exploited DC for its own benefit. Local elite formed the DC Public Education Fund (DCPEF) to sponsor education projects, such as IMPACT, which they deemed worthy. In the negotiations between the Washington Teachers’ Union and DCPS concluded in 2010, DCPEF arranged a 3 year grant of $64.5M from the Arnold, Broad, Robertson and Walton Foundations to fund a 5-year retroactive teacher pay raise in return for contract language allowing teacher excessing tied to IMPACT, which Rhee promised would lead to annual student test score increases by 2012. Projected goals were not metfoundation support continued nonetheless.

Michelle Johnson (nee Rhee) now chairs the board of a charter school chain in California and occasionally collects $30,000+ in speaker fees but, otherwise, seems to have deliberately withdrawn from the limelight. Despite contributing her own additional scandalsafter she assumed the DCPS Chancellorship, Kaya Henderson ascended to great fame and glory with a “distinguished professorship” at Georgetown; honorary degrees from Georgetown and Catholic Universities; gigs with the Chan Zuckerberg Initiative, Broad Leadership Academy, and Teach for All; and board memberships with The Aspen Institute, The College Board, Robin Hood NYC, and Teach For America. Carey Wright is now state superintendent in Mississippi. Dan Gordon runs a 30-person consulting firm, Education Counsel that strategically partners with major players in US education policy. The manager of the IMPACT teacher evaluation program, Jason Kamras, now works as Superintendent of the Richmond, VA public schools. 

Arguably the person most directly responsible for the recurring assessment system fiascos of the Rhee-Henderson years, then Chief of Data and Accountability Erin McGoldrick, now specializes in “data innovation” as partner and chief operating officer at an education management consulting firm. Her firm, Kitamba, strategically partners with its own panoply of major players in US education policy. Its list of recent clients includes the DC Public Charter School Board and DCPS.

If the ambitious DC central office folk who gaudily declared themselves leading education reformers were not really, who were the genuine education reformers during the Rhee-Henderson decade of massive upheaval and per-student expenditures three times those in the state of Utah? They were the school principals and staff whose practical suggestions were ignored by central office glitterati. They were whistleblowers like history teacher Erich Martel who had documented DCPS’ student records’ manipulation and phony graduation rates years before the Washington Post’s celebrated investigation of Ballou High School, and was demoted and then “excessed” by Henderson. Or, school principal Adell Cothorne, who spilled the beans on test answer sheet “erasure parties” at Noyes Education Campus and lost her job under Rhee. 

Real reformers with “skin in the game” can’t play it safe.

The author appreciates the helpful comments of Mary Levy and Erich Martel in researching this article. 

Access this testimonial in .pdf format

More on the “false positive” COVID-19 testing problem

I used my cell phone last night to go into the problem of faulty testing for COVID-19, based on a NYT article. As a result, I couldn’t make any nice tables. Let me remedy that and also look at a few more assumptions.

This table summarizes the testing results on a theoretical group of a million Americans tested, assuming that 5% of the population actually has coronavirus antibodies, and that the tests being given have a false negative rate of 10% and a false positive rate of 3%. Reminder: a ‘false negative’ result means that you are told that you don’t have any coronavirus antibodies but you actually do have them, and a ‘false positive’ result means that you are told that you DO have those antibodies, but you really do NOT. I have tried to highlight the numbers of people who get incorrect results in the color red.

Table A

Group Total Error rate Test says they are Positive Test says they are Negative
Actually Positive 50,000 10% 45,000 5,000
Actually Negative 950,000 3% 28,500 921,500
Totals 1,000,000 73,500 926,500
Percent we assume are actually positive 5% Accuracy Rating 61.2% 99.5%

As you can see, using those assumptions, if you get a lab test result that says you are positive, that will only be correct in about 61% of the time. Which means that you need to take another test, or perhaps two more tests, to see whether they agree.

The next table assumes again a true 5% positive result for the population and a false negative rate of 10%, but a false positive rate of 14%.

Table B

Assume 5% really exposed, 14% false positive rate, 10% false negative
Group Total Error rate Test says they are Positive Test says they are Negative
Actually Positive 50,000 10% 45,000 5,000
Actually Negative 950,000 14% 133,000 817,000
Totals 1,000,000 178,000 822,000
Percent we assume are actually positive 5% Accuracy Rating 25.3% 99.4%

Note that in this scenario, if you get a test result that says you are positive, that is only going to be correct one-quarter of the time (25.3%)! That is useless!

Now, let’s assume a lower percentage of the population actually has the COVID-19 antibodies, say, two percent. Here are the results if we assume a 3% false positive rate:

Table C

Assume 2% really exposed, 3% false positive rate, 10% false negative
Group Total Error rate Test says they are Positive Test says they are Negative
Actually Positive 20,000 10% 18,000 2,000
Actually Negative 980,000 3% 29,400 950,600
Totals 1,000,000 47,400 952,600
Percent we assume are actually positive 2% Accuracy Rating 38.0% 99.8%

Notice that in this scenario, if you get a ‘positive’ result, it is likely to be correct only a little better than one-third of the time (38.0%).

And now let’s assume 2% actual exposure, 14% false positive, 10% false negative:

Table D

Assume 2% really exposed, 14% false positive rate, 10% false negative
Group Total Error rate Test says they are Positive Test says they are Negative
Actually Positive 20,000 10% 45,000 2,000
Actually Negative 980,000 14% 137,200 842,800
Totals 1,000,000 182,200 844,800
Percent we assume are actually positive 2% Accuracy Rating 24.7% 99.8%

Once again, the chances of a ‘positive’ test result being accurate is only about one in four (24.7%), which means that this level of accuracy is not going to be useful to the public at large.

Final set of assumptions: 3% actual positive rate, and excellent tests with only 3% false positive and false negative rates:

Table E

Assume 3% really exposed, 3% false positive rate, 3% false negative
Group Total Error rate Test says they are Positive Test says they are Negative
Actually Positive 30,000 3% 45,000 900
Actually Negative 970,000 3% 29,100 940,900
Totals 1,000,000 74,100 941,800
Percent we assume are actually positive 3% Accuracy Rating 60.7% 99.9%

Once again, if you test positive in this scenario, that result is only going to be correct about 3/5 of the time (60.7%).

All is not lost, however. Suppose we re-test all the people who tested positive in this last group (that’s a bit over seventy-four thousand people, in Table E). Here are the results:

Table F

Assume 60.7% really exposed, 3% false positive rate, 3% false negative
Group Total Error rate Test says they are Positive Test says they are Negative
Actually Positive 45,000 3% 43,650 1,350
Actually Negative 29,100 3% 873 28,227
Totals 74,100 44,523 29,577
Percent we assume are actually positive 60.7% Accuracy Rating 98.0% 95.4%

Notice that 98% accuracy rating for positive results! Much better!

What about our earlier scenario, in table B, with a 5% overall exposure rating, 14% false positives, and 10% false negatives — what if we re-test all the folks who tested positive? Here are the results:

Table G

Assume 25.3% really exposed, 14% false positive rate, 10% false negative
Group Total Error rate Test says they are Positive Test says they are Negative
Actually Positive 45,000 14% 38,700 6,300
Actually Negative 133,000 10% 13,300 119,700
Totals 178,000 52,000 126,000
Percent we assume are really positive 25.3% Accuracy Rating 74.4% 95.0%

This is still not very good: the re-test is going to be accurate only about three-quarters of the time (74.4%) that it says you really have been exposed, and would only clear you 95% of the time. So we would need to run yet another test on those who again tested positive in Table G. If we do it, the results are here:

Table H

Assume 74.4% really exposed, 14% false positive rate, 10% false negative
Group Total Error rate Test says they are Positive Test says they are Negative
Actually Positive 38,700 14% 33,282 5,418
Actually Negative 13,300 10% 1,330 11,970
Totals 52,000 34,612 17,388
Percent we assume are really positive 74.4% Accuracy Rating 96.2% 68.8%

This result is much better, but note that this requires THREE TESTS on each of these supposedly positive people to see if they are in fact positive. It also means that if they get a ‘negative’ result, that’s likely to be correct only about 2/3 of the time (68.8%).

So, no wonder that a lot of the testing results we are seeing are difficult to interpret! This is why science requires repeated measurements to separate the truth from fiction! And it also explains some of the snafus committed by our current federal leadership in insisting on not using tests offered from abroad.

 

============

EDIT at 10:30 pm on 4/25/2020: I found a few minor mistakes and corrected them, and tried to format things more clearly.

PISA shows great US education progress under Common Core, charter proliferation, reforms. (JUST KIDDING!)

If there is anything that the recent PISA results show, it’s that the promises by David Coleman, Bill Gates, Michelle Rhee, Betsy Devos, Arne Duncan, Barack Obama, and others of tremendous achievement increases and closing socioeconomic gaps with their ‘reforms’ were completely unfilled. I am copying and pasting here how American students have done on the PISA, a test given in many, many countries, since 2006. There have been tiny changes over the past dozen years in the scores of American students in reading, math, and science, but virtually none have been statistically significant, according to the statisticians who compiled and published the data.

Then again, nearly any classroom teacher you talked to over the past decade or two of educational ‘reforms’ in American classrooms could have told you why and how it was bound to fail.

Look for yourself:

PISA results through 2018

 

Source: https://www.oecd.org/pisa/publications/PISA2018_CN_USA.pdf

 

EDIT: I meant David Coleman the educational reform huckster, not Gary Coleman the actor!

 

More Educational Miracles (Not!)

I have prepared charts and graphs for 8th grade NAEP average scale scores for black, hispanic, and white students in various jurisdictions: the entire nation; all large cities; Washington DC; Florida, Michigan; and Mississippi.

You will see that there was a general upwards trend in math from about 1992 to roughly 2007 or 2009, but the scores have mostly leveled off during the last decade. I included Michigan, since that is the state where current Education Secretary Betsy DeVos has had the mo$t per$sonal influence, but that influence doesn’t look to be positive.

While it’s good that DC’s black students no longer score the lowest in the nation (that would be Michigan – see the first graph), there is another feature of my fair city: very high-performing white students (generally with affluent, well-educated parents) in its unfortunately rather segregated public schools, as you can see in the last graph. Naep 8th grade math, black students, various placesnaep math, hispanic, 8th grade, various places

naep 8th grade math, white students, various places

Not So Fast, Betsy DeVos!

I attended the official roll-out of the results of the 2019 National Assessment of Educational Progress (NAEP) a couple of days ago at the National Press Club here in DC on 14th Street NW, and listened to the current education secretary, Betsy Devos, slam public schools and their administrators as having accomplished nothing while spending tons of money. She and other speakers held up DC, Mississippi, and Florida as examples to follow. Devos basically advocated abandoning public schools altogether, in favor of giving each parent a “backpack full of cash” to do whatever they want with.

Some other education activists I know here in DC shared their thoughts with me, and I decided to look at the results for DC’s white, black, and Hispanic students over time as reported on the NAEP’s official site. (You can find them here, but be prepared to do quite a bit of work to get them and make sense out of them!)

I found that it is true that DC’s recent increases in scores on the NAEP for all students, and for black and Hispanic students, are higher than in other jurisdictions.

However, I also found that those increases were happening at a HIGHER rate BEFORE DC’s mayor was given total control of DC’s public schools; BEFORE the appointment of Michelle Rhee; and BEFORE the massive DC expansion of charter schools.

Here are two graphs (which I think show a lot more than a table does) which give ‘average scale scores’ for black students in math at grades 4 and 8 in DC, in all large US cities, and in the nation as a whole. I have drawn a vertical red line at the year 2008, separating the era before mayoral control of schools (when we had an elected school board) and the era afterwards (starting with appointed chancellor Michelle Rhee and including a massive expansion of the charter school sector). These results include both regular DC Public School students and the charter school sector, but not the private schools.

I asked Excel to produce linear correlations of the average scale scores for black students in DC starting in 1996 through 2007, and also for 2009 through 2019. It wasn’t obvious to my naked eye, but the improvement rates, or slopes of those lines, were TWICE AS HIGH before mayoral control. At the 4th grade level, the improvement rate was 2.69 points per year BEFORE mayoral control, but only 1.34 points per year afterwards.

Yes, that is a two-to-one ratio AGAINST mayoral control & massive charter expansion.

At the 8th grade level, same time span, the slope was 1.53 points per year before mayoral control, but 0.77 points per year afterwards.

Again, just about exactly a two-to-one ratio AGAINST the status quo that we have today.

pre and post Rhee, 4th grade NAEP, black students in DC, nation, large cities

pre and post Rhee, 8th grade NAEP, black students in DC, large cities, and nation

Charter schools do NOT get better NAEP test results than regular public schools

It is not easy to find comparisons between charter schools and regular public schools, partly because the charter schools are not required to be nearly as transparent or accountable as regular public schools. (Not in their finances, nor in requests for public records, nor for student or teacher disciplinary data, and much more.) At the state or district level, it has in the past been hard or impossible to find comparative data on the NAEP (National Assessment of Educational Progress).

We all have heard the propaganda that charter and voucher schools are so much better than regular public schools, because they supposedly get superior test scores and aren’t under the thumb of  those imaginary ‘teacher union thugs’.

However, NCES has released results where they actually do this comparison. Guess what: there is next to no difference between the scores of all US charter schools on the NAEP in both reading and math at either the 4th grade or 8th grade level! In fact, at the 12th grade, regular public schools seem to outscore the charter schools by a significant margin.

Take a look at the two graphs below, which I copied and pasted from the NCES website. The only change I made was to paint orange for the bar representing the charter schools. Note that there is no data available for private schools as a whole.

public vs charter vs catholic, naep, math

If you aren’t good at reading graphs, the one above says that on a 500-point scale, in 2017 (which was the last year for which we have results), at the 4th grade, regular public school students scored an average of 239 points in math, three points higher than charter school students (probably not a significant difference). At the 8th grade level, the two groups scored identically: 282 points. At the 12th grade, in 2015, regular public school students outscored charter school students by a score of 150 to 133 on a 300-point scale (I suspect that difference IS statistically significant). We have no results from private schools, but Catholic schools do have higher scores than the public or charter schools.

The next graph is for reading. At the 4th grade, charter school students in 2017 outscored regular public school students by a totally-insignificant 1 point (222 to 221 on a 500 point scale) and the same thing happened at the 8th grade level (266 to 265 on a 500 point scale). However, at the 12th grade, the regular public school students outscore their charter school counterparts by a score of 285 to 269, which I bet is significant.

charter vs public vs catholic, naep, reading, 2017

 

 

“And forgive our debts, as we forgive those who owe us!”

The title of this post might remind you of part of the so-called Lord’s Prayer, which in English is usually rendered “And forgive us our trespasses, as we forgive those who have trespassed against us.”

This sounds like forgiving sins, but in Latin, which I studied for about six years, the prayer is really about forgiving debts:

“et dimitte nobis debita nostra sicut et nos dimittimus debitoribus nostris”

I don’t know enough Greek to be able to comment on the original meaning of the words as apparently written down in the New Testament in that language, but it is generally accepted that Jesus (if he really existed) spoke Aramaic – but only a few of his (alleged) words were recorded in that language, since the entire NT was written in Greek, not in Hebrew or Latin, and definitely not in English!

The following book makes the argument that forgiving debts, wholesale, was essential if you wanted to avoid stratification of society into a class of oligarchs and a class of everybody else, who were essentially little better than slaves. They make the point that compounded interest grows exponentially and without limit, but economic growth does NOT: it follows a logistic curve at best, which means that there are certain limits.

For example, while bacteria growing in a petri dish appear to grow exponentially for some hours, perhaps for a few days, eventually, there is no more uncontaminated agar for the bacteria to eat, and they start drowning in their own waste products. So despite what one learns in most Algebra classes (including my own), bacterial growth is in actually logistic, not exponential. However, unless debt is periodically forgiven – which seldom if ever happens these days – the debtors end up drowning in debt, as you might be able to discern from this little graph I made:

logistic versus exponential growth

I haven’t read the book, but the review is most interesting. Here is a quote:

Nowhere, Hudson shows, is it more evident that we are blinded by a deracinated, by a decontextualizedunderstanding of our history than in our ignorance of the career of Jesus. Hence the title of the book: And Forgive Them Their Debts and the cover illustration of Jesus flogging the moneylenders — the creditors who do not forgive debts — in the Temple. For centuries English-speakers have recited the Lord’s Prayer with the assumption that they were merely asking for the forgiveness of their trespasses, their theological sins: “… and forgive us our trespasses, as we forgive those who trespass against us….” is the translation presented in the Revised Standard Version of the Bible. What is lost in translation is the fact that Jesus came “to preach the gospel to the poor … to preach the acceptable Year of the Lord”: He came, that is, to proclaim a Jubilee Year, a restoration of deror for debtors: He came to institute a Clean Slate Amnesty (which is what Hebrew דְּרוֹר connotes in this context).

So consider the passage from the Lord’s Prayer literally: … καὶ ἄφες ἡμῖν τὰ ὀφειλήματα ἡμῶν: “… and send away (ἄφες) for us our debts (ὀφειλήματα).” The Latin translation is not only grammatically identical to the Greek, but also shows the Greek word ὀφειλήματα revealingly translated as debita: … et dimitte nobis debita nostra: “… and discharge (dimitte) for us our debts (debita).” There was consequently, on the part of the creditor class, a most pressing and practical reason to have Jesus put to death: He was demanding that they restore the property they had rapaciously taken from their debtors. And after His death there was likewise a most pressing and practical reason to have His Jubilee proclamation of a Clean Slate Amnesty made toothless, that is to say, made merely theological: So the rich could continue to oppress the poor, forever and ever. Amen.

The Math Teacher’s Job is Neither to Teach the Lesson, Nor to Help Individual Students Who are Struggling!

….but rather, to prepare a lesson from which ALL the students can learn!

… according to the way that Japanese math teachers are taught their craft, as described below. You will find that these methods, which include Lesson Study, are pretty much the exact opposite of American “Direct Instruction” or “Teaching Like A Champion.”  Given that nobody claims that Japanese students lag behind American ones in math or science, perhaps we in the US could profit from examining how other nations’ teachers do it. Note also that this description is of mathematics lessons in elementary school, not middle or high school.

Please read the following description and leave comments on what you think.

*************************************
From Tom McDougal. Lesson Study Alliance, Chicago [and brought to my attention by Jerry Becker. – GFB]
*************************************
It’s not the teacher’s job to teach the students!

By Tom McDougal

What?? You might be thinking. What else could the teacher’s job be but to teach?

The teacher’s job is to ensure that students learn, all of them, we hope, though we know we will usually fall short.

In Japan, most (elementary) math lessons are designed as  “teaching through problem solving” lessons (TtP). A teaching through problem solving lesson typically includes the following parts:

 
1.  introduce the problem
2.  explicitly pose the task for students
3.  students work on the task (5-10 minutes)
4.  share student ideas
5.  compare and discuss the ideas for the purpose of learning new mathematics
6.  summarize major points from the lesson
7.  student reflections

(There is sometimes overlap, and a back-and-forth between some of these, e.g. #4 & #5 may be combined.)

While students are working on the task (#3), the teacher walks around the room, monitoring their progress. Japanese educators have a term for this, kikkan shido, or  “providing] guidance between the desks.” They recognize that there are different ways to do kikkan shido, and it is often a subject of discussion in Lesson Study. During planning, for example, a team will usually discuss how – or whether  – the teacher should respond to a student who exhibits a particular misconception; during the post-lesson discussion, there may be argument about whether the kikkan shido was effective. And, it is considered a skill that new teachers need to develop.

Teachers who are inexperienced with TtP lessons often make an unfortunate error while doing kikkan shido: they see a student who is struggling, or who has done something wrong, and they stop and help that student. After several minutes the teacher moves on, encounters another student who is having trouble, helps that student, and so on. Then, suddenly, time is up, and the lesson ends.

There are at least four important drawbacks to this type of kikkan shido. First, as my description suggests, it uses up a lot of time. The teacher may never get around to all of the students, and other students who need help may never get it. Second, by addressing misconceptions privately rather than publicly, the teacher deprives other students of the opportunity to analyze those misconceptions and learn why they are incorrect. Any experienced teacher knows that certain misconceptions are very common, so when one student makes an error that stems from a common misconception, that offers an opportunity to “inoculate” other students against making the same error sometime later.

The third problem with tutoring students individually is that it conflicts with the whole premise of teaching through problem solving. You expect that some, or even all, of the students will have difficulty with the task; that’s why it’s called “problem solving” and not “practice.” Teaching through problem solving involves an expectation that students will have difficulty, but that the comparison and discussion phase will address their difficulties and that, by the end of the lesson, all (or almost all) of the students will have learned what they need to know.

And fourth, we want to help students learn to give viable arguments and to critique the reasoning of others, the third Standard for Mathematical Practice in the Common Core State Standards. To accomplish this, we need for students to share and discuss different, perhaps conflicting solutions. Students need to do the critiquing, not the teacher.

 
Of course, some errors are simply the result of sloppiness, or otherwise unrelated to the main learning goals of the lesson. So when the teacher sees an error while conducting kikkan shido, he or she has to decide: should this be addressed privately or publicly? What should I say to this student? Do I expect that, by the end of the lesson, this student will understand what he or she has done wrong? This is a tricky decision, and an important part of lesson planning is anticipating different student responses, correct and incorrect, and deciding ahead of time how to handle them.

Caring teachers naturally feel drawn to help struggling students: they feel like it is their duty to help those students right now. To counteract that impulse, I say, bluntly:

It is not the teacher’s job to teach the students. It’s the teacher’s job to create a lesson that teaches the students.

 

DC’s Black, Hispanic and White Students Progress on the NAEP Under Mayoral Control and Before – 8th Grade Reading

8th grade naep reading, DC, B + W + H

We are looking at the average scale scores for 8th grade black, Hispanic, and white students in DC on the NAEP reading tests over the past two decades. Ten years ago, Washington DC made the transition from a popularly-elected school board to direct mayoral control of the school system. Michelle Rhee and Kaya Henderson, our first and second Chancellors under the new system, promised some pretty amazing gains if they were given all that power and many millions of dollars from the Walton, Arnold, and Broad foundations, and I showed that almost none of their promises worked out.

In the graph above, the vertical, dashed, green line shows when mayoral control was imposed, shortly after the end of school in 2007, so it marks a convenient end-point for school board control and a baseline for measuring the effects of mayoral control.

For 8th grade black students in reading in DC, their average scale scores went from 233 in 1998 to 238 in 2007, under the elected school board, which is a (very small) rise of 5 points in 9 years, or about 0.6 points per year. Under mayoral control, their scores went from 238 to 240, which is an even tinier increase of 2 points in 10 years, or 0.2 points per year.

Worse, not better.

For the Hispanic students, scores only increased from 246 to 249 before we had chancellors, or 3 points in 9 years, or about 0.3 points per year. After mayoral control, their scores went DOWN from 249 to 242 in 10 years, or a decrease of 0.7 points per year.

Again, worse, not better: going in the wrong direction entirely.

For white DC 8th graders, it’s not possible to make the same types of comparisons, because there were not sufficient numbers of white eighth-grade students in DC taking the test during five of the last ten test administrations for the NCES statisticians to give reliable results. However, we do know that in 2005 (pre-mayoral control) white 8th graders in DC scored 301 points. And since the mayors and the chancellors took over direct control of education in DC, not once have white students scored that high.

Again, worse, not better.

Why do we keep doing the same things that keep making things worse?

==============================================

My previous posts on this topic:

  1. https://gfbrandenburg.wordpress.com/2018/04/20/comparing-dcs-4th-grade-white-black-and-hispanic-students-in-the-math-naep/
  2. https://gfbrandenburg.wordpress.com/2018/04/17/the-one-area-where-some-dc-students-improved-under-mayoral-control-of-education/
  3. https://gfbrandenburg.wordpress.com/2018/04/17/more-flat-lines-4th-grade-reading-for-hispanic-and-white-students-dc-and-nationwide/
  4. https://gfbrandenburg.wordpress.com/2018/04/17/one-area-with-a-bit-of-improvement-4th-grade-math-for-black-students-on-the-naep/
  5. https://gfbrandenburg.wordpress.com/2018/04/16/was-there-any-progress-in-8th-grade-math-on-the-naep-in-dc-or-elsewhere/
  6. https://gfbrandenburg.wordpress.com/2018/04/16/progress-perhaps-with-8th-grade-white-students-in-dc-on-naep-after-mayoral-control/
  7. https://gfbrandenburg.wordpress.com/2018/04/16/maybe-there-was-progress-with-hispanic-students-in-dc-and-elsewhere/
  8. https://gfbrandenburg.wordpress.com/2018/04/16/just-how-much-success-has-there-been-with-the-reformista-drive-to-improve-scores-over-the-past-20-years/

 

%d bloggers like this: