When the conservative authors Christopher Rufo and Christopher Brunet accused Harvard’s Claudine Gay last month of having committed plagiarism in her dissertation, they were clearly motivated by a culture-war opportunity. Gay, the school’s first Black president—and, for some critics, an avatar of the identity-politics bureaucracy on college campuses—had just flubbed testimony before Congress about anti-Semitism on campus. She was already under pressure to resign. Evidence of scholarly misconduct was just the parsley decorating an anti-wokeness blue-plate special.
But soon enough, the integrity of Gay’s research became the central issue in a scandal that appears to have led to her resignation on Tuesday. It turned out that the New York Post had gone to Harvard in October with separate allegations of plagiarism in her published articles; and then, earlier this week, still more examples were produced. “My critics found instances in my academic writings where some material duplicated other scholars’ language, without proper attribution,” Gay wrote in a New York Times op-ed, shortly after she’d stepped down. She acknowledged having made “citation errors,” and has in recent weeks requested a handful of formal corrections to published works. Still, she avowed in her op-ed, “I have never misrepresented my research findings, nor have I ever claimed credit for the research of others.”
I haven’t either—at least as far as I know. For the past couple of decades, I’ve been a professor at elite research universities; I’ve published 150 or so scholarly articles and conference papers, and 10 books. Might any of these contain the sort of improprieties that led to a university president’s downfall? I felt sure the answer was no, but the question lingered in my mind and was echoed in the claims of the other academics who have lately rushed to Gay’s defense. Some people argued that her citation practices were not egregious or even that they represent business as usual. “If that’s going to count as plagiarism,” one professor wrote, “all writers are vulnerable to it, and anyone who writes anything controversial can expect to suffer for it.” If all writers were vulnerable, was I?
A version of this question lies at the core of many disagreements over Gay’s departure. Does her now-acknowledged sloppiness really stand out among her peers? What would happen if the same degree of scrutiny were applied to the work of any other scholar? In short: Is the baseline rate of these transgressions in academia high or low?
I had no idea. So, as a simple experiment, I decided to launch a targeted plagiarism investigation of myself to see if similar scrutiny of my dissertation, performed for no good reason, could deliver similar results. Perhaps I, too, am guilty of some carelessness that might be taken—maybe out of context, perhaps in bad faith—as a sign of scholarly malfeasance. I promised my editor ahead of time that I’d come clean about whatever I found, reporting any misdeeds to my university’s research-integrity office and facing applicable consequences.
I’ve had a comfortable, 20-year career in academia; perhaps this would be the end of it.
How to do it? The instances of copying in Claudine Gay’s dissertation that I’ve seen are not the kind that jump right out at you, but they are near-direct quotations of other scholars’ work, presented in the form of paraphrases. Brunet and Rufo appear to have reviewed her roughly 200-page text systematically, and I wanted to hew as close to their methods as possible. When I reached out to ask how they’d performed their analysis, Brunet said “No comment” and Rufo didn’t answer. (Isabel Vincent, the Post reporter who had received separate plagiarism allegations from an anonymous source in October, also declined to offer any details.)
I suspected that the probe had been carried out using one of the several plagiarism-detection software packages that are now available for private use. Jonathan Bailey, a copyright and plagiarism consultant who also runs the plagiarism-news website Plagiarism Today, told me that the analysis of Gay’s dissertation is likely to have been carried out with iThenticate, an online service run by the same company that operates the popular student-oriented plagiarism detector Turnitin. “When dealing with cases of research integrity, the best tool is iThenticate,” he said. Turnitin has cooperative agreements with academic publishers, which allows the software to check a document for text shared with sources that would otherwise be hidden behind paywalls or in library archives. “It’s a pricey tool, but in this space, it’s easily the best one out there,” Bailey added. (Turnitin didn’t respond when I asked whether iThenticate might have been used to investigate Gay’s work.)
On December 29, I downloaded my thesis from the institutional repository at UCLA, where I had earned my doctorate, signed up for an iThenticate account, and arranged for The Atlantic to pay the standard rate of $300 to analyze my dissertation’s 68,038 words.
Then I started to wonder what the hell I was doing. I had fairly strong confidence in the integrity of my work. My dissertation is about how to do cultural criticism of computational works such as software, simulations, and video games—a topic that was novel enough in 2004, when I filed it, that there wasn’t a ton of material for me to copy even if I’d wanted to. But other factors worked against me. Like Gay, who submitted her dissertation in 1997, I wrote mine during a period when computers were commonplace but the scholarly literature wasn’t yet easily searchable. That made it easier for acts of plagiarism, whether intended or not, to go unnoticed. Was it really worth risking my career to overturn those rocks?
On the principle that only a coward hides from the truth, I pressed the “Upload” button on the iThenticate website, waited for the progress bar to fill, then closed my laptop. When I came back for my report the next day, it felt a little like calling up my doctor’s office for the news, possibly bad, about whatever test they had run on my aging, mortal body. I took a breath and clicked to see my result.
It was 74. Was I a plagiarist? This, apparently, was my answer. Plagiarism isn’t normally summed up as a number, so I didn’t know quite how to respond. It seemed plausible that 74 might be a good score. Turns out it wasn’t: The number describes what percentage of a document’s material is similar to text from its database of reference works. My result—my 74—suggested that three-quarters of my dissertation had been copied from other sources. “What the heck?” I said aloud, except I didn’t say heck.
This seemed wrong to me. I was there when I wrote the thing, and I’d have remembered copying seven out of every 10 words from other sources, even 20 years later. Turns out it was wrong. I wrote the dissertation from 2002 to 2004, and the plagiarism software checks a work against whatever it finds—even if the compared text was published later. As Bailey told me, “iThenticate doesn’t detect plagiarism. It detects copied or similar text.” From there, Bailey said, “You have to do a lot of manual work.”
So I started doing the manual work.
The first, most obvious source of my plagiarism score was the fact that I’d subsequently published a book based on my dissertation (a common practice in academia), which itself appeared in many forms throughout the iThenticate database. In other words, the software suggested that I’d plagiarized my dissertation from a future version of myself. But to confirm each of these false-positives, a gumshoe plagiarism sleuth like myself has to go through the report and click on each allegedly copied source individually.
Once I’d excluded the literal copies of (and commentaries upon) my own work from the analysis, my similarity index dropped to 26 percent. Phew! But iThenticate still listed 288 possible sources of copying. Exonerating myself was going to take a while.
I noticed that a lot of the matches were citations of other books, articles, or materials. iThenticate has a checkbox to “Exclude bibliography,” so I ticked it. Now my score was down to 23. Other matches were literal quotes, which I had quoted with footnotes to their sources. Ticking another checkbox, “Exclude quotes,” brought my similarity index to 9.
Most of the remaining matches were boilerplate chaff. The institutional-archive copy of my dissertation had added a line to the footer of each page, “Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.” iThenticate had matched a dozen or more other dissertations with the same notice, including “Pathogenesis of Bartonella Henselae in the Domestic Cat” and “Hyperdeprivation and Race-Specific Homicide, 1980–1990.” Laboriously excluding those and similar materials left me with 87 potential instances of plagiarism, and a similarity index of 3.
I carefully reviewed the matches that remained. Some were just citations of my work. Others were appropriately footnoted quotations that I’d used, but that iThenticate hadn’t construed as such because they were indented in the text. I also had to click through titles or other proper names that were showing up as copied phrases. Bibliographic citations that the filter hadn’t caught came up too. So did a lot of textual noise—phrases such as to preserve the, which appeared in similar patterns across unrelated materials.
After a couple of hours of work, I still had 60 individual entries to review, each requiring precision mousing to assess and exclude. Determined to see if I’d copied any original work according to the software, I persisted—after all, some of the instances of plagiarism that had sunk Claudine Gay were measured in the tens of words. But not one single match that iThenticate had found amounted to illegitimate copying. In the end, my dissertation’s fraud factor had dropped from 74 percent to zero.
The story I’ve told above has been fact-checked by The Atlantic, although the checking did not replicate the several hours of manual verification. And I realize that on some level I’m just asking you to trust me when I report that the work I analyzed does not include uncited text from other authors. I can only hope the same is true of all my other published research.
Does this imply that Gay’s record is unusual among professors? Not in and of itself. Her field of quantitative social science may have different standards for textual reference. The sciences are more concerned with the originality of research findings than the descriptions of experiments. But it does at least refute the case that this was nothing more than academic jaywalking, or, in its purest straw-man form, that everybody does it.
But even if there’s substance to this Harvard scandal, I’m more afraid of what it may portend. The result of my experiment brought me no relief; only a new anxiety. The very ease of the self-investigation, conducted at a relatively modest cost with the help of powerful technology, hints at how a full-bore plagiarism war could end up playing out. In her New York Times op-ed, Gay admitted that she’d been wrong to copy text without attribution. She also characterized the campaign against her as a part of a coordinated attempt to undermine educational institutions and their leaders. On both counts, she was right.
More attacks like this are sure to follow. After Gay resigned, Rufo announced that he would contribute $10,000 to a “‘plagiarism hunting’ fund” meant to “expose rot” and “restore truth.” That’s enough dough to test a few dozen dissertations or a few hundred articles with iThenticate, and their authors wouldn’t be able to dismiss the findings solely as the product of “bad faith.” I suppose that’s good news for companies such as Turnitin. (Academics may be getting their just deserts for subjecting students to constant surveillance with the company’s student-focused plagiarism-detection software.)
If a plagiarism war does break out, I suspect that universities and their leaders will end up fighting it defensively, with bureaucratic weapons directed inward. “If I were a school looking to appoint a new president,” Bailey told me, “I’d consider doing this kind of analysis before doing so.” To run standard plagiarism checks on top brass may end up seeming reasonable, but with that policy in place, what’s to stop beleaguered and embattled administrators from insisting on the same—best practices!—before any faculty hire or award of tenure? Academic publishers could demand iThenticate-style checks on all submissions. Legislatures could demand plagiarism-assessment reports from state colleges, with a special focus on fields that are purportedly “woke.”
Plagiarism assessment, with automated accusations and manual rebuttals, could become a way of life, a necessary evil brought about by, yes, the bad actors who seek to undermine educational institutions and their leaders. That isn’t likely to improve academic work, but it would certainly make higher education worse.