No one knows what it’s like to be the bad man

November 23, 2012

(This article was originally published at Statistical Modeling, Causal Inference, and Social Science, and syndicated at StatsBlogs.)

Part 1. The ideal policy

Basbøll, as always, gets right to the point:

Andrew Gelman is not the plagiarism police because there is no such thing as the plagiarism police.

But, he continues:

There is, at any self-respecting university and any self-respecting academic journal, a plagiarism policy, and there sure as hell is a “morality” of writing in the world of scholarship. The cardinal rule is: don’t use other people’s words or ideas without attributing those words or ideas to the people you got them from.

What to do when the plagiarism (or, perhaps, sloppy quotation, to use a less loaded word) comes to light?

Everyone makes mistakes, but if you make one you have to correct it. Don’t explain why your mistake isn’t very serious or “set things right” by pointing to the “obvious” signs of your good intentions. . . . Don’t say you’ve cleared it with the original author. The real victim of your crime is not the other writer; it’s your reader. That’s whose trust you’ve betrayed.

But what if the copying-without-full-attribution is indeed mere sloppiness, an honest mistake? The correction should still be made:

Nobody said anything about motive. All they did was to point out an error of scholarship that needs to be fixed.

2. The example

Basbøll’s discussion arose out of a case I posted the other day, not an example I was personally aware of but an instance of minor plagiarism that someone sent in to me in unofficial role as “not the plagiarism police.” My post elicited a range of comments, with some people saying that the published paper should be corrected to make the sourcing more clear, and others saying that the copying was no big deal. Hence Basbøll’s argument that the requirement of clarification should not be contingent on intention. If I break a dish, I should replace it, even if I knocked it over by accident.

Several hours after my entry had been posted, I received a friendly email from one of the authors of the article:

Just want to set things right: our paper is an experimental test of the JPE modeling paper. It is based on that paper; we cite the JPE paper 8 (!) times in our paper, comparing our results to their prediction. Our paper is an empirical test of the JPE model. As we clearly state in the paper, the motivating economic issue and literature are the same. This is why we cite them 8 times! Our paper also benefited from comments (prior to publication) from the authors of the JPE paper.

I reacted to this by appending the above paragraph to my blog post (with attribution, of course!) and adding:

Given this, it sounds like credit was given fairly. And so I would change my above “it does seem a bit tacky” sentence to: “It seems like a mistake to not use quotation marks, even in a case such as this where the work is clearly labeled as following up from an earlier paper.”

I did not say that the unsourced quotation was not plagiarism, but I felt that it was not such a problem, hence I felt that the email I’d received had cleared things up. That said, just as I am not the plagiarism police, I’m not the plagiarism arbiter, and me saying it’s ok doesn’t make it ok. Basbøll has a good point above, and I like that he detaches the consequences and the remedy from the intentions.

Perhaps one problem is the analogy of plagiarism to theft. It’s hard to steal something by accident. Even when somebody “takes” something by accident (“hey, that bag was just sitting there on that bench!”), it’s likely he had an idea there was an owner. Instead, maybe we should analogize plagiarism to breaking a dish, which it’s possible to do by accident or even unknowingly via sloppy behavior (I say this as someone who is sloppy and breaks things sometimes). If I break a dish, I feel bad and I apologize, but I don’t feel like a bad person, I just see it as part of the cost of doing business, given my general level of obliviousness.

3. The problem

So where’s the difficulty? When I received that email, I felt bad, in that I had singled out these authors (although not mentioning them by name, but that was not hard to find via google). I posted a correction right away. (Much of this was due to the cordial nature of that email. Several months ago I had an unrelated instance of a blog post that annoyed someone who sent me an obnoxious cease-and-desist style email. That time I had no particular desire to make a correction, in fact I was tempted to post the entire letter online, but then I thought better of it.)

Here’s the deal (for me): if someone clearly seems to be acting like a “bad guy,” I have no problem slamming him. For example, Mark Hauser didn’t share his data, then denied and denied, even when his own lab assistants were telling their stories. Frank Fischer threatened legal action against the student who discovered his plagiarism. Ed Wegman did a Chris Rock and denied even after the evidence was obvious, and he also threw a former student under the bus, even though there were several instances of plagiarism in Wegman-authored publications not involving that student. Karl Weick, when is plagiarism was uncovered, bobbed and weaved, got cute, and never apologized or explained.

But once there’s some ambiguity and I must dismount from my moral high horse, it gets tougher.

In the example discussed above the authors seem reasonable (hey, one of them sent me a nice email), and after all they did cite the earlier article 8 times. If someone wrote a paper in a high-profile journal to replicate one of my papers and they cited me so clearly, I wouldn’t mind if they copied some lines from my exposition. (As Basbøll notes, it’s not just about the authors of the original and new papers, the scholarly community is also involved. So I’m not saying that the copying-without-full-sourcing is OK, just that it doesn’t seem bad in the way of some of those other cases.)

Or consider Bruno Frey, the self-plagiarist, whom I actually met once (when we were introduced, he said, “Gelman—you wrote that zombies paper,” which of course made me happy). He demonstrated the strong form of Arrow’s theorem by publishing the same article in 5 different journals. That’s not cool; on the other hand, it was more of a waste of people’s time than anything else. Self-plagiarism isn’t as bad as real plagiarism, and Frey did in fact apologized (and it was a real apology, not a non-apology apology of the form, “I’m sorry there was something I did that led to a perception of wrongdoing” etc). Sure, talk is cheap, but the point is that many people who are caught don’t apologize. Apology is an important step because it establishes the principle that this is something that shouldn’t be done. Anyway, the Frey case seemed ambiguous to me so I wasn’t so comfortable with mocking him.

Here’s another example. Carl Blackman is a scientist who, many years ago, published a weak statistical analysis and then refused to share his data. This annoyed me enough that I used it as the central example in my first Chance column on ethics. When he and his colleague Dennis House saw my article, they were annoyed and wrote letters to the editor (which I posted and discussed here). Although I did not back down from my claim that their refusal to share data was unethical, I did feel a little bad about the situation: my impression now is that Blackman and House were in over their head, and that they did not perceive any general duty to share data with other researchers, even though they worked at a government agency. Hence I felt the larger problem was not with these particular people but rather with the lack of general agreement that researchers have a duty to share data and methods where possible. I did think Blackman should’ve apologized (if for no other reason than that the progress of science in his subfield would have been advanced by a better published analysis of his data) but I can understand that people find it difficult to apologize after what they perceive as an unprovoked attack.

Just as with tabloid-headline crimes (or, more recently, with outrage-of-the-week bloggers), it’s much more fun to write about these cases when there’s a clear and unambiguous bad guy. Once you start sympathizing with the perp, it’s harder to keep tasing and clubbing. That’s why Basbøll’s attitude, harsh and unforgiving as it might seem, is probably the right approach, to separate the act from the motivation.

P.S. Is there a term for reverse plagiarism, where you write something and attribute it to someone else? When the “someone else” is not a real person, this is called “sock puppetry” (some notorious examples in the blogosphere include Mary Rosh, Dilbert, and that cranky middle-aged guy from the New Republic who doesn’t like bloggers). But what if, for example, you edit a wikipedia entry and then cite wikipedia? I don’t think this behavior is unethical but it does involve a bit of misdirection. One might call it “text laundering.”

P.P.S. This is silly, I know, but . . . I get such a satisfying feeling of power every time I hit option-o to type the ø in Basbøll. I should program such simple key combinations for à, è, and é as well; those guys really slow me down.

Please comment on the article here: Statistical Modeling, Causal Inference, and Social Science

Tags: , , ,