A remarkable new “sting” of the “diet research-media complex” was just revealed. It tells us little we didn’t already know and has potentially caused a fair amount of damage, spread across millions of people. It does, however, offer an opportunity to explore the importance of prospective group review of non-consensual human subjects research—and the limits of IRBs applying the Common Rule in serving that function in contexts like this.
Journalist John Bohannon, two German reporters, a doctor and a statistician recruited 16 German subjects through Facebook into a three-week randomized controlled trial of diet and weight loss. One-third were told to follow a low-carb diet, one-third were told to cut carbs but add 1.5 ounces of dark chocolate (about 230 calories) per day, and one-third served as control subjects and were told to make no changes to their current diet. They were all given questionnaires and blood tests in advance to ensure they didn’t have diabetes, eating disorders, or other conditions that would make the study dangerous for them, and these tests were repeated after the study. They were each paid 150 Euros (~$163) for their trouble.
But it turns out that Bohannon, the good doctor (who had written a book about dietary pseudoscience), and their colleagues were not at all interested in studying diet. Instead, they wanted to show how easy it is for bad science to be published and reported by the media. The design of the diet trial was deliberately poor. It involved only a handful of subjects, had a poor balance of age and of men and women, and so on. But, through the magic of p-hacking, they managed several statistically significant results: eating chocolate accelerates weight loss and leads to healthier cholesterol levels and increased well-being.They wrote up the results, submitted the manuscript to twenty predatory journals known to publish basically anything for a price, picked one of the “multiple” ones that accepted it, and wrote a check. (The journal that published the paper now says it did so “by mistake” and has removed it from its website, but you can read it here.) Then they cooked up a press release, sent it out to numerous journalists and media outlets, and waited. Several (of various quality, focus, and circulation) took the bait and passed on to their readers the good news that adding chocolate to a “strict” low-carb diet would “accelerate weight loss.” As Bohannon brags:
It made the front page of Bild, Europe’s largest daily newspaper, just beneath their update about the Germanwings crash. From there, it ricocheted around the internet and beyond, making news in more than 20 countries and half a dozen languages. It was discussed on television news shows. It appeared in glossy print, most recently in the June issue of Shape magazine (“Why You Must Eat Chocolate Daily”, page 128).
Then, yesterday, Bohannon exposed his own “con” in io9, an online science and technology magazine, under a headline that blared: I Fooled Millions Into Thinking Chocolate Helps Weight Loss.
Was the sting ethical? In answering that question, it’s helpful to separate two discussions: whether the “protocol” (here, the diet trial protocol plus the rest of the scheme to get the results published and reported) was substantively ethical (including questions about informed consent, debriefing, appropriate risk-benefit ratio, and risk minimization) and whether there was appropriate process (typically, IRB review) before that protocol went into effect.
Bohannon has confirmed that he did not submit any piece of this to an IRB or European equivalent for review. I’ll leave to one side whether IRB review for the diet trial was required under German law. (Unlike the U.S., many other Western countries require IRB review only for biomedical research, and it’s unclear whether a study of diet constitutes a medical trial—as opposed to, say, a “wellness” trial.) In the U.S., had an academic either received federal funding for this work or simply been an affiliate of a university that either by institutional policy or by federal contract subjects all human subjects research to IRB review, the sting—all of it—would have been required to undergo IRB review and approval. But although Bohannon has a Ph.D. in molecular biology and has had academic affiliations—perhaps ironically, as a visiting scholar in Harvard’s Program in Ethics and Health—he is a journalist and is not, therefore, legally bound to obtain IRB review for anything he does, no matter how risky or deceptive. (Most journals require a statement that IRB approval occurred before they will publish a paper involving human subjects, but that’s not a legal requirement and anyway, to no one’s surprise, the journal that accepted the shoddy study for publication within 24 hours and published it online as soon as the check cleared never asked about IRBs.)
But was IRB review ethically required? The IRB system is highly imperfect. Hardly anyone likes it. Some think it’s overly burdensome relative to its (undemonstrated) benefits while others think it’s largely toothless and should be replaced with an even more rigorous institution entirely. Even federal regulators aren’t happy with it. They proposed twenty pages worth of changes to the rules that IRBs apply (known as the Common Rule), though their apparent inability to agree on which of these changes is worth making has delayed final revisions for nearly four years. The IRB system is also modeled on biomedical trials, and is ill-suited to some other kinds of research. (More on this below.)
Whatever one thinks about the particular system of IRB review that we currently have in the U.S., it is especially important to get some sort of third-party prospective review of your plans when you don’t plan on asking your subjects whether they want to participate or not.
Let’s back up. Why are we even talking about the benefits of third-party review of nonconsensual research? Doesn’t human subjects research always require informed consent, at least ethically, if not legally? Actually, no, and for good reasons. For those of you who didn’t just close your browser window in anger and are still reading, it’s helpful to remember that “human subjects research” is an incredibly broad category of activity that includes low- or no-risk but potentially massively beneficial things like posting a checklist of best hygienic and other practices in an ICU to determine its effect, if any, on deadly and expensive catheter infections. Sometimes, an important question affecting people’s welfare—perhaps even the subjects’ own welfare—urgently needs to be answered, informing subjects about the study would badly bias the results, and the risks of the research are minimal or nonexistent. Under those circumstances, federal regulations allow nonconsensual “human experimentation,” and rightly so (ethically speaking). The government quite properly sponsors social science “stings” of landlords to determine the existence of racial discrimination in housing, for instance. You can’t tell a discriminatory landlord that you’re investigating housing discrimination and expect her not to alter her behavior in ways that mask the discrimination. (For several other examples, see pp. 295-298 here.)
But here’s the thing. As I’ve said before in the context of the Facebook mood contagion experiment and in other contexts involving risky behavior by journalists, if you’re not going to ask prospective subjects to weigh the risks and potential benefits of participating (say, because that would badly bias the results or because your subjects are not competent adults) and are instead going to deceptively conscript them into service as subjects, then it is right and good to ask people who are less personally and professionally invested in the study than you to put eyes on your plans to ensure that they really don’t pose any more than minimal risk to subjects and that whatever risk they do impose are justified by the expected benefits. This holds true whether you’re an academic, a journalist, a corporate data scientist, or a citizen scientist.
To think that this kind of third-party check on bias is a good idea, you don’t have to believe that we are in the presence of diabolical Mad Scientists. You just have to believe in human nature.
So how does this sting’s risk-benefit balance measure up in this regard? Not very well, I think, though there are different views and I’m open (as always) to being persuaded otherwise. Part of the point of prospective group review is the group part—different people bring different perspectives and different knowledge to the table. What follows are the kinds of questions and concerns I would have raised had someone come to me with this plan in advance.
For any conclusive analysis, I’d want to know more facts. On the risk side, for instance, I’d want to know about the health risks, if any, to the German trial participants of following a strict low-carb diet for three weeks, with or without added chocolate. I’d similarly want to know about the health risks to the “millions” of people who might adopt the low-carb + chocolate diet as a result of media coverage of the shoddy study, none of whom, unlike the clinical trial subjects, were cleared for the diet by blood tests and questionnaires. But, on the other hand, I’d want to know the probability that members of the public would be uniquely persuaded by reporting of this shoddy study and weren’t already persuaded to reduce their consumption of carbs and/or increase their consumption of dark chocolate by the considerable media reports of prior studies that have found things to like about both diet choices.
As for the potential benefits, to my mind, this sting tells us almost nothing that we didn’t already know. (I’ll come back in a minute to who I mean when I say “we,” and also to another purpose the sting might have served—indeed, the one Bohannon has said it was intended to serve.) We already knew that many journals will publish virtually anything you give them, with no peer review and no editing, for a price. Indeed, Bohannon’s first sting found that 157 journals accepted the dubious paper while only 98 rejected it. In io9, Bohannon also points to two lists of well-known predatory journals from which he selected the journals to submit his study to. The fact that “multiple” such journals of the 20 to which he submitted this time once again did so is unsurprising. And, like his first sting, this tells us nothing about how open access journals compare to subscription journals, since the dubious manuscripts in both cases were submitted only to a selection of the former group.
We already knew that underpowered studies can lead to results that are statistically significant but are likely false positives that would not replicate in larger samples.
We already knew that journalists sometimes credulously report shoddy science (and poorly report good science), perhaps especially diet “science,” whether because that science draws clicks, because the journalists are naïve about p-hacking and other aspects of science, or because they are overworked. Here, several media outlets reported the chocolate diet story, without apparently interviewing any outside researchers. Bohannon says he was genuinely surprised by this result:
Could we get something published? Probably. But beyond that? I thought it was sure to fizzle. We science journalists like to think of ourselves as more clever than the average hack. After all, we have to understand arcane scientific research well enough to explain it. And for reporters who don’t have science chops, as soon as they tapped outside sources for their stories—really anyone with a science degree, let alone an actual nutrition scientist—they would discover that the study was laughably flimsy.
Bohannon is either punking us again or this Science correspondent hasn’t been paying attention (my strong bet is on some version of the former). Non-tranquil academics complain when some popularizers of science run fast and loose with the facts, seemingly on purpose. Scientists regularly mock science reporting published in even the most august outlets. Gary Schwitzer and his team of forty at Health News Review evaluate the accuracy of health care journalism. I agree with him that the media reaction to the miracle chocolate diet is, unfortunately, not at all surprising.
And, finally, although the sting doesn’t even attempt to demonstrate this, we already know that too many members of the public are too credulous—and too busy living their lives—to sort the wheat from the chaff in the media’s coverage of science, especially “science” that tells them what they want to hear, like that eating chocolate will accelerate weight loss.
In sum, the fact that multiple journals that were already recognized as predatory agreed to publish, and multiple media outlets reported, one piece of shoddy science that was explicitly designed to appeal to readers (Bohannon says the doctor chose dark chocolate for the sting because it’s “a favorite of the ‘whole food’ fanatics”) is neither surprising nor especially illuminating.
Now, it’s true that the “we” in the above paragraph refers to scientists, careful specialist science reporters, other science communicators, and the like, and not, alas, to the general public, to many non-science academics, and to many policymakers. These latter groups continue to disseminate and consume shoddy science, which—along with the fact that shoddy science is conducted and published in the first place, while careful replications and failures to replicate are more likely to collect dust in the proverbial file drawer—is an enormous problem, indeed, an enormous ethical problem.
What we need are feasible solutions to make these groups aware of this problem, not more evidence of the problem that perversely contributes to the problem itself.
How might this sting contribute to the problem of shoddy science being disseminating to the public? Bohannon et al. caused shoddy science to be published in numerous outlets with a collective circulation that pales in comparison to the readership of io9, where Bohannon announced his sting. Just as when a sensational story is printed on page 1 and its correction is printed on page 14, it seems likely that far more people will read and be affected by the shoddy science than by the announcement that the latter was all a ruse.
And it’s not just about sheer numbers: readers of Shape, for instance, are unlikely to be readers of io9. So Shape readers are both unlikely to be disabused of the idea Bohannon et al. planted that they should be on a low-carb, high-chocolate diet and unlikely to gain enlightenment about the limits of science publishing and reporting and the need to be more critical consumers of the same in the future. I suspect many of io9’s science- and tech-savvy readers, conversely, were likely less susceptible to the shoddy science to begin with.
Bohannon has suggested that his goal was less to contribute to generalized knowledge than to “make reporters and readers alike more skeptical.” That makes a bit more sense, given that, as I’ve suggested, there is ample evidence within the relevant expert communities of poorly conducted, published, and reported science. And “making” reporters and readers less credulous is indeed a noble end. But, in the parlance of research ethics, was his sting well designed to achieve that important end?
If you deceive someone and then debrief them, they may learn something valuable from the experience, and that could be a benefit of the study. But that’s a delicate business. They might just as easily retreat into defensiveness or anger—especially when you communicate this lesson publicly by declaring that you’ve “exploit[ed] journalists’ incredible laziness” to “fool millions” with your “con.” To the extent that journalistic laziness drives the reporting of shoddy science, I’m as irritated as Bohannon. But if the goal is to get those journalists to change their behavior, how you communicate your deception matters. There is some behavioral science around the effectiveness of shame and humiliation as learning tools. Did Bohannon et al. consult it? Or did they develop their plan ad hoc, based on mere intuition about what works?
Even assuming that deception and, in the case of journal editors and reporters, public humiliation are effective means of altering behavior, debriefing all of these parties would be necessary for that effect to occur. Did Bohannon and colleagues have a plan for debriefing each set of subjects? Could they have had a feasible plan? With respect to reporters and readers, perhaps the best they could do is (1) alert reporters to the sting and request (2) a correction to the original piece and (3) a new story about it that readers may be more likely to see. But ultimately Bohannon et al. have no control over how outlets handle news of their sting, so this is a pretty big gamble. NPR reports this morning that some outlets that had covered the story have appended corrections to it, which is great. And the fact that NPR and other outlets are covering the sting will also help spread the word to readers who may have been snookered by news of the miraculous chocolate diet. But Bohannon’s team could not have known that this would happen, he has not suggested that he has done anything other than pen the gloating io9 piece to make it happen (not even requesting a retraction from the journal that published the study, which seems to have acted unilaterally in pulling the piece), and almost no matter what, there will inevitably be readers of the original stories, some of them months old, who never learn the truth.
It may be the public that is most harmed by the sting. It’s worth noting, then, that even had IRB review been required, the public would not be deemed “subjects” in need of protecting. Bohannon and his colleagues could be said to have “intervened,” via reporters, in the lives of Shape, Bild, Huffington Post, and other readers, but they collected no data “about” those readers, such as the effects on them of being exposed to shoddy science. As a result, readers were not “subjects” for purposes of federal regulations. The IRB system is explicitly not designed to protect everyone who might be adversely affected by a study. In this case, as it happens, it is not designed to protect those who might have been most adversely affected and certainly the most “innocent” of those affected.
Conversely, the system invites exquisite conversations among IRB members about the myriad ways a study might conceivably harm anyone who does meet the definition of “human subject,” even if—as might be thought to be the case with the journalists and predatory journal editors here—those subjects are at least partially culpable for any research-related harm that befalls them. An IRB that took seriously its role under the Common Rule would have had to ask about the risks to the punked journalists’ and editors’ employment, for instance, along with all of the psychosocial risks of being deceived and publically embarrassed. IRBs are a bit like doctors in this regard: They don’t refuse to help the bank robber who enters the Emergency Department with a gunshot wound.
What about the other subjects of the sting, the publishers and journalists? Maybe some journalists will be more careful going forward, lest they, too, get punked. But that assumes something about what causes poor science journalism in the first place. Reporting science well is a tricky business. Most journalists are overworked and underpaid. The odds of being caught in a Bohannon-like sting are sufficiently small that it would probably be rational to continue to cut corners. They can always point to the fact that the study they reported was, after all, published in an academic journal.
And what of those journals? As for the ones that will print anything for a buck (or a Euro), I hold out little hope that such stings will have much effect on them at all. Truly predatory journals can and do close shop and reopen under a new name overnight, when it suits them. They are not motivated by scientific integrity, so shaming them for lacking such integrity is unlikely to alter their behavior. As Bohannon himself notes in io9, “there are plenty of journals that care more about money than reputation.”
My current thinking is one of skepticism, then, about the risk-benefit profile of this sting, especially as it was conducted. Similar ends might have been more successfully reached with less risk. Why, for instance, was it necessary to dupe 16 Germans into a three-week trial that Bohannon et al. knew would demonstrate absolutely nothing about the effectiveness of different diets? Bohannon would be the first to tell you that he could have fabricated the data and still gotten it published within 24 hours by a predatory journal—or, ahem, Science. Perhaps the fact that the trial was actually conducted gives the sting added punch (although I’m not sure that inducing people to publish and report fabricated data would have been any less titillating). But is that marginal punch worth the costs to subjects? It’s not actually clear what the German diet trial subjects were told, other than that their participation “was part of a documentary film about dieting” (actually, the film version of the io9 reveal, set to air soon in German). But assuming that subjects falsely believed that that film was documenting a trial designed and intended to determine the effectiveness of different diets, potential costs include wasted time and effort and, worse, ensuing distrust of the medical research establishment.
Similarly, given that Bohannon et al. couldn’t realistically be certain they would be able to debrief reader-subjects, why not focus on reporter-subjects? They could have written the press release, done the interviews, and taken the ruse just up to the point of publication, then pulled the plug. That would have avoided injected more bad science into the lives of “millions” and, by avoiding public humiliation, might additionally have been more effective in altering reporters’ behavior.
The only ironically assuring thing about all this is that we are already so bombarded with shoddy diet advice masquerading as science that reports of this study may amount to a drop in the bucket that had little effect in altering anyone’s behavior. (Recall that in defining minimal research risk according to how it relates to the kinds of risks subjects encounter daily, the Common Rule, much like the expectation of privacy test, bootstraps sorry states of affairs—like constant media bombardment with reports of shoddy diet “science”—to define risk down.) Even so, deliberately disseminating bad science, with no plan for unringing the bell, is a terrible precedent for researchers, journalists, and everyone else.