Big, If True | Stats + Stories Episode 234 / by Stats Stories

Andrew Gelman (@StatModeling) is a professor of statistics and political science, and director of the Applied Statistics Center at Columbia University. His research interests include voting behavior and outcomes, campaign polling, criminal justice issues, social network structure, and statistical and research methods. He has received the Outstanding Statistical Application award three times from the American Statistical Association, the award for best article published in the American Political Science Review, and the Council of Presidents of Statistical Societies award for outstanding contributions by a person under the age of 40.

Episode Description

Most articles that appear in academic journals are kind of mundane in that they’re extending the work of scholars who have come before, or sometimes taking an old theory in a new direction. There are those moment, however when a piece of research holds the possibility of fundamentally remaking a field. How should those articles be handled? What’s the ethical way to review such research? That’s the focus of this episode of Stats and Stories with guest Andrew German. 

+Timestamps

Could you just describe what a big if true article is? (1:37), Editor motivations and making a splash (9:00), How can reviewers be better? (12:47), Attributing credit in this new post publishing review system (15:21), Why you felt compelled to start your ethics article (18:43), Changing thoughts? (25:03)


+Full Transcript

Rosemary Pennington Most articles that appear in academic journals are kind of mundane and that they're extending the work of scholars who have come before or sometimes taking an old theory in a new direction. There are those moments however, when a piece of research holds the possibility of fundamentally remaking a field. How should those articles be handled? What's the ethical way to review such research? That's the focus of this episode of Stats and Stories where we explore the statistics behind the stories. And the stories behind the statistics. I'm Rosemary Pennington. Stats and Stories is a production of Miami University's Department of Statistics and media journalism and film, as well as the American Statistical Association. Joining me is regular panelist John Bailer, Chair of Miami statistics department. Our guest today is Andrew Gelman, a professor of statistics and Political Science and director of the Applied Statistics Center at Columbia University. His research interests include voting and criminal justice issues, social network structure and statistical and research methods. Gelman received the Outstanding statistical application award three times from the American Statistical Association, the award for best article published in the American Political Science Review, and the Council of Presidents of statistical societies Award for Outstanding Contributions by a person under the age of 40. He's also recently authored a piece in chants about the ethics of publishing Big If True articles. Andrew, thank you so much for joining us again here on Stats and Stories.

Andrew Gelman Happy to be here again.

Rosemary Pennington Could you just describe what a Big If True article is?

Andrew Gelman I'll give three examples. One was the paper published in the medical journal Lancet in 1998, claiming that vaccines caused health risks, and that was later retracted, and that was considered one of the most embarrassing papers ever published. In 2011, the Journal of Personality and Social Psychology published a paper claiming that Cornell students had extrasensory perception that made its way to the New York Times. In 1994, our very own statistical science journal published a claim purporting to report on a Bible Code code from the Bible that was supposedly predicting events 1000s of years later. Now, I'm not sure the story of the vaccine denial paper, but the ESP paper and the Bible Code paper were notable first, because yes, big if true, if really, you can measure extrasensory perception that would be big news seems to violate our understanding of science worthy of a New York Times headline at the very least. Similarly, if the Bible was really predicting stuff reliably 1000s of years later, that's indeed, news. The second was that it seemed pretty clear that the editors of the Journal did not believe these papers. That they, it seemed to me , were publishing them under a kind of sense of fairness or obligation. Like, if we don't we, if we don't publish this paper, just because we don't believe it, then we're inserting ourselves into the process. So I think they're republished for kind of procedural reasons. Of course, the editors of the journals, I don't think notice problems with the papers. Now in both of these, again, I will focus on the ESP paper and a little bit I'll mention the Bible Code paper, CSP paper and the Bible Code paper have since both been refuted, both empirically and methodologically. They've been empirically refuted in that people have tried to replicate the ESP studies and don't find anything. The paper was methodologically refuted because people went back to the paper, and they realized that the author was using a lot of forking paths. If you read the paper, the ESP paper looks impressive, because they have, I believe, nine different experiments, all of which report successful results. But if you look carefully, the different experiments use different methods to analyze the data in different ways. So we understand methodologically that how it was possible for researchers such as that to get apparently statistically significant results, because it kind of looks like the probability of that happening should be 1/20 to the ninth power, but that's not the case at all, so empirically refuted, and also refuted at a methodological level. I guess you could say there was a theoretical rescue station only in that there was never any mechanism for serious mechanism calls for this to work. The Bible Code paper was empirically refuted, not by trying to use the Bible to make more predictions. But some skeptics went and used the same method to other books, I think they use Warren Piece and like other lawn books, and you can find similar patterns. And again, the methodological refutation was that there were forking paths. And there are enough things you could look at, you could find a pattern as we, at the time, we refer to it as a very easy word search problem where you can kind of make up your own rules for what counts right now, if you look at these two papers, it's obvious how bad they are. One thing that's striking is just when the ESP paper came out, there was this whole thing like, well, we can't just reject it. It's using standard methods, that wouldn't be right.

But you look at it carefully. It's like, yeah, these were really bad methods. And it's a little like, you know, the story of Arthur Conan Doyle how he was fooled by the fairy photographs. Right, so, the author of the Sherlock Holmes stories, was fooled by obviously fake photos that some kids made showing life sized fairies flying around in their garden. And it's like, these are not good fakes. But they're just obviously fake photos. And Sherlock Holmes was very skeptical, but his author, not so much. What's interesting, what's striking is not just oh, he got fooled, but it wasn't close. This wasn't like a fuzzy recording, that might sound like something, you know, we're not talking about like the Zapruder tape or anything like that, right? Very clearly, in retrospect, bad. But at the time, people didn't realize. So what I wanted to get to was, what if you're a journal editor in this position? And if you see the papers really bad, you reject it? That's easy. What if you don't realize it, and that will happen? Because if you see the papers are really bad, you will reject it? And you say, This is really bad. You did all these things wrong? Do you think the author is thinking to say, oh, shoot, I did all these things wrong? I was wrong. I really feel no, of course not. The author is going to be very annoyed at your review report, and the author will send it to another journal. And then they'll keep sending it to another journal until a journal will publish it. So all you have to do is find one person. So what is that journal editors supposed to do? What's the right answer? And there's two answers that won't work. So one answer is just out. don't publish it. Okay. Because it's ridiculous. ESP. Cornell students, like, come on? What? That's a joke, right? Like, but that's not good. Because if you follow that rule, then you lose your chance of big discoveries. Right. FOMO. Right. The other extreme would be to say, just publish it, the open minded science progresses by reputation, you know, what's the big deal? So what if they published these papers? With the problem that there's nothing wrong with publishing everything, but they don't publish everything. So there's an asymmetry, that, will they publish a paper saying, I did an experiment, and I found that Cornell students have no ESP, oh, they're not going to publish that. Right? Because we already knew that, are they going to publish experiments? And you know, you, you cannot predict recent elections from the Bible? No, they're not going to do that. Right. And so, you get a bias in favor of false things. So that seems like a problem, too. So this was what the quandary I came at when starting to read this paper.

John Bailer So you know, when I was reading this, I mean, it's, it's funny, because I wrote down journal FOMO comments. Because it's, it seems like there's this kind of real incentive to try to get the attention that will get you the, you know, the coverage and the times, I mean, so it's, it's kind of this incentive process, and I like how you, you kind of explored this with the, the idea of these these motivations, the journal motivations, like as you noted it as sort of fairness and open mindedness and, and not censoring this but, but this this idea of, well, you want to make sure that you you get out there and get this claim, just in case it's there because it's going to bring your journal a lot of attention.

Andrew Gelman I'll also say I agree, but I don't want to frame it as like the journal editor being kind of greedy for fame or whatever. It's more that it's a combination of factors that reinforce each other, but part of it is a kind of bending over backward, like, Oh, this is so ridiculous. Like, like I really shouldn't be unfair to it. And, and that's where it's like there's a kind of, again, you get these kinds of asymmetries because there are all sorts of ridiculous things that they don't publish. Like, it's what, who decides what, what's the ridiculous thing?

John Bailer Um, so let me let me follow up with so you also commented and step through kind of the refutation based on kind of this empirical exploration as well as its kind of methodological, you know, kind of, you know, kind of deep kind of careful look and review is that is, you know, at what level you could be a sign of science working. But, but in some sense, I never think that that kind of refutation ever gets the attention, that the splash that the incorrect splash, does.

Andrew Gelman Yeah, the, well, the reputation gets less attention, these kinds of studies. It hardly matters because not many people were believing in ESP and Bible Code, or if they were they weren't really doing anything with it, except believing it. A lot of people passively believe in, in biblical fundamentalism and extrasensory perception, and I looked it up surveys say 45% of Americans say they believe in ghosts. 65% believe in the supernatural and 65% believe in God. And so like, you know, you can, like people believe in things that there's disputed evidence for, and people believe in things that aren't explained by the usual laws of physics. But it's reputation doesn't matter, because like, someone's going to not stop believing in ghosts, just because you shot down some particular ghost, right?

John Bailer Yeah, but vaccines. Yeah, casting suspicion on that. That's a pretty scary one to have kind of pushed out. And then

Andrew Gelma No, I agree. And you get a big have to they're like, huge truth theories, right. So like, it would be huge if ESP was really happening, right. But there are theories that were proposed and refuted scientifically. So there are some famous examples in psychology, like certain aspects of social priming. So there were studies that were promoted, claiming that certain very small imperceptible interventions could have large effects on behavior. And those have been many of them have been refuted, but the ideas are still out there. So yeah, I agree. I mean, it's not. But like, that's unavoidable, too. I don't know. I mean, that's a separate question. Let me put it that way. Like, how would there be a way for refuted things to go back to the swamp where they came from, rather than just kind of living forever? That's a good question.

Rosemary Pennington 12:47
You're listening to Stats and Stories. And today, we're talking with Columbia University's Andrew Gelman, about the ethics of publishing research, particularly big if true research articles, you talk a lot in your column about the role of editors. But I wonder, too, about the process of peer review, you know, I do a fair amount of reviewing in communication journals. And there's always a sort of tension whereas the reviewer you're trying to comment on sort of the research, but also couching things like Right, like if something feels a little funny, you're like, well, there could be something interesting here. You know, if the author did a bit more digging, although it might not, it might be you know, you're always sort of trying to couch what your critique is a little bit. And I wonder sort of what we might do in relation to sort of the peer review process itself to try to help some of this right, obviously, you have the issue of the auditor, who might feel conflicted about whether to let some of this stuff through. But at some point, those of us who are doing the reviewing also have to kind of figure out how to handle things that come through that could seem on their surface, like they're done correctly, but then also seem kind of questionable.

Andrew Gelman
I think it would be better to have open reviews. And some journals do that now. Because we don't actually know what the review is saying here. Like, did they say who reviewed them? Were they reviewed at all? Where did this go with? When I review a paper I just remember that I'm providing information? I think I used to get, I would be asked to review a paper and I would think all this is like a heavy responsibility and do I think the papers to be accepted or not. But then I realized that's not my job. So I'm providing information to the editor. So I got the paper. I spend, you know, five or 10 minutes and I write a review and then that's information that the editor can use as they see fit. More generally, I prefer the idea of post publication review. Pre-publication review is inefficient because every paper gets reviewed multiple times. And some of the worst papers get reviewed more because they get sent from journal to journal to journal. Then the best papers are certain papers that are the ones who peered into Have journalists get the little least scrutiny, because they got three reviews and they got they appeared, they got the least scrutiny, they most attention, then they're out there, post publication review has the delightful feature that people only do it if they feel like it. And so the papers that there's more interesting will get more reviews. So it seems to naturally solve that problem.

John Bailer
I liked your idea that this sort of this option, see, you had talked about a couple of different ideas that that were part of what what an editor might do, but the idea of publishing the data, you know, the raw data, as you know, without publishing, you know, kind of the, the article necessarily seemed like a really, you know, really pretty fascinating idea, although, then I found myself thinking about how, how are people going to get credit for their contributions? How are they going to? How is this going to be evaluated in the system, but it does seem, particularly for things that are very challenging, you know, kind of these big, but potentially big but true stories that that makes for a pretty reasonable, you know, maybe option? Can you talk about why you thought that?

Andrew Gelman
So, I would think that like if these papers have value, well, sure, like think the author's will get credit for like sticking their neck out and making bold claims like, you know, you go right, go for it, but the the scientific value would be in their data and their analysis, they should supply their data. And so there's kind of three parts of it. First, the data should be public, the raw data. Sometimes there's some kind of confidentiality, you're doing a study of dangerous practices, but that's not the case and any of these. So the first thing is to make the data available, which isn't necessarily the case. The second, though, is to take the burden away from the author, because now what's needed is a kind of coherent argument. So what I wrote here was, the usual way to go is to publish a highly polished article making strong conclusions along with statistical arguments, all pointing in the direction of these conclusions, basically, an expanded version of that five paragraph essay that you learned about in high school, here's what we're going to tell you, here it is, here's the evidence supporting it. By the way, at the end, this is what we just told you. And that's what students are told, and researchers like that's how you're supposed to read a scientific paper, you get a lot of a lot of advice, you don't get a lot of advice on how to write a scientific paper, if what you think what you found is probably wrong, if they don't tell you much about that. People do put limitation sections in their papers. And that's good. So I don't mean that they're terrible. But the limitation section is not always the main point. So if you can get credit for publishing the data, then you don't have to be certain about the conclusions. Now, of course, the journal, a top journal might say, well, we don't need we're not going to publish the data, because it's where to publish this paper, because it's a pile of data. And it's not clear if there's really any ESP, well, then if they really feel that way, they shouldn't publish it, right, these people can publish it somewhere else, right? Like, if the top journal feels they need to be convinced, you know that, that that's another story. And they request a replication. If you want to run in the Boston Marathon, you first have to get a good enough time and another marathon first. So if you want to publish in the top journal, it may be what happens if they have an NP registered your paper somewhere else. And then the top Journal says, Well, this is sort of suspicious. So we're going to ask you to do that, like that can be done. There's a lot of options. Like once you recognize the goals that the journalists have,

Rosemary Pennington You've been writing this column about sort of the ethics in statistical research for a while now. Can you talk a bit about how that gets started? And why you felt compelled to sort of start writing about that.

Andrew Gelman
It started in 2000. I think it was 2012. And that editor at the time asked if I wanted to do a column. And I wanted to do a column about ethics and statistics, because I was bothered about how people talk about ethics and statistics and how people talk about ethics more generally. And I would say my problem about how people talked about ethics and statistics back then, is similar to my problem about how people talk about ethics in data science. Now, it's a similar thing. And I felt like there were three things going on. So the first was that ethics are important. I mean, it's clear that unethical behavior exists that we personally sometimes have ethical trade offs, but it's an important topic. The second is that we are not trained as epithets. I really am skeptical that it means anything to be trained as an ethicist for that matter. I don't I don't really think that there's, I mean, some people know more about the topic than others, but I don't think it's a technical subject in the same way as being trained to be a veterinarian or to be A physicist or, or to be a truck driver or whatever, I don't think of it quite that way. But we certainly are not used to thinking systematically about ethics. But we know it's important. The third is a tendency to focus on a few kinds of specific flagship problems, sort of flattening the discussion. So just coming into this, it led me to let me talk about how I came into ethics and statistics. So I felt that the discussion of ethics and statistics focused, I felt like they'd say, there are these things like, is it ethical to do a randomized trial, if the treatment if you know, the treatments can be better? And people say things like you need equipment between the treatment and the control? If you think the control is better, it's unethical to do the treatment. Or if you think the treatment is fair, it's unethical to do the control. But that makes no sense to me from a Bayesian standpoint to say echo points makes no sense. Because if you have equity points, and you get one data point, you no longer have equity points. But also, just from a real life standpoint, that's not the case. Like you obviously, we're in different situations and different settings. There's real ethical questions like when do you stop the trial, but I felt that the framing was kind of naive. And then when people would try to rescue it by saying, Well, this is an ethical design, if the probability is in this range, and it's not statistically significant. I just felt like they're, they're kind of missing some of the big parts of the story. And in this case, a big issue with the trolley problem, that it's a trade off of the people in the study with the general population. And so ethical questions have to ultimately revolve around how many people are in the population? Who what's happening to the people in the study, specifically, I don't mean that there's a formula that there's an easy answer, but to answer it without talking about the population, and about talking about other people in the sample, it seems wrong to me. Now, the second thing that bothered me was what I call the LA law version of ethics. So it's such an old TV show, and in the old TV show, there'd be good guys and bad guys, and often the lawyers would have to kind of defend the bad guy. And there'd be this kind of like, you get these very subtle ethical questions, right? Like, you know, what, if I am, I'm a company, and I'm polluting them by sending sludge into the river? Is that unethical? Well, it's legal. So would it be unethical? If it's legal? Is it really unethical? If I don't do it, my company will go out of business, I'm gonna be defeated by competitors from other countries that lack environmental laws and I have to fire everybody. So really, it would be unethical not to dump this lodge, right? Or what if it is illegal? Well, you pay a fine, it would be unethical for me to not dump this ledge and pay the fine. What if I could bribe some politicians and get them to look the other way? Well, maybe that's the most ethical way. Because of the alternative, she can get into this kind of extreme consequentialism, in which you can make any clever argument. And I associated that with a lot of ethics talks. I associate ethics talk with either very specific arguments that are missing huge parts of the problem, or a kind of clever, clever story where you can make an argument for anything a kind of like, debate team approach to ethics. So I had this idea, I want to do something different. So I've talked with my wife, Carolyn, who teaches social work, and she taught classes and ethics. And I gave her my complaints. And she said, that's not what ethics is about. Like she said, ethics is not like when I said it's about people justifying all these things. She said no ethics is about people making decisions under difficult situations and balancing this out. In social work. This is very clear, because social work is all about the person in her environment. And that's how it's distinct from various academic fields. So I started with my first ethics comment, I thought, what is an ethical dilemma? I want to be very clear, an ethics problem arises when you are considering an action that first benefits you or some causes you support, second, hurts or reduces benefits to others, and third violates some rule. So it seemed to me that's what it's about. There's a trade off and a rule. Of course, you can have trade offs without the rule, but typically, there is something that you're supposed to do. And I felt that ethical discussions of ethics and statistics tended to be bloodless, and discussions of ethics in journalism tended to be too clever and too focused on the idea that you can come up with an argument for everything. So I thought when we're thinking we said, I wanted to read a column with specific ethical dilemmas that involve these trade offs and try to Like think about them from scratch. So that was my goal.

Rosemary Pennington
Do you feel like your thoughts about ethics have changed at all since you started writing the column?

Andrew Gelman Well, my, what I've spent my time writing about it has kind of changed in some way. I guess. I've, when I started, it was when I was becoming more aware of like a lot of bad research that's been published. But it was less of a concern. I mean, the example I gave in the first column was something that had happened decades earlier, when I was in grad school where I had read a paper that it's not have a million citations, but in a certain subfield it was being much discussed. And I went to the library, found the article typed in the data wasn't raw data, but it was various estimates and standard errors, made some graphs and I realized that they could have done better and their analysis. And I contacted them and asked if they could share the data. And they said no, like, well, everything was in lab notebooks, then they would have had to photocopy it wouldn't have been that hard. But they didn't. They just weren't. So I was really thinking that paper was not scientific fraud, or anything close. It was just, I don't think they 100% knew what they were doing. But I think at the time, I was thinking of science reform, mostly in that way, like we should be able to do better if we have more data available. I think I was just less aware of thinking about there being a systematic problem of bad science. So that's kind of a whole angle of ethics, that has come up a lot, which wasn't originally what I was thinking about. Some of the things we've written about have just been flat out people lying, making up data. I don't I don't want to talk about that, like all the time, because if you talk too much about fraud, then or maybe sometimes it's fraud, but under no malfeasance or something. When you talk about just misconduct all the time, then people who don't do flat out misconduct think that they must be doing everything right. And then conversely, if you point out someone does something wrong, they think they're accusing them of misconduct, which is frustrating. But yeah, some of it's just that, just like bad numbers out there.

Rosemary Pennington Well, that's all the time we have for this episode of Stats and Stories. Andrew, thank you so much for joining us today.

Andrew Gelman Oh, sure. Thank you for inviting me.

Rosemary Pennington Stats and Stories is a partnership between Miami University’s Departments of Statistics, and Media, Journalism and Film, and the American Statistical Association. You can follow us on Twitter, Apple podcasts, or other places you can find podcasts. If you’d like to share your thoughts on the program send your email to statsandstories@miamioh.edu or check us out at statsandstories.net, and be sure to listen for future editions of Stats and Stori