The Statistical Detective | Stats + Stories Episode 226 / by Stats Stories

Kristin Sainani (née Cobb) (@KristinSainani) is an associate professor at Stanford University. She teaches statistics and writing; works on statistical projects in sports medicine; and writes about health, science and statistics for a range of audiences. She authored the health column Body News for Allure magazine for a decade. She is the statistical editor for the journal Physical Medicine & Rehabilitation; and has authored a statistics column, Statistically Speaking, for this journal since 2009. She is also the associate editor for statistics at Medicine & Science in Sports & Exercise. She teaches the popular Massive Open Online Course (MOOC) Writing in the Sciences on Coursera, and also offers an online medical statistics certificate program through the Stanford Center for Professional Development. She was the recipient of the 2018 Biosciences Award for Excellence in Graduate Teaching at Stanford University.

Episode Description

No matter how careful a researcher or statistician is there's the possibility that an error made exists in reported data. The trick as a reader is figuring out how to identify errors and then understand what they might mean. Learning how to be a statistical detective is the focus of this episode of Stats and Stories with guest Kristin Sainani (née Cobb).

+Full Transcript

Rosemary Pennington
No matter how careful a researcher or statistician is, there's the possibility that an error may exist in reported data. The trick as a reader is figuring out how to identify errors and then understand what they might mean. Learning how to be a statistical detective is a focus of this episode of stats and stories, where we explore the statistics behind the stories and the stories behind the statistics. I'm Rosemary Pennington. Stats and Stories is a production of Miami University's departments of statistics and media, journalism and film, as well as the American Statistical Association. Joining me is regular panelist John Bailer, Chair of Miami’s statistics department. Our guest today is Kristen Sainani. Sainani is a professor at Stanford University, where she teaches statistics and writing works on statistical projects in sports medicine, and writes about health science and stats for a range of audiences. She's also the statistical editor of the journal physical medicine and rehabilitation. And as authored a statistics column statistically speaking for the journal since 2009, and 2020, she published a statistically speaking column on how to be a statistical detective. Kristen, thank you so much for joining us today.

Kristen Sainani
Thanks for having me.

Rosemary Pennington
I guess just to get started, I'm wondering sort of what made you feel like you had to write this column about being I keep wanting to say data detective, because I like the consternation, but this this statistical detective?

Kristen Sainani
Yeah, I feel like a lot of what I do in my work is being a statistical detective. So I do a lot of statistical review. And I also sometimes will vet things for journalists behind the scenes, and just in reading papers, I'm ending up playing that role a lot. And I think what's interesting is, you know, people get really intimidated by statistics. And I think a lot of people don't feel confident in their ability to vet statistics. And I like to tell them that the majority of the errors that I'm detecting in statistical review, it's not, you know, some subtle problem in a fancy statistical model, it's things like, you got your zero and one backwards, you know, you've miscoded something, you've lost half your dataset, from table one to table two. And so they're really basic things that you don't need a statistics degree to detect. So I thought it would be fun to write a column about, here's some tips and tools that you can use, that are largely relying on common sense. And, you know, adding and subtracting to detect what shouldn't be obvious errors in papers. But these errors often slip through.

John Bailer
You call it stat detective, but you know, as I, as I hear you describing it, I think you're almost being a forensic specialist, you're trying to, you're trying to figure out who killed this particular study? And why did they do it? Or maybe how they reported it?

Kristen Sainani
If you like, if you like mysteries, and you like that kind of thing. It's actually fun, right to dig back to the original paper and see what went wrong and try to figure it out. Yeah, exactly.

Rosemary Pennington
The question I have, when I was reading this, is, you know, there are so many of us who are in a position to do reviewing, so I you know, I do communication research, and I do a fair amount of reviewing for journals. And occasionally, there are studies that come across my table where I'm like, this seems okay. But there's always this moment where I'm like, I am not quite sure, if I have the statistical depth to really understand what the data is saying it seems like it's saying what it's supposed to be saying. But there's always that kind of struggle as you are reviewing something, because you want to make sure you're not allowing bunk through and I guess, what sort of tips do you have coming out of your own experiences, maybe that are in this column? For those of us who maybe are not as statistically literate, as we try to review stats, in the end the work we look over?

Kristen Sainani
So I think, first of all, just you know, trust your gut, and if something seems wrong, it's, you definitely want to double check it and use common sense to ask basic questions. Do the numbers make sense? Or do they seem plausible? And again, these aren't things that don't require a degree in stats. So I tried to teach in my, in my teaching of statistics, I tried to teach my students to be a little skeptical of everything they see, I think there is a lot of bunk in the literature and approaching things with a little skepticism is always a good thing.

John Bailer
So it's funny to hear you say because I often think about a lot of these intro classes as being courses in data self defense. That's part of what you know, trying to help people respond. But as you were talking about some of these ideas, like the common sense or the simple arithmetic or some of the state the stat checking programs, Could you could you kind of dive in and give some examples of of when common sense gave some insight about a particular problem in a paper or a case where the simple arithmetic example, you know, plays out for you.

Kristen Sainani
Right, but actually a great example is after I published that column, I did a webinar on the same topic. And I was pointing out tips, things like, you should look for outliers in scatter plots. And sometimes outliers can have an undue influence on your statistics. And they're easy to catch off in. And right after the webinar, I was contacted by a grad student and his advisor from Australia. And they said, Hey, we tried applying some of your tips to one of the most highly cited meta analyses in our field. And they were in strength and conditioning research. And they said, We think we found an error, can you look at it for us? So sure enough, I popped up in the paper, you look at figure one, there was a scatter plot. And this is a meta analysis. So the data points represent groups of athletes, from different studies. And right away, you can see there's this outlier up the upper left hand corner, and it's clearly making the correlation coefficient bigger than it should be. But even worse than that, it was if you just stopped and thought for a minute about what the numbers meant, what that dot represented, this, actually, somebody should have recognized before this, that that was an obviously implausible data point. And so what they were looking at is, the x axis was looking at improvements in sprinting ability in this group of athletes. And the y axis was looking at improvements in squatting ability. And they were in standardized effect sizes, so a standard deviation units and the coordinates of this point, this outlier was a five standard deviation improvement in sprinting, and wait for it 15 standard deviation improvement in squatting ability, and that is just at face value not possible.

John Bailer
So did you try that routine? Did that?

Kristen Sainani
Right? So like, what do they give these athletes? Right? So for those who don't think in standard deviation units, to translate that back to the original units, we actually pulled the underlying paper. And that would have meant that these, this group of athletes, on average, went from being able to squat about 120 kilograms of weight to something like 250 kilograms of weight, you know, overnight. Right. So it was obviously an error. And so we dug into the paper, we figured out what had happened is that when they were calculating the standardized effect size, the authors had put standard error, rather than standard deviation and their denominator turns out, that's going to make your effect sizes artificially large. This is an error I have seen a lot when reviewing meta analyses. And this led us to find some other errors in that paper. And then to ask the question, you know, how often do these errors occur in meta analyses in their field. So the grad student put together the 20, top cited meta analyses in strength and conditioning research, and we systematically reviewed them for errors. Turns out about 45% of them had at least one effect size, where they had swapped standard error and standard deviation and 85% of the articles had some one of these statistical errors that we were looking for. So really, you know, again, something 15 standard deviation improvement shouldn't make it through review, editors should see it recognize it as wrong, and yet it gets to review. People don't notice it.

Rosemary Pennington
As you're talking about this, you know, we're living in this moment where people are so skeptical of science, right? And it's a place to sort of try to tread very carefully, because you want people to, you know, trust Data Trust information. And yet, it seems like there are all these ways where things can get tripped up. And I wonder as you're doing your detective or forensics work, as John talked about it, how do you navigate how to talk about this in a way that doesn't make people go, Oh, my God, science is terrible. We can't trust anybody, like how do we talk about this in a way that's productive and fruitful? And doesn't sort of just, you know, lead them all the worst impulses?

Kristen Sainani
That is a great question. There is really, yeah, this tension of wanting to point out the errors, but not throwing out the baby with the bathwater, it doesn't mean all of science is bad. It does mean that there are a lot of published papers that I don't trust and being able to differentiate well done science from bunk is a really important skill. But I think pointing out these errors, and being transparent, actually, at the end of the day is going to increase confidence. I mean, it's a long term process, but if you are more transparent, and you admit when you make errors and correct them, so science is a self correcting process when done right, then that's eventually the thing that builds trust in science. So hopefully in the long run this process of detecting errors and fixing them and being transparent actually increases trust.

John Bailer
Yeah, that's certainly an aspiration. And I think that's a great point and really an outstanding point. When you're talking about these, these standardized versus a 15 or five, you know, I was just thinking, Gosh, I hope that a student from an intro stat class would have this kind of gut punch reaction to something that large. And, you know, the fact that you're talking about kind of, you know, established scientists and journals reviewing this and not having that, that like holy cow, this couldn't possibly be the right experience. So I like that. It's common sense, but just kind of use basic fundamental principles that you've learned in an intro class about data analysis, to just do a quick as your first read seems like part of what's embedded in what you're suggesting?

Kristen Sainani
Absolutely. And I think it says something not not so great about maybe how statistics is taught a lot of the time in that we have students, researchers, who wouldn't recognize this. And I think some of it boils down to the way statistics gets taught sometimes isn't a very cookbook fashion, unfortunately. And it doesn't emphasize thinking and just asking questions and being skeptical and thinking about what just what the numbers mean. And that's a big problem.

John Bailer
That helps a lot in terms of that. That example for the common sense piece was really, really interesting and helpful, what your second point in your data detective piece was touching on the idea of simple arithmetic. So what's an example of where you've seen that kind of simple arithmetic has helped you catch some of these unusual or inappropriate or incorrect values.

Kristen Sainani
Yeah, so another I find example, I think is a one that was a so I wrote it a health column for for Allure magazine for many years, which is actually a magazine that focuses on beauty. And I would cover topics like obesity and exercise and skincare. And so I was always looking for headlines about exercise or you know how people can eat better. And there was a headline that came out in it, it was something like exercise labels beat out calorie counts when keeping teenagers away from junk food. And the idea was, it was kind of an interesting idea that is, instead of labeling food with like this is has 200 calories. What if you were to put on the food, you need to run for an hour to burn off the calories in this food. And that might motivate people more to want to stay away from bad foods. So really interesting idea. But when I went and pulled the paper hoping I might be could write about it for my, for my health column. I looked at table one. And so this will require a little setup. But basically what they did is they went to four stores. And they counted up how often teenagers were buying sugary beverages as opposed to non sugary beverages. And then they implemented some interventions. And one of the interventions was they posted a sign that said, Did you know that a bottle of soda has 200 calories, something like that. And then, for another few weeks, they posted a sign that said something like Did you know that you'll have to jog for 50 minutes to burn off the calories and a bottle of soda. So they counted up, you know, the sugary beverage purchases. And before the intervention at baseline 93% of the teenagers were buying sugary beverages, when they posted the calorie count sign that dropped to 87%. When they posted the exercise label sign it was 86%. Okay, 86% versus 87%. This is not hard arithmetic, those, those are virtually identical. There is no real meaningful difference between those numbers. So how does that get translated to exercise labels to beat out calorie counts. And so it's this funny thing called an odds ratio. This is a statistic that comes out when you do a model called logistic regression, which is used for binary outcomes. And that's what the author's had done. They stuck their data into these model outcomes, these odds ratios. Well, odds ratios are this funny measure that can be easily misinterpreted. And in fact, when you have common outcomes, like most teenagers, buying sugary beverages, it actually can really exaggerate effects if you misinterpret the odds ratio. So that's what had happened that 86 versus 87% got magnified through the misinterpretation of odds ratios. So a really good example where everybody can look at the difference between 86% and 87%. And get that right, and then it gets kind of twisted around and people see in it what they want to see in it. Oh, look, it's a big difference.

Rosemary Pennington
You're listening to stats and stories. And today we're talking statistics and data with Stanford University's Kristin Sinani. Kristin, you mentioned that you have worked with journalists to help vet things and I you know, in a past life was a journalist who did a lot of science and medical reporting. And I did a Reporting Workshop in San Francisco on science writing, and one of the things that I'll never forget because I also share this with my students when they're sort of thinking about doing this is, you know, a way to help You navigate us to adopt a scientist, right? And, and if you're looking at a study and you want to cover it, you can reach out to your adopted scientist and be like, Help. Help. I have no idea what these numbers mean, John. Right. And so I wonder when it was beautiful. I, you know, I was lucky, I was working in Alabama, at the public radio station associated with the University of Alabama at Birmingham, a lot of really cool people at UAB, who were just kind of helpful. And I wonder what that process is like for you. Do you work with journalists? What kinds of things are you helping them navigate through? And is there a particular moment when you could talk about how you felt like you had had a really big impact on how something got reported? Because you sort of helped a journalist work something out?

Kristen Sainani
Sure. Yeah. I mean, it's a fun process to help journalists behind the scenes to vet statistics, because sometimes things shouldn't be written about, or it's easy to get them wrong until you feel like you're helping the journalist to get things right. And I think it's also great, you know, for my scientist hat, too, because sometimes the journalists are pretty savvy and they can have a good gut feeling sometimes when they think, hey, something doesn't seem right here. And so sometimes I end up using those examples in my teaching and my statistics, call him or even end up writing a paper for the academic literature about some of these. So they end up being kind of interesting cases. For me. Well, one that I got, that ended up being a much longer console than I anticipated at the beginning, was I was actually at a science writing conference in San Francisco and giving a talk on a panel with a journalist, Christiane Swanton, who was then at 530 eight.com. And she was writing a book on exercise science, and I happened to do my statistical work in sports medicine. So she said, hey, you know, I found this funny statistical method that's being used in sports science. And could you look at some papers for me. And so I ended up looking at these papers for her. And this was a method that had gained some popularity in sports science, and was doing some funny things. And so I ended up looking at these papers and realizing there were some problems, they were basically misinterpreting confidence intervals. And then they had made some wild claims like that, their method could somehow both improve type one and type two error at the same time compared with standard hypothesis testing, which, of course, mathematically type one and type two error, false positives and false negatives trade off. And so anyway, I had a whole phone conversation with her and I left that conversation, and I thought I should write something up in the academic literature about this. So I ended up writing a paper about it, that ended up being a whole thing, because there were people who really were wedded to this method, it was popular, I wasn't the first person to criticize it, there was other statisticians who had criticized it, but I ended up writing several more papers about it. So that really got me Moreover, in the area of sports science and I've ever been before.

John Bailer
Yeah, I think it's a really great story to think about this kind of collaboration, this connection that you had with, with journalists, and then you wait a minute, there's a there's, there's more to the story, and then that, that ultimately, that led to investigation and science on your part. So that's, that's a, that's a really neat, neat outcome to have to have resulted, I want to get back to a lower magazine. John's a huge fan. I read some of your columns as sort of preparing for our conversation today. And one of the things that I was I was really struck by is not just that I really do look better, I am better looking after a couple of drinks but but the the fact that when you were reporting on that study, that you that essentially you are boiling this down to a paragraph and change, I mean, really after the first lead in you were talking about in a paragraph, a study that was conducted some of the methods involved in the study the results of the study, and and its implications and and actually setting up some some biological reasons and explanations why that might have occurred. So I mean, I found that to be really fascinating and really difficult. So I was just going to just want you to talk a little bit about how do you take a story like this, you know, this headline that really grabs your attention enough to write a call, you know, a piece about it, a very short piece? And then how do you boil it down to something so concise, and so direct?

Kristen Sainani
Right? I mean, it's something we're not really good at as scientists are getting to the heart of the matter. And I wish that we were better as scientists at getting right to that take home message and the most important points for the audience. And so I would often picture that my audience was I was writing a health column for young women and so it might be somebody you know, on the beach reading about this, and this is the piece of health information that they're gonna get. And so, you know, you had to boil down the whole study into what's the most important things for this audience. And I learned a lot about writing and things I can take back to writing for any audience about how to boil things down, get the most important information in, write short. And that's all very useful for scientists. I wish more scientists had that training, because maybe we could do a better job than in writing in the scientific literature, because one of the things I spend a lot of time doing is trying to teach scientists to write better in the scientific literature.

Rosemary Pennington
I wonder what advice you might have for the individuals like your people who are reading a lower or any other sort of popular publication that's reporting on science, right? There's a lot of them. What advice would you have for the reader who's not statistically savvy, again, going back to that person about how to navigate these stories to be able to judge for themselves? Like, maybe they don't have access to the data? But is there a way of reading these kinds of columns or write ups that might help them figure out what is really worth trusting? Because, you know, I used to read women's health, and there would always be these articles about, you know, coffee is great for you, green tea is bad for you like, and it just sort of changes a lot and just sort of, how do you navigate that in a way that, you know, leaves you feeling informed and not sort of frustrated or misled?

Kristen Sainani
Or I think, you know, just like I teach my statistics students, you always want to be skeptical. And you always want to understand as a consumer of this kind of reading this kind of article, that this is one study at a time, we often unfortunately, as journalists, we're covering news into it's some study that was just published. And that's isolated from the body of work. So you don't want to take any one study in isolation and read too much into it. Realize that there's a bigger body of work out there. And this one study is not the be all and end all, you know. So take things skeptically and don't over don't mean, you should run out and drink a lot of coffee. I'm all pro coffee, but me too, whether it's great for health, I love the studies that tell you that wine, chocolate and coffee are good for your health. I'm not sure I trust all of them. But right, yes, same.

Rosemary Pennington
I like those. I feel good about my choices when I read them.

John Bailer
I think your point about the single study and you know this, the skepticism about it, but the fact that the single study is new and news, and that it has to be in context. And you know, I'm wondering if maybe kind of if I were writing, you know, such a piece that I would wonder how would I add that last sentence after the results of this study that, that and this is something that's been seen in some other work? I mean, I mean, in some ways that that kind of wrap up can connect is the relevance of why it's even a story in the first place.

Kristen Sainani
Yeah, it would be great if journalists always paid more attention to putting those individual studies in context. And there's a number of ways to do it. Sometimes you can, it's great, you can interview scientists, and you can get the scientists themselves to put things in context. And I really like it when you have a good quote, that puts things in context for your reader or gives practical advice. A lot of what I did in my column was practical advice for a young woman on how to know how to be healthy. Here's some practical advice.

Rosemary Pennington
We're sort of running towards the I know, you've talked sort of a lot about sort of this journalistic piece. And I wonder if you mentioned a little earlier that you wish, more scientists could write more to the heart of the matter. And before we leave, I wonder if you could just sort of give, you know, share your thoughts on how a statistician or any researcher can actually do that and get get to the point and be able to communicate effectively, like what has worked for you, and what do you think might be helpful for others?

Kristen Sainani
Yeah, I mean, I spent a lot of time talking about how we don't write well in the scientific literature and how that's a huge problem. And, you know, as a science journalist, I learned to demystify things. And that's why I feel like a lot of what I'm doing. So as a science journalist, your goal is to make something complicated, as accessible as possible to as many people as possible. And when I move over to my academic hat, I often feel like we're doing exactly the opposite in academia, that it's almost like we're, we're working really hard to make things obscure, and to make things sound a lot more convoluted and complicated than they actually are. And so I wish more scientists would think about the fact that the goal is not to obscure to sound smarter to make things sound really hard and fancy, but that your goal always should be to make things as accessible as possible, to as many people as possible. And if we, as scientists, had that in the back of our minds more, just that trying to write clearly and concisely, I think that would do a lot actually to improve science. And statisticians certainly are guilty of this too, you know, making things sound too dense. Could we simplify it and make statistics accessible to more people? That would do a lot to improve statistics, I think.

John Bailer
You know, I remember many years ago, I had found a copy of this editorial and I think it was from a physics journal. And it said that language is a scientist. instrument used with precision. And I, you know, the idea of having that kind of focus of precision and care with language as well as what you're doing in terms of the science that you do, to me was a really compelling point, and was something that I continue to think about. I can't tell you where I found it, but I can tell you, what I, what I took away from it. And I think that's part of, you know, the heart of what I'm hearing some of what you're saying.

Kristen Sainani
Yeah, it's so important. And if you look at a lot of the scientific literature, and you read through it, I can't understand half the things that people write, and I'm a scientist with a lot of training. And it's because we don't write in a clear and concise style, for the most part in the literature. And that means all sorts of bad things. That one, your science, nobody else can be able to use your science to get a lot of bad science hidden in obscure language. And it's just it's hard to build on science that you can't even understand. It's why a lot of things flip through peer review, I think that are not well done. Because people look at it and it can't really be understood, you can't really critique it well.

Rosemary Pennington
Well, that's all the time we have for this episode of Stats and Stories. Kristen, thank you so much for joining us today.

Kristen Sainani
Thanks so much for having me.

Rosemary Pennington
Stats and Stories is a partnership between Miami University's Department of Statistics and media journalism and film and the American Statistical Association. You can follow us on Twitter @StatsAndStories, Apple podcasts or other places where you find podcasts. If you'd like to share your thoughts on the program, Send your email to StatsAndStories@miami.oh.edu Or check us out at StatsAndStories.net and be sure to listen for future editions of Stats and Stories where we discuss the statistics behind the stories and the stories behind the statistics.