How To Be An Ethical Data Journalist | Stats + Stories Episode 134 / by Stats Stories

Bekah Photo.jpg

Bekah McBride is a science writer and communications specialist who has worked with both companies and universities to turn data and research into applicable and actionable messages that inspire change. She holds a B.S. in Life Science Communication with an emphasis in Business from the University of Wisconsin-Madison and a M.A. in Journalism from the University of Nebraska-Lincoln. Her work has been published on DataJournalism.com and in Significance.

+ Full Transcript

Rosemary Pennington: Understanding data has become only more important with the proliferation of social media and the circulation it enables of claims and counterclaims. Figuring out what claims are founded on data and facts and what aren’t can be a difficult task, one that, ideally, journalists should be able to help the public navigate. Data journalism and its ethics is the focus of this episode of Stats and Stories, where we explore the statistics behind the stories and the stories behind the statistics. I’m Rosemary Pennington. Stats and Stories is a production of Miami University’s Departments of Statistics and Media, Journalism and Film, as well as the American Statistical Association. Joining me are regular panelists John Bailer, chair of Miami Statistics Department and Richard Campbell, former chair of Media, Journalism and Film. Our guest today is Becca McBride. She's the Marketing and Communications Coordinator at the Nelson Institute for Environmental Studies. McBride is also a science writer and communication specialist who has a particular concern for the communication of data. She's recently authored an article for Significance Magazine about the do's and don'ts of ethical data journalism. Becca, thank you so much for being here today.

Becca McBride: Thank you for having me.

Pennington: Just to start us off, just so we can have a ground to sort of start this conversation on- how do you define data journalism?

McBride: Sure. Well, I think that from a technical standpoint, data journalism can be defined as obtaining, cleaning and analyzing data for use in telling stories. But I think on a larger level, it's really great way to bring voices to the forefront, to understand the current state of affairs going on in the world, to hold a magnifying glass up to some systemic issues and really showcase what’s going on, but I also think that when it’s done in a sense that isn’t ethical, it can really perpetuate stereotypes and systemic issues.

John Bailer: You’ve talked about a number of best practices that, when you’ve written about this practice of data journalism, ranging from the idea of the inconsistencies and missing pieces and unpacking concepts, the golden rule of journalism, becoming educated and then one of my favorites- I had to save it for last- beware the bell curve. You know, could you talk a little bit about that and maybe some of the other practices that you mentioned?

McBride: Yeah, that’s one of my favorites as well. So, basically when I started this project, I really wanted it to be something that could be used in education. I wanted anything from like a high school journalist up through a professional to be able to take what I had learned and utilize it really readily, and so I turned it into a five-step process. So, the first one is interrogate the data. And so, in terms of that what I mean is data and facts are not synonymous. So just because a scientist or someone has data point, it doesn't necessarily mean that we should take that at face value. So, I’m encourage folks to kind of look at how was the data gathered, who might have been left out of the gathering of the data, and identify those inconsistencies, and when we find them, as journalists, we should showcase that to the reader. The second part is to create context. So, I think that a good part of, you know, great reporting is showcasing kind of like not just what are these numbers, it's not just the visual graphic, but what do these numbers mean? Who is represented in the numbers? Going out and interviewing folks who may be represented by the numbers and getting that kind of real story is important. One example might be, you know, we see a lot of data on it on the education gap. So instead of just saying well, here's the gap, we could say well, what are schools doing about that gap? So, I think unpacking those concepts and creating that context is key. The third one is your favorite and my favorite too, be well- beware of the bell curve. So, in terms of that what I mean is sometimes we’ll see a case of a journalist going out and interviewing perhaps one or two people from a particular community that may have been impacted by that dataset, but that- those folks could be on the tail end of that bell curve. That might not be what the general public- it not that general public. And so I think it's important- and what I learned through my research is to go out and look for folks who are from different age ranges, different financial backgrounds, different education backgrounds, really understand what that group looks like and don't just take one person or two people’s word for it. The fourth one is avoid harm. And that’s a pretty basic one for journalism. We have lots of codes of ethics. We have, for example, I am a science writer so I look at the National Association of Science Writers’ Code of Ethics. Going to school for journalism I looked at the Society for Professional Journalism Code of Ethics. There's lots of codes of ethics out there, but I think at the end of the day the core value is just to avoid harm to those in the news, and to do what's ethically correct. And then the fifth one is just that education is key. As I was doing my research I found that there really isn't yet a lot of classes available, there just aren't a lot of resources available for students who are interested in not just being a journalist, but really looking at data as a journalist. One of the particular studies I found was that of the 113 schools surveyed, only 59 any class that related to data that was available to journalists. And so I think that part of what I was trying to put out there is like hey, as journalists, we could do a little bit of a better job of preparing students for dealing with the data that they're very likely going to encounter in today's world.

Richard Campbell: Becca, a lot of your writing, especially the Significance Magazine piece, reminded me of a lot of the similarities there are between statisticians and journalists. And one that came to mind, which I think deals with ethics, is that a lot of times journalists feel like their obligation is just to tell the story, tell what's happening and get out of the way, let readers decide. And I think some statisticians might believe that too, John can speak to that, but do journalist and statisticians have obligations beyond just the facts? And this might speak to your notion of the larger context. Can you talk a little bit about them?

McBride: Yeah. So as I was doing my research, you know, I spoke to many different people from statisticians to journalists to professors and I really- I didn’t find anybody who didn't agree that journalists do have that kind of ethical standard where they need to create that context. In particular, I was appointed to a research study from Stanford that was done in 2014, and it was called the Racial Disparities in Incarceration Upping Acceptance of Punitive Policies. And basically, what it did is it went out and it told people lies about punitive policies, just to see how that would impact their perceptions. And what they found is that exposing people to extreme racial disparities in the prison population actually increased their fear of the crime and increased their acceptance of the policies that lead to those disparities. And so, to me what that means is that as journalists, if we’re just putting something out there and leaving it up to the audience to create that context, we’re possibly perpetuating stereotypes and even increasing fear. And so, I think that ethically it is our job to create context and showcase why these disparities are occurring, what could possibly be done about them, and show that these are real humans; they're not just numbers that we’re talking about.

Bailer: You know, when I was thinking about your- what you were talking about with ethics and the context of data journalism, I found myself looking at kind of all these other statements that you see in statistics, for example, in practice. So, the ASA, the American Statistical Association has ethical guidelines for statistical practice, the UN has a Fundamental Principles of Official Statistics that includes within it statements associated with, you know, collecting, processing, storing and presenting data in responsible ways, and the idea of this individual data being strictly confidential. I even saw that Coursera is hosting a data science ethics class that’s starting soon and regularly as offered. This is- this seems to be even more for visible in practice now, with the ease with which data can be obtained and often the ease with which it’s obtained without the knowledge of the people whose data are being obtained. So how is this balance dealt with in terms of practice and in terms of where you can secure data from many different sources without ever any kind of approval or informed consent? How does a data journalist wrestle with this?

McBride: Sure, so when I was interviewing folks, one person I worked with in particular was a data journalist, and I asked him a lot of questions about this, and I think you're absolutely correct that, frankly, for journalists, professional or those going to school, there are a lot of ethical questions about this and no one's really sure what the right answer is. But in doing my research, I feel that the right answer is when we find these data sets, we need to not only identify where we got that data set from, but probably take the time to kind of clean that data set, if that makes sense. And so, what I mean by clean is go ahead and look through it and see if there are inconsistencies, see what's in the data, what might have been left out of the data, is there maybe something suspicious about that data? Especially if they’ve perhaps found it on some sort of random website, you know, could there be bots in there? Are they real humans? How is this obtained? I think that’s where the ethical standard is for the journalists to really truly understand where that data is coming from, and if there are inconsistencies or there are questions about it, to at least point those out to the reader if you are going to use that data set. What I think, ideally, you want to try to find data that you know is well vetted- that doesn't mean you shouldn't still clean it, but I think you won't- you know, you don't want to just be searching the internet for random data sets.

Campbell: So, Becca, you're reminding me here of a lot of times when I read news articles and compare them, to say, a scientific report, where you're often see the limits of the study, so journalist hardly ever do that. They don't they don't talk about what they didn't find in the story. So, when I'm reading the news off and saying why didn't they ask this question? Why didn't they talk to this person? Do you think there's a place for that? I mean, I'm what I'm trying to get at here is what we could learn- what journalist can learn from statisticians and from science that it could improve their stories?

McBride: Yes, so while I understand that today's newsroom is tough to work in, there are certainly time limitations, there are word limitations, there are a lot of limitations. So, I understand why a lot of journalists aren’t yet doing this. I am, personally, as a part of this project advocating for exactly what you’re talking about, which is that we do need to have the limitations of the study or at least some form of that in a context, laid out, when we're talking about data. I not only think it would help inform the reader, but I think it would make us more ethical journalists who are really doing our best to stop misinformation, which I think is a part of our job.

Pennington: You're listening to Stats and Stories and today we're talking about the ethics of data journalism with Becca McBride who has authored an article on the subject for Significance Magazine. Becca, I've been following a lot of the conversation around the data around the Coronavirus outbreak on Twitter. And yesterday as I was sort of prepping for our conversation, I saw- I can't remember if the woman is an epidemiologist or some other- has some other role in public health, but she was- she tweeted out that she's been continually tweeting information and data about the about the epidemic- the pandemic, and she said in one of her tweets that her activity on her account had been sort of stymied because so many people were reporting her to Twitter for misinformation because they didn't like what she was sharing. They didn't like the information right? So, she's sharing like, you know, W.H.O. statistics and C.D.C., so it's all of this data that's being churned out of all these places, right? Public Health departments that people find problems with and it sort of reminded me of a conversation we've been having a lot with at Stats and Stories about this issue of trust around facts and data. And I wonder if it has become more difficult to be a data journalist given this environment where there does seem to be a distrust of facts, and how that sort of makes this idea of ethics maybe even more potent and important to consider.

McBride: Yeah, well interestingly enough, I actually got the idea for this project in 2014 when the Ebola epidemic was going on. I was reading a lot in the news and thought, well gosh, some of this isn't quite right, and some of this is really good, but people aren't taking it seriously, and why is it that, you know, folks look at data and such a kind of strange way? And so that’s a great question I think it applies today with the COVID-19 pandemic that we have going on. And so, I guess, you know, I would say that In terms of being a science writer, which is what I am- a little bit more than a data journalist, but I certainly use data in my writing, it has become more difficult. I think that we’re held to an even higher standard because there is some distrust going on out there. And that's part of the reason that I feel like this ethics is really important and I think that letting the public know that we're following guidelines and that we do care about the ethics, and that we are showcasing the limitations and sourcing our data- I think that will help with the trust. I think there is perhaps some larger systemic issues that we have to work on as a society when it comes to data and science, but I do really believe that showcasing the ways in which we're getting the data, showcasing those limitations and really being clear that this isn't a journalist’s opinion, this is vetted data- will help in terms of people trusting that more.

Campbell: Follow up on the virus question here, because I'm watching the sort of- were in the midst of watching these daily briefings from the president and his- the folks that are on the virus committee. One of the things that's frustrating to me, a lot of times are the questions that the journalists are asking, and the data right now is very uncertain, it's not a lot of it's not- we don't know what some of the answers are. We don't have the studies, but journalists ask questions- asking for some certainty when we don't really have any. Are there questions that you would like to see them asking that aren't being asked? And are you frustrated sometimes, too, by the by what's not being talked about in this very difficult time?

McBride: Sure. Well, I don't know if I have a question off top of my head; that's a good one. but I would say that there are frustrations for me in terms of generally how things are being reported. I mean, certainly there are journalists doing a wonderful job, but I do think that in when something like this is happening, we run the risk of two things. First, when we are just reporting on anything that comes out of any particular person’s mouth, we run the risk of perpetuating misinformation. So, I think while someone in a very high position may be saying something, I think as journalists, ethically, we still want a vet that. We want to make sure that that statistic is correct before we report on it. Certainly, you know we use quotes all the time in journalism, but it is still our job to be reporting on correct information, and so I think that's one risk we run, and we want to be careful about that. I think that the second risk we run is, of course, fearmongering or being accused of fearmongering. And so again, that's where it's really important in my mind to use facts and statistics rather than just quoting folks, or just kind of throwing things out there that may have happened at a news conference. I think it’s very important for us to do our own research and try to figure out what’s going on and then report on that.

Bailer: I think it's even harder now because a lot of what's- what's done with the pandemic are projections from models, and the data that that's available that would be available might be on things like infectivity, or typical times to recovery, or you know, age-related case fatality rates. Some of this is still unknown and they’re projections from models and those model projections- there's some point they’re extrapolations into the future and we're not quite sure, and as we more data the models get tuned, so then the model predictions might change. And I think that there's so much uncertainty embedded in this that when some of the numbers change, the public finds that to be really disorienting. It's almost like well I can't trust any of this because it keeps changing. Whereas the story is a little more subtle. It's that we're making projections into the future and as data- as we accrue more data and more information, we can make better projections. How do we how do we communicate? I mean, as a science writer you certainly are in in the situation where you have to communicate where there's uncertainty in the system, how do you communicate that in a way to say that's just part of what's there; that's part of the- there's ignorance -now and let me just as a quick addendum, there's also a variability we'd expect in a system. That's inherent in any population that we study, so the fact that you know your time to recovery might be five days and mine might be ten, that may just be natural within this human population. So how are things that would- what are things that we can do to better communicate this uncertainty and variability to the public?

McBride: Sure, so I think first of all what you mentioned is modeling and I think it’s important when folks are reporting on these numbers that they mention that this is a model, and is in fact just a statistical projection and you know, I think sometimes as journalists it can be like, oh, we can get a really good headline here if we say, you know, 200,000 deaths coming, but that’s not really the ethical thing to do. Of course, we want to say well, there's a model that projects that if we do not take things seriously and take these steps X could happen, and so I think that’s one thing to think about. I did see some great data journalism from the New York Times recently, which I thought was a kind of interesting take on this, where they actually had a model that the folks where come to read it could mess with and kind of pull the curve to showcase okay, if we all follow along with the rules and stay home, here's approximately how many cases they're estimating versus if we don't, here's what could happen. And they specified that it was a model and they showcased the way in which the model works and you could actually kind of play with it as the reader, which I think was a really good take on that because it really allowed people to play with the numbers and understand how their own behaviors could impact those numbers.

Campbell: One of the things that I liked that I've read in your writing is this quote that you use, stories are just data with a soul. Can you talk about that a little bit and what that means to you?

McBride: Yeah, so I have to give a little bit of credit to Renae Brown I think she was the first one that kind of said something along those lines. But when I was working on this paper, I just thought you know, so many folks think of numbers as this arbitrary concept. I think when people see statistics, there's something like a little cold about it, maybe. I don't think people see that that statistic may represent a human, or an animal, or whatever it is that we were talking about. I think we're journalists can come in is to give that piece of data a soul, to like make it human, or animal, or you know- showcase that that's not just a number that’s something that exists out there, and what does that number really mean? And so, when I talk about like data having a soul, what I mean is that context behind the data. What is that number, and what story does that number have to tell, and how can that story help us better understand our world?

Bailer: You talk some about the importance of nurturing trust in public in terms of the reports that we do, I think that's critical for official statistics when you look at government agencies around the world that are producing information for their public to consume and make decisions based on its critical, but one aspect of this nurturing the trust of the message that we deliver in the incorporating some of the nuance in that message as you've described in your work, is also having the public demanding it from us. You know, how do we help the public also develop this appreciation and expectation that there is more to this story? That perhaps these numbers aren't given with exactness, that there's not this this inherent precision and at that there's the in fact there's uncertainty and variability. How do we get the- what are things we can do to help educate the public to demand that from us?

McBride: Sure, that's a great question and I think it kind of is the million-dollar question, right now. How do we get the public to not only appreciate journalism for what it provides, but also to appreciate facts and data and science for what it can provide? And I'm not sure I have you know the perfect answer, but I do think-

Bailer: None of us do.

McBride: But I do think there are some things that we can do. I think, you know, in terms of showcasing our value, I think that we’re doing that right now, especially with this COVID-19. I'm seeing a lot of publications putting out, you know, free material right now, taking down the wall, you know, doing their best to really showcase the information that we can provide, doing things like The New York Times did with interactive data. I think when people are going through a scary time like right now I think we showcase how important journalism is, and the role it can play in stopping misinformation, so, I think we have a great opportunity right now to do that. I think as I said earlier showcasing the ways in which we’re being ethical and we're being truthful- the more we can show that to the public, I think the better. Whether that's, you know, literally laying out a code- like I did hear that people can follow, or more importantly showcasing in the articles well here's where I found the data, you know, here's what it means, here's the limitations, just really being upfront and truthful with the reader.

Bailer: So, one thing that I would ask, and this is a frequent question here, is Becca help us teach the next generation both of journalists and statisticians given your insights. I mean Rosemary, I think you and I need to be sitting down and developing some stuff based on Becca's challenge, but that's just in general, Becca what might you advise us in terms of helping prepare the next generation of the data journalist? And I would talk to the statisticians, you know? How do we how do we get our statisticians involved and prepared to work in this arena as well?

McBride: Sure. So as part of what led to the Significance article that you're referring to, I worked on some research for my master’s degree, and as part of that I actually created a curriculum that could be used for college-level students who are interested in learning about the ethics of data journalism in particular. And as a part of that I thought you know, there's some fun projects in there. For example, one they could look at the internet try to find some good data journalism and some really bad data journalism, and really try to understand where are the missteps and what goes really well? I wanted them to obviously go through these five steps of the code of ethics that I developed and talk about what does that mean. And I think that, you know, really the important part is that whether you’re a journalism student, or a statistics student, or frankly any student at this point I think that there is room for talking about the ethics of data and how it's going to be utilized in your particular place of work, because in today’s world I think data is everywhere and I think people need to be informed on what’s ethical use of data, so I think that across the board whether you're a statistician or a journalist, I think there’s room for that.

Pennington: Well Becca, that's all the time we have for this episode of Stats and Stories. Thank you so much for being here today.

McBride: Thank you for having me.

Pennington: Stats and Stories is a partnership between Miami University's Departments of Statistics and Media, Journalism and Film and the American Statistical Association. You can follow us on Twitter, Apple podcast, or other places where you can find podcasts. If you'd like to share your thoughts on the program send your email to Statsandstories@miamioh.edu and to be sure to listen for future editions of Stats and Stories, where we discuss the statistics behind stories and the stories behind the statistics.