As researchers and medical professionals struggle to get a handle on the COVID-19 pandemic, journalists struggle to tell the pandemic’s story with many news outlets increasingly turning to info graphics and data visualizations to help them do so. Visualizing data for news is the focus of this episode of Stats and Stories with guest Harry Stevens.
Harry Stevens joined The Washington Post as a graphics reporter in 2019. He is part of the team that won the 2020 Pulitzer Prize in Explanatory Reporting for its climate change-focused series. He previously worked at Axios, where he designed news graphics and worked on data-driven investigations. Stevens's journalism career has also included stints at the Hindustan Times in New Delhi, India, and the Salt Lake Tribune in Utah.
Full Transcript
Rosemary Pennington: As researchers and medical professionals struggle to get a handle on the COVID-19 pandemic, journalists struggle to tell the pandemic’s story with many news outlets increasingly turning to info graphics and data visualizations to help them do so. Visualizing data for news is the focus of this episode of Stats and Stories where we explore the statistics behind the stories and the stories behind the statistics. I’m Rosemary Pennington. Stats and Stories is a production of Miami University’s departments of Statistics and Media, Journalism and Film as well as the American Statistical Association. Joining me are regular panelists John Bailer, chair of Miami’s Statistics department and Richard Campbell, former chair of Media, Journalism and Film. Our guest today is Harry Stevens. Stevens is a graphics reporter at the Washington Post and produced a story in March about how disease outbreaks spread exponentially and how to flatten curves. It’s since become the most read story in Washington Post history. Harry, thank you so much for being here.
Harry Stevens: Thanks so much for having me. It’s great to be here virtually with you guys.
Pennington: Yeah, so your story went viral and I remember seeing it everywhere and it has so many different kinds of – I mean, so many different kinds of visualizations, how did you as you were thinking through the story decide how you were going to visualize this information?
Stevens: Sure, so the idea for the – so basically for people who haven’t seen the story- it features a series of very simple simulations of bouncing balls moving around in a rectangle and when the balls collide either with the walls of the rectangle or with each other they bounce off in another direction. And so you start out by making one of those balls quote-unquote sick, which is just to say making it a different color than the rest of them and then when a sick ball collides with a healthy ball, the healthy ball gets sick too or becomes the same color as a sick ball. And so, you can watch these simulations play out over the course of thirty seconds or so, and see how the disease spreads first slowly with the first infection being transmitted and then very very quickly. And then you can introduce certain parameters into the simulation; so you can try to put up a big wall in between some of the balls so that they can’t get to each other or you can make it so some of the balls don’t move, and by changing the parameters of the simulation you can sort of show ways that disrupt the network effect of spreading disease and give people a sense of how to slow down the spread of something through a network. So, the idea of the bouncing balls came to me actually just from sort of some fun experimentations that I had been doing on my computer over the weekend. I’m not- I don’t have, like, a computer science background, so a lot of the code that I’ve learned to write has just been from me doing these kinds of experiments on the weekend and reading tutorials online. So, one of the things that I was always curious about was collision detection. And so, what happens when two circles occupy the same space? They’re moving at a certain angle and a certain speed so when they’ve collided with each other what happens, and how do you represent that in code? And so, I had done some experiments with it- actually a series of experiments. I think the first one I did was two years ago and it was just like how to make it so that if a circle hits the side of a screen it bounces off in the other direction, which might be easy if you know geometry, but I had to look up all the formulas for like what is the angle of reflection based on the angle of incidents and how to make the ball bounce. And then once you’ve done that you have to figure out how to make the balls bounce off of each other when they hit each other, which is much more complicated. And I think like a year later I was like, yeah, I want to come back and revisit this code, and so I got the balls to bounce off of each other. And I mean there are things that are probably more interesting to like computer science people than to maybe all of your listeners, but just making the algorithm efficient because you have to compare the position of every ball to every other ball at each tick, but there are ways to make it more efficient so that you can add more balls to the simulation and it doesn’t crash your computer, and there’s all kinds of interesting things here. So, I had just been doing these kinds of experiments long before COVID-19 was a word that anybody knew, and I just thought they were fun to look at; they were fun to watch. And I think that that’s kind of part of the key to the success of the story is that like, even if you’re not talking about a disease, even if you’re not trying to teach somebody anything, it’s just visually engaging to watch these balls bounce around. And so that can draw people in or give them a door to step through to what it is that I am trying to teach them. So yeah, so like I had the bouncing ball thing already working, I had already written the code for it mostly; there was still some bugs like sometimes the balls would get stuck together and I had to figure out how to make it so that that didn’t happen, but it was mostly- that part was done. So, we were in a room with some editors and some other graphics reporters talking about you know, like, how can we move our coverage forward of COVID-19? And it was early March, so at that point, the President was still not taking it seriously. A lot of people across the country still hadn’t internalized the idea that the problem wasn’t necessarily them getting sick; the problem was that they could pass it on to somebody else. Like there were still people like spring-breakers who were just like, you know, I just want to party, and I don’t care if I get sick, as if that were somehow a brave position to have. But it’s like dude, it’s not about you getting sick, it’s about like killing my grandma. And so when you have these simulations you can see very easily that like, one transmission earlier on can infect somebody all the way across on the other side of the simulation very very quickly, even though the original sick person never had any interaction with the healthy person over on the other side. And so I think just seeing that maybe was something that a lot of people needed to internalize this idea that like things can spread very very quickly in a network if we don’t do anything to try to slow them down.
John Bailer: So, Harry I mean the question that was really burning when I saw this was, you know, when will the vaccine or effective treatment be available for simulitis? I thought it was really effective that you basically abstracted the kind of key features of this story of infection. You know, you didn’t get caught up on kind of all the other nuance that’s part of this, you know, how long people are – you know, would they be contagious? You know, what’s the pool of people that are susceptible to this? And I thought, you know, how did you kind of boil that down? What were some of the things you thought about in sort of extracting those key features?
Stevens: So, after publishing this story I worked on another story about these supposed SERI models, which, for those of you who aren’t familiar with those, it’s like an agent-based model where people, or the agents pass through these sort of forced stages of the disease. So, it started as susceptible which is the S and they become exposed which is the E, and then become infected which is the I, and then R stands for like removed which means they’ve either recovered or they’ve died. And so, when you’re building these models, there are certain parameters that you have to take into account like, with a real- so you’re trying to model a real disease. So, you’re like how long is the infectious period? Like, how long do people remain infectious for? How long is the incubation period? Like, how long does it take from when you were first exposed to when you become…? Anyway, there’s all sorts of- what’s the contact rate? And various other parameters that you have to build into these models to get them to reflect anything close to reality. And even then it’s a model, so it’s not supposed to be reality; it’s just supposed to give us some idea about future potential outcomes. And anyway so, that’s a long way of saying I didn’t know any of that when I started doing this story and even by the time I published it- like when I published it I didn’t even know what an SEIR model was. And I actually think that was, ironically, kind of helpful for me. Like, that I didn’t know how complex it would be, or all of the things that I could take into account. Like it made it so that there was no way I could do anything other than something that was really simple. Like, I knew that I wanted it to show something spreading through a network because I thought that the idea of exponential growth was not something that was intuitively understood, certainly not by me and I don’t think by most people. And so, if you can just show that, then that’s really all that I wanted to show. And one conversation that I had- I was talking to Lauren Gardener who is an epidemiologist at Johns Hopkins University way back in February and we were just talking about the model that her team uses to try to forecast the growth of Coronavirus- it was very very early on at that point so, so much was unknown. But she was talking about just the complexity of the model that they use and how it was computationally extremely intensive. So, like it would be hard to run that in a browser. And that was the conversation that really helped me understand that like there was no way that I could model a real disease. That was where the idea for simulitis came from. Just like we’re not trying to forecast a real disease, we don’t have to map the ticks of the simulation to any kind of real unit of time you know there’s not like a second of the simulation represents a day or an hour or a month; there’s no mapping to real-time because there doesn’t need to be right? Like that’s not what you’re trying to show, you’re not trying to forecast an actual disease. You’re just trying to show people how network effects work and how to slow them down. And so, I think that by having a very simple goal, that helped like really limit what the design space- like, it helped focus us and limit what I was trying to accomplish, and that made it so that it was easier to teach something that was like simple but important.
Richard Campbell: So where did this come from? Because you both are able to do this, but you’re also a very good writer and reporter-
Stevens: Thank you.
Campbell: So, that’s unusual in journalists, don’t you think?
Stevens: It is unusual; I think less unusual than maybe it was a decade ago. I think that- so I started out in journalism as a writer and a reporter which I think has been really valuable for me because I learned about, like, how to collect information, how to interview sources, how to frame a story before I learned any of the graphics and code stuff. So, all the graphics and code stuff are building on top of that foundation that I already had, and like how to explain thing to people, how to tell a story and how to find sources. So, I’m glad that I did it in that order. I also got- like I went to journalism school in 2013-2014, and I took a class on data visualization. So, I had never really thought very deeply about information design, and that class really opened my eyes up to how powerful it can be. How much information you can communicate visually? And so, like, taking that class was really helpful to me; it also introduced me to JavaScript. I’d never written code before, so we just did like some basic stuff, but it was enough of a building block. Like, once you know about stack overflow then you can pretty much learn anything.
Pennington: You’re listening to Stats and Stories and today we’re talking to Washington Post graphics reporter Harry Stevens. So, you- before the Washington Post you were working in Axios and have worked a few other places and have done a number of different kinds of data visualizations. When you are approaching a story that is going to be data-rich that you want to help an audience understand through this sort of graphic presentation, how do you think about- how do you approach that storytelling? Because you are- even though it is a graphic, right, and you’re dealing with data, you still at the end of the day have to communicate some kind of story. So, how do you approach that when you’re thinking about the kind of graphics you’re going to use in a story?
Stevens: Sure, so I mean making a news graphic is similar to writing a news story in that like you have to consider the information that’s going in and like how you’re collecting it. So like, if you’re just writing a story and reporting it, like you go find sources, you interview them, you maybe find documents, you read the documents and figure out what they mean, whereas with graphic story usually you’re finding datasets, but you have to interrogate a dataset with the same rigor that you would interrogate a source or a document. So, you have to figure out how that dataset is deficient, how the data was collected, whether there are certain biases inherent in how that data was collected. So, the same kind of reporting that you would need to apply to any sort of journalistic endeavor, you need to apply when you’re working with a dataset, and then once you analyze the data- I mean, so, if you’re doing a story with data, it’s not like you just have a dataset and you’re just playing around with it like for whatever might come out of it. Like, usually you have a hypothesis and you’re trying to see if the analysis bears out that hypothesis, and like a lot of times it doesn’t, and you don’t have a story. And that’s the same with any kind of reporting. Like you, might think that something is going on at City Hall but then you interview everybody and they’re like no, that’s not happening, and then you don’t have a story. So anyway, once you’ve done the analysis- like for me the graphics side of things is really the fun part. I try to make things that just look really cool and that are engaging to people and really fun, and so I mean part of it is like I can only make things that are cool if they communicate the central idea. So, like all of the aesthetic decisions for me come from the purpose of what I’m trying to communicate. So, like, I don’t know if you guys play chess but in chess, there’s – like people say tactics flow from a superior strategy so I look at that in the same way as like making a graphic. The aesthetic decisions flow from like your communication strategy. So, like what is it that I want to tell people? What do I want to get out of this? And then you know try to delete everything that doesn’t serve that purpose and then like once I’ve really gotten that it’s like refining it. So, you know, making it look beautiful or making it look clean, or you know, adding some kind of visual flair, but generally, it’s a balance right? Because you do – in the news business you need to make something that catches people’s eye and that is really cool, but you also need to communicate something as well. So, they have to work hand in hand but I guess if you have to get rid of one you’d get rid of the flair because you need to communicate something and that’s the most important thing.
Bailer: I’m curious: as you go through these types of representations, you know dealing with the uncertainty and the inputs. You know, I’ve seen that you’ve looked at a couple of different scenarios that might play out in a simulation. Do you have other ways that you help recognize and convey the fact that these models do have imprecision, they do have uncertainty that are part of it. And the input that’s provided in these models isn’t known and possibly can’t be known. So, what are some of the things that you’ve done that try to convey that uncertainty and variability?
Stevens: A lot- every journalist I think or like graphic journalist that’s covering COVID-19 right now is grappling with this problem. So, another story I did that I had mentioned earlier was about how these SEIR models work. And so one of the things that we did there was again we used a fake disease because we weren’t again trying to say anything about COVID-19, we were just trying to help people understand how the models work so we use simulitis again and this story, by the way, did it do nearly as well in terms of traffic, but I think that it was a bit more. I don’t know, I think that for the people who liked it and really wanted to dig in, I think that they enjoyed it, but for that one we just let people adjust the parameters themselves to see how that might affect the output, and then we tried to explain it using quotes from real epidemiologists that we had interviewed about like how grappling with uncertainty is at the very core of what they do. So, like the purpose of these models is not to like open up your crystal ball and tell people exactly what is going to happen. It’s just to help people who need to make important decisions understand like the range of possibilities and like how their decisions might affect the outcome. So like, you know, I can’t- no epidemiologist can tell you how many people are going to die of COVID-19 or how many people are going to be infected or when we’re going to hit the peak and when it’s going to start going down. Like, there’s just no way to do that with any kind of certainty. Like even people who predict the weather, for example, get it wrong all the time. And they’ve had many many many more decades to deal with that phenomena and to prepare their models, and there’s probably more certainty going in, but you can’t just get it right every time because nature is chaotic and it’s hard to predict the future. But the point is like, again, not to predict the future, but just to understand how our decisions can affect the range of possible outcomes. So that’s one thing that story tried to communicate other places have done a pretty good job. I think like 538 now has a tracker of all of the different model outputs and so just comparing them with each other I think is useful, like wow, there’s a really wide range of possible outcomes that these things are predicting and trying to explain to people. Like, what are the inputs going in? But yeah, I mean inevitably there’s going to be skepticism on the part of the public, I think, about a lot of these models because there’s like a sort of general misconception about what their purpose is and how they function. And then they’re inevitably quote-unquote wrong because they didn’t predict the future correctly and then people say the whole model is useless. But of course, that was not what the purpose of the model was, to begin with.
Campbell: So, one of the things I love about your work, and it follows up on John’s question about uncertainty, is journalists are really good at telling about what just happened or what happened yesterday, but you’re talking about what might happen. So here’s a story idea you already may be thinking about that because universities and colleges all over the country are thinking about should we open in the Fall, so it seems to me that you have a model that would suggest what happens at a place like Miami University when all these kids from all over the country come back, they’re not- you know, they’re living in their own places, they’re gathering they’re having parties on the weekends. Can you do a model that’s going to show what might happen as we decide whether we’re going to stay online for another semester or whether we’re going to you know try to go back to some kind of business as usual? Help us out here.
Stevens: I would definitely leave that to the professional epidemiologists because like, I’m not a statistician. For the story that we did on like the different disease models, we managed to find like a really basic SEIR model because it like has to solve these ordinary differential equations and I didn’t know how to do that so, fortunately, somebody, like a scientist, I think it was at like Los Alamos- he was a smart guy anyway and so he had written the code for this SEIR model that we used- far beyond the ability I had to do it myself, but certainly like it’s a really important question and like you mentioned, like it does seem, I mean, if I think about my life as a college student and then add like an extremely infectious disease that spreads quickly through that experience, I can imagine basically everybody getting sick. So, it definitely does seem like a very dangerous situation and a recipe for disaster. Particularly because it doesn’t sound like there’s going to be a vaccine any time soon, so I definitely don’t envy the college administrators that have to deal with this.
Pennington: Harry, you have mentioned math and coding, two things with which most journalists don’t get in undergrad and also are terrified of. I speak as a journalist who was thankful the only math classes I had to take were logic and statistics when I was an undergrad. So what advice would you give to journalists who want to try to work with visualizations but might be semi-scared of what goes into it, right? Because it does seem like it’s this black box where I need to know all these things and I can’t do it well, what advice would you give to someone who wants to explore this?
Stevens: So, I had the exact same experience in undergrad. Like, I had to take a math class for my – to graduate and I took logic. Now going back, like, I wish I had taken more math classes because it ended up being something that I use in my job all of the time and you know, there’s like- there’s just a knowledge gap between me and like people who have had more formal education that I work to close all of the time, and I wish I didn’t have to. But, that being said, like- so, when I started doing code and working with spreadsheets like in journalism like I didn’t know much math and I still don’t really know much; I know a little bit more but yeah, I mean a lot of times just measures of central tendency, like average and median. Those are like pretty useful mathematical tools to help you try to figure out what’s going on and you don’t need to know a lot of math to do those things, and a spreadsheet will actually do them for you. So, a lot of times it’s just about like using the tools that are available. Like, you don’t even need to know how to code. Like, I did sort of data journalism pieces for a couple of years without knowing really how to code well; I used Excel- in fact, I still use Excel for a lot of data analysis. I mean, if I’m going to do something that’s a little bit more complicated or needs to be reproducible yeah like I’ll use [inaudible] or I’ll use JavaScript, but a lot of times I still use Excel. And I think that like, I mean Excel is not great, but it does introduce a lot of errors that you have to be aware of. Like it will change your dates for you without you wanting it to do that, and various other things that can definitely be problematic, but it’s like it’s still a better tool than nothing. And so, I definitely think, like, if you can learn how to do a pivot table in Excel, you’re going to know a lot more than most other journalists. Like if you want to start to use data in your reporting just go online and Google how to use a pivot table and suddenly you’ll realize that like you have this new superpower that you didn’t have before. So, I think you learn incrementally, and I guess- it’s part of it is like having the mindset like this is not too scary. Like I’m just going to try to learn one thing at a time and get better as it goes.
Bailer: Yeah, I’m going to ask the complementary question. As someone who got out of writing because I did despise the subjectivity of assessments when I was in [inaudible] and I decided that, thank God, that there was a place for people like me where I didn’t have to deal with it, but yet now what I do is- what I do more than anything else is write. So what kind of advice do you give to the people that are coming from the quantitative side that are doing data analysis and modeling, but still recognize the importance of that communication and integrating the important part of the story that goes with this? What kind of suggestions do you have for folks like me?
Stevens: Yeah, that’s a great question I think one thing you can do is read The Elements of Style; that’s a great book. I read it and it helped my writing so much. You know, use strong active verbs, delete unnecessary words, you know, verbs are stronger than adjectives, stuff like that. The other thing is like empathy is the biggest thing both for writing and for making graphics. Like, you have to have empathy with your readers. Things are just hard to understand, generally, in life and so you have to work really hard to make things easy to understand. Like, really try to put yourself in the shoes of one reader and think like, what are some possible ways that this sentence might be difficult to interpret, or this paragraph might be difficult to interpret? And just make it better that way. I really think just like having empathy with your readers or your viewers can make your work so much better.
Pennington: Well Harry, that’s all the time we have for this episode, thank you so much for being here.
Bailer: Yes, thank you for being here.
Stevens: Thanks for having me. This was really fun.
Pennington: Stats and Stories is a partnership between Miami University’s departments of Statistics and Media, Journalism and Film, and the American Statistical Association. You can follow us on Twitter, Apple podcasts, or other places where you find podcasts. If you’d like to share your thoughts on the program send your email to statsandstories@miamioh.edu or check us out at statsandstories.net, and be sure to listen for future editions of Stats and Stories where we explore the statistics behind the stories behind the statistics.