Balancing Rigor And Entertainment When Telling Stories About Data | Stats + Stories Episode 49 / by Stats Stories

49+thieme.jpg

Nick Thieme (@FurrierTranform) is a research fellow at University of California Hastings Institute for Innovation Law and freelance writer for a variety of outlets. Currently, his work focuses on AI regulation, cybersecurity, and pharmaceutical patent trolling. His writing has appeared in Slate Magazine, BuzzFeed News, and Significance Magazine. He was the 2017 AAAS Mass Media Fellow at Slate Magazine, writing about technology, science, and statistics. In particular, he wrote about the role of p-values in the replication crisis, the overblown,existential fear caused by the dangers of AI, and the NotPetya cyberattacks in Ukraine. He is also part of an ASA team currently in the process of helping found a new blog, aimed at undergraduates, practicing statisticians, and the general public.

+ Full Transcript

Rosemary Pennington : We all know data can be complex. Communicating it to an audience in a way that is understandable while still full of research nuance, and rigor is sometimes a challenge. This becomes even more so when that data is connected to a controversial topic. Say, the debate over vaccinations, for example. One way to bridge that gap is to use data in storytelling, helping audiences get to the meat of an issue without bogging them down too much in the technical details, communicating complicated data stories to a general audience is the focus of this episode of Stats and Stories. Stats and Stories is a partnership between Miami University's Departments of Statistics and Media, Journalism and Film, as well as the American Statistical Association. I am Rosemary Pennington. Joining me in the studio: our regular panelists John Bailer, Chair of Miami Statistics Department, and Richard Campbell, Chair of Media, Journalism and Film. Today's guest is Data Researcher Nick Thieme. Thieme is a Research Fellow at the University of California Hastings Institute for Innovation Law, where he explores Artificial Intelligence Regulation, Cyber Security and Pharmaceutical Patent Trolling. He was also the American Statistical Association's 2017 Mass Media Fellow, writing about Science, Tech, and Stats for Slate Magazine. Thanks for being here today Nick.

Nick Thieme : Thanks a ton for having me on!

Pennington : I know you're also involved in an ASA effort to launch a blog designed to communicate to a broad audience. Where does your interest in communicating data come from?

Thieme : Yeah, that project, the Statbites project is actually really, really exciting. And I think it ties in closely to my interests. So, I came from Carnegie Mellon where I did my undergrad. And for a longtime I had this conception, I think, of statistics as being sort of a dry field, and a field that didn't lend itself nicely to storytelling. And I had a professor Cosma Shalizi who would give us assignments like, "Here's the dataset, tell us something about it". I struggled at first, because I think I was taking a very robotic approach to it and as the semester went on I started to realize that the fun of the projects and also the best use of them, was to try and not just say something about the data, but say something about it in an interesting and narrative way. Ever since that course, I've wanted to communicate interesting subjects in interesting ways, and it seems like data storytelling is a useful way to go about that.

John Bailer: And then the natural question, what's been your favorite data story that you've told so far?

Thieme : Oh. That's a good question! There's a couple of them. One of my favorites is one that didn't really get very much buzz, and being sort of small. So, I wrote, when I was at the University of Maryland I worked with a professor Nick Diakopoulos, who is now, I think at Northwestern. So, we worked on a project looking at gentrification in Washington DC, that's obviously a huge issue, and something that's happening, not just in DC, but in a lot of cities. One of the things we wanted to take into account was that right now the measurements of gentrification happens slowly. Right? Like, it takes time for building permits to get approved, it takes time for Starbucks to get built on the corner. But something that can be measured much faster are changes in wi-fi speed. Or the building of capital bike shares. If you know anything about DC, it's a bike sharing program. So, these new, innovative-type changes happen much quicker than the older measures of gentrification. So, what we did was we took a lot of open data that DC, kudos to them, makes available, and analyzed it across different census tracks in DC, and what we're able to see is if you use these technological gentrification indicators, it sorts of predicts the brick and mortar gentrification. And it was interesting to see that you could "get out in front" of brick and mortar gentrification by looking at this tech gentrification. Because at least, in my eyes, after all the point of that work and data stories is to make a difference. So, if you can figure out when gentrification is occurring before it occurs then I think you're in a good place to combat it.

Bailer : So, what's been the hardest data story you tried to tell? The one that you struggled the most to tell that narrative…

Thieme : Funny enough that's an easier one. I think it is the p-value story. Right so I think that one of the hardest things for scientists, going into science journalism is talking about the things you know well. Right? Because you're talking to people who don't necessarily know these ideas so well.

Richard Campbell : See that would be me. So, what's a P-value?

Thieme : Yeah, so if you want to put it sort of, colloquially right? It can give you the evidence of having seen something as extreme as what you saw watching data, or seeing data as extreme as what you saw under some previously specified hypothesis. So, the example that I ended up using was this drug example, right? If I have two drugs and I posit it that they have the same effect on some disease, well the p-value I'll get is the probability of having observed a difference in effectiveness as large as I saw given that there was no difference.

Bailer: And all the assumptions for the methods that you're using in applying so that's all the subtlety that goes with that as well.

Thieme: Yeah

Campbell : John's question has got at "the heart" of what you're trying to do, which is fairly unusual. I think John and I know this because we've been working at this for a long time, but you're in this territory between statisticians and journalists, or between the numbers and the narrative, so just in general, what do you think the challenges are there? What obstacles have you faced? Both from statisticians, that community, and from the journalistic community? When you start talking about data storytelling. For many people those are two words that don't belong together.

Thieme : I think the biggest struggle I've found, and I think this kind of ties in with the struggles, or the "push back" that I've seen both from the journalism world and the science world, is trying to strike a balance between rigor and entertainment. I think this is broad in terms of- I think this applies to science journalism to statistics to data driven stories, all of these things. Because the goal, I think, one of the goals of data story telling is, you're trying to communicate information. Science Journalism, the same. But if the only goal of it was to communicate the raw information, then, you know, you would go read the abstracts of science articles instead of reading the story in the New York Times. So, I think trying to find that balance is hard, really hard. And, you know, funny enough I was listening to the episode of this show with Alberto Cairo, which was phenomenal, and definitely intimidating for me to try and… but he was talking about the five things he thinks make a data visualization successful. And two of them, beauty and functionality, seem to be really applicable to data stories as well. I think that that same trade off of rigor and beauty is the exact same tradeoff or at least it is to me that he was talking about in data visualizations.

[MUSIC]

Pennington : You're listening to Stats and Stories, where we discuss the statistics behind the stories and the stories behind the statistics. The topic today is data storytelling. I'm Rosemary Pennington. Joining me are panelists Miami University's statistics department chair John Bailer and Media Journalism and Film Department chair Richard Campbell. Our special guest is data researcher Nick Thieme. Now, Nick you wrote a piece for BuzzFeed that was talking about the issue of doxing? Which has been in the news off and on for a while. And I think is a very complicated issue, and I think that you did a really nice job of explaining what doxing is and how it works and how organizations like Twitter try to figure out ways of curbing doxing. Even though it's kind of out of their control. How do you approach reporting a story like that, writing a story like that? Thinking about someone who has studied statistics and has studied data, but you're trying to communicate it to a lay audience, how do you approach telling that story?

Thieme : I think I approach it… I try to approach it, at least, from the perspective of whoever is suffering at the hands of doxing. Or at least that's how I try to approach that one. Doxing at its core, you're taking someone's personal information that they don't want made public, and you're making it public for them. That's a huge violation. So, it felt like the most important thing I could do with that article was one, not take more of peoples' private information and make it public but two, report that story in a way that made it clear the dangers of the doxing, but also the responsibility and the ability that companies like Twitter have to stop it. Right? I think that there's news that on December 14th [2017] Twitter is going to start enforcing their harassment rules. So, it's clear that they have this ability, right? And kudos to them for doing it. But, I tried to approach the story from the perspective that hopefully, made that clear. That this was their responsibility, and that they could actually address it.

Bailer : I enjoyed your story on "Poor people don't eat more fast food". Talking about the idea that the thing that's influencing consumption wasn't income but hours worked, and I thought that that was a nice piece. And my question to you is, can you talk through the flow of where you started in terms of research study and then how you extracted the story that you wanted to tell and then ultimately how you packaged it, for this piece in Slate?

Thieme : Of course. So, it all started, I think that was the Ohio State article by the researcher Jay Zagorsky. and it started from that article, I stumbled on to his work. He's a very interesting researcher. In the past, he wrote an article talking about how the "freshman 15" was fiction. So, he specializes in dispelling myths, I think. Which fits very well with Slate. So, I stumbled on that article and as soon as I saw that headline, it was clear that it would fit well for a Slate-type piece. It just ties in with the Twitter answer a little bit. The idea there or the grounding principle, was that, again, we have this narrative we tell ourselves. There are people who are being affected by what we're telling ourselves. In this case the lower SES people who we assume are eating fast food because we, to some degree look down on fast food and look down on lower SES people. so, the important part of that story and what I tried to keep in mind while I was writing it was that dispelling the fiction about fast food consumption was to some degree also trying to dispel the fiction about how we look at lower SES people. I tried to keep that in mind while I was writing the article.

Pennington : Something strikes me as you talk about both the Buzzfeed article and and the fast food article is that your focus seems to not be on the data but on the people

Thieme : Yeah, I think that's fair. And I actually… I think I appreciate you saying that. The data is a means to an end as I see it. I think there's a reason why we're here talking about data driven storytelling, right? It's the data that drives the story but ultimately what we're trying to produce is a story, right? And the narrative of that has to be first and foremost focused, I think on the people inside of it.

Campbell : I'll give you this as a two-part question, one things that you see journalists doing stories about data that are drive you crazy, and two, are there stories, important, significant stories that journalists aren't covering because the data and the numbers seem kind of overwhelming?

Thieme : So, as for something that drives me crazy, this is sort of my pet-peeve. So, there's a lot of uncritical reporting on say, data for social good. It strikes me as being similar as the uncritical reporting on computer science for the social good, that we saw maybe like 5-6 years ago. There was this new app. And since apps can be good they will be good. I think that's true also of data science, right? You'll end up with, and not to name names, but there's a particular project that sponsors people to learn machine learning tools that will help in some social scenario. And every year there's a new class. And every year I have kept up with them a little bit. The projects don't really go anywhere. But when the projects are first released they get a lot of good coverage and a lot of good press saying "hey it's great that this is being done because it's a 24-hour news cycle the story sort of dies there. I think it's important not just to point out that something's being done but to stick with it and see that it eventually turns into something productive, and that drives me crazy. Maybe the story that I wish was covered more if it was more comprehensible, I guess architecture of A.I., and how that architecture necessarily influences those models themselves. You look at stories about A.I. and they talk about A.I. thinking, and they talk about the rise of killer robots and this is less the complexity of the data but I think the more complexity of the model and I wish that people would spend more time talking about the actual architecture of A.I. and how they make decisions, back-fitting - that whole thing. Because I think knowing more about the model itself, about say, GANs, would help people understand that this alarmist notion that A.I. is going to, you know that your toaster is going to rise up and take over the household is fiction.

[music]

Pennington : You're listening to stats and stories our discussion today is on data storytelling. Our guest is data researcher nick Thieme. Now Nick, I'm just going to ask you, you referenced GANs in your response to the last question I'm just going to ask you to explain who GANs is. because I know a GANs in communication that may not be the same GANs you're referencing.

Thieme : Okay, absolutely. So, there is this particular class of artificial intelligence algorithms called generative adversarial networks. And they're very much in vogue. The idea is super, super cool. Basically, you have two neural networks and you set them up to duel against one another and in fighting back and forth they get better. So, this summer I wrote an article for Slate interviewing a researcher at the University of Colorado by the name of Jeff Primm and he used GANS to produce images that would look like volcanoes or flowers. And these were totally original images that looked distinctly like that. And the way that he explained it to me that they worked is, you can use this forgery analogy, basically. You have one neural network that is a forger, and one neural network that is a regulator. And the forger produces a forgery and tries to get it past the regulator. And the regulator tries to catch it. If the regulator predicts that it's a forgery he turns back and like the worst regulator in the world says, "well, the reason you messed up is because you didn't put these pixels in the top left corner of your image". So, the forger gets better. And so, they go back and forth like this until the forger produces what are pretty close to impossible to detect forgeries.

Bailer: So, when you said architecture of A.I. and neural nets, you're really raising the bar in terms of the conceptual framework that someone has to deal with. So, you said you're frustrated with how that's conveyed. Well, here's your chance to take this to a larger audience. So, what does it mean to say the architecture of Artificial intelligence? What do you mean by that? And how would you describe our neural net in a tweetable form? Well, maybe two tweets?

Thieme : I think I would preface by saying I was hoping someone much smarter than me would be able to answer this question. I guess maybe taking my best stab at it, what is important to say about neural nets, is that they're taking input data, transforming them in a series of ways, right? You have a particular nonlinear function, and making predictions based upon that. That is obviously a very cartoonish explanation. But when I talk about wanting people to talk more about the architecture I think what I'm trying to point out is that, instead of saying the GANs think, right? Or instead of saying that these AI are making conscious choices, to make it clear that there is this series of functions that get applied to some input data that necessarily produce a prediction.

Bailer : You just have some algorithm behind the scenes. It's not some sentient thing that's happening.

Thieme : Yeah, exactly.

Campbell : So, again you're in this terrain between data and journalism and you must be driven nuts by this anti-science, anti-numbers, "fake news" craze. Do you have any sort of insights on how to combat this, and what you do over and above the work you're doing which I think is probably the primary way?

Thieme : It's hard. It's really hard. I think this thing people talk about in terms of echo chamber is the fact that people listening to this right now are people who are, you know, they already believe in the news that they get from reputable places. Or at the very least they pick places we would agree on as being reputable. So obviously that's a huge issue, and how to combat echo chambers I have no idea. But I think maybe this goes back to the idea of science journalism as being entertaining versus being rigorous. And it has to have the rigor. So, I'm reading a book called October by China Mieville and it's a story of the 1917 Russian revolution. Which is something that normally I wouldn't read. But China Mieville is one of my favorite fiction authors. He's written some incredible post-modern fiction, he's a beautiful writer. And because of his beautiful writing, I've gotten interested in this topic that I otherwise never would have examined. And I think that to some degree, maybe one way out of this, is to focus on the entertainment side of journalism, of course doing it rigorously, right? But appealing to people across political spectrums on an entertainment level. I think that's one of the biggest values that entertainment journalism can bring to the table.

Bailer: How did you get to where you are now? Where did you start out in terms of your course of study and how did it evolve to this point? If we were advising some of our current students and they wanted to do what you're doing, what might you give them in terms of the wisdom of your experience?

Thieme : I've never heard it put as "the wisdom". I've never heard it said that I had any wisdom, so it's nice to be in this seat, I guess. Um, how did I get here? So, I started as a statistics major. All my formal training is technical so I have a bachelor in statistics, one Masters in applied math and I just graduated with another Masters in Computer Science. But I think I was lucky to have professors who emphasized the story side, or I guess or as it was put "the human side" of statistics and math and all of their human applications. so, I think the way I got here was through technical training but being fortunate to have people who made me understand that technical skills can be used in social fields. And how to do this? Well, apply for things you're not qualified for. There was no way I was qualified for this triple A S fellowship that I had this summer. But I applied and I got some good letters of recommendation and I guess got lucky and got it! And it really did change my life. So, I guess, yeah, maybe that's my recommendation.

Campbell : So, at some point you learned how to write, so how did that happen?

Thieme : Yeah, that's funny enough, something I haven't introspected on all that much. As far as… there is this saying that if you don't have time to read you don't have time to write, I think a lot of the training I got in writing was just from being an avid reader my whole life. I mentioned Cosma Shalizi earlier, reading his blog Three-Toed Sloth going through college and his emphasis on the storytelling side of statistics really helped me become a better writer. And obviously I have to say I worked at the writing center at Maryland, helping graduate students improve their journal articles to make them more legible. So, the director of that institute Linda Macri helped me tremendously in terms of writing, and in terms of giving me a language with which to describe how writing should be improved.

Campbell : Very good.

Pennington : Nick, we are just about out of time, but before we go I want to ask you if you have recommendations for places you think people should be going to read good data storytelling. I know a lot of us probably read stuff that the New York Times produces, and BuzzFeed does some really good work but are their other under read spaces people should be exploring?

Thieme : Wow, so my personal favorite is BuzzFeed, I think BuzzFeed just does an absolutely incredible job but the other two is I think maybe my second or first favorite, depending on how you count is is Propublica. They do absolutely incredible data journalism and their piece "Machine Bias", looking at how A.I. can bias criminal sentencing along racial boundaries I think is one of the best pieces I've ever read, so yeah, I think they would probably be my recommendation.

Pennington : Great well thank you so much that's all the time we have for this episode of stats and stories. Nick Thieme, thanks for joining us today.

Thieme : Thank you so much for having me, this was awesome.

Pennington : Stats and Stories is a partnership between Miami University's Departments of Statistics and Media, Journalism and Film, and the American Statistical Association. You can follow us on Twitter or iTunes. If you'd like to share your thoughts on our program send your email to statsandstories@miamioh.edu , and be sure to listen for future editions of Stats and Stories, where we discuss the statistics behind the stories and the stories behind the statistics.