Austin Fast is a journalist based in Phoenix with over a decade of radio, TV, print and web experience, currently focusing on data analysis and investigative work at National Public Radio. He specializes in data analysis on NPR's investigations team, often collaborating with reporters from NPR Member stations across the country. Before coming to NPR, Fast reported for KJZZ in Phoenix and covered the world's largest wild salmon fishery at KDLG in Dillingham, Alaska. He's also written breaking news at a Cincinnati TV station and taught English overseas with the Peace Corps.
Episode Description
Data have always been important to the work of journalists. from Jacob Riis’ is reporting on how the other half lived in late 1800s New York City, to stories about gun violence in 2022, journalists need numbers to tell their stories. But not every reporter is trained to find and work with data. For those who want to dive into investigative journalism which often depends on complicated data, learning the skills to clean and analyze statistical information is a crucial part of the job. That is the focus of this episode of Stats+Stories with guest Austin Fast.
+Full Transcript
Rosemary Pennington
Data has always been important to the work of journalists from Jacob Reese reporting on how the other half lived in the late 1800s in New York City, to stories about gun violence and 2022. journalists need numbers to tell their stories. But not every reporter is trained to find and work with data. For those who want to dive into investigative journalism, which often depends on complicated data. Learning the skills to clean and analyze statistical information is a crucial part of the job. One NPR reporter's journey into the field of journalistic data analysis is the focus of this episode of Stats and Stories, where we explore the statistics behind the stories and the story behind the statistics. I'm Rosemary Pennington. Stats and Stories is a production of Miami University's Department of Statistics and media journalism and film, as well as the American Statistical Association. Joining me is regular panelist John Bailer, emeritus professor of statistics at Miami University. Our guest today is Austin Fast. Fast specializes in data analysis on the NPR investigations team often collaborating with reporters from NPR member stations across the country, before coming to NPR, Fast reported for KJ ZZ and Phoenix and covered the world's largest wild salmon fishery at Katie L. G. He's also written breaking news at a Cincinnati TV station and taught English overseas with the Peace Corps. And importantly, he's an alum of the journalism program here at Miami. Austin, thank you so much for joining us.
Austin Fast
Yes, thank you for having me.
Rosemary Pennington
I just want to ask how you got interested in using data in your reporting?
Austin Fast
Yeah, well, it started. I mean, you mentioned that I worked for KBL G, which is a small NPR member station in Dillingham Alaska, which is, if you'd like to eat salmon, there's a good chance that it comes from that area of Alaska. And so accordingly, a lot of the reporting I was doing was focused on numbers about how many salmon were being caught by the fisherman, how many salmon were escaping up the rivers to continue the species because that's something that they're really focused on there as making sure that they fish sustainably. And so as I was doing my reporting there, I realized there were so many more stories that I could be looking into if I just knew how to analyze all these numbers that were coming from the Department of Fish and Wildlife from in Alaska, you know, that every day, they had these all sorts of numbers. And I was, you know, reporting them across the airwaves to the fisherman who wanted to hear them and dependent on them to kind of direct where they might, you know, cast their nets that day. But I realized there's trends within this data and these numbers that I was probably missing and just didn't know how to dig into. And so that's, that's really how I got started and interested and, and I realized a good way to kind of expand those and learn those skills was to actually get a master's degree at Arizona State. They have a program, specifically in investigative journalism. And you mentioned that I was in the Peace Corps, the Peace Corps has this great program that they will fund Master's studies. So it all kind of came, it was a perfect, perfect storm, because I was like, ASU has this program. And that focuses on data journalism, and investigative journalism and the Peace Corps, you know, has funding so you know, it was a win-win. And that's, that's really how I got into it. And would you like me to tell you more about the program itself?
John Bailer
Yeah. I mean, the immediate follow up for me was, what does it mean to train in investigative journalism? What are some of the components of what you did as part of the training?
Austin Fast
Sure, so the program at Arizona State, it's an 18 month program. And so the entire semester, they call it boot camp. So it really is training. It's bringing in people from, you know, not necessarily who've worked in journalism, for example, we had a paleontologist who wanted to become a journalist, and we had people who had been English teachers, like I was in the Peace Corps. And, you know, we had a person who did fraud investigation for a bank, you know, so they could come from all sorts of walks of life, and they wanted to become journalists. And so we had this boot camp, where we were learning, you know, all the basics of, of, you know, radio, TV print journalism, but then one of the classes that I was most excited about was specifically data journalism. And so I was taught by Sarah Cohen who was formerly of the New York Times Washington Post St. Petersburg Times and she really was able to lay it out and start with the basics you know about what is data? What are numbers and what does it mean? She, like I to be honest, had never worked with Excel pivot tables before that class. You know, I took the news and numbers class with you, John. I'm 13 years ago, as we talked about a little bit ago, but I don't know if we talked about pivot tables in that class. And so I was able, you know, we kind of learned the basics of, of, you know, what you can do with them, and how you can look for what Tarik Cohen called it was interviewing the data, just like, you know, journalists, interview people, you can also interview the data, you know, you can ask questions of, you know, how many people are, you know, doing this? How many people are doing that? What are the extremes, the outliers, because a lot of times, the people on either end of a dataset are particularly interesting, in terms of, you know, journalistically, and so we started with Excel, and then we moved into a statistical programming language called R, that I'm sure you're familiar with. And I would assume, and so really, I think one of the things, the main things I took away from that class is that, as a data journalist, you'll never know all the answers, there's always a new package in our Python to do what you want to do, and you're not going to know everything, or memorize it all. And that's not the point, I think a major part of that class was to learn how to, you know, ask questions and find the answers for yourself. So I would say, a major part of being a data journalist is actually googling a lot of stuff like being proficient at Googling, because, like, for example, the other day, I was trying to, I thought to myself, I wanted to calculate distances using latitude and longitude, and are from one point to another, and I had, you know, 1000s of points. And I thought, Well, surely there's an R package for that somewhere and do a little bit of googling. And sure enough, there it is, I found it and you know, 20 minutes later, I'm able to find 1000s of points, calculate the distance between them in R. And, you know, I didn't know how to do that before that, but really what the, you know, the major takeaway of that class was, you know, you need to be inquisitive and ask questions, and, you know, learning how to find answers for yourself.
Rosemary Pennington
Was there a story you were working on? So it sounds like, you know, in a classroom environment, I think it can be really easy to feel like you did get confident, right? And look, I know how to do this. But then when you go out into the real world, you're like, oh, no, like, at least it was for me, like multiple times. And as a journalism student, like, Yes, I know how to do this thing, and then went into the real world. And I'm like, oh, no, what do I do now? And I wonder, was there a story for you that you were working on or you worked on in grad school, where some of this kind of came together for you where you were taking what you were learning in that class? And then were able to really see how it was helping you as a reporter?
Austin Fast
Yes, yes. So part of the capstone of that program in Arizona, Arizona State, you spent an entire semester working on one investigation. And so at the time, it COVID was just, you know, ramping up. It was in 2020, when I was in that class, and we focused on how was COVID-19, affecting people experiencing homelessness, was something that hadn't really been looked at yet. And so I got put in charge of the data element of that project. And so I didn't know much about, you know, housing, and I definitely wasn't expert on COVID, because none of us were. But again, like I said, I think a key part of being a data journalist, or any journalist, is asking questions and finding the experts who can answer those questions for you. So what I did was I reached out to a bunch of demographers and sociologists and epidemiologists and asked them if I wanted to see where are the places that are most affected? Where are the places where people experiencing homelessness are most affected by COVID? What would I do? And they were able to point me towards some datasets that are, you know, the county health, what is it called the county health rankings that I think the Robert Wood Johnson Foundation puts out every year, and some other datasets that are gathered by, you know, various national organizations, and I was able to create what we call a vote vulnerability index to see which counties might be, you know, of all the 3200 counties across the country, which might be most affected, in terms of their people experiencing homelessness affected by COVID. And then we used I was able to narrow down to about 43 counties using what I learned in that data journalism class. And we then sent records requesting all those counties asking for, you know, communication about with some keywords like COVID, and homeless, housing, shelter, those types of things. And we're able to get back from a lot of those cities in those counties, some really interesting emails about how the people in charge there were dealing or are not dealing with this crisis as it was unfolding. And it led to a great report that was published by The Associated Press and ran in newspapers all across the country. And so that was a great way to see what I learned in a class, and then, you know, put it into practice in the real world.
John Bailer
Yeah, I think it's interesting that that part of as you're describing the work that you were doing, was this identification of these potentially rich data sources. So there was this just that that kind of what's the what is the world of data look like for the problem in question of interest to you? And then kind of how do you go through and, and kind of process that, that kind of data munging step that we talked about? And then finally, what kind of analyses are appropriate and supportive probing this question that you started?
Austin Fast
And I mean, I would say this really relies on fine. I'll just reiterate, it relies on finding what one of my professors at ASU called a Sherpa, you know, someone who, who is someone who's an expert, because I wasn't an expert. And that's a key part of that process. Could you repeat the second part of your question?
John Bailer
Well, yeah, that's assuming there was a question, Austin. That was very generous of you. You know, it's just it's I was reflecting on kind of what you were saying about just this investigative journalism and kind of the data journalism component of it, there were at least three components that I was hearing, one was just identifying the source. Then the second part was, what kind of skills were you needing and using to process that information to a usable form. And then lastly, some of the analyses that you were executing. And when, as you look at those components, what typically has been the hardest part of doing an analysis and some of these data journalism, investigative reporting activities you've done?
Austin Fast
One of the difficulties that I run into a lot is that people who aren't experienced in working with data, a lot of times they think that it's a kind of a magic bullet, like, oh, I have this story I'm hearing. And you know, I'm reporting. I'm hearing these anecdotes from one or two, maybe let's be generous, say, 510. People I'm hearing anecdotes from, and they're convincing anecdotes, and I want to do a story about it. But wouldn't it be great if we had some data to back it up? And you know, sometimes when I've been pulled into those projects, the data doesn't quite backup, what the anecdotes are saying. And so that always leads to an uncomfortable conversation where a data journalist has to say, well, you know, I'm not quite sure, you know, they may be one of those outliers that I mentioned a little bit ago. And yes, they are interesting, because, you know, journalism is all about stories and human emotion. And that's, you know, still a valid story about their experience, but I don't know that the data supports, you know, a broader claim. And ethically, you know, I don't, I'm not sure that you should use this data source. And so that's definitely one of the difficulties that I've come up with, you know, and working with other journalists who don't have some of that data literacy, that is as much as as much as data journalists generally do.
Rosemary Pennington
You're listening to Stats and Stories, and today we're talking to NPR’s Austin Fast. Austin, you do work at NPR, you are doing data analysis on the investigations desk, what does a typical day look like for you?
Austin Fast
Is it really there? Is there a typical day? Yeah, every day is different, which I mean, that's why I wanted to be a journalist. You know, if I want it, that's what I love about journalism is that no two days are the same. But a lot of times what I do at NPR is the investigations desk, a lot of times I'm partnering with reporters from member stations. You know, NPR has hundreds of stations all across the country. And last year, NPR started this initiative called the station collaboration team within the investigations desk where they are really focusing on you know, for example, at Katie, LG that I mentioned, in Dillingham it's a very small station in a rural part of Alaska, they don't have the resources to you know, you can't expect them to have the resources to know how to program and are Python, you know, they're just trying to get their newscasts out most of the time, they have literally two or three News staff. And so a lot of times, I'll get pulled in to help with some, you know, a member station reporter has some idea, something that would be a great story, but they just don't have the, you know, the data expertise to do it. And so sometimes I'll be pulled in to help with those. Sometimes I'm doing research and you know, looking at new datasets and trying to produce a national investigation on my own. You mentioned a little bit ago, John, the nursing license story that I did. And so that's an example of a story that I was hearing. So this was published earlier this year. And just to summarize, basically, boards of nursing across the country were, some of them were taking a very long time to, to approve licenses for registered nurses and LPNs licensed practical nurses. That's a problem when we are in the midst of a pandemic. And there's nursing shortages that have been ongoing for years. And you don't want to have any bottleneck at all. And so that was an example I was hearing not about nurses, but I was hearing from my friends in Phoenix and I was hearing that they were having trouble finding pharmacy technicians actually is what I heard and I thought well look at health care who's the backbone of health care? It's nurses, there's millions of them across this country that you know if anything goes wrong in hospitals or doctors offices, they're supporting it. And so I started looking at nurses when hearing the same thing was going on. And looking back, I may be a little more hesitant to just kind of willy-nilly embark on a national project. I thought I'll just send 52 requests to 52, State Board of Nursing State Boards of Nursing, I'm including Puerto Rico and DC in that. And it turned out, you know, it takes a long time to get, you know, to work through the bureaucracy of over 50 different boards of nursing. And so the hardest to go back to your question from a second ago about the hardest thing, some of the difficult things, the hardest thing was that there were 52 boards, and I got 32 responses and 32 different responses, they were in all sorts of, you know, the data was very unclean. And well, it was, it was in all different formats. And I had to be able to, you know, make them, you know, clean it up and make sure that they're actually comparable one to the other. And so, that's a long answer about what my typical day is. But that was just to give you some examples.
John Bailer
Well, I think it's helpful for us to hear a little bit about the workflow that you describe about going through a project like this. I guess, you know, now that you've mentioned that story, can you talk about some of the major takeaways that you learned from the analyses? What are some of the endpoints that you looked at, and some of the comparisons that you made between states?
Austin Fast
Sure. So I mean, once I got all the data, it was a very simple analysis, I was really just looking at the date, the nurse submitted an application and the date that it was approved. I mean, it's subtraction, a count of days, but then I was able to calculate, you know, state medians. And we had a, our visuals team at NPR made this great visualization that show and it was interactive, you could choose by the state, and it would highlight the the dots showing, for example, Pennsylvania was taking I forget the exact number, but it was something like 120 days was their median time, compared to Vermont, which had, it was practically one day approving licenses. And so we saw a real range, and I was able to speak with, you know, once I saw that, you know, visualizing is part of the reporting process, a lot of times you think of journalists will think of visualization as, you know, a final product to show your results to the audience. But I think it really can be part of the reporting. And that's a great thing that, you know, most data journalists, I would say, use visualization to find who to focus on. And so I made some simple scratch charts to see where the meetings were, and I saw, okay, Pennsylvania, Texas, California, those are at the top, they have really long median processing times of over 100 days each. And then we reached out to some of our member station reporters in those places and, and got, you know, pulled them in on board to get them to focus on their regions. And it was just a really nice collaboration to show you know, this, where this is happening in different parts across the country, but then also, the national scale of it.
Rosemary Pennington
Awesome. You said something, it just second goes about sort of how the data can help you figure out whom to focus on in your reporting. And I'm looking at one of your stories from NPR, about actually people who are missing reporting, it's the story about how millions of people are missing from the CDC COVID data. And I wonder, how do you find people that aren't there in data?
Austin Fast
Oh, that's a good question. That story actually started because I was curious, I had done a story right before that, looking at the disparities between rural, rural and urban counties, and their COVID vaccination rates. And the next idea I wanted to look at was to see what their racial differences are. And so I, the CDC has this dataset, it's supposed to have every single case of COVID that was reported, again, this is this last year, so you know, it was earlier than we are now. So think of it less was known at that time. And I very quickly realized that I could not do an analysis on race, because in the data it says you know, it says African American, Caucasian, Asian American and then unknown and the unknown, or the no response for many of these counties and states, it was just it was most of them. And I thought, well, I can't do that. And so the story, this is a case where the story isn't always what you expect, it's going to be you know, you need to be flexible to go where the reporting takes you. And so the story then became, okay, the CDC has this other this other tracking system, that's basically just a tally of all the cases and that tally was millions of cases above what was in the the CDC dataset that I was just saying, and so that's how I was able to see that okay, their tally is X number. The full quote unquote full dataset is this number which is my smaller, what's going on here. And that really led to an interesting story about how our system of public health across the country is, I had one expectation that, okay, the CDC collects all this, it's all there. And I found out very quickly that our system is not like that every single county has kind of an island on its own, that does its own thing, and then reports that to the state. And then the state reports to the CDC, voluntarily, meaning they don't have to if they don't want to, and in some states, for example, Texas, they have only reported about 2% of their cases to the CDC. And so the story became, you know, in a state like Texas, which has millions of people, and also millions of COVID cases, what effect does that have on epidemiologists ability to analyze and make decisions about, you know, what we should do in a pandemic, and that was the story, you know, like, what's going on. And unfortunately, as I just mentioned, with investigative journalism, sometimes there's not. I wasn't able to find, you know, a resolution, but our job is just to point out the issue. And, and hope that the people who do work in the government or, you know, whatever these agencies are, that they can take that reporting, and then you know, might be able to create some positive change.
John Bailer
So, you know, I'm curious to think about what's, how two states I was gonna say how to people, but how did this How did some of the states respond to these two stories? I mean, I would, I would think that this, this could get some, you're shining the light on some some good performers, and also shining the light on some bad performers? Yeah. And how did they react? I'm just curious, did you get any kind of residual from this story that was interesting to see.
Austin Fast
Well, for example, I mentioned Vermont was doing great, they were very happy to talk to me, obviously, you might not be surprised, because they, because they looked good. And I mentioned California, they were actually very, they were great to work with me. It just took a long time. Obviously, they're not thrilled that they're not, you know, they're not coming out looking as great in the story. But, you know, that's not my, I'm not trying to make anyone look bad. In these stories. I'm just reporting the facts, you know, what the data says? And how do you explain that? And so I tried to give them their due and give them you know, an opportunity to respond. I will definitely say that sometimes when, like, for example, Texas was not happy to talk with me about that. And they had all these reasons for, you know, why it is the way that it is that you know, that's not how public health surveillance works in Texas, and in our art, you know, it goes to things like our system of federalism the way you know, we got back to like the Constitution in the 1700s and reasons why, you know, we have a federal system and they don't have to send that up to the CDC. And I'm just thinking, Okay, I respect that. That's our system. But still, you know, you have to realize you only have 3% of your cases at the CDC, how does that affect the ability for the CDC to make informed decisions?
Rosemary Pennington
Austin, as you may be aware, having studied journalism now, at the undergrad and graduate level, there is often a very cliched fear of numbers that surround journalism students, which actually, being a journalism professor is not so cliche, I would like to state, I wonder what advice you would have for a student who wants to do this kind of investigative work, but maybe is a little bit nervous about the stats, part of that, because a lot of my students are,
Austin Fast
I would say that data journalism sounds very fancy and maybe intimidating. But like I mentioned, with the nursing license story, I was subtracting, you know, one date from the other. A lot of times data journalism is not, you know, created by looking at all these complicated statistical tests and analyses. You know, it can be and certainly there are great data journalists who do that sort of thing, but I come at it. I'm a journalist first. That's what I learned originally and moved into data journalism. There's other data journalists who started as a statistician and moved into the journalism part. And so you can you can choose, you know, your level of how deep you want to go into the numbers and, and a lot of the things that there's great data journalism that comes out of just, you know, looking at medians and, and averages and just very simple things that you probably learned in, you know, no later than junior high or elementary school. And I would, I would hope that everyone can do simple addition, subtraction and multiplication, fingers crossed. And also, like I mentioned before, there's so many resources online. There's there's website, there's all sorts of websites and information on there's data journalism.com A one I turned to all the time a Stack Overflow l.com, which is basically a message board for people who, you know, using Python or R, who don't know how to do something, and there's just so many answers on there of things you can try. And so, I guess my last piece of advice would just, you know, just try it. And you might be surprised at how much you can learn, you know, even just by devoting a little bit of time to try to master some of these concepts.
John Bailer
So as a follow up to that, you know, one of the things that, that, uh, I think about as a, as a stat person who's also interested in, in journalism and house, how stuttered numerical information is conveyed, there's, there's often only a point estimate is reported, there's often just this precision of a single number, a central measure, without kind of telling the story of the variability, or the story of maybe some of the uncertainty in the system, how do you balance kind of focus on message, but also the acknowledgement of the variation and uncertainty that are part of the analysis, the data that are that you're analyzing?
Austin Fast
Right, that's, um, that's something I've talked about a lot with people on my team and NPR, I mean, numbers are estimates, and especially when you're looking at big datasets, they're estimates. And I think by giving a level of precision out to, you know, two decimal point three, you know, you can give a false sense of security that this is, you know, written in stone, and this is what it is. And so, especially, I mean, maybe in radio, we have a little bit of leeway, because to get the message across, we have to round anyways, because when you're listening to an audio story, and someone says, let's say, I don't know 65.1%, that's gonna fly over listeners head, they're thinking about all the decimals and, and whatnot anyways, and so in radio, we round anyways, we would say two thirds just to be even though it's it's not precisely what the analysis says. Our focus is on getting the point across to someone who's driving down the street, yelling at their kid in the backseat, who's you know, who dropped their pacifier on the floor or whatever, you know that audio radio listeners are doing other things. And so a lot of times, you know, we can't put those in radio stories. And so, I mean, both things kind of go hand in hand, because it is kind of problematic anyways, just because you don't want to give them a false sense of how certain these numbers are.
John Bailer
Well, you know, Austin, the one thing we didn't mention throughout this is that you contributed, that's a story episode. So I'm going to as we come to a close here, I'm going to issue a listener challenge, see if you can identify which episode Austin contributed to.
Rosemary Pennington
Good luck. Well, that's all the time we have for this episode of Stats and Stories. Austin, thank you so much for joining us today.
Austin Fast
Yes, thank you so much for having me. It's been a pleasure.
John Bailer
It's a delight to get to interact with you again. Thanks for taking the time and wishing you quite well, and all this really interesting work that you're doing.
Rosemary Pennington
Thank you so much. Stats and Stories is a partnership between Miami University’s Departments of Statistics, and Media, Journalism and Film, and the American Statistical Association. You can follow us on Twitter, Apple podcasts, or other places you can find podcasts. If you’d like to share your thoughts on the program send your email to statsandstories@miamioh.edu or check us out at statsandstories.net, and be sure to listen for future editions of Stats and Stories, where we discuss the statistics behind the stories and the stories behind the statistics.