James J. Cochran is associate dean for research with the University of Alabama’s Culverhouse College. He is also a professor of statistics and the Rogers-Spivey Research Fellow.
Ryan McNeill is the London-based deputy editor for the Reuters global data team. During his more than 10 years at Reuters, he has worked on investigations revealing underground markets for adopted children, America’s failure to prepare for sea-level rise, failures by governments across the US to stop the spread of antibiotic resistant infections, the scale of Africa’s illicit gold flows to Dubai, how and where humans are raising the risk of zoonotic spillover around the globe, and ethnic cleansing in Sudan. In 2024, he was part of teams that won two Overseas Press Club of America Awards, using satellite imagery and remote sensing methods to document human rights violations – such as the burning of villages and documentation of mass graves in Darfur – and reveal how humans are raising the risk of another global pandemic.
Episode Description
Coronaviruses, Ebola, Marburg, NEPA, SARS, what do these diseases share in common? Habitat loss resulting in closer interactions between infected bats and uninfected humans is one factor. What other factors are driving the growth of zoonotic diseases, and where is the spillover risk the greatest?
+Full Transcript
John Bailer
Coronaviruses, Ebola, Marburg, NEPA, SARS, what do these diseases share in common? Habitat loss resulting in closer interactions between infected bats and uninfected humans is one factor. What other factors are driving the growth of zoonotic diseases, and where is the spillover risk the greatest? Our guests today include a journalist who recently completed a multiple story investigation of changes, along with a statistician involved in evaluating the modeling and prediction efforts. This Reuter project used two decades of outbreak environmental data to identify locations on Earth with the highest risk of bat borne virus to spill over to humans. I'm John Bailer. I'm joined by Rosemary Pennington, chair of the department of media, journalism and film. Stats and Stories is a production of Miami University's departments of statistics and media, journalism and film, as well as the American Statistical Association. Joining us today is Ryan McNeil and Jim Cochran. McNeil is the London based deputy editor for the Reuters Global Data team in 2024. He was part of teams that won two overseas Press Club of America Awards using satellite imagery and remote sensing methods to document human rights violations such as the burning of villages and the documentation of mass graves and Darfur and revealing how humans are raising the risk of another global pandemic. Cochran is the Mike and Kathy Moran Endowed Chair of business. Cochran was a founding co-chair of statistics without borders and a member of the founding committee for the informed pro bono analytics initiative. He's also an ASA fellow and recipient of the ASA Founders Award. Thank you both so much for joining us today.
James Cochran
Nice to be here.
John Bailer
So Ryan, can you start with a description of what inspired this Reuters series of articles related to bat virus zones, risk zones around the world?
Ryan McNeil
Yeah. I mean, thinking back, I think it really started with my colleague, Deb Nelson, who I've worked with on numerous investigations since joining Reuters, and we both sort of have had an affection for environmental related investigations and, you know, as she's really excellent at mining academic literature for news stories. And so if you fast forward a little bit to, you know, the height of the pandemic, I was here in London, and she had been asked to look into the origins of coronavirus, of SARS too, which, of course, we know, it will probably never really actually be known. But, you know, she got to thinking about the next one, and how do we prevent it? And that was at that point, you know, I had been helping our teams here in the UK cover the UK government's response to the virus, to the pandemic. And, you know, eventually, though, like that, kind of kept sending me these papers. And there was one that, you know, led by David Heyman, that had looked at some various environmental factors and links with coronavirus observances. And, you know, I had been spending some time learning about how to use satellite imagery and thinking about it in terms of environmental investigations, and I looked at this, and I remember thinking, perhaps stupidly and a little bit arrogantly, like, couldn't we do this for the whole earth? And that was kind of how it started, and a lot of the previous research had used actual observance, you know, they included whether it was a human or animal species, like if they had identified its presence there. And we were going to do something different. We were going to look at the actual spillovers to humans and so that's really how it all got started. It was just, you know, as many journalistic investigations start like you're trying to answer this question, it's like, okay, well, what about the next one? How do we stop it? Well, to stop it, you have to be able to figure out where it's happening. And so that's what kind of set us off on this multi year journey.
Rosemary Pennington
In this article, I'm looking at what you call the places where the viruses are sort of leaping from animals to humans, jump zones. And I am curious, how did you identify jump zones, and what were the challenges of doing this on a global scale?
Ryan McNeil
Wow, so we're really going down deep here. Um, so, you know, first of all, we needed to figure out where these things have happened in the past, and that was sort of the first bar to hit, and we were able to, sorry, actually, I need to look up the name of the company that had given us an initial list. So the first thing was identifying where these spillovers had happened. We knew all along. I mean, based on the academic literature, it seemed pretty clear that the predictors data was going to be, you know, sort of, quote, unquote, the easy part was going to be figuring out where these sites, where these events had occurred. And we were able to get a little start on that by some some we were able to get sort of a starting list of the spillovers and but what we had to do then was we had to turn it into a GIS data set. We had to turn it into an electronic mapping format, and that required a lot of our own research to try and place these sites, these events. Sometimes we had really good locations, you know, either through previous academic literature or public health reports or, you know, various other bits of information. Other times we had to, like, do our own reporting, see if we could figure it out. We used our own, you know, we're lucky at Reuters, we have 2300 journalists around the world, and in hundreds of countries. So sometimes we have that ability to get ground information. I can think about, I think there was one incident in the DRC that we were able to track down that sort of the village where it happened, because we had people who had covered that story and knew so, you know, it was a lot of work like that. And then, you know, there was also the building of the predictors data set. Now, we have sort of divided the world into these five kilometer square pixels zones, if you will. And we were trying to do our analysis on that bit. But that is a very large data set. And even with today's modern sort of computing infrastructure, you know, sometimes if I made a mistake in my code or something like that, it would take, you know, days of reprocessing to rebuild it, which, as you can imagine, frustrates editors a whole lot.
John Bailer
This is great. I mean, I'd like to maybe help unpack this a little more. So, you know, I'm gonna get down and sort of dive deep on three different things. One is just some basic terms when you talk about spillover. Help paint the picture about what you mean by spillover. Secondly, then talk a little bit about the outcomes that you're tracking, and then, thirdly, about what we're contained in that set of predictors that you were looking for to try to predict outcomes. So, you know, homework problem one, you know, what do you mean by spillover in this context of, kind of, this future pandemic concern?
Ryan McNeil
So if we're using coronaviruses as an example, and Ebola and other sort of well known Marburg NEPA, you know, those are circulating in wildlife, in this particular case, in bats, right? But there are diseases that circulate in rodents and various other animals. And in the case of bats, you know, the viruses that they carry that for whatever reason we don't know yet, they don't get sick, but they transmit. They can transmit those diseases, that process of giving the disease, so to speak, to humans, either directly or indirectly, that is a spillover.
James Cochran
So now it's a spillover, and that can be bites or through feces or through saliva, all kinds of ways that bats can transfer viruses to humans.
John Bailer
Yeah. So yeah, and I remember one of those in one of the stories, part of the story, reading about just consuming mango that might have the saliva of a bat on it as being one of the sources of the potential routes of which a human might be exposed to this disease.
Ryan McNeil
I was just gonna say, I think one of the most fascinating parts of the whole series was, you know, sort of the trail of figuring out how NEPA was causing infections in Bangladesh due to consumption of date palm syrup, you know, which is a delicacy in this part of the world. And, you know, scientists eventually were able to trace it to the fact that the bats like to gorge on the date palm salt. Sat for the same reasons, you and I might enjoy it, and they urinate and then we drink it. And that's how people are spreading.
John Bailer
So that's the spit. So you've described a kind of spillover. The outcome is going to be some of these diseases that you've described, these ones that are originally circulating in wildlife populations, that are not a concern, until somehow there's this conversion into some human exposure. So then the last question that I had asked was related to the idea of, what were some of the variables that you were thinking about that might be predictive of there being more risk of this spillover?
Ryan McNeil
And see, I think this is where Jim comes in, you know, and various other experts, you know, we wanted to try and figure out what predictors should be included, and to do this, we basically mind academic literature. Again, are any journalists listening to this? Like, the admonition I always get when I teach is, like, some of the best stories are hiding in academic literature, and we don't do enough to take advantage of that, and it's what I preach all the time, but back to the point at hand, like, you know, it was a matter of, like, going through these papers and talking with Jim, I remember one of the things that we tried at his suggestion was actually including the value of the latitude and longitude, which is something that we hadn't considered. But it's, of course, very important. And you know, there were various discussions about how to prepare the data in the first place, to make sure we weren't running with scissors. But, you know, that's where a lot of expertise, like gems and other academics really came in. Was really important.
James Cochran
What we were really curious about in this study, though, too, is whether or not deforestation led to bats circulating more among human populations and leaving their traces, which would cause this zoonotic transfer that we've been talking about. So this is very much a sort of climate change and an environmental issue, if it goes the way we thought it would go.
Ryan McNeil
And we also had to confront issues like, for example, you know, the deforestation data that we used as a predictor is an annual data set. And, you know, there's this kind of idea, well, okay, if you know, a chunk of trees with a critical bat roost, or more are chopped down in, say, December? Are you saying that that risk from December doesn't carry over into January, for example? And so, you know, it's a lot of discussion, like, should we include cumulative and deforestation? And can we, should we include recent deforestation? And should we include nearby like, if you think about each of these pixels, should we in each direction also include those? And the answer is, we ended up on Yes, on all of them, you know, trying to understand what would really have an effect.
Rosemary Pennington
Ryan, you just said that you think journalists should be mining academic research for stories. What advice would you give to journalists who want to do the kind of work you had to engage in to produce this big story about spillovers?
Ryan McNeil
Well, my personal bias would be, you know, because I am frequently looking, I kind of live in both worlds, right? Like we're looking for quantitative methods, but we're telling journalistic stories, right? And so my interest is what I'm frequently looking for, what data did you use and what methods did you use? You know, because you know frequently that will reveal things that we didn't know, or the ability to answer a question that we didn't know about. So, you know, frequently I'm looking at data, but, I mean, you can talk to Jim about work he's done, within the city in Tuscaloosa, I believe it was, you know, that would have the journalists would be interested in and, and so, you know, I think that there's, it's not just the data part of it. It's also the content. And, you know, what questions are trying to be answered, and sometimes that finds a lane for us, like, there's some question that hasn't been answered that maybe we can, which is kind of what we were trying to do here. You know, it's very similar, but just slightly different. That question that we wanted to answer, to tell our story, was slightly different. And but all the methodologies and all the ideas for the predictors, and all of that was coming from academic research.
James Cochran
And I think this relationship really needs to have two different aspects to if you talk about academia and journalism, we do a lot of work in academia, publish a lot of papers that might be of interest if we can break the story down and tell it in a way that the general public would like to hear, instead of in terse academic prose. But also statisticians as a group tend to be a little shy about working with people in journalism because they're worried they won't look good or something bad is going to happen. I've been working with Reuters reporters for over a decade now. I started working with them when Joan Biskupic apparently found out I'd done some work on the head trauma research in the NFL, and also on the economic impact of an NFL lockout that, for some reason, went viral, and she called me up and she wanted to talk to me about some work that she and Janet Roberts and John Schiffman were doing on whether or not a small subset of our population have dominated the Supreme Court and been able to get cases before the Supreme Court. And this was a fascinating problem with a lot of exposure. You know, I turn on the TV one night, my wife and I are watching Washington Week in Review, and the host is asking Joan about this work that I worked on with her. And you think, holy cow, if I get this wrong, everybody looks bad, but you have to trust your skills as a statistician. And the other thing is, as we've progressed or regressed as a culture over the last several decades, there's a lot less trust in journalism. And so journalists that do the kind of work that Ryan's doing sticking his neck out by actually collecting data and analyzing data, identifying the problems that he wants to report on, instead of being assigned to them, that's really risky stuff, and I want to make sure that journalists, investigative journalists, who have that kind of guts, you know, have someone there to support them. And generally speaking, when Ryan calls and wants to talk to me about a project, he's pretty darn dead on about how to handle it. And again, you understand these are really messy problems trying to get the data to line up, like Ryan talked about the zoonotic transfer data versus the deforestation data. They aren't collected exactly the same times or over exactly the same geographies. You know, these are all really messy, hard problems to work on, and the only wrong answer is to not help.
John Bailer
You're listening to Stats and Stories. Our guests today are Ryan McNeil and Jim Cochran. You know, one one part of the story is, as I was reading through these different components, these chapters of this story. There were pieces talking about a new train line in China, or continuing lumbering in the Amazon, or the emergence of a disease that was unsuspected in other contexts. You know. How did you know? How did you choose those particular stories to kind of flesh out the analyzes that you had been conducting, and what were some of the things that you had learned?
Ryan McNeil
Wow, so that's a really good question. You know, we knew this was going to be a multi part series. But when you undertake an endeavor that you don't want to tell the same story over and over and over again, right? And even though there's the same kind of spine, and so what we work to do is to understand first, like, let the data guide us. I mean, we had a pretty good idea based on, you know, the academic literature that existed so far, like, where our main focus areas were going to be. But, you know, we spent a lot of time, we went to Liberia. We went to Ghana and various other places. But you know, what we tried to do was come up with these very distinct stories. And, you know, in West Africa, it's very hard to ignore. I mean, if you just take a satellite image, like on who are, and look around Ghana and look at the environmental destruction like you could see the very telltale signs of the sort of artisanal gold mining that is responsible, as well as the other things, cocoa farming. And then, you know, in Laos, it was really quite interesting because it's such an undeveloped country, and now you've put this high speed rail clock right in the middle of it, and made all these changes happen in one of the most forested and diverse places in the world. And then in Brazil, like, you know, it was really interesting, because we didn't have any events in the Americas, yet, our model is still picking up on, you know, these specific places within the Amazon. And so we thought it was a really good example of kind of, here's what the future may hold. Because, you know, it's apparently quite hard to get into the Amazon and do the kind of research, the kind of sampling of animals, you know, to get into that very vast region. And so there's still a lot we don't know, like, what's in there and what's happening. And so yet, we're doing this massive destruction. And then also, of course, Brazil has a lot of air connections in ways that other other countries might not. So it enabled us to also tell the story of, like, how these things can spread from just, you know, somebody coughing in a remote forest. And it was a risk too, because that's a bit hypothetical, right? And journalists don't, um, we tend to kind of shy away from, you know, that kind of speculation. So we wanted to handle that with a lot of care, but that's really what drives it, is like we felt like in the end, you know, we wanted to have, you know, even though we had the same narrative spy, we wanted to tell a different story about each place. And I think the factors that are affecting these changes, whether it's Australia, Bangladesh, Ghana, Brazil, I mean, they're all the forces are all different, but I guess the offense is the same, right?
James Cochran
It's interesting that Ryan brought up the risk that journalists are taking. You know, there have been several projects that he and I have talked about, and after going through the data and him explaining what they're seeing and what they've done, you know, basically, they come to the conclusion that there's no story here, or the story is that there's no story here, and you have to be prepared for that as a journalist. And you know, Ryan's being evaluated on having successful stories. So you know when you don't know how something's going, for example, one thing that Ryan and I have talked about, we haven't really looked at yet, is another aspect of this problem, transportation. Thirty years ago, you might not have seen something like Ebola get out of small villages in Western Africa, but as you establish more transportation, it makes it easier for people to get from those villages to the major cities, and then from those major cities, you hop on a plane and eight hours later a year in Schiphol Airport in Amsterdam, and from there you're all over the world. And so by making our transportation network more efficient, we may have also inadvertently made it easier to spread these viruses around the globe.
Ryan McNeil
I think that's one of the, if you asked me, like, what was the one predictor that you didn't have that you wish you had? I wish it had been, you know, annual road changes, so that a lot of the research uses roads, but it's like a static point in time. You know, what would have been ideal would be to have basically a roads data set for each year, and then you could calculate, well, how far is this pixel from a road, or, you know, some other or what's the density of roads, you know, some, some sort of way. But I think I wish that in hindsight, like that would be the thing that doesn't exist that I wish existed, and probably would improve, you know, because it also drives deforestation as well. Like the roads create deforestation and change all these other climate predictors. So it would have been really, I think I wish we could have had that.
Rosemary Pennington
You've both just talked about transportation as being one thing. You wish you'd had a better understanding of or better ways of measuring going into this. What other questions I wonder, did you emerge with having worked on this?
James Cochran
I would say, from my perspective, you think about this as a two by two table. You have deforestation in the rows, deforested, not deforested. In the columns, you have transfer or not transfer. You'd like to be able to look at both deforested and not deforested, to get some sort of discrimination about transfer. But it's always hard to decide where to look, where your purported cause isn't happening. You know, that's always a difficult thing to do, and you don't want to end up only looking at half of that fable and drawing conclusions.
John Bailer
How do you evaluate the quality of these complicated prediction predictive models?
Ryan McNeil
Let me take a step first, and then I'll let the applied statistician. You know, I thought this was something that was in our minds a lot. And, you know, there were a couple of events that occurred after, like as we were working through, you know, actually the writing and editing process that occurred, and in areas that had that our model had found a high probability of potential. So that was heartening. And then, of course, we spent a lot of time talking about this exact question with people like Jim and others, so maybe it's best if he takes that on.
James Cochran
I think there are a couple of things to keep in mind. One, you have to look at how well the data fit the model fits the data that you used to create the model. And obviously there's a little bit of a problem there, because the model is telling you what the model should be, and then you're turning around and applying the model back to the data that told you what the model should be, what we call overfitting. But these data come to you very slowly over time, and so what we have to do is over time, see whether or not this model works, and hopefully we're going to find academics and others who will take this work and run with it, and add to it and augment it, you know, in ways that that Ryan referred to adding something about transportation mileage, or kilometers of roads or airline passenger amounts for, you know, passenger lists for a year, out of a country, you know, into other other parts of the world, something like that. But it's not something that's going to be easy, and it's something that's going to take a lot of time.
John Bailer
I kind of thought that might be your answer, but I think it's a really important part of the story as this is told. This was great. I just want to tell you how impressed I was at the quality of this work. Thank you. I really, I mean, I was looking through it. I thought that the media, that kind of interactivity as the story emerged, was just really, really beautifully well done. And I just, just kudos. Well done.
Ryan McNeil
Well, thank you. Imagine, I mean a lot of credit to our editors too. I mean, imagine going to your editors and saying, Hey, look, we're going to spend a lot of money to travel around the world to tell a story about that. And they got it, and they supported us. And, you know, it took a long time by journalistic standards. I mean, two and a half years of reporting, you know, but we wanted to get it right, and we wanted it to be, you know, because it is the story of our time, in many ways, at least one of the major stories of our time, we felt like we could do, you know, we had to do it the right way. So, you know, the only thing that we didn't get into that I wish we had is, one of the ways that Jim is really important is, how do we translate these findings into English readers can understand, you know, that's that was something we spent, as Jim can attest, a lot of time on, I mean, talking about text, talking about, like, specific words, you know, well, if we use this word, is that okay? Because we have to live in this world, it needs to be scientifically and statistically accurate, but then we also have to be able to, like, explain it in the way that an average person can understand. And so that's the real like doing these kinds of modeling exercises like translating, it is really where the, you know, the hard part comes.
James Cochran
I think that's actually kind of educational for me. And when I walk into the classroom, it doesn't hurt that I've had to think through issues like this. I mean, that's what you do when you teach. Try to make it understandable for your students, and it's the same thing here. We're trying to make it understandable to make it understandable for Ryan's readers. And if people don't understand it, or people don't believe it, then, or people don't think about it when they read it, then what's the value of the work?
John Bailer
So did you? Did you pilot test some of the language? Did you see how it was received? I mean, how did that play out?
Ryan McNeil
I don't know, no, but, I mean, I was just thinking, like the amount of time we spent discussing whether we could use the word risk or not was, you know, we're really like again as to the test, like we probably went back and forth on this, you know, a lot because we thought, okay, well, what's some economist sitting around gonna think we see it? But it was something that the public understood, you know, and it was things like that. Most of these discussions were done before publication. Once it's out, that's it.
John Bailer
Well, you know what? We could also include some of this discussion of the communication part in this. I very much appreciate this. As someone who's done risk assessment and risk estimation and estimation and dealt with the fact that sometimes people mean hazard when they say risk, or they mean something different. And I was thinking when you said, I was trying to picture how, oh, we used random forests. Wait, about Amazon. Is that a random forest? You know, I could easily imagine some of the technical methodology and then explaining in context, it can be a real challenge. And I thought you did, well, I'm very impressed.
Ryan McNeil
And you know, these machine learning models, there's a lot of black, boxy nature to it, and you know that also can make people uncomfortable. And so again, it goes back to, like, how do you prove it right? Like, how do you, how do you ground truth? You know, something like this.
James Cochran
But there's a lot of technical language that we use, and that is, of course, an issue that we have to kind of step around. But it's surprising how many words in the statistics vocabulary are used in a different way in general public significance. You know, I get really irritated when I edit a journal article and somebody says, We have significant results. And I'll say, I don't see your statistics. I don't see your inference. You know, when you write for me, that's what you're telling me, but when you're writing for the general public, it's different. But when you combine the two statistics and journalism, all of a sudden you have to worry about both ways that those words are used. And there are words if you have, if you worked as a statistician with journalists for a while, you kind of really start to see or anticipate where the landmines are, and you develop ways to step around them. And that's, it's more than the technical side. It's the general language side and trying to make it digestible.
John Bailer
So you know, if you were going to summarize in a small snippet of conclusion, what were sort of the main, what's the main takeaway from your work that the world is doing?
Ryan McNeil
It’s creating unprecedented risk around the world at a scale that's probably quite new. And on top of that, we're interconnected more than we ever have before, or we're connected more than you ever have been before, and the pressures are such that I would expect, I mean, I'm not Nostradamus here, but you know that they don't show a lot of signs of relenting. So it seems like, you know, I did another podcast, and they asked if I came away thinking that this was going to get better. And I said no. And I think that, you know, we're facing, as long as this damage is continuing, and as long as we're linked more than we've ever been linked before. And you know, just, I could kind of repeat myself, like, these are things we're going to have to confront.
James Cochran
I think what I take away from this is that all of this is self-inflicted by our species. Humans are doing this primarily to ourselves. And if you want evidence that there were people that anticipated this, the Obama administration, apparently, early in President Obama's first term, created a pandemic playbook on how they would deal with something like this, because in their words, it was inevitable. They saw this coming. For whatever reason, maybe they didn't see the relationships we did, but they saw this coming. You know, we have set ourselves up for these horrific problems that we're now facing.
Ryan McNeil
And it might be easy, you know, I suppose some might react and say, Well, you know, this is, I don't know, Ghana's problem, or this is the DRC problem, but a lot of the forces that are driving these things are coming from us in the, you know, in the global north, so to speak, wealthier countries, you know, our demand for furniture, as well as China's growing industrial demand, you know. So, it's not just a problem for Ghana to solve. It's a problem for the whole world.
James Cochran
And how do you tell Ghana or the DRC or Myanmar or some country like that, that they have to stop development after the United States and Western Europe has gotten to where it's comfortable with its economic status and stature? Somehow we all have to work together to find ways to help those countries develop within the context of their own cultures and within the context of their own environments without creating greater problems.
John Bailer
Well, I'm afraid that's all the time we have for this episode of Stats and Stories. Ryan and Jim, thank you so much for joining us today. Thanks.
Ryan McNeil Thank you John, thank you Rosemary.
James Cochran
Thanks for having us.
John Bailer Stats and Stories is a partnership between Miami University’s Departments of Statistics, and Media, Journalism and Film, and the American Statistical Association. You can follow us on Twitter, Apple podcasts, or other places you can find podcasts. If you’d like to share your thoughts on the program send your email to statsandstories@miamioh.edu or check us out at statsandstories.net, and be sure to listen for future editions of Stats and Stories, where we discuss the statistics behind the stories and the stories behind the statistics.