How to Identify Russian Bots | Stats + Stories Episode 88 / by Stats Stories

tucker pic.jpeg

Joshua A. Tucker is Professor of Politics, affiliated Professor of Russian and Slavic Studies, and affiliated Professor of Data Science at New York University. He is the Director of NYU’s Jordan Center for Advanced Study of Russia, a co-Director of the NYU Social Media and Political Participation (SMaPP) laboratory, and a co-author/editor of the award-winning politics and policy blog The Monkey Cage at The Washington Post. He serves on the advisory board of the American National Election Study, the Comparative Study of Electoral Systems, and numerous academic journals, and was the co-founder and co-editor of the Journal of Experimental Political Science.

+ Full Transcript

(Background music plays)

Rosemary Pennington: Online trolls and bots are not new. As long as there's been an interactive Internet, there have been people spreading misinformation and distrust. However, during the 2016 US presidential election cycle, the work of trolls and bots became hyper visible, as did some country’s support of that work. Politics, bots and data are the focus of this episode of Stats and Stories, where we explore the statistics behind the stories and the stories behind the statistics. I’m Rosemary Pennington. Stats and Stories is a production of Miami University's Departments of statistics and Media, Journalism and Film, as well as the American Statistical Association. Joining me in the studio, our regular panelist John Bailer, Chair of Miami Statistics, and Richard Campbell of Media, Journalism and Film. Our guest today is Joshua Tucker. Tucker is a professor of politics at New York University, as well as Director of the University's Jordan Center for Advanced Study of Russia and co-director of NYU’s social media as political participation lab. Tucker has published with a number of co-authors research examining issues related to trolling, bots and fake news. He joins us in the studio today after traveling to Miami on a visit sponsored by the Havighurst Center for Russian and post-Soviet studies as part of the colloquium series on Russian media strategies at home and abroad. Josh thank you so much for being here today.

Joshua Tucker: Thank you so much for having me, it's a real pleasure!

Pennington: Before we get started with the meat of this conversation, can you explain exactly what a bot is?

John Bailer: Oh that was my question!

(Collective laughter, voices overlap)

Tucker: So, sure. When we think about actors who post content online whether it be a comment to a newspaper story or a podcast or a Facebook post or a Twitter post, we can think about a sort of continuum of ways in which that content could be generated. And one end of the continuum is a human being, who sits down, has a thought, types something hits enter, and it goes into the ether that way. At the other end is a computer algorithm, that is programming, that is producing content based on an algorithm. And that algorithm could be incredibly simple. It could just say every 15 minutes, say, “Hello world!”, right? That algorithm could be slightly more complex: it could be, every time the headline on The New York Times changes, you know, take the text from the New York Times, post the text and then put hash tag fake news after it and post that. Or, it could be, you know, every time ambassador- former ambassador Michael McFaul texts, you know, tweet at him and post this video. So it could be anything like that and we call that a bot. That's the definition that we use for it. Now, in between that is this kind of interesting category that has been given the label cyborg and a cyborg would be an account that has some human generated content and some content generated algorithmically. Now it's important when thinking about bots to understand that there are lots of legitimate uses for algorithmically generated content, right? So we might think of a helper bot that if, you know, did you mean to type this or something before you hit this? On Twitter, we might think of the National Weather Service, right? Which, you know, is tweeting something about a storm warning or something like that. That could be algorithmically generated. I write for this blog at the Washington Post called the Monkey Cage. We have essentially a Monkey Cage bot, which every time we post a news story, it posts the headline and it posts a link to that headline. So there's lots of…the term bot has pejorative content and actually in our research, we try to distinguish between that, by having something that we call an official account versus a bot for the purpose of our research studies. But for this sort of generic definition of the human to bot continuum here, it is important remember, like, a lot of content is produced by algorithms and there might be very legitimate reasons to produce that algorithmic content. There are also illegitimate reasons to do that as well.

Bailer: So how hard is it to detect a bot?

Tucker: Oh it's really hard.

Richard Campbell: And how do you distinguish it from people?

Tucker: Right. So it's incredibly hard, like…so the way we would want to do this, if we were going to start from scratch, right? And design a way to do this with a machine learning algorithm, the same way we would with any machine learning algorithm, if we're going to use a supervised machine learning model, we want ground truth, right? Ground truth would involve: we would want to know that there are accounts out there that are producing content by algorithms. We’d want to have a bunch of human accounts and then we want to train a machine to be able to distinguish the differences between them. The problem is, as scholars, if we're trying to find bots, how do we get to that ground truth and basically, if you think about if you really want to know that the content is a bot, there's only one real way you can definitely know that it's a bot, and that's if you program it yourself, right? So that's the first option you could do. We could, as scholars, go out there and build a whole bunch of bots. The problem would be then, if I built bots and then I built a machine to distinguish bots from humans, I would know what my bots were doing, so maybe we could potentially get around that a little bit and you could build the bots, John and then I could try to detect them and we could build my training model to sort of, based on the bots that you did, and I wouldn't know what rules you were using when you were doing this. The problem is, we're not particularly interested as a political scientist in that exercise, right? What we're interested in is, can we go into, in the particular case of the work we've been doing, can we go into Russian political Twitter and figure out which accounts are bots. So with that in mind, if we really want to know what these bots in the wild are, because if we build them ourselves, we're going to know the algorithm, we're going to know what we built them on. So we want to know…what we're really interested in is, can we detect bots in the wild. So there's essentially two ways you can do this. One is you need an informant, right? If you're going to do supervised machine learning, you would need an informant who's going to tell you, no no, I built these bots. These are, you know, I was paid by X to do them and I built them. Or you need to rely on leaked data, right? So you get leaked data, you don't have…which is essentially a passive informant, right? Like you think there's someone out there who tells you about it, but they don’t know they're telling you about it. And there are advantages to that, and there are, you know, landmark studies that have been done using leaked data. So for example, the King, Pan and Robert's incredible study, I know you had Gary on the podcast previously, their unbelievable study that they did about sort of upending everything we thought about these 50 centers in China, which turned out not to be 50 centers and they turned out not to be publishing antagonistic content, they published happy talk, right, like how great it was to be Chinese. That's based on leaked data. So they have real ground truth and the sense that like, they are, you know, very, very sure and they do an incredible job in that paper of trying to dismiss all the potential arguments as to why this could be fake. But leaked data is leaked data at the end of the day, and they were really, really careful about it. So that's one way you can do it. The other…but the problem with leaked data is, you can't do the study until somebody leaks the data. The other problem with leaked data, and I love that article and I think I find it incredibly convincing, that this stuff was not deliberately leaked. But when you're dealing with leaked data, you're trusting someone that the leak…that you know, there is some possibility, if we all rely on leaked data all the time to do these kind of studies, that someday someone's going to get played by this, right? That's the problem with leaked data. And so you get the…so there’s tradeoffs, right? If there was a silver bullet, everyone would do it. So what we did in our studies is we trained human beings to identify bots. And so the limitation to our study is, if the bots are so good that they cannot be trained by the human eye, our method is not going to be able to detect them. However, what's incredibly valuable about our method, and our goal was to come up with a method that you could deploy at scale, that would be transparent as to how we're coding them as bots, that would be replicable, and that would be retrospective. And so what we did to try to do this is, we took a big collection of data in which we were interested in, in this particular case it was Russian political Twitter data and we developed a coding framework. So we basically had someone from our lab, so Sergei Sonovich, who was one of the co-authors on the paper just looked at a ton of these accounts. And he got very good at distinguishing which were bots and which were not bots. But of course we can't publish a paper, a scientific paper that says the way we tell our ground truth is whether Sergei says it’s ground truth

(Collective laughter, voices overlap)

Tucker: So as much as we would like to have done that…so we trained students and so at the height of this project, we had 50 undergraduates in Moscow at the Higher School of Economics, working as a team, to hand code these bot accounts. And what's nice about the way that we did this is, that it is very much in the open science tradition. Our coding is transparent. We trained the students to do this. You could take the coding tomorrow and go get fifty Russians and recode all of our data with exactly the same coding instrument we used and see if you come up with the same topics. And so we and then we had at least 5 different undergraduates looking at each of these accounts. We only entered them into our data set as ground truth if we had an inner code or reliability of point 8 or higher. And so that's where we went. Now, does that eliminate subjectivity entirely? No it doesn't and does it mean that we might be missing something in the sense of, there might be accounts that are so clever, we can't tell or there might be accounts that are ambiguous and we sort of miss them. Those are the trade-offs for doing it. The nice thing about it is though we have our data and what we essentially did was, we built a nice little piece of software, so Denise Ducal, who was one of the co-authors on the paper and the lead authors on the technical papers built this nice little piece of software that takes all the Tweets from the same account in your collection and displays them in a way that looks kind of like a Twitter account, right? Because the other way you would do this is you just click on the Twitter account itself but that's not replicable because if you click on the account today and John you click on the account in 3 weeks, you're going to see something different. So and it's not retrospective either you can't say I really want to see what this looked like in 2014. But with our method of having a sort of cache data set, that you then display in this sort of quasi Twitter like page that means again that if you get 50 undergraduates who go and code this tomorrow they can see the exact same thing my 50 undergraduates coded and you can check to see if we if we get similar results.

Campbell: So jumping from definitions to findings, so what's the most interesting thing you think you've discovered in looking at Russian Twitter bots?

Tucker: So I would say there were three things we discovered that were super interesting. The first was the sheer quantity of tweets in our accounts that were produced by accounts that we labeled as bots. To be very clear, this is not a benchmarking of Twitter. We were using a very specific collection of data. We had a series of…we used Twitter's search A.P.I. to collect the data, which meant we fed the A.P.I. a whole bunch of keywords, we got all the tweets about those particular keywords. We then only kept in our collection, for the point of view of our analysis, we only kept accounts where at least 75 percent of the tweets were in Russian. So this was Russian political Twitter. We may have missed tweets about politics, we may have gotten other things, but we found, on average, across this now 4 year period that we looked at, that almost 50 percent of the content on any given day was being produced by bots. Now, that's the first thing that was interesting. So that's much, much higher than the baseline level that is reported as being produced by bots on Twitter. The second thing we found that was super interesting, was that a lot of these bots were simply tweeting news headlines. Some of them had links but a lot of them didn't. And if you think about it, it makes zero sense if you're trying to fool humans or convince humans about anything to tweet a news headline without tweeting a link to that news headline. So our best guess as to what's going on there and this is only a guess, is that these are accounts that were trying to fool other algorithms, not human beings, that they were trying to manipulate search engine optimization so that when you would see…so, which is kind of like a Google of Russia, had for a long time, this sort of most popular news stories on the first page. We think perhaps they were trying to manipulate that, perhaps they were trying to manipulate Google search, so that when things popped up and if you fast forward as in your introduction Rosemary in talking about the U.S. 2016 elections one of the things we think that was going on there was surfacing R.T. The Russian sort of state propaganda or state news agency that has a strong propaganda function to it, right? We think that this is one of the interesting things that was happening was that RT is sort of showing up high in search algorithms. If that's the case we may have seen the antecedent of that in Russian domestic politics. So I would say the second big thing we would take away from this is that, our sense is that if we think about bots only as trying to fool humans we may be missing an important part of the story which is interesting because going back to our original conversation about whether our method can catch increasingly sophisticated actors. Well, increasingly sophisticated actors to fool whom?. If it's increasingly sophisticated actors to fool algorithms that may not be really hard for a student or a human to detect. If it is increasingly sophisticated algorithms to fool humans that may be tougher for humans to detect. So that was the second thing we found and I started to…it changed the way I thought about bots a bit. The third thing we found that was super interesting was that, we discovered…so we came at this project because we were interested in how authoritarian regimes and competitive authoritarian regimes respond to online opposition, how are they dealing with this kind of new threat that's out there and we've written sort of extensively about this. We can talk more about that topic generally if you're interested in it. I just you know, we just saw, Richard you saw the talk which started with you know the talk I gave here at Miami, it started with about 15 to 20 minutes on this topic here, but that got us to bots because we had this whole classification of all different things regimes could be doing. And we wanted to study this how regimes are responding in the aftermath of the Arab Spring and the 2011 protests and after the Russian elections. So that's what we wanted to study, you know we had the whole classification and one of the things we do is, well regimes might try to change the nature of the online conversation a la the King, Pan and Robert's work in China and so we set out to look for actors that might be doing this. We set out to look for bots. We found bots, but the leap at that point, he said oh good! We found the bots, and now we can find Russian government’s you know, online strategy for dealing with online ops. No, it turns out not to be the case because what we quickly discovered is that not all of these bots were pro regime. In fact, it turns out that when we have enough tweets in an account that we could actually estimate the political orientation of that account, it turned out that only slightly more than a quarter of the accounts in our collection, did we classify as pro regime bots. Almost 50 percent of the bots were what we would call neutral bots that were sort of tweeting these new headlines and they were indiscriminate, or they didn't seem to have any sort of political message. But the interesting thing is we also found pro opposition bots and we also found essentially, what we call sort of pro Kiev pro the government in Kiev sort of pro Ukrainian bots that would be anti-Russia's activity in Ukraine. And when you sum up the combined activity of the pro opposition bots and the pro Kiev bots it's roughly about the same amount of activity, certainly roughly the same number of accounts that we could find as the pro-Kremlin bots. So there's two messages. I think one message is the substantive message which is that, as always, you know there's a lot going on in the online information ecosystem and simplistic explanations are probably missing sort of more nuanced stories and a lot of what we find going on in the online information space is always cat and mouse games. In retrospect it doesn't seem surprising that if we think the regime would have started using bots, that opponents of the regime would start using bots. The second thing is a methodological point, and it's for people who want to study this sort of thing, which is, if you build bot detection software because you want to find, you want to track the activity of a state actor, you are going to at best have a lot of noise in your data, if you don't actually take the secondary step of trying to code whether or not your bots are actually a pro state actor. At worst you going to have a lot of bias because you're going to be coding activity that is actually anti-state activity and I think that applies…you want to think about political campaigns, you want to think about any, oh Brexit, we assume all the bot activity is on the anti-Brexit side. I mean I haven't looked at this data, but any study I would do now that would be the next thing I would do that would check that. So those are the three big takeaways that were interesting to us.

(Background music plays)

Pennington: You're listening to Stats and Stories and today we're talking bots, data and social media with NYU’s Joshua Tucker. He's at Miami University on a visit sponsored by the Havighurst Center for Russian and post-Soviet studies as part of the colloquium series on Russian media strategies at home and abroad.

Bailer: Well you know, just a follow up on one point you raised there, I was very surprised to read about the neutral actors. That was the part when I was looking at that paper that really surprised me, to think about those surfacing. And you know consistent with when you're talking about that bots are often used as amplifiers for a message. You know I can see the pro and the anti bots coming into play, as we are thinking about government actors. But I was trying to think, well what's the role of these neutral actors and from your comment I was wondering if perhaps it's that this search engine optimization component or something related to that. (Voices overlap)

Tucker: So that's the best guess we have for them. I mean and it's possible you know, again, we were trying to be conservative in this. We did not want to make errors of classifying things as something where they weren't. So we were trying to avoid, we were trying to maximize precision at the expense of potentially higher recall. It is possible that one could build…OK, so by the way, classifying the political orientation turns out to be a much, much more difficult task, than classifying whether the account is a bot or not and we had to kind of go back and continually work on getting cleaner coding from students and being you know careful about this and we built, ended up having…we built the classifier that predicts the political orientation includes text. Whereas we try to build a bot or not classifier to not include textual features in the hope that it would transcend national boundaries but of course this is going to involve unigrams and bigrams and all sorts of things like that. So it's already based on just the text that's in there. It was a much more difficult classification process. Now you could imagine building, and again, this was the first time anyone you know, we'd never done anything like this. This is a kind of different approach to dealing with bot classification and so you could imagine a sort of next stage generation 2.0 of this political orientation that takes these neutral accounts and if they have a link, opens up that link, collects the text in the link and actually gets better at saying this and says you know what actually a lot of these neutral accounts, they're not saying anything that's pro regime or anti regime, but they're sharing newspaper stories that are pro regime. We actually, in other work that we're doing on the Russian troll accounts from the 2016 election, are working very intensely with this link data and so that's a secondary stage of analysis. So the first answer to your question is, it's possible that some of these, that we're calling neutral accounts, these are accounts sharing news stories. But those news stories have a particular slant to them. So that's one possibility. The second possibility is honestly, that this is just a sort of…these are media sources buying botnets, to you know, make many more copies of their headlines, of their news stories online and I still think that’s Occam’s Razor, that's the simplest thing. There's another…there are other possibilities for what these headlines could be. Like the other thing that I've come up with that you know seems potentially interesting is if you were trying to hide an account from Twitter. Let's imagine you wanted to hold back a whole set of bot accounts to be able to use in a crisis and we do find that the sort of…in our first paper on this, on just the bot detection, there's a sharp upshot in the use of bots immediately after the Crimea conflict began. So imagine you were trying to hold back a set of bots and you know that if you have an account that hasn't tweeted for 2 years and then it tweets 50 times, it's going to get shut down. And you have to build a history of that account doing something. But you know if you have an account that every night at midnight tweets hello world, maybe that’s going to get flagged. If you had to build something that would randomly tweet at different times and randomly produce different texts, saying take the top story from this newspaper, every time the upper right hand story on the front page changes, that's not a bad algorithm, right? Because that's controlled by a human. A human is making the decision when they change the news story. That's different, I mean I'm just making all this up, I'm speculating.

(Collective laughter)

Tucker: There’s no evidence that this is what's happening, but it seems like a kind of cool strategy. And then if you told it to do that like, I don’t know, 2 or 3 times a day, then when it suddenly tweets 25 times one day, then it maybe doesn't look so out of the ordinary. So that's another possibility of what was going on. I'm still an Occam’s Razor kind of person. I still think the most likely outcome is that these are the media companies themselves, are facilitating this. Whether they're paying for it, whether they're doing it. Now to be very clear, these are not…we had this official category in our accounts. You would not have gotten counted as a bot, if you were the official Washington Post account. That would be in our classification that would be an official account.

Campbell: Can you talk a little bit about fake news and some of your research on the fake news dissemination online on Facebook?

Tucker: Sure, I'll be happy to do that. So we recently came out with a study about a month ago where we were looking at the question of who shared fake news on Facebook. And if you want to think about, just to back up for a second here. There's been a lot of talk about fake news. But fake news is essentially, the study of fake news and politics is essentially four separate questions. They are of course interrelated. The first question is who produces fake news? The second question is, who disseminates fake news or who shares fake news? The third question is, who's exposed to fake news? And the fourth question is, what's the effect of being exposed to fake news? So if you want to get the full picture of sort of fake news in the 2016 election, you want to know answers to all of those questions. I'm going to start by saying we know almost nothing about the last part of that question. So anyone who wants to jump to the effect of fake news on the election was X. There's an awful lot of speculation about that, because we know almost nothing about the exposure to fake news. In fact, one of the more interesting results I've heard from a recent study is that when you prime people to think about fake news, they're not actually any better at identifying fake news, but they're more likely to think real news is fake. So we really don't know much about that last step at this point. We do know quite a bit from 2016 about who was producing fake news, right? There's this sort of Macedonian teenager, these kind of profit maximizers. Interestingly enough and almost frighteningly we think in 2016 it was more economic actors than political actors. Although to expect that that's going to continue in the future is probably folly now that people know you can do this sort of thing. Where our research came in was at that second stage which was who was actually sharing fake news. And you know one of the things about social media studies and the SMaPP lab does this all the time, is that the vast majority of research we've done is on Twitter data. And the reason we do research on Twitter data and everyone does research on Twitter data is that well, for the most part Twitter data has been much easier to get access to, in part because you know, the vast majority of tweets are public. People tweet about them being public and Twitter for the most part has been quite…has historically been quite good about making their data available for academic research. It’s gotten a little more complicated recently. But of course we know that...and there are these incredible reports from Pew, if you've seen these Pew social media usage reports, we know most people aren't on only one platform and if they are on only one platform it's Facebook, right? And Facebook is the giant behemoth in the room, so we do all these studies on Twitter. We do think they're important for policy, I mean it's amazing data. We're social scientists, like we can get data on 20 percent of population, that's really amazing, right? But if you can go to Facebook you can get data on 70 percent of the adult population and obviously that's you know that's where a lot of stuff is happening and that's the way the fields developed. And you know we're hoping that in the future, there's going to be more and more access to Facebook data for academic researchers, and we could talk more about that, because that's a whole other story right now, with the sort of push and shove between transparency and privacy concerns. But we were in a very unique position which is that going into the 2016 election we had run a panel survey where we had surveyed a panel of people three times over the course of the election and we asked the people who reported having Facebook accounts if they were OK sharing some of their Facebook data with us. So we had permission from them, we told them about it, we gave them you know, multiple opportunities to stop doing this thing and it turns out, much to our surprise, this is a bit experimental, that about half of our respondents who reported having Facebook accounts were willing to share some of their Facebook data with us for the purpose of this academic study. So we had Facebook data and we didn't get a lot of Facebook data from them, but we got what they posted over the course of the election campaign. So one of the challenges with survey data and media consumption is, as you guys know well, is that you're always relying on recall. We're asking people questions, right? The beauty of Nielsen is that there's a box on the T.V. So you actually know it was on even if you don't know who was watching it, right? And the problem and the beauty and social media data which is objective indicators of data often you have to estimate covariance and we can do really fancy things to estimate covariance with social media data, which is really kind of cool. But at the end of day you're still estimating. It's an extra step of estimation. When you have survey data linked to social media data, you don't have to estimate the covariance because these are people that you already know a lot about from the survey and you don't have to rely on recall. So what we were able to do after the election that we quickly realized, was we could actually look at who had shared fake news and so that's what we went and did. And this study is essentially, basically looking at the relationship between a number of demographic covariates and people who shared fake news and we very self-consciously didn't code fake news ourselves. We took a number of lists that were out there in the field and we could actually look at who posted this online. And the sort of takeaway findings from the study where the following: first, we found that in our study, less than 90 percent of our respondents shared any instance of fake news. There were a few people who shared a lot, the next highest number was people who shared one story. So consistent with a number of other studies that have come out recently, we found that sort of this part of the behavior, the sharing of fake news is a high activity among really, really small numbers of people and nonexistent among over 90 percent of our sample. That was the first thing. The second big finding however that came out of it was that we did find that on average people who were older shared significantly more links to fake news websites than people who were younger and in particular we had this kind of headline finding where the over 65 in our study shared, on average, seven times as many links as the 18 to 29 year old in the study. So that was the second big finding.

Bailer: So the big question for me is, when I hear about this is, what…you talked about interventions to increase discernment of such information. So what ideas or what kind of suggestions, recommendations do you think about…I mean even if you don't have a clean call answer of what's going to work, what are ideas for making people kind of less prone and less susceptible to the effects of these news?

Tucker: Well I think you know I think the thing that comes out of our study that's important is that if your reaction to fake news is let's go into high schools and teach digital literacy, right, that's not going to solve the issue, right? And it's a great thing like my kids are learning digital literacy you know in high school and that's wonderful for you know for them to do that. I think kids are going to consume…the kids who are in school right now are going to consume, the vast majority of information they consume about politics online, we should teach them digital literacy. But that's not going to…that's not going to deal with this. So if we do think that teaching digital literacy, whatever that is, right, is an important antidote to sharing fake news, to you know, to being convinced by fake news and things, we have to think creatively about ways to do this outside of just the schools, right, and in particular we might want to do it, we might want to over emphasize this kind of training with people for whom they have encountered this technology at a point in their life that is a later point in their life where they have consumed news through other vehicles previously. However with that being said, I think we need to learn more basic scientific understanding of what is determining why people are sharing fake news. And let me give you…I think the fundamental crux of the matter is, we need to know if people sharing fake news…and this is the next study that we want to do, we want to understand whether people are sharing fake news because they think it's true. And therefore if we can get them to understand that it's not true, they won't share fake news or if people are sharing fake news because either they don't care if it's true or they know it's fake but they're sharing it out of a, the hope that it will convince other people or b, because they're just using it to display identity, right? If you just want to say oh look! The Pope endorsed Donald Trump, right? I know that's ridiculous but I'm going to share that with all my friends so they know how much I like Donald Trump and how much I dislike Hillary Clinton. If we don't get this sort of underlying story straight here, if you design interventions that are designed to solve a problem that's based on people sharing fake news because they think it's false, right, because they think it's false, you think oh, if I can just convince them it's because they think it's true and I can just convince them it's false they won't share it, if people are sharing it and don't care whether it's false or not those interventions are going to fall flat. Moreover I think we really before we start running out and doing this stuff we want to understand what the effect on people is of priming them to think about fake news, right, and what's dangerous about the current moment is that there is such a furor, an uproar about so many things happening online, with fake news being one of them, right, that we are rushing out with potential solutions before we understand the basic process that's underlying here. And so I think this is really a moment where we need basic research and we need that research because there's intense pressure on policymakers to do something about this. I mean if you think about this for a second, what you have is you have lots of people calling on the platforms, right, to immediately get rid of fake news. Sounds great as a normative concept, right, but once you say you have to get rid of fake news they've got to decide what fake news is, right? I didn't even want to decide what fake news is, as a scholar who is studying it, because I didn't want it to get in the way of my analysis and be a potential bias, and what I was doing is analysis. So we are asking, we are essentially talking about potentially outsourcing decisions on what's legitimate news and what's not legitimate news to giant multinational corporations and if I came in and said you know what I think we should legislate that Exxon Mobil, I mean Exxon gets to decide what goes out in the New York Times, everyone would look at me like I was crazy. But if I say Facebook should be legislated to take down things, stories that Facebook determines are false, right, then people are saying, this is what people are calling on them to do. So I think there's a lot at stake here because of the huge pressure on governments, on the platforms themselves and it calls out for more basic research and a lot of it and quickly.

Pennington: That's all the time we have for this episode of stats and stories. Josh, thank you so much for being here today.

Tucker: It was my pleasure!

Pennington: Stats and Stories is a partnership between Miami University’s departments of Statistics and Media, Journalism and Film and the American Statistical Association. You can follow us on Twitter, Apple podcast or other places where you find podcasts. If you’d like to share your thoughts on the program, send your email to or check us out on and be sure to listen for future editions of Stats and Stories where we discuss the statistics behind the stories and the stories behind the statistics.