Filippo Menczer is a professor of informatics and computer science at Indiana University, Bloomington. He is an ACM Distinguished Scientist and a Senior Research Fellow of The Kinsey Institute. His research focuses on Web and data science, social network analysis, social computation, Web mining, and modeling of complex information networks. His work on the spread of information and misinformation in social media has been covered in many US and international news sources, including The New York Times , Wall Street Journal, Washington Post, NPR , PBS, CNN, BBC, Economist , Guardian, Atlantic, Reuters, Science , and Nature. His recent work includes Hoaxy , a way to "visualize the spread of claims and fact checking."
+ Full Transcript
Rosemary Pennington: 2016 saw the rise of the term ‘Fake News’ used to label propaganda, hoaxes and disinformation, there has been some confusion about what exactly fake news entails. Organizations like ProPublica which supports and produces in-depth reporting have launched efforts to help consumers identify fake news. Researchers at Indiana University have also joined the effort, launching Hoaxy. The project tracks the sharing of links to fake news across the internet by collecting public tweets, linking to particular organizations or sources of information. Hoaxy’s the focus of this episode of Stats and Stories where we look at the statistics behind the stories and the stories behind the statistics. I’m Rosemary Pennington. Stats and Stories is a partnership between Miami University’s departments of Statistics and Media, Journalism and Film as well as the American Statistical Association. Our regular panelists are Department of Statistics chair John Bailer and Department of Media Journalism and Film chair, Richard Campbell. Today’s guest is Fil Menczer, professor of informatics and computer science at Indiana University, as well as the director of the Center for Complex Networks and Systems Research, he also helped develop Hoaxy at IU. Thanks for being here this afternoon, Fil.
Fil Menczer: Thank you for having me.
Pennington: Just to start off the conversation, how did Hoaxy come about?
Menczer: The research behind it has been going on for several years, we are interested in the spread of information, and misinformation on social media, so we build lots of models and theories about what are the factors that may affect their spread and their virality, and also the competition for our limited attention between fake news or other kinds of misinformation and fact checks, or debunking information. So to verify and validate these models we have to collect data. So we started collecting data about people who share links to either fake news websites or fact checking websites, or hoax conspiracies and so on. So initially it was mainly to sort of….as part of our research … to validate our models, but then we realized that this data might also be useful for others, other researches, some reporters, the general public, and so we started working on a public facing service, which eventually became the Hoaxy website, and also an API, which is a set of tools that people can use to programmatically access the data that we are collecting. So, we designed this site and this set of tools to make it easy for people to study, analyze and visualize how misinformation and fact checks compete and how they spread through the social network from person to person. Who are the main influential spreaders, and what are the temporal dynamics of these events, and so you can observe this directly on our website now.
John Bailer: You use the term ‘model’ and you use the term ‘network’ so if we could just take a step back and have you talk a little bit about why is it that a social network has implicitly imbedded in it the idea of some structure, some data structure, and then, can you talk a little about that and then the types of data that might be collected on that.
Menczer: Very good question, in fact I should also say…you mentioned models, of course, in different disciplines the word ‘model’ means completely different things. So the way that statisticians might use the word model, or mathematicians, or physicists, or biologists are completely different things. When I was talking about models I was referring to agent based models that are basically simple theories of how, in this case, information may spread from person to person, where you imagine that there is a network, a graph, connecting users, social media users, according to some rules and then people follow some simple rules to pay attention to different things that they are exposed to on this network so the structure of the network is very important, who you are connected to determines what you see. So we’ve actually spent quite a bit of effort studying the structure of the networks that are induced by the exchange of information on social media. We have been collecting data from twitter for several years, since 2010. We collect a sample of about 10% of all public tweets, and we have all of this data stored on a cluster and actually available for researchers to use. This is a different project, it’s not Hoaxy it’s something else called The Observatory on Social Media, also known as Truthy. So we can look at the people who are talking about some subject, for example some hashtag on Twitter, let’s say the name of a presidential candidate, for example, who’s talking about Obama or who’s talking about Trump etc. And so you can build a network where the nodes are the accounts, the Twitter accounts and the edges between these accounts represent how information about that topic, let’s say for example a hashtag, spreads from person to person. So for example you may post a comment or a tweet about Obama, I don’t know ‘I like Obama’ or ‘Obama’s a Muslim’ or whatever, and #Obama and then I retweet it. Well if I retweet it, it means that that information has passed from you to me, so we can represent that with an edge, with a link connecting the node of your account to the node of my account. Now, do this across millions and millions of tweets and you get these networks, and when you visualize these networks or when you analyze them mathematically, you see that there is a very interesting structure, they’re not random networks. You are much more likely to be connected to other people who are similar to you, that have similar opinions to yours, political tendencies maybe who live in the same geographical area. These are well known things about social networks, there’s a word for it, it’s homophily, right? People like to be connected to other people who are similar to them, and that’s very natural, we’ve observed this in social networks for a century. But, of course in social media like Twitter and Facebook, which are used to spread information and some people use it as their main source of news, this homophily also means that what you see is based on what is shared by your friends and your friends tend to be like you, and so the view that you have of the world, the view that you have of the news, the events and opinions is very very biased. So understanding the structure of these networks, the community structure of these networks, the clusters, some people call it the 'echo chambers,' in which you find yourself, helps us understand how opinions can be affected by the use of these social media, and also in some cases, manipulated by people who abuse social media specifically for this purpose.
Richard Campbell: Can you talk a little bit about, in your own work, how the news media have covered your work, what they get right, what they get wrong, what makes you mad about what they get wrong? We feel like we have strong obligation here to help our journalism students tell stories about numbers, and it’s a challenge for many of them, so maybe talk a little bit about your own experience, since I know you’ve been covered by lots of major national media, talk a little bit about that.
Menczer: Well I have to say, that in the great majority of cases I feel very lucky, I think we’ve had a lot of really good coverage. I would say almost all of the reporters I ever talk to are good and professional and they do their research and they write mostly accurate things, occasionally there is some small inaccuracies but that’s more than normal, so I actually admire journalists and reporters and media professionals a lot. Unfortunately there are also some that, perhaps we shouldn’t even call them journalists or reporters, but there are people in the media world who are not necessarily interested in really sort of talking about facts, but they may propagate stuff with a very very strong bias. So, we have also, unfortunately have to deal with some of those, so those would be the negative experiences. There was specifically a period of time in 2014 prior to the midterm elections, so let’s say between August and November, in which we were targeted by, what I would call, an organized misinformation campaign, that attempted to depict our research in a very bad light and in fact it was mostly based on completely fabricated nonsense. But, it spread like fire and it went viral and it spread in a portion of the social network, of the social media inhabited by people who were predisposed to believe that sort of misinformation, so we experienced firsthand sort of the spread of misinformation about our own research and it was pretty bad, because we were the target of a lot of hate messages and mail and of course it was not nice to see our own research completely misrepresented on TV by one channel in particular, so in that case that has not been pleasant.
Campbell: How did you counter that?
Menczer: We tried different things. So this started with an article in a hyper….what I would call a hyper-biased political blog which took a sentence from an abstract of an NSF grant that we had several years ago, and took that sentence out of context and tried to make it sound like it was something terrible and ominous. We later realized that this particular blog did this systematically, so they basically combed through NSF abstracts, looking for sentences that could be taken out of context to make it sound like it was silly research and then they would write an article where the title was “The Feds spend ‘X’ amount of dollars to do ‘Y’” and the idea was to show that research is biased or research is silly or the Federal government wastes research funds on silly projects. These are all taken out of context of course, but the result is that people who pay attention to these sources get the idea that the federal government is crazy and wasting tax payer’s money and so on. So this particular story the title, because there was a sentence in the abstract of our grant that was talking about broader impact, so things that our work could be applied to in the future and of course we were talking about a system that we wanted to build, which we now have built, it’s called the Observatory on Social Media, to collect data from Twitter and help study how information spreads from person to person, and since we had observed some abuses of social media such as social-bots and astroturfing, in our broader impact statement we said ‘if we understand how these abuses work, then in the future this understanding could lead to tools that will help preserve free speech.’ Because of course these tools can be used to suppress free speech, to make it look like there’s people talking about something where it’s not true or make it look like some people are popular when they’re not, or having automatic trolls that attack and insult people, so these are examples of suppression of speech, but this article made it sound like we were specifically targeting conservative speech or conservative users and somehow we had the magic power of suspending their accounts and this was a secret program of the federal government, specifically of the Obama administration, to spy on citizens, of course this was completely fake and fabricated, but it picked up like wildfire, it was on Fox News, and eventually it escalated all the way to a formal investigation by the chair of the house committee on science and technology, so how do you react to it? Well we put out information, posted on our research website, on our blog explaining that this wasn’t true, but it didn’t really matter. Those who were spreading the misinformation were not necessarily interested in accuracy, they just wanted to get people riled up. I mean within a couple days this was debunked, there was an article in the Columbia Journalism reporter there was an article in a UK newspaper, we had an interview in the Washington Post, so it was widely debunked, it was pretty clear to anybody who wanted to look at it that it wasn’t true, and there were two articles in Science. But the ones who were spreading the misinformation, they wouldn’t interview us, they wouldn’t call us, right? So that’s why I say it was really an organized campaign, it wasn’t just a mistake. Also the stories they were telling about our research were kept changing over time. So we would say ‘well it’s not true that, I don’t know, we’re building a secret database for the federal government’ and then two days later there would be a new article that says we were collaborating with the FBI to track people posting memes, image memes and report them to the government, or something like that, or that we were getting 100 million dollars to do this work for the federal government, they just got crazier and crazier and so it was hard to keep up. Of course the university helped us and we had some statements released about that and several of the professional organizations and scientific organizations for computer science, math etc. published a letter indicating that this was all false, but it just kept going and going until the elections.
Pennington: You’re listening to Stats and Stories where we discuss the statistics behind the stories and the stories behind the statistics the topic today? Fake news and social media. I’m Rosemary Pennington joining me are panelists Miami University Statistics department chair John Bailer and Media Journalism and Film department chair, Richard Campbell. Our guest is Fil Menczer, professor of informatics and computer science at Indiana University and co-creator of the Hoaxy project. Fil maybe you could take a moment and sort explain what it is that Hoaxy can do and what it can’t do, because it’s linked to this idea of fake new but from what I understand, you’re not scrubbing the content of tweets to understand if they’re fake you’re just really look at how they map across a network?
Menczer: Yes that’s right. Hoaxy is definitely not a fact checking tool, I want to make that clear. We’re not media specialists, we’re not journalists and we’re not able to look at a claim and say whether it’s true or not, we don’t have the expertise, we could do it by hand but certainly we couldn’t do it for the millions of claims that we track. So we don’t do that instead what we do is we allow you to visualize how claims are spreading from person to person, how they are going viral, so we rely on lists of websites that are generated by some well-respected media sources and fact checking organizations, these include both fact checking sites, places like Snopes and Politifact, and Factcheck.org and so on, as well as a bunch of site that are often indicated by fact checking organization as posting claims that are not verified or that are fake news, or that are conspiracies or things of that sort. We do go to these websites and read the articles that are posted there, so that you can basically search through them, so that’s the first piece of our tool, it works like a search engine, you put in a few key words and you find all the claims, the articles that are written that match your quarry, then you can choose to look at some of them more in-depth and perhaps you might find some articles from fake news websites and some articles debunking information from fact checking websites and then you can visualize how those particular articles, or links to those articles are spreading through twitter from person to person, so you see a network and a node represents an account and the size of that node represents how influential that account is in terms of being retweeted or quoted, you know, basically spreading that information whether it’s a claim, a valid claim or an unverified claim or fact checking and so on. The edges, like I said before, represent people tweeting or retweeting or quoting or mentioning each other and we use two colors to represent whether the links are on websites that are known fact checking sites or whether they are these sites from this list that we got that are often posting claims that get debunked.
Bailer: With the spread of viruses, when we think about health, we often consider…. Are there inoculations, can a vaccine be developed? Is there an equivalent of trying to develop a digital vaccine to try to help us inhibit the spread of such misinformation viruses?
Menczer: Very interesting question. I just, a couple of days ago, read an article, it was a report, I think it was a press release about an article, which I put on my to-read list, so I haven’t read it yet so I can’t speak about it in detail but basically it was a study done by some researchers who have tried to think of it as a vaccine. So they showed, from what I recall about the study, they had a set of people to whom they showed some reliable information and some misinformation and then they just looked at whether the certainty that these people had in trusting the accurate information, how much it decreased when they were exposed to misinformation. Then they compared this with another group that was sort of inoculated with a sort of information virus where before being shown this accurate or inaccurate information they were told ‘sometimes there are articles that are meant to confuse or to post fake information. Then this second group, according to the study, their trust in the accurate information did not decrease as much when they were exposed to misinformation. So, who knows? Maybe this idea might work, I think that it’s probably just a beginning and more research is needed, but more in general, there is a lot of work that needs to be done to understand the role of fact checking. And there is some contradictory evidence in the literature about fact checking. There is some work that has shown that fact checking can backfire, it can have the opposite effect to what you expect. Not only that people don’t change their mind but that their previous beliefs based on misinformation may actually be reinforced by being exposed to related information even if it is debunking earlier misinformation that they have been exposed to, and then there are other studies that show that instead there are cases in which being exposed to fact checking helps move opinions, at least a little bit. So, I think this is an open area for research and we would very much like to explore it, in fact we are in an ongoing conversation with some colleagues who work in the cognitive sciences and also sort of, social psychology and we would like to work with them to come up some, both experiments and models to help understand what are the conditions under which fact checking may work, or it may not. Again, there is work, there is literature in this field but I think that there is still a lot of work to do to really understand what can be done in terms of fact checking. Of course we might think of other ways to mitigate misinformation, perhaps if you’re not exposed to misinformation at all, that might be also good, but at the same time we don’t want to censor information, but fact checking is one particular approach where we simply say ‘well that’s not true’ well it’s not clear whether doing that always helps, or maybe there are different ways of doing it. There are articles that suggest if you just say something is not true, it is counterproductive, whereas if you simply state something that is true that contradicts perhaps previous misinformation that a person had been exposed to, then maybe that might be more effective. But I think we’re very far from having a clear understanding and agreement, or a consensus of when fact checking works and how it works, because the effects are also very different from person to person. So I think it’s a very open area of research.
Pennington: You’re listening to Stats and Stories and our discussion considers how scientists study the way that misinformation travels across the internet.
Campbell: You’re talking about the fact checking and one of the things that struck me, particularly in this last election, was how often fact checking sites were targeted as biased, you know that this was part of the fake news circulation, the same thing that you experienced. Where disinformation campaigns were aimed at discrediting scientists, legitimate journalists. I mean in this era of alternative facts how do you combat this? When your own work is exposed to this disinformation, or the work, the legitimate work of fact checking journalists is under siege.
Menczer: Yes, it’s a scary situation that we find ourselves in. I don’t have a good answer, I agree that it is extremely concerning, you might say that it’s not surprising if somebody are trying to manipulate public opinion in some way, whether it is through social bots or fake news websites or in any other way then of course, when they are being accused of being fake news, why wouldn’t they try and use that against their accusers? And they may be successful at creating confusion in people. This is a well-known technique it has been used for a long time. You don’t really have to convince people of something fake being true, it’s enough to just create doubt. Because many people, of course, do not have the time and resources and interest in digging though things and spending a lot of time reading and making up their own mind. They just hear thing casually here and there, they maybe look at a headline that flows on their iPhone or something that they hear on the radio in the car and they don’t spend time to actually do the research, is this true or is this not true? So of course people can take advantage of that to create that confusion and it is very sad because it is creating huge challenges for scientists. Scientists are having a hard time right now, communicating their work and possibly the important implications for policy, whether it’s things related to climate or to pollution or to anything you want because if there is somebody who has an interest that counters the evidence in science, they can just attack science, and we’ve observed that among some politicians unfortunately and I’m not exactly sure….my hope is that most people do not enjoy being deceived, they don’t know that they are deceived, so we need to figure out ways, effective ways to communicate and to let people realize when they are being deceived because probably once they do realize it they will be more careful about what they pay attention to and whom they believe. But the technology that we have now, especially social media, were not designed with the goal of helping people protect themselves from misinformation, they were designed to help people get engaged. So that you’re more likely to see a post that you’re likely to interact with or click on or engage with or comment or from somebody that you like or from a close friend or from somebody that you have interacted with before and unfortunately all of that because of the echo chamber phenomenon that I was describing before, this homophily that leads us to being surrounded with people with very similar opinions to ours, all of that is conducive to us being tricked. If I’m a liberal I’m more likely to be duped into believing in news that tells me that some Republican person is evil. And likewise if I’m a conservative I’m more likely to believe some piece of fake news that tells me that some progressive person is evil. So we’re not duped by our adversary, we’re duped by fake news that is believed by our friends. The structure of social media may make things a little bit worse right now and exactly the extent to which that happens is something that we need to study and understand better, but at this point it would seem that our reliance on social media is making us more vulnerable, and therefore we are more confused, we are less informed and therefore people can take advantage of that and manipulate the medium, so it is a worrisome situation right now for sure.
Bailer: You know when you talk about some of these ideas, I’ve read you describe things as the attention economy, in that there’s this competition for our hearts and our minds and our thoughts. I wonder if in that setting, there’s this need to find who’s going to be our trusted curator of information, who are we going to follow? Historically there were very few sources that we might plug into to get news, and we were counting on them to process and curate. So how can that work now, or can it work now? Given this broad social network in which, you’re saying, a vast amount of news is being encountered.
Menczer: You’re asking really good questions. It is certainly true, and has become more true in the last few years, that as we are inundated with more information the scarce good is our attention, because we are not capable of digesting all the stuff that gets posted for example. That is the result of our shift from one to many platforms, or broadcast platforms like traditional media, TV, newspapers and so on, to the web or social networks, where it’s more of a many to many model. It has lowered, hugely, the cost of producing information. There’s a lot of good about that right? Because now anybody has the power to produce and broadcast information and there are a lot of examples where that is a wonderful thing. Imagine for example, people who shoot live videos, so we have witness accounts of situations that until recently we didn’t have. These are uncovering some things of which we were unaware, so I think there are lots of ways in which this is a good thing. However, like any technology it comes with good and bad, it comes with advantages and perils. So the bad here is that as everybody becomes a producer, the consumers of course, our… the amount of attention that we can dedicate to whatever activity, reading news or getting informed etc. well that stays the same and so how do we deal with this huge information flow, some people call it information overload. Social media try to do that for us right? They try to act as filters, and that’s why we see news from our friends. But now we know that that’s a very biased method that exposes us to a very biased set of sources of information, so it comes with its own disadvantages. You know, concurrently with the emergence of social media and new technology of course we’ve seen the decline of traditional media, so you know a lot of traditional broadcasters are struggling economically, so people don’t trust them, so at the same time in which people are trusting less sources of information that actually follow traditional ethics in making sure that they don’t spread misinformation, at the same time it is very hard for us to tell the difference between a trusted source and untrusted source. If you see something on your Facebook screen or your Twitter screen, your information goes to the headline or maybe who shared it, not so much who is the source. Now of course these things could change and in fact we are happy to do research and work with technology platforms to help figure out ways that we could help people understand what is the source of a piece of information. But the way things are now, the major mediators of information, or curators or editors as you were saying earlier, are mostly gone, and we don’t really have a good replacement for them. We have popular Twitter accounts and popular Facebook accounts but there are plenty of sources of unreliable information even fake news or misleading information that are extremely popular. Because the popularity on a social platform is not just the result of accuracy but of many other factors, like your own political opinions for example. So this is one of the things we are in fact studying, something interesting or popular, how more people generate information in that area, even if they have nothing new to propose. You know if everybody is talking about some event and you’re a fake news source, then it’s easy for you to generate some fake news on that topic, it will get attention because everybody is interested in that topic right? And then of course you can make money out of ads or whatever it is the reason why you’re doing this. So people exploit the attention economy for this kind of manipulation. Our research also is showing the more you’re overwhelmed with information and the more limited attention you have the more likely the system is going to help unreliable or poor quality information go viral. Even assuming that each individual prefers to share reliable information or high quality information, so this is something that our models predict, the interesting part is again, we’re not assuming that people want to share misinformation, everybody is trying to…for example you look at ten things on your feed and then maybe share one. Well, maybe if instead looking at ten you looked at twenty you might have found something that tells you that that thing was fake, or a more reliable source on the same topic. But because you only looked at ten or five or three, you’re more likely to share something that is not accurate.
Pennington: Thank you for spending time with us today talking about this, I think we could probably talk about this for another half hour if we had the time. Good luck with Hoaxy and Truthy in the future.
Menczer: Thank you
Pennington: That’s all the time we have for this episode of Stats and Stories it’s a partnership between Miami University’s departments of Statistics and Media, Journalism and Film and the American Statistical Association. Stay tuned to keep following us on twitter or iTunes. If you’d like to share your thoughts on the program send your email to email@example.com and be sure to listen for future editions of Stats and Stories where we discuss the statistics behind the stories and the stories behind the statistics.