Popping Filter Bubbles | Stats + Stories Episode 238 / by Stats Stories

Dr. Francesca Tripodi is a sociologist and media scholar whose research examines the relationship between social media, political partisanship, and democratic participation, revealing how Google and Wikipedia are manipulated for political gains. She is an assistant professor at the UNC School of Information and Library Science (SILS), a senior faculty researcher with the Center for Information, Technology, and Public Life (CITAP) at the University of North Carolina at Chapel Hill, and an affiliate at the Data & Society Research Institute. In 2019, Dr. Tripodi testified before the U.S. Senate Judiciary Committee on her research, explaining how search processes are gamed to maximize exposure and drive ideologically based queries. Her research has been covered by The Washington Post, The New York Times, The New Yorker, The Columbia Journalism Review, Wired, The Guardian, and The Neiman Journalism Lab.

Episode Description

Have you ever wondered why a search engine result for undocumented workers in North Carolina provides links to worker rights sites, while a search for illegal aliens in North Carolina would lead you to immigration concern sites? Did you know that Wikipedia entries for women have a higher recommend rate of deletion than entries for men? The heart of those questions is the focus of this episode of Stats and Stories with guest Dr. Francesca Tripodi.

+Full Transcript

John Bailer Did you ever wonder why a search engine result for undocumented workers in North Carolina provides links to worker rights sites? While a search for illegal aliens in North Carolina would lead you to immigration concern sites? Do you know that Wikipedia entries for women have a higher rate for recommended deletion than entries for men? Today's episode of Stats and Stories focuses on information literacy, including insights related to parallel internets, algorithmic silencing, and so much more. I'm John Bailer. Stats and Stories is a production of Miami University's departments of statistics and media, journalism and film, as well as the American Statistical Association. Joining me on the panel today is Regina Nuzzo, professor at Gallaudet University and freelance science writer. Rosemary Pennington is away. Our guest today is Francesca Tripodi, professor at the UNC School of Information in library science, a senior faculty researcher with the Center for Information Technology and public life see tap at the University of North Carolina at Chapel Hill and affiliate at the data and Society Research Institute. Tripodi is a sociologist and media scholar whose research examines the relationship between social media political partisanship and democratic participation, revealing how Google and Wikipedia are manipulated for political gains. Dr. Chip Prodi testified before the US Senate Judiciary Committee in 2019. On her research, explaining how search processes are gamified to maximize exposure, and drive ideologically based queries. The Washington Post, the New York Times, The New Yorker, the Columbia Journalism Review, wired the guardian. And now, stats and story arcs are covering Francesca's work. So, Francesco, welcome, and we're so excited to have you join us today.

Francesca Tripodi
Thank you so much for having me. It's really fun to be here today.

John Bailer Great. Well, to start our conversation, I'm just curious, what was the career path that led you to study how we consume and engage with media?

Francesca Tripodi
That's a great question. I'm a sociologist, so I started looking at social problems and issues of power within society. And I'm also a background communication scholar and I lived in Los Angeles for a while. So I was really obsessed with the media. And kind of looking at the intersection of these three things is what got me going on my research track. Once upon a time, and advisor stopped me and said, you know, you actually asked sociological questions, and I didn't know what that was. So I went back to school and got a PhD in sociology.

Regina Nuzzo
I'm a little curious about your work with Wikipedia and gender inequality. So you have a paper about that. What kind of big take home? When did you find out in that study?

Francesca Tripodi
Absolutely. So the paper title, I think, says it all. It's called miscategorized. And that paper looks at how women who have biographies on Wikipedia, which is already a minority of biographies available on Wikipedia, or more likely to be flagged as non notable subjects, and nominated for deletion than men who have biographies on Wikipedia. And I looked specifically at the English language, Wikipedia, which I think is really important to note.

Regina Nuzzo
So what happens when they are flagged for deletion? As a woman who had a Wikipedia page? Do I need to be concerned about that and go in defending myself? Well, first of all, what did notability actually even mean? Like the definition of that?

Francesca Tripodi
That is an excellent question. So Wikipedia notability is criteria used within the community to determine whether or not a topic deserves a Wikipedia page. And for people in particular, that notability is determined regarding whether or not independent outlets whether that be a newspaper, or for example, an art gallery has featured information or artwork or content about the subject. So for example, for professors, my research itself does not constitute Notability. It's whether or not my research has been the subject of independent investigation from myself. Does that make sense?

John Bailer
So could you just take one step back from this and talk a little bit about how an entry even shows up in Wikipedia in the first place?

Francesca Tripodi
Sure. That's a great question. So Wikipedia is a website that anybody can use. So anybody can put up an article about somebody on Wikipedia

John Bailer
About themselves, right? You couldn't put that up about yourself?

Francesca Tripodi
I cannot. Technically, there are. You're right, there are issues of conflict of interest and also regarding neutral points of view. And so an article is assessed based on no audibility based on whether or not it is written from a neutral point of view. And if I'm writing an article about myself, clearly, that creates a conflict of interest in which I can't myself determine that I'm notable, somebody else has to determine that. But anyone can write a page about somebody else on Wikipedia. I would like to also clarify, this is a huge problem on Wikipedia, because PR firms are constantly trying to add people or organizations or actors, for example, onto Wikipedia that are completely irrelevant subjects in the and Wikipedia is really trying to differentiate itself from the Yellow Pages, for example. And Wikipedia is also very widely used on search engines like Google. So they want to protect the content that's up on Wikipedia, because they want to have themselves known as an encyclopedia and as a very credible source of information. And so this is part of the story that I'm really glad I get to tell deletions are important. Establishing notability is important. But unfortunately, because of these widely held gender biases, women tend to be perceived as less notable subjects overall, not necessarily because notability or deletion is, quote unquote, bad.

Regina Nuzzo
Isn't there a bit of a self fulfilling prophecy here. So if you have a Wikipedia page, then you're more likely to be featured independently, and to meet the qualifications for Notability?

Francesca Tripodi
In that, absolutely. And the self fulfilling prophecy goes even a little bit deeper than this. So some really great researchers have already shown that the qualifications that established notability are inherently gender biased. For example, if you're relying on the news to cover a subject, or you're relying on a book to cover a subject, or an art gallery to cover a subject, there is so much work that shows that men are more likely to be featured. And also this intersects with race. We know that white men in particular are more likely to be featured in these spaces as experts or in art galleries. And so it's most definitely a self fulfilling prophecy. Unfortunately, my research shows that even when women are able to meet all of these hurdles already in their way, they are still more likely to be seen as less relevant or less notable subjects than their male counterparts. And this is just based on peer review of people who volunteered their time for Wikipedia.

John Bailer
So what kind of data did you collect and analyze as part of the study? How did you accumulate this information? And how did you do this comparison?

Francesca Tripodi
Right, so I'm, I'm really excited to be on a podcast sponsored by statistics because I, myself am a sociologist, but my research is highly qualitative. I'm an ethnographer by trade, if you will. And so my research started with ethnographic observations of edit a thons and edit a thon Are these like small meetups of people that get together and try to do things like add women to Wikipedia. And I was going to these editor thorns and I saw that person's being added during these edit thons were being deleted. Sometimes during these edits to thoughts, I would do interviews with persons who had volunteered all this time to come to the editothon, only to find out that something they added was deleted after the fact. And the frustrations were really palpable amongst these volunteers because they had put in a great deal of time and energy to try and close this gender gap. And for those who might not be familiar, the gender gap is a very widely documented fact that of the millions of biographies available on English language, Wikipedia, less than 20% of them are about women. So there are these editithons that are there to say, hey, let's close this gap. Let's get this representation up there. And so during these ethnographic observations, I started documenting what seemed to be like a very clear example of gender bias happening just kind of in real time. Unfortunately, when I brought these concerns to peer reviewed journals, or to other wicked wikipedians qualitative data has a difficult time being believed and so many people told me these were anecdotal stories or my favorite review, then this was from a top tier academic publication told me that maybe these women just weren't notable or that these people that were editors were just being too sensitive, which of course is like, very gendered language and made me really angry. And so instead of just scrapping the project I partnered with an extremely talented data scientist who is at UVA scholars lab, I was afforded a dissertation fellowship through UVA scholars lab. And this amazing data scientist, Eric Rochester, wrote a script for me and we scraped all the articles for deletion over a 10 year period. And then he filtered them based on biography. And then I partnered with another computer scientist here, actually an undergraduate who helped me with a script of data match with wiki data so that we could look at the pronouns used, that's how we identified gender was based on the pronoun used either within the article, or within the discussion of the article. And then I just did basic descriptive statistics to see well, are these proportions matching up? And then I did chi square analysis to determine that these were statistically significant proportions that we were seeing. So I did a little bit of statistics. I dabbled,

Regina Nuzzo
Of course, we love this. We love any kind of statistics. I feel like I have to ask a question. And maybe I'll sound like the peer reviewers, you know that we're trying to shoot down your, your original paper. But we're all of these articles about current living people who are recent history, because unfortunately, we don't have that much information about women in all of history. Right. So I was wondering if that might explain the gender difference there.

Francesca Tripodi
Sure. So in terms of what I was looking at, I think it's helpful to explain what I was focusing on specifically, within articles for deletion, any article can be nominated as just usually the criteria is not notability, specifically with biographies. And then I looked at the decision rendered. So after an article is nominated for deletion, it has roughly seven days where people who edit Wikipedia, and basically anyone can put their feedback about whether or not the article should be included or not. And then I looked explicitly at articles that were kept. And so what I classified these as like, Oops, these shouldn't have been here, these are people that do meet our notability criteria, but they were mis-categorized, right? Mis categorized on Ms period, like the title. So these were just miscategorized or like, the very professional term. These are web sees in the database, like, oh, they shouldn't have been here and we need to put them we need to make sure they stay up on Wikipedia. And then I did an analysis within those keeps right to determine well, are the are the web sees equal? Because you're absolutely right, Regina, right. You know, I mean, there are all these factors that could go into whether or not someone may or may not be notable. The extent to which they've been professionals is a big one. And, living biographies are much more challenging to keep on Wikipedia, because you might not have established your Notability. But what I wanted to do is have as much control for that as possible. So I looked exclusively at these keep the keep phenomenon to say, are the percentage of whoopsies equal. And you know, the null hypothesis there was that if there is no gender bias, then the proportion of biographies oops, IID, or accidentally nominated for deletion should be roughly the same for men and for women. And that's where, you know, you're seeing these, like major discrepancies where, and again, I don't think these are people out there like saying, Oh, I'm gonna go out there and delete all the women. I think, unfortunately, the sociologist in me says, well, these are these studies that we've shown in controlled experimental settings, like the famous one, right is the resume test, where people get the exact same resume only like a woman's name and a man's name. And they're more likely to say, Oh, these men are more qualified or more notable or should get the job. And it's kind of replicating these same studies in a non experimental setting. Does that help answer that question? Because I think that's a really good one.

Regina Nuzzo
It did. That was great. I'm convinced enough. Thank you. Okay.

John Bailer 1 Well, you're listening to Stats and Stories. Our guest today is sociologist and media scholar, Francesca Tripodi. We've had guests in the past on the show who've discussed inoculations or other kinds of interventions for misinformation. And I've been thinking about this, this issue of parallel internet that emerges as a consequence of searches. In fact, I even did searches like I had mentioned that you had suggested as that I mentioned in the intro to this program, but also you've mentioned in some of your publications, so I'm, I'm just curious if you can can first talk a little bit about this, this, this bifurcation or actually, it's probably more than just two but these kinds of different paths that search firms have similar concepts may lead you to to find and other ways that we might be able to break out a little bit of I think you've called it another call. contexts, your ideological dialect that might be part of your searching.

Francesca Tripodi
Sure, yeah. So this kind of notion, the fact that we're living on these parallel internets came out of a report that I did through data and society back in 2018, I believe 2018. And what I was doing with this work, again, was ethnographic work inside of two Republican groups where I came to their monthly meetings, and I went to various organizational functions with them, and then did interviews. And I started talking to them about where do you go for information you can trust? And how do you decide what is trustworthy information? And at the time, this was when the NFL protests were really gearing up in response to Black Lives Matter. And, and in response to Colin Kaepernick protests. And this was also in conjunction with the President. At the time, President Trump had put out a lot of really public statements saying these protests are having a negative impact on NFL ratings. And I think there was one tweet in particular that said, NFL ratings way down, right. And I got to thinking about that. And so I just did what I thought anybody in my study was doing, I Googled it, NFL ratings down. And I got a lot of information, confirming, confirming this idea that NFL ratings were down because of these protests. And then I thought, well, what if I didn't agree with what he had to say? What if I Googled something different, so I did the same start NFL ratings, and I wrote up at the end of it instead of down. And it was an entirely different set of stories, confirming this logic that even though what was interesting about these stories was that they were showing that yes, NFL ratings are down, but not because of protests. These have been falling in response to safety. And these have been actually falling for some time. And so by digging through this information returned, I walked away with an entirely different I could have walked away with an entirely different position. And so then I started thinking, Well, how does this manifest itself in more entrenched concepts, and this is actually a research project I'm starting right now. So I talk a little bit about it in my book, but the way in which our returns are highly shaped by our ideas, and this is driven by my own, just like queries of saying NFL ratings up NFL ratings down, or the example that you provided undocumented residents, North Carolina, illegal aliens, North Carolina. And again, time and again, these returns are very different. And it's because that's how the internet works. Right? So we are obsessed with this idea of these highly personalized returns. And I've had multiple conversations with Google one of these are actually a C tap podcast where we feature the these executives at at Google who go on record saying, you know, some of this stuff is tailored, obviously, if I'm googling for sushi, they want to they want to return things in my area, or else it's not relevant. But though these, these, the role that keywords play is a major driver in the kind of information returned. And people who want to exploit this, I mean, this is just basic search engine optimization. You know, these are not these are not actually secrets. If I google Doritos, I'm not gonna get returns for Cheetos, I'm gonna get returns for Doritos. But these ideas are actually much more baked into ideological positions, ideological dialects, and so are highly manipulable. And this is a project I'm continuing to work on. Starting right now.

Regina Nuzzo
So the thing that struck me when you were talking about this idea of NFL ratings up versus down is just what words are available in the news articles. And I'm wondering if leading from that we shouldn't when we're doing these searches, and be a little like scientists and search for things that we think will support whatever idea we're interested in, but then also search for things that might go against it. Right? Should we be researching for the opposite,

Francesca Tripodi
If your desire is to understand what someone like you like not like you might be reading? I think that is a fantastic solution. You know, I think we often and this relates back to information literacy. I think people who are spreading problematic content, understand the way the internet works and are tagging their content with very specific keywords and phrases. And so if you see information on something like Facebook or Instagram. And then you go to a search engine, and you search for those same phrases that you saw on Facebook or Instagram, you are likely to get the same exact content that you saw on Insta, or any of the social media apps or Twitter. And that's regardless of if you use DuckDuckGo. If you use Bing, if you use Google, it doesn't matter which search engine you're using, because search engines are ultimately programmed to return content based on metadata, which is just a fancy way of saying, you know, computers don't speak the same language that humans do. They did. So you have to tag content in language that computers will understand so that it can return relevant information. But yeah, if you're interested in understanding how others might be looking at a topic or thinking about a topic, consider how your search terms are really driving the information that you're going to get.

John Bailer Yeah, so all of a sudden, now I find myself wondering if there's a research project here in natural language processing, where if you had given a prompt, such as a key, some phrase in a in a keyword search, if if maybe one of the things that was you may also want to try this search. That would be kind of, you know, the, if you had given this argument, like you had in the example of illegal aliens, that maybe you'd say undocumented worker, perhaps or something, you know, where it would actually generate a second, or second or third or others just to kind of help with that? Because I mean, in some sense, it's, you know, I think you use the language of these filter bubbles and popping filter bubbles I love. I love some of the images you use. And I mean, the title of your paper, I mean, I'm just, I'm just thinking, I'm really jealous of someone that's framing something so beautifully. So, so well done on that. But I do think that in some ways, what you're suggesting is one of the ways to combat this potential, this potential bias that we bring to the search word phrases that we use, is to some stuff, find some ways to help prompt kind of other other investigations.

Francesca Tripodi
It's, sure , and so and so search engines could of course, provide these prompts, although they are unlikely to intervene in anything ideological, because, you know, I think the important thing for us to remember is that these companies aren't backed companies that sell our data to make money, they aren't actually invested in like making our lives these amazing things. And so yes, of course, they could provide these, I think they are less likely to intervene in ideologically driven queries and the ways that these parallel internet's form in conjunction to them. You know, Eli Pariser, actually is the one who created the concept of a filter bubble. Although for him, he talks about how it's really algorithmically driven, that these because they have these vested interest in keeping us on their platform for as long as possible, they aren't going to show us stuff that makes us mad, or that we're disagreeing with, because they want us to stay on them. And what I'm arguing is like, yes, but also, the onus is in part on the individual as well, in terms of the way that we start our queries that in my book that that's coming out, in a couple months, I talk a lot about just something really simple, like the sky is blue. And so if you Google the size of the skies, blue, you're gonna get blue images. But say, I don't think the sky is blue. And you Google, the sky is not blue, you get this really fabulous article from NASA that explains to you how the color of the sky is based on the atmospheric chemicals in the air, and it's going to change regardless of what you're looking at, you know, Google, the sky is red, and you get articles confirming that Google the sky is green, you get information about how the sky turns green before a tornado. So it's fascinating, because I think I just think you're right, and that there are interventions that these companies could participate in, if in fact, they wanted to rile us up a little bit and or we're really invested in this process. But I think more likely, the key is for us to just consider a little bit about how those are shaped. I also really want to mention, and I don't want to dominate the conversation, but I think the big thing to pay attention to is that these tactics are being used by people who are trying to spread misinformation, and that's where I am seeing some platforms trying to intervene. Right. So when there is this very explicit form of misinformation, that is dangerous, you know, I am seeing people within these platforms saying, you know, we don't want we don't want to be a conduit for this. We do actually want to create meaningful interventions, and having these kinds of prompts that say that keyword is super specific and means that key word is really only being talked about by a small, small portion of all the information out there. Hopefully will nudge people in a direction to think, Well, why is that? That only a small number of people are talking about it.

Regina Nuzzo
My mind is kind of blown by this. And now I want to go redo all my Google searches with the word not in there, at least what what sort of tips would you give someone like me if I wanted to make sure that I'm getting all sides of something in and not just maybe overtly political things, but just things in general, like you were talking about with the sky is blue? Should I be including not? For example, what are some other tips?

Francesca Tripodi
That's a great question. And I've been very fortunate to work with some extremely talented researchers in this area. So Mike Caulfield, he's at University of Washington, he has created what's called the SIFT methodology for evaluating information online at Sif T, which stands for stop, make sure, you know, like before you get going on your claims, like start to begin with eyes, investigate the source, who trying to understand what the agenda is behind the information that you're looking at, find trusted coverage. So scan to make sure multiple sources are looking at this topic and you know, just look beyond the first few results, and then trace to the original. So trying to figure out quotes in media sources, is that being clipped out of another story? And are they intentionally framing that information in ways that are inaccurate? So I would definitely say he's a great person to go to in terms of a wonderful way of thinking through that.

John Bailer
That's great framing. You know, before we come to a close, I wanted to give you a chance to just tell us a little bit about your upcoming book, what's you know, so this is the, what is this the propagandists playbook that perhaps we're hearing we're alluding to?

Francesca Tripodi
Definitely? Yeah, so the propagandist playbook is my book examines what are the some of the seven tactics being used to manipulate search engine optimization and spread content with the goal of enhancing political participation in a nefarious way. And so I think about the ways that cultural complexities and the way that different groups engage with and make meaning of information is specific to different communities. So I think about how conservative groups engage with information literacy differently than progressive groups. And then I look at the ways in which those strategies of media literacy are then exploited by political elites, media elites in order to sell a very specific message. And then a lot of what we're talking about how the Internet can be seeded with misinformation, and then directed people to it. So I refer to this as the IKEA effect of misinformation. And it stems from business professors who looked at IKEA effect, to say that people become really invested in that low quality coffee table, if they put it together themselves. And you can even see this sometimes on Craigslist, right, someone puts together this very low quality coffee table, and then tries to sell it for $50. Because they're like, but I made it and it's so great. And I'm seeing these same strategies play out in, in misinformation campaigns, where they activate this idea of don't trust me do your own research. But they've seeded the internet with these keywords and these concepts before telling people to do their research. And then when people do their own research, very mindfully, and not just because they're tricked by someone saying it right. They're out there, really trying to read more about it, but just quite frankly, don't understand how their returns are given to them. And we've been taught that these first returns are somehow of higher quality, but they're really just matching those phrases as closely as possible. And so my book kind of digs into that.

John Bailer
Well, that sounds like it's gonna be a great read. So thank you for telling us a bit about it. And thank you for taking the time to join us. I'm afraid that all the time we have for this episode is Stats and Stories. Francesca, thank you so much for joining us today.

Francesca Tripodi
Thank you for having me. It was really fun. For us to

John Bailer Stats and Stories is a partnership between Miami University’s Departments of Statistics, and Media, Journalism and Film, and the American Statistical Association. You can follow us on Twitter, Apple podcasts, or other places you can find podcasts. If you’d like to share your thoughts on the program send your email to statsandstories@miamioh.edu or check us out at statsandstories.net, and be sure to listen for future editions of Stats and Stories, where we discuss the statistics behind the stories and the stories behind the statistics.