With the ubiquity of technology in our lives have come concerns over privacy, security, and surveillance. These are particularly potent in relation to what's come to be called Big Data. Navigating the complicated terrain is a constant conversation in some sectors of the tech industry, as well as academia. And it's the focus of this episode of Stats and Stories with Christoph Kurz.
Read MoreUnderstanding "Civic Statistics" | Stats + Stories Episode 199 /
Being able to read and write is necessary to be successful in work, at home, and in civic life. Do parallel skills associated with critical reasoning from numbers and data carry similar weight? What do you need to know to be an informed consumer of numeric information, and to use such information? That's the focus of this episode of Stats+Stories with guest Iddo Gal.
Read MoreBuilding Back Better | Stats + Stories Episode 195 /
Over the course of the last year, statistics have framed our lives in very obvious ways. From COVID cases to unemployment rates, stats have helped us understand what’s happening in the wider world. As we contemplate how to “build back better” in the aftermath of the pandemic, official statistics could help guide our way, at least, that’s what the authors of a recent Significance Magazine article think. That’s the focus of this episode of Stats and Stories with guest Paul Allin.
Read MoreThe Probability of the Next Terrorist Attack | Stats + Stories Episode 182 /
When planning for potential disasters, we often focus on hurricanes that might ravage coastal areas or tornados and droughts that strike rural parts of the Midwest. But researchers are also working to uncover the vulnerabilities faced by urban areas and that’s the focus of this episode of Stats and Stories
Read MoreAn Anti-Racist Approach to Data Science | Stats + Short Stories Episode 180 /
Individuals and institutions around the United States are grappling with the history of racism in the country as well as the ways they themselves have contributed to it. Many are working to adopt anti-racist approaches to their work and in their everyday lives. How to be an anti-racist data scientist is the focus of this episode of Stats and Stories with guest Emily Hadley.
Read MoreMigration Math | Stats + Short Stories Episode 179 /
Dr. Marie McAuliffe is the head of the Migration Research Division at IOM headquarters in Geneva and Editor of IOM’s flagship World Migration Report. She is an international migration specialist with more than 20 years of experience in migration as a practitioner, program manager, senior official and researcher. Marie has researched, published and edited widely in academic and policy spheres on migration and is on the editorial boards of scientific journals International Migration and Migration Studies, and is an Associate Editor of the Harvard Data Science Review.
Episode Description
As COVID has ravaged the globe, it's overshadowed another ongoing global story of migration, according to new data from the International Organization for Migration migrants make up 3.5%. of the total global population with the top five countries of origin being India and Mexico China, Russia and Syria that information and more can be found in the IOM 2020 world migration report, that's the focus of this episode of Stats and Stories with guest Marie McAuliffe.
+Full Transcript
Rosemary Pennington: As COVID has ravaged the globe, it's overshadowed another ongoing global story of migration, according to new data from the International Organization for Migration migrants make up 3.5%. of the total global population with the top five countries of origin being India and Mexico China, Russia and Syria that information and more can be found in the IOM 2020 world migration report, that's the focus of this episode of Stats and Stories where we explore the statistics behind the stories and the stories behind the statistics I'm Rosemary Pennington Stats and Stories is a production of Miami University's Department of Statistics and media journalism and film, as well as the American Statistical Association. Joining me our panelists John Bailer, Chair of Miami statistics department and Richard Campbell professor emeritus of Media and Journalism and Film. Our guest today is Marie McAuliffe is the head of the migration Research Division at IOM headquarters in Geneva and editor of IOM flagship world migration report. She's an international migration specialist with more than 20 years of experience in migration as a practitioner Program Manager senior official and researcher, McAuliffe has researched published and edited widely in academic and policy spheres on migration and is on the editorial boards of scientific journals international migration and migration studies and is an associate editor of the Harvard data science review marine thank you so much for being here.
Marie McAuliffe : Thank you so much Rosemary and it's a real pleasure to be able to talk about migration. It's a compelling topic, I used to work in industrial relations many years ago. And if you wanted to shut down a conversation with colleagues and people you're meeting in airports and while you're traveling you just mentioned trade union regulation. If you want to start a conversation you just start talking about migration because it is everybody's lives, you know, it's the classic six degrees of separation so it's a real honor and privilege to be talking to you today.
Pennington: What would you consider the headlines of the 2020 migration report.
McAuliffe : Actually it was for the 2020 report and of course the next one that we're currently working on which is the 2022 volume it's going to be released later this year. It's going to be quite different, but for the 2020 volume I was answering media inquiries and doing interviews, and a lot of journalists were a little bit puzzled they're basically saying, so what the report, essentially tells us is that it's business as usual, as usual and there is no major crisis in international migration, and I would be saying yes that's true the trends are kind of like on track, they are what we expect most people do not migrate across borders. During their lives, a very high proportion 96.5% per day within the country in which they're born. The challenge is that displacement especially internal displacement but also cross border displacement takes up a lot of resources around the world. It is very significant from a human security perspective. And so those numbers are often the numbers that we focus on, you know, they're small proportionally but they're very significant in terms of meaningful changes to people's lives and often profound tragedy and loss.
Campbell: You know, one thing that you mentioned is that it was different about these upcoming reports that this report is also used as a fact checking device that serves a role to kind of in reaction to some maybe misinformation or are kind of untrue stories that are being told about my migration. Can you comment a little bit about that.
McAuliffe : Yeah, I mean this is a. Unfortunately it's it's a growing area for our work, we wish that we didn't have to be used as a fact checking, you know, recently, that there was an issue with misinformation and disinformation but it's increasingly being used as a fact checking resource because a lot of that sort of the discussion in social media. Also traditional media but certainly social media has amplified and intensified the problems around disinformation globally. Is that numbers are often a really big focus when it comes to migration, and also displaced populations. So in, in a number of instances for the 2020 report and also the 2018 report. The report prior. The focus on migration statistics and a key kind of art on data and key information has been really useful in dispelling misinformation around the volume and scale of migrants to say that in actual fact, let's try and put this in proportion. You know the world migration report, clearly shows us that you know this displacement events or the number of, and proportion of migrants in a particular country is not, you know, huge, it's not overwhelming. We're not talking about a crisis situation we are talking about a situation that can be managed and he's being managed. Most of the time, we are seeing, you know, pretty balanced reporting in different parts of the world. But of course, when we're talking about interest groups and, and sometimes that goes into political scenarios, there can be a lot of misinformation and disinformation out there, and certainly targeting. You know, particular ethnic minorities is a very considerable problem that's been amplified I think during COVID-19 as well where we've had, You know xenophobic racism ignites in different parts of the world. And again it's completely out of kilter and a lot of it is emotional, but statistics are used as a bit of a weapon in trying to portray some untruths to make political points, and to really use to be used as a power tool to try and diminish. You know the rights of people in different situations.
Bailer: And you talk a little bit about some of the challenges that are faced just gathering this data from different parts of the world. I mean when you look at the impressiveness of this project and especially the interactive project I recommend everybody go to it because it's really cool. It's very cool, that's all. It's awesome.
McAuliffe : It's really good that we actually got a specialist to do that. And he, I think he's, he's one of the, I mean, probably one of the world's best to be honest because he gets involved in terms of the editorial content, and we work on a whole range of different types of interactive components. He's not just, you know, a technical specialist, he spends a lot of time understanding, you know, the data which is really, really important. It's a very significant project in terms of sort of coordination and collaboration, because we are using, as you mentioned, Richard we're using statistics from all over the world. When we use them in very different ways and we are very careful about how we portray them. There's a lot of focus on accuracy, relevance and balance and you know being objective and so forth but accuracy is probably the key thing when we're talking about statistics so for example. As you may know, like we're in our 70th anniversary actually this year IOM, the International Organization for Migration and we were set up as an operational agencies so after World War Two. It used to be a committee that was assisting with, you know, World War Two from Europe and it was only really focused on Europe. We've changed over time but we have a very strong base in terms of operational programmatic data collection, but not necessarily global statistics. So what we do is we try to be very accurate in our global overview by saying this information which looks at resettlement is programmatic data. It's not a global kind of number you know about refugee resettlement. It is, it is programmatic data but it does give you some insights whereas then the global data that we use from a number of different UN agency, though, there's data in there that is global from the UNHCR from the United Nations Department of Economic and Social Affairs, of course, the International migrants stock data, there's the International Labor Organization data on migrant workers that is global, so we try to make sure that we situate it and represent it accurately, so that it's used as a tool for fact checking or for teaching or for officials when they're doing briefings and so forth of their, you know, ministers and senior officials and it's an it's accurate and correct, but it is an enormous challenge. And it's an enormous challenge because migration. As a demographer, I mean, you know, births and deaths are pretty straightforward, compared to events related to migration because you've got a whole range of movements occurring, that are occurring in real time. And there is, there are very few countries I think are my own country. Australia is one of the few countries that can genuinely collect cross border movements and that's got a lot to do with its geography. To be honest because its border is, is, is, you know, it's an island, it's isolated, and it has very specific sort of arrangements around cross border movements, but most countries, actually, they don't know. They can't tell you at any particular point in time. How many international migrants or travelers or visitors actually are in their countries, which does frighten some people but that's just, you know, the honest truth.
Bailer: I thought it was interesting. Earlier when you're talking about displaced people versus my migrants, and I, I almost think of it that you know one thing that came to mind for me is, is the idea that there's both voluntary and involuntary migration. And that's, that's a, it's a really interesting question about kind of, you know, when I think about this place you're kind of being. If that sounds like involuntary forced move versus the, the idea of migration for other other reasons, can you talk. I know that some of the things that you mentioned is that, that, that the report addresses issues of patterns and processes and trends. So, I guess one. So now I'm going to ask a multi part question. So the question is, as I think about this, what can you talk a little bit about some of the causes or, or processes that lead to migration. And then as kind of, then follow that up with kind of some of the patterns and trends that may have been observed.
McAuliffe : Definitely. We tend to talk. I mean, there's the discussion around sort of drivers, there's a whole body of, you know, long standing research in regards to the gravity model the push and pull, ravenstone 1886 1987. Moving into kind of the enabling factors because there's much greater recognition of, you know, self agency of migrants, that they're not just pushed and they're not just pulled but they actually have a lot of ability to make decisions and act on those decisions in different situations. So we've kind of seen a lot of migration theory developed over many years, though we do tend to talk. Now, also because it's not necessarily about sort of problematizing if I can use that for migration as a problem that it is just you know a social and economic phenomenon, but it's not necessarily a problem so we tend to talk more these days about drivers. And there's much greater recognition of, you know, the multiple motivations and multiple factors involved underpinning migration, as well as moving more and more away from the kind of binary construct around forced and voluntary migration or involuntary involuntary migration so there's probably for about the last sort of 20, years, there's been much more of a discussion around the spectrum. In regards to agency, my extra my doctoral research is, is, is on this particular topic and it's. It was focusing on self agency of a group of refugees Hazara is from Afghanistan, Pakistan and Iran traveling down to Australia refugee refugees I mean a very high finally to grant rates, under the Refugee Convention of between 96 and 100% across five program us so that's about as high as you can get real genuine refugees and they would engage in major self agency extraordinary resilience to be able to get to Australia by boat sometimes traveling through up to seven transit countries. Many of them, traveling by themselves or in groups with other young Hazara males have enormous amounts of resilience, very dangerous journeys at times, extremely unsafe, and yet they're refugees. So, these are not people who have been displaced. These are people who have engaged in major migration journeys. So there's, there's a lot more recognition
Pennington: You're listening to Stats and Stories and today we're talking with Marie McAuliffe , the head of the migration Research Division at the International Organization for Migration and editor of IOM flagship world migration report, Marie when I was looking at the 2020 report and the interactive visualizations, which as Richard mentioned are really really lovely and I just think are really helpful to help understand this, there was a section that talks about migration corridors. And I was wondering if you could explain what a migration corridor is and maybe how how you've seen them change a bit, as, as you've been involved in this work.
McAuliffe : Here the migration corridor When was it used, it was a component that we added to the 2018 volume it's not one that is is typically kind of constructed in in migration research and analysis and statistics, but it is sort of a cumulative picture of some of the key corridors around the world and they relate to what the UN describes as an international migrant which is foreign born, so you can be so included in the international migrants stopped statistics and be a refugee you might have been displaced across the border you might have been recognized as a refugee and then you settle for example, it doesn't go into credit policy category, it doesn't go into the type of migrant whether you're a student, whether you're a migrant worker whether you're reuniting with your family under a family reunion program or anything like that. It's really just about foreign born, and it is cumulative so what we can see is that over time, based on the UN Desa international migrants stock data we can see where those really big corridors exist. So that, the largest one in fact we've just pulled the data for the next volume, and we were looking at it today with my team, the biggest corridor by far is Mexico to the United States, and globally. So what we've done in the 2020 volume is, and you learn from your mistakes, to be honest, because in the regional chapter we've done those migration corridors for all of the six un regions. But so many times I've wanted to actually look at it globally, and we didn't produce it, we didn't put it into the global chapter. So this time we're fixing that so that we can actually use it in our work and and use it in our, you know presentations and our discussions and so forth. But it was very interesting to see how those corridors have changed and unfortunately. One of the really big corridors that's in the top 20 it's right up the top actually is Syria to Turkey. Yeah. Now I've been working on migration for so long that I recall you know writing a whole lot of briefings. When I was working in the Australian Government in Syria as the third largest host country of refugees in the world, it used to host mainly Iraqis. So, they would be the third or, you know, you'd be doing briefings, you know, every few months, every quarter and so forth and it would always be Syria would be number three. Syria is now the number one origin country of refugees, of course. But that corridor is so significant, it's in the top 20 of all global corridors for 2020 for the current data, and that is Syria to Turkey, it's a huge, huge corridor that has opened up, and it's one that only very few migration specialists and academics, at the time at the beginning of the Syrian conflict so you know the potential for very significant displacement, and most didn't but but a couple that who I work with, they did actually pick it and say that this was going to be a particularly bad displacement sort of scenario and ongoing conflict, so that that's kind of like the sad side of it and that is a very significant change in what otherwise a quite long term trends and long term patterns of migration, around the world. It could you talk a little better in the report.
Campbell: This is probably on the good side of the report. You talked about, ignoring migrant contributions in this topic of remittances. That's really fascinating. I didn't know anything about this. And that data is really interesting and how do you even get that data. Let's talk about what remittances
McAuliffe : Remittances are collected by the International Monetary fund's that's where the kind of the primary data set is. There are two types of international remittances they're reported in as part of balance of payments from governments from central banks. And there are two types of remittances international remittances there are like salaries and payments that are made and then there are personal remittances so quite often, when we're talking about remittances in a development context, international development context, the narrative is that a migrant worker, from a developing country will go and work in developed country, you know, a wealthy industrialized country and send back remittances to family members who will be able to, you know, pay for food for shelter, I mean that's a particular kind of issue in in certain parts of the world like Central Asia, you know, Tajikistan and other countries are very reliant on international remittances for poverty alleviation so for basic needs, but also they support education education of children as well as broader family members and so forth. But then there's kind of like other aspects to it as well. And this is something that we pull out of the report or kind of. Did you know the type of little snapshot? Where did you know that, Switzerland, is, is one of the largest one of the largest countries in terms of receiving international remittances, as well as you know sending internet remittances, and Germany also receives enormous amounts of remittances and so forth, it's it's it's it's really interesting because it's cross border workers. It's people who are actually in Europe and move, working in one country and living in another and so forth and so on. luxembourgers by share of GDP one of the largest in terms of outflows of remittances in terms of GDP. Because it's not just developing and developed contexts. It's much more sort of broadly so I think, For the last report I think France was number six mainly from Switzerland. So France was the number six country of international remittances inflow so receiving international remittances, and Germany was number nine and again that's because of Switzerland. Though the money went out of Switzerland into French bank accounts of residents in France, I'm sitting here in Geneva. And you know, a hospital system the medical system is people who reside in France, basically. So, and the same in Zurich it's mainly people who live in Germany. So their cross border workers Now COVID has really impacted that very significantly and the World Bank who does a lot of work on international remittances in a migration context projected that there would be a 20% decline of international remittances in 2020 because of COVID. Now you're obviously related to people losing their jobs during COVID so migrant workers wouldn't be able to send back money back home, but also because of you know people being stranded people having to return engage in return migration because they couldn't stay in the Gulf or various other places, and having to return. What we have seen is we've seen something quite different. So, we've been tracking this really closely and we've seen some of the largest monthly remittance inflows treated to traditional kinds of receiving countries on record. Especially mid year, so sort of July, August we saw countries posting very substantial increases in remittances. A couple of reasons for this which is a bit of a surprise so the World Bank has adjusted its projections. There'll be reporting, we'll be watching very closely, we're doing a lot of analysis for the world migration report on this particular topic, but we have reduced the projection to 14% for 2020 from 20 down to 14%, because what we started to see is we started to see some central banks posting very significant increases now why is this occurring How could this be the case. When you look back at SARS and it moves there is a particular phenomenon in previous kind of pandemics where, if a country is in crisis, you will see diaspora in migrant workers, sending money home because if they can to help families deal with crisis situations. Haven't obviously seen anything on the scale of COVID-19. But there are other things that are actually occurring. So when you trawl through a lot of the, you know investigative journalism, as well as central bank reporting you see different dynamics occurring, the number of people that usually travel back, carrying cash, you know, back into Pakistan Pakistan have been reporting this through their kind of analysts and so forth, pretty clearly back into Nepal back in in India people carry cash home it's informal remittance channels that have been quite significant now with the mobility restrictions. We've seen a very significant increase in digitalization and moving through into formal channels. So things like Bangladesh have been trying to get more people into formal remittance channels for a couple of years. But COVID-19 has really changed the whole dynamic so there is an absolute necessity to move into formal channels because otherwise you won't be able to get money home to your family. So, and of course that's good for governments because, you know, increases the formal channels, they can know you've got a much better system in terms of moving out of informality and formality. And being able to sort of tax people based on their returns and income and so forth. And we're also seeing changes in terms like the Philippines, for example, has had a massive increase in terms of online banking again. Many to digitalize. So we're seeing really big changes around digitalization. It's very clear in in the remittance kind of situation but there are a whole range of other aspects to do with migration and mobility where digital digitalization is dominating and not in such a positive way I would say,
Bailer: Well, let me just just personalize this just for a second, you know, what do you like best about what you're doing, and how did you know how someone would get involved if they wanted to do migration research.
McAuliffe : I mean, I really enjoy kind of the analytical aspect, working with a really strong team and also collaborating with researchers from all over the world. So, for one of the chapters. I'm working with a one researcher in the US, one in China, and one in Kenya, and we're collaborating on, you know, a thematic chapter, which is intensely interesting and really challenges some of the orthodoxy around some of the narratives that we that we hear around migration by looking at the data, it's just a donut chart. Looking at the analysis of what the data is telling us and being open to that. Whereas a lot of the times in in migration research, there are particular constructs and if you are trying to do something that is a little bit unorthodox like looking at migration journeys of refugee refugees like convention refugees, people that feel a bit confronted by, you know what you're trying to do, and what you're trying to really explore and understand basically to look at the complexities and to be open to those complexities to say well actually it's not as simple as we uphold or it's not as simple as some people may sort of portray you know they, these people as ours are not fleeing for their lives they are engaging in major major migration journeys that you know they lose their lives during many have lost their lives. So, that aspect I find you know is really fascinating and very compelling. And it's also the things that bring us together to make us realize that, you know, I might be an Australian but I've got so much in common with the you know the migration experiences and journeys and what people having to do in Pakistan, you know, in Singapore. In Malaysia, in Latin America, you know, in Venezuela in Colombia in Colombia his decision on regularization is profound and I think even some of my colleagues don't really understand that that is an enormous policy change, and an incredibly positive one that will have intergenerational impact. That's one thing I meant as a visa officer actually because I used to run visa programs and in draconian government overseas. And one thing that I really learned very quickly is that one decision around a visa, whether that was you know a student visa or a visitor visa or whether it was to migrate as a spouse or something like that has impact, not just on that person, but on that person's family and on their community as well. And that is something that kind of like that shared experience. If you look and learn it as a visa officer, but then you magnify that and you scale. You can quickly see 1.7 million Venezuelans in Colombia have the opportunity to be regularized and are going to change people's lives for decades to come. And it's that's, I mean it's probably one of the most positive policy decisions I've seen in a long time in a long long time and we don't celebrate it enough and we don't recognize it for what it is that's an area I think that we could be getting ideas for doing further research. But that's an area where we can actually try and quantify what that actually means you know for quality of life but also for quantity as well. across time and I don't think that that's an area that people have really looked at in a, in a policy context, it does tend to be here and now it does tend to be fairly kind of narrowly thought through and you know having worked for a government you are required to sign up to basically national interest. So, one of the real challenges and one of the things that I really like about working in the UN and for IOM is that you can rise above that, you know, singularity around national interest and you can look at the broad, you know, mutual interest on a very large scale.
Pennington: Well that's all the time we have for this episode of Stats and Stories Murray thank you so much for being here today.
McAuliffe : Thank you very much indeed. Rosemary I've really enjoyed it and to john and Richard thanks so much.
Pennington: Stats and Stories is a partnership between Miami University’s Departments of Statistics, and Media, Journalism and Film, and the American Statistical Association. You can follow us on Twitter, Apple podcasts, or other places you can find podcasts. If you’d like to share your thoughts on the program send your email to statsandstories@miamioh.edu or check us out at statsandstories.net, and be sure to listen for future editions of Stats and Stories, where we discuss the statistics behind the stories and the stories behind the statistics
The views expressed in this podcast are those of the interviewee and do not necessarily reflect those of the IOM or its member states.
Everything Makes Sense with Statistics, Right? | Stats + Stories Episode 176 /
Tim Harford is an economist, journalist and broadcaster. He is author of "Messy", and the million-selling "The Undercover Economist". His newest book “The Data Detective” was released in the U.S. and Canada earlier this month. Harford is a senior columnist at the Financial Times, and the presenter of Radio 4's "More or Less", the iTunes-topping series "Fifty Things That Made the Modern Economy", and the new podcast "Cautionary Tales". Tim has spoken at TED, PopTech and the Sydney Opera House. He is an associate member of Nuffield College, Oxford and an honorary fellow of the Royal Statistical Society. Tim was made an OBE for services to improving economic understanding in the New Year honors of 2019.
Read MoreLove, Sex and the Pandemic | Stats + Stories Episode 175 /
Debby Herbenick is a sex educator, sex advice columnist, author, research scientist, children's book author, blogger, television personality, professor, and human sexuality expert in the media. Dr. Herbenick is a professor at the Indiana University School of Public Health and was lead investigator of the National Survey of Sexual Health and Behavior.
Read MoreThe Recent (Regrettable) Rise of Race Science | Stats + Stories Episode 173 /
Angela Saini is a science journalist, author and broadcaster. She presents radio and television programmes for the BBC, and her writing has appeared across the world, including in New Scientist, Prospect, The Sunday Times, Wired, and National Geographic. In 2020 Angela was named one of the world's top 50 thinkers by Prospect magazine, and in 2018 she was voted one of the most respected journalists in the UK. Her latest book, Superior: The Return of Race Science, was published in May 2019 and was a finalist for the LA Times Book Prize and the Foyles Book of the Year.
Episode Description
Race science – the belief that there are inherent biological differences between human races – has been “repeatedly debunked” in the words of the Guardian, and yet, like a pseudo-scientific hydra it raises its heard every so often. Most recently race science is the return of scientific racism is the focus of this episode of Stats and Stories, where we explore the statistics behind the stories and the stories behind the statistics with guest Angela Saini.
+Full Transcript
Rosemary Pennington: The race science, the belief that scientific study will uncover inherent biological differences between human races has been repeatedly deep in the words of the guardian and yet like a pseudo scientific Hydra raises its head, every so often. What's also known as scientific racism has framed studies of human intelligence and attractiveness and most recently emerged in conversations around genetics, the resurgence of scientific racism is the focus of this episode and stats and stories, we explore the statistics behind the stories and the stories behind the statistics. I immerse Mary Eddington stats and stories is a production of Miami University's Department of Statistics and media journalism and film, as well as the American Statistical Association. Joining me were our regular panelists John Bailer, Chair of Miami statistics department and Richard Campbell professor emeritus of media journalism and film. Our guest today is Angela Saini. Saini is a science journalist, author and broadcaster. She produces radio and TV programs for the BBC, and her writing has appeared in such publications as New Scientist prospect. The Sunday Times wired and National Geographic in 2020 Sandy was named one of the world's top 50 thinkers by prospect magazine, and in 2018 she was voted one of the most respected journalists in the UK. Her book superior the return of race science was published may 2019, and was a finalist for the LA Times Book Prize, and the foyles Book of the Year Angela thank you so much for joining us today.
Angela Saini: It's my pleasure. Thanks for having me.
Pennington: I was wondering if we could start our conversation with you describing kind of what historic race science was or is and how that compares to sort of its modern iteration.
Saini: Well, I think a lot of people imagine the racial categories that we use now around skin color to have been around forever. But of course they haven't been, they were inventions, and the time that they were invented was around the time of the enlightenment. When scientists and naturalists in Europe were looking at the natural world, and thinking about how to classify it. And, as well as classifying animals and plants. They also thought about classifying us, they thought, you know, this cultural diversity that we see all around the world, all these differences that we see. Maybe they rise to the level of different breeds or different species of human being. And that's where the idea of race, as we use it now, came from so that's not to say that people didn't think about human difference before, of course, there must have. But these racial categories, black, white, yellow, red, you know these very broad racial categories at least now that's around the time that they were invented. But there was. I mean we know now, but it's always been true that there are no natural dividing lines between the human species we are one human species. And we are very homogeneous as a species so we're more homogeneous than any other primate chimpanzees, have more genetic diversity than humans do. And so given that there are no natural dividing lines between us. Any attempt at categorization is by its nature likely to be arbitrary; it can't be anything else, it has to depend on categorizing you know what's important to them. And the fact that they landed on skin color is as arbitrary as anything else because, because at the time, there were lots of different ways of categorizing people so there were some people who thought there were a few races and people who thought there were 1000s of races. And the way that traditionally the word race hadn't been used very much, but traditionally the way it was used. Prior to that had been to refer to a family or tribe. So if you're using it by that definition, which in some ways is a more coherent definition because at least within a family you have some genetic similarity. We know more than you do at a continental level, then there can be millions of races, you know, logically by that by that standard.
But it was skin color that kind of became popular. And that scientists ran with European and American scientists around for hundreds of years, and it was given meaning really because that became one of the ingrained assumptions that formed the science of human difference. So there were lots of assumptions at the time, including for instance that women were not the intellectual equals of men, which is why women in Europe were excluded from many universities and certainly all the scientific Academies of Europe, from the Enlightenment onwards, because we were just seem to be separate but we are two separate categories and women were kind of intellectually separate category. So these assumptions. As arbitrary and as political and unscientific that they came to form the basis, like I said of the science of human difference. And that continued for hundreds of years in fact well into the 20th century, there are still many people who think in these terms now. And that's all that science is, you know, there really isn't anything else.
Richard Campbell: Could you talk a little bit about the notion of a centralism because I think some of our listeners probably don't know what that is. and also how some of these studies got passed, even more recently when got past the early editorial stage, because the, as you pointed out the starting assumption is that populations are essentially different, and people, and that doesn't seem to get interrogated at the beginning of some of this work.
Saini: Well, essentialism really cuts to the heart of this because it says that there are biological qualities that certain groups have or certain populations have that other populations don't have, and what. Historically, people have tried to make inferences based on their assumptions around these essentialisms. So for example, that the Western world is as economically prosperous, as it is as it was for a couple of 100 years at the time that these ideas are being developed, because of some essential quality that white Europeans have that other people in the world don't have, which is a very a historic way of thinking too because as we know, if you look through the course of human history Europe's dominance for as long as it was, is just one part of human history other other cultures and other civilizations have risen and fallen and and you know Western European civilization will go the same way we know you know that's how history works. But, you know it, what it does is try to explain society, and what we can see out there in the world through nature and say that this isn't historical, this isn't political, this isn't social This has nothing to do with how we live or how we choose to treat each other. This is because of some qualities that we have within ourselves. And it's a it's an argument that remains powerful to this day there are many, particularly on the right, and by this I mean the far right the alt right, who want to be able to make these claims because if they can then we don't need to do anything about inequality as we see it in the world, whether that's gender inequality or racial inequality, or even class inequality, there are still attempts to reintroduce class into this equation as it existed in the early 20th century, a lot of for example the British eugenics and race science movement was about was actually about class. And there were attempts to state that for example poor people were genetically biologically inferior to richer people and that's why you had generational poverty. And there are some people even trying to revive that now in the 21st century, believe it or not
John Bailer: You know what, when I was reading your book one thing that really struck me was the issue of the kind of the cultural and political context searches done and how that shapes and frames the way that that kind of you we've looked at problems. So I, you know, this seems like this echoes throughout history as part of this investigation. Can you talk a little bit about that?
Saini: Yeah, absolutely. I mean, I think there is this, I mean, I studied engineering, and I was certainly trained within a system that taught me that what we do when we do science or engineering or mathematics or whatever is objective that we set apart from society we were above politics. And the problem with that is that we forget that much of the science, including those very early assumptions that I just mentioned earlier, were very much rooted in the politics of the time they were informed by the politics of the time. And because they weren't interrogated enough because of that politics. That's why mistakes continued for so long. And this is how mistakes happen even sometimes orthodoxies can build within the science sciences fallacious orthodoxies can build within the sciences, for a very long period of time because nobody questions. These basic assumptions because they assume that everybody who's doing this is perfectly objective so there can't be any problems here. And that is something I think we need to challenge, I trust the scientific method. I really do think it is the best one of the best ways we have of understanding the universe and understanding ourselves as humans, but, it's limited by our own by the fact that we are human and that we, we have these biases and purchases. We are informed by the world around us. And that shapes the questions that we ask the limits to what we can imagine. You know, for example, it's only relatively recently that scientists have started challenging the idea that there is a gender binary, you know, to think outside those boxes. And that's because it literally wasn't within the purview of their imagination to but there could be anything else out there and society in that sense is challenged it because everyday people and their discomfort with these gender categories and how they feel about how they feel about these things and challenging that politically has been entered into the sciences and then scientists start asking these new questions. So, we have to accept that. And if we can accept it and understand it and engage with the fact that science sits within society that's embedded within cultures, then, then I think we can get closer to objectivity because then we can understand exactly what it is that we're looking at.
Bailer: Just as a quick follow up. I remember years ago when I was reading Stephen J. Gould, some of the work that he had written that was what I had the epiphany of that kind of cultural context in which, in which science is done and I've found myself thinking, oh my what kind of, you know, how is this, how is the world in which I live now and the culture in which I live now shaping the way that I ask questions, or how I look at problems or how I think about interpreting results and analyses and that's a that's a that's a, that's an important and challenging consideration as we do our work.
Campbell: I was gonna just follow up on that. How is this sort of the politicization of science, during the COVID crisis I mean this has really been a remarkable thing is this a new phenomenon or is it a new stage or just the the the mask wearing thing in the states here, you know, divided politically it's just, it's sort of incredible to me, and the stories that are emerging of people dying in the Midwest who refuse to admit they have they have COVID that they have something else. Is this a phenomenon that you've seen before and studying the history of this
Saini: t's always been there they've always been people, I mean there are lots of different, I think there are lots of different things happening this year. One of them is as you say conspiracy theories and this kind of pseudo scientific conspiracy theories that can be quite elaborate, and especially popular because they spread so easily on social media, that you know this phenomenon of misinformation and disinformation that gets spread so quickly through things like WhatsApp and Facebook and Twitter.
And it's because we consume things so so fast and we don't always have time to challenge it, and it's very easy for these incorrect means to kind of proliferate, and it is something I'm working on I set up a group last year, which now sits under the wall institution here in London, which is one of the oldest scientific societies in the world. And we are a group of journalists, policymakers, social media experts counterterrorism experts, academics, a very broad range of people all interested in this problem of pseudoscience in whichever way it manifests, and what you quickly start to realize when you look at this, is that these people who, you know, whichever conspiracy theory they're there to whether it's an anti Vax one or whether it's flat earth or a climate change denier conspiracy theory or whatever it is that what they have in common, because they don't have anything in common demographically They come from all kinds of walks of life, age everything, but what they do tend to have in common is a mistrust of authority. And this is that common thread, you see. And actually that's understandable because, you know, very often are authority figures and not always trust, and especially these days where we have all these populist leaders around the world who are willing to lie sometimes outright to the to their citizens, then it's very easy to build a mistrust of authority and to buy into certain conspiracy theories. And that is why it's sometimes very well educated, very skeptical people are sometimes the most vulnerable to this because what they're really doing is questioning what they're seeing to such an extent that they question everything you know, even the fundamental basics. And that is the point at which we need to engage with these topics. This isn't about ignorance always, you know, very often, especially, I mean I looked at anti vaxxers particularly for a documentary last year. These are often very well educated middle class people who are very well clued up on the facts but what they're choosing to do is dismiss a certain set of facts and choose another set of facts that fits with their fears or their worldview, and what the conspiracy theorists do. And the ones who spread this kind of misinformation disinformation and who do that for lots of different reasons, including sometimes state actors. So there are Russian bots you know spreading this kind of stuff around, but what they try and do is play on those fears. So for example the legitimate fear of a parent that their child might be hurt if they're given this medicine or this vaccine, and then they draw you into that rabbit hole of false facts and everything can, and sometimes seeded with accurate stuff, you know, for example, real examples of Vaccine Injury, but it marginal but real examples, and then use that to kind of build a case as seeds that doubt in your mind so it can be. I think it's very complex, the way the psychology of this works, and especially with the internet and then dynamics around the internet, it makes it even more difficult, but it's a phenomenon that has always existed there have always been doubts around these things and often what's happened, for example with vaccine doubts. Is that a big pandemic like this will happen. Everyone feels that they need to take the vaccine, and then the doubt kind of subsides. A little bit, and shocking though it is and it's unfortunate that it happens that way that people have to die in order to be confronted with the devastating reality of the importance of these things but that's often how these things happen.
Pennington: You're listening to stats and stories and today we're talking with science journalist Angela Saini. Angela so you write about the work of Karen Fields. And the idea of race craft and I you. I'm trying to find iPad to pull up on my phone, because I love this, this line where you say, in thinking about you know race is sort of in relation to sort of witchcraft and sort of it being a construct and about race writing it as biologically real as witches on broomsticks. I love that line but I also I think back to Richard's earlier question about sort of editors letting these things through right so you also write about a blog post a man wrote about sort of the, you know, lack of intelligence and attractiveness of black women, and then talk about sort of the scientific papers that get through. And I wonder it's sort of the inverse of what Richard just asked if, if, because people are credentialed who are pushing some of these views that it lends sort of truth and and sort of vigor, to this idea of there being what is it biological diversity remember how one of the people you talk to biological diversity or something right they're human buttons too Yes Yes, that's the term. Right. Whether the credentials behind some of these people sort of reifies the idea that race is somehow real.
Saini: Yeah, and it's a real problem. I think it's a difficult one to tackle because I think the nature of academia is that it is a board church and in some ways it needs to be a board church in order to maintain academic freedom.
And I value that I do think that's important, but at the same time.
What we get is as a result of that we do get people, and we've always had these people so they've been existed right from the beginning, and people who hold very marginal political views who then turn to science to justify those political views. So many of the people, for example that I write about in superior or who I interviewed for superior about their.
What have been termed by some people, scientifically racist or pseudo scientific positions or papers that they've written. Most of them are not geneticists. In fact, none of them are geneticists, most of them tend to be psychologists political scientists, you know, people outside these disciplines where the work of whether real kind of biology, around human differences done.
And very often, when you kind of scratch beneath the surface and this is something I've tried to do very hard in my work is not just interview people who are critical of race but understand those who adhere to these racist theories, or what have been termed racist theories. Why do they do it, why are they so attached to these ideas. And when you dig underneath very often, what comes out is a kind of political underbelly so you know they're anti immigration, or they're anti racial mixing, or they feel that there should be some form of segregation between people that equal opportunities are a bad idea that affirmative action is a waste of time. And that's often what lies beneath all of this and what they're really doing is using the science as a tool to justify these political beliefs, and sometimes they go through quite unbelievable contortions intellectual contortions to be able to do that. And because the evidence really doesn't support the idea number one that race is real or that there are these deep psychological differences between us, but they won't let it go, and what they cling to increasingly is the possibility that one day evidence will come along to prove them right. And, and, you know, you could say that about pretty much any area of science because we don't know everything we're never going to know everything, especially because human nature is not just some simple biological kind of substrate, it's it, who we are is heavily influenced by our environment our culture, our biology is affected by our environment and culture how we develop our brains, everything. So because all these things are so intertwined. You cannot extricate them there are, there is no separate nature and nurture they're all intertwined with each other. We are always changing. And so you can never get a full grip on who we are as human beings, you can never say definitively what human nature is. And that's really where the, the territory that they occupy now is that uncertainty. And I guess they will occupy it forever as long as they hold these political beliefs, and that's the that's the space that they'll that they'll live in the the thing we have to challenge is not just a science that all the pseudoscience that they're peddling, but really understand why they why they so desperately want it to be true.
Bailer: You talk a lot about where the work appears, or some of the more recent research, and it is reinforced. For me the idea of of identifying funding sources, as well as identifying kind of the outlets for this work just because it's it's appearing or just because it's been supported doesn't doesn't necessarily mean there isn't an agenda that goes that goes with that could, you know, can, can you describe a little bit about how you know digging into that and kind of how do we how do we kind of inoculate ourselves against these kinds of impacts.
Saini: Well, within scientific publishing there is a wide range of quality. So there are some journals that are right at the bottom end like the mankind quarterly so this was a pseudo scientific journal that was set up after the Second World War, by race scientists including one Nazi race scientist who carried out experiments on the body parts of Holocaust victims, some of them children, so he and others all around the world, I should say scattered all over the world, and not confined to any one region, set up this publication which is still, you can still read today. So it's still being published in fact I interviewed the person who was then the editor of mankind quarterly when I was writing superior. And so in that sense, on the margins of scientific publishing there are people trying to keep these ideas alive in those circles. Very often they're writing for each other so they cite each other's rights for each other they're not generally cited in the mainstream in mainstream academic journals, but some of them also do have a presence in mainstream academic journals. So one thing I learned in 2018. During my research two of the editors of mankind quarterly were sitting on the editorial board of the journal intelligence which is a major journal published in the field of intelligence, which itself is a very fraught field so it has a. It has its roots in eugenics as well it has a very dark history and history it hasn't completely let go of unfortunately even to this day so there are still figures within the intelligence community who are considered racist by others within academia who have been denied platforms or denied access to conferences because of their views. But anyway, so these two. These two people were on the board of this journal and Elsevier which is a major Publishing Group has very strict rules around who can sit on editorial boards. And when I asked them about why they allowed these two people who had very weak academic credentials to be sitting, one of them in fact has been.
He hadn't kind of honorary position with an Irish University which has now been rescinded as well so it has no academic affiliation anymore. And I, and I asked them why do you have these people on your journal board because you have certain standards that you're meant to uphold and they entirely wash their hands of it and said it's not for us it's for the editor in chief, the editor in chief told me it was a matter of academic freedom that this was about having a plurality of views within the journal, which is worrying because the journal itself has published a number of articles over recent years, by people who've had links to the alt right and white supremacists who, you know, have strong connections with the mankind quarterly have edited or written for it. And he just refused to do anything, but by the end of 2018, when I went back to check the editorial board when I was, updating my references, I noticed that those two people had been quietly removed from the editorial board. So I feel that maybe because I wrote an article at the time that maybe there was some pressure within the editorial board to clean up their act a little bit, but the the point I'm trying to make is that these are not isolated instance, there are other problems within other journals. If anyone goes to the brilliant website retraction watch. You can see how common this actually is the basis pseudoscience it as recently as this year has had to be retracted. In fact one paper published earlier this year was retracted from a journal. And after criticisms of how politically motivated, it seemed to be.
And then the authors themselves admitted that their data was shoddy, and that they should that they should retract.
So you really have to ask yourself, you know, are we, upholding the standards that we need in academic publishing. And this isn't just a matter of academia anymore this is a matter for all of us because the public has access to these papers now because of because of the internet. And if we can't trust what we're reading, if these kind of retractions are going to continue and if we're going to get dodgy people sitting on the boards of journals writing papers, then it's going to erode trust in science even further.
And it's going to damage the reputation of science and make it much harder I think for good scientists to do good work. But there are people I mean I know I work with journal editors and journal groups and there are people trying to tighten those standards, not just around quality but also around the ethics and looking at the repercussions of their work.
Campbell: Angela, how much of that is a problem, I think you.I think this was from your, your piece in nature, where you say, scientists rarely interrogate the histories even have their own disciplines.
How much of what you just talked about, is, is because of that is because we're not the scientists themselves aren't even aware of the long trajectory of history. And I think you're right elsewhere. John probably won't, but how humanities professors and humanities have provided a stronger critique here than science itself. I think this is changing. I think there's more attention being paid to history, but talk a little bit about this failure of science to interrogate its own history.
Saini: Well, humanities does also have its own problems. It's within the social sciences that you often see the best critiques I think of the sciences. And one of the problems that we have is that scientists very rarely engage with that body of knowledge. So for example when it comes to medicine and race health and race. There is actually a huge wonderful body of literature that we have within the social sciences, looking at the effects of racism and discrimination on health on the body. Mentally on all of these things. And yet, in the covid 19 pandemic this year. I saw a number of high profile physicians and medical researchers, looking to genetics to explain the racial disparities that we were seeing immediately, you know, by March, April as soon as it was clear that black and Asian people in certain countries are dying at higher rates than others, they've jumped straight genetics which, if they were aware of that body of literature that shows the effect of racism, discrimination structurally on how we live and how people are treated and not just that also around class and all these different factors, I mean, a lot of this is stupid socio economic status and a lot of that work is done within the social sciences. And then we would I don't think we would be jumping to those kind of essentialist conclusions or assumptions, immediately. And so we do need, I think, more dialogue, and more humility I think sometimes among scientists that it's not just hard science it contains all the data that you need that there is data out there in the world that it's actually equally, and sometimes even more important when we're talking about certain, certain things.
And that failure to understand not just that body of social and cultural knowledge but also history. I think is why a lot of mainstream scientists fall into these traps while they make these mistakes, and I know this from my own experience because as I said, I studied engineering. I was, I was very poorly exposed to the social sciences when I was at university, but as an adult after I left, I was working in the BBC, and in my spare time I started doing a degree at King's College London which is just here in London in their department of war studies, so this was an interdisciplinary science and security course, in which taught by social scientists mainly but also a few people who have experienced in sciences and engineering. And for the first time, I learned about the construction of knowledge feminist critiques of knowledge or, you know, all these Foucault everything, all these things that I've never been taught before I suddenly got an introduction to and also the history of science, technology, how ideas develop the cultures that they're developing and how that shapes how we think about them. And it completely changed the way I think about ideas. And I, I really very firmly believe and in fact I've been advocating this all this year in every university talk I've given that we should integrate science, history and humanities teaching into science teaching more. I really very strongly believe that every time you learn a scientific concept in whichever discipline, it is, you should know the background to it.
Pennington: Well that's all the time we have for this episode of stats and stories Angela, thank you so much for being here today.
Saini: Thank you for having me.
Pennington: Stats and Stories is a partnership between Miami University’s Departments of Statistics, and Media, Journalism and Film, and the American Statistical Association. You can follow us on Twitter, Apple podcasts, or other places you can find podcasts. If you’d like to share your thoughts on the program send your email to statsandstories@miamioh.edu or check us out at statsandstories.net, and be sure to listen for future editions of Stats and Stories, where we discuss the statistics behind the stories and the stories behind the statistics.
Making Decisions During the Pandemic | Stats + Stories Episode 172 /
Risk is a tricky thing. We like to think we understand it but when it gets down to brass tacks it can be harder to wrap your brain around things like acceptable or unacceptable risk. How do you define it how do people understand risk. The COVID-19 pandemic has only highlighted the trouble we sometimes have understanding risk, communicating risk is a focus of this episode of Stats and Stories with Baruch Fischhoff
Read MoreHow We Understand Uncertainty | Stats + Stories Episode 168 /
Communicating risk is difficult at any time but during a pandemic, communicating risk well can be what keeps a disease from spreading, as one public health official has put it, like wildfire. During the COVID 19 pandemic, experts, journalists, and elected officials have all been working to find the most effective way to communicate risk to the public. Helping people understand their risks of infection – or of infecting others – can be the thing that gets them to follow mask mandates or other public health advisories. Effectively communicating risk in COVID 19 is the focus of this episode of Stats and Stories with guests Alexandra Freeman and Claudia Schneider
Read MoreStatisticians for Society | Stats + Stories Episode 167 /
Mastrodomenico is a fellow of the Royal Statistical Society as well as owner and founder of his statistical consulting company Global Sports Statistics.. He is also the Chair of RSS’ Statisticians for Society initiative since its inception in 2017. He is also an RSS Statistical Ambassador, which involves regular work with the media in assisting with their reporting of statistical issues.
Episode Description
Data can be powerful and persuasive rhetorical tools for nonprofits as they explain the work they day and ask for monetary support from various entities, but not all nonprofits can afford to hire a statistician to crunch numbers for them. An organization in the UK is working to meet the statistical needs of nonprofits and is the focus of this episode of Stats and Stories with guest Robert Mastrodomenico.
+Full Transcript
Rosemary Pennington: Data can be powerful and persuasive rhetorical tools for nonprofits, as they explain the work they do and ask for monetary support from various entities, but not all nonprofits can afford to hire a statistician to crunch numbers for them, an organization in the UK is working to meet the statistical needs of nonprofits and is the focus of this episode of stats and stories when we explore the statistics behind the stories, and the stories behind the statistics I'm Rosemary Pennington stats and stories is a production of Miami University's Department of Statistics and media journalism and film, as well as the American Statistical Association. Joining me our regular panelists John Bailer Chair of Miami statistics department and Richard Campbell, former chair of media journalism and film. Our guest today is Robert Mastro Domenico Mastro Domingo is a Fellow of the Royal statistical society, as well as owner and founder of global sports statistics. He's also been the chair of RSS as statisticians for society initiative. Since its inception since 2017 Mastro Dominica also is an RSS statistical Ambassador which involves regular work with the media and assisting with their reporting of statistical issues, Robert thank you so much for joining us today.
Robert Mastrodomenico: Thank you for having me.
Pennington: Could you just describe how statisticians for society started
Mastrodomenico: A funny thing that I was, I've been with the RSS doing kind of volunteer work for 10 years I think about that. And I was just getting I just been a trustee trustee for I think five years and I was gonna have a little bit of a break and someone who I knew and I'd work with them but like, we want to set up an initiative to work with charities, would you mind helping us out, pick some people to be involved. And so I just did it as a kind of small favor to help get it started and the more I kind of learned about what they wanted to do, the more I thought well actually I should probably stay on and the idea was to kind of leverage the RSS membership, and all the fellows to try and connect up charities, with statisticians and essentially be that kind of middle piece in it to kind of facilitate that. And the idea kind of blossomed from there and we know it ran, we started out, we got some people involved from the society, volunteers, like myself, along with staff members and the kind of the concept of statisticians for society was born.
Bailer So can you talk a little bit about the some of the groups that you've worked with, you know, before I mean I'm dying to ask also about how you implemented it. I'm also dying to ask if there was a statisticians against society but I think we could we could be postpone that to later but but in terms of the groups, let's let's just start with what are some of the organizations that you all have been involved in helping.
Mastrodomenico: So, the kind of the approach we initially took was you know we want to help charities who maybe don't have the resources to help themselves so we set a limit on kind of charity turnover what you're allowed to kind of be making, to, you know, sign up for this initiative. And so, we're aiming at kind of smaller charities and we still do now and so the initial kind of thing was we were looking we just kind of put the word out there were the charities who wanted to do this and just kind of used our contacts that we had at the society so a lot of the kind of staff members who help administer this, they just kind of use contacts we had from various charities just to see can we get the word out and would people do this. And it was the unknown as well you know we came into this, trying to do something not knowing if this was succeeding how popular it be. And I think initially there was kind of one instance we had a charity sign up and they did it, they were doing an interview on radio four and kind of spoke about this BBC Radio for for you guys in America, and from there we just started getting lots of lots of interest and in the UK, there's a lot of kind of groups and events you can go to to kind of promote yourself within the charity sector. So, bit by bit we started doing more and then we start getting traction with, you know, charities signing up with, you know, wanting to do work, and the variety, it was astounding. You just have lots of different kind of charities, all wanting some kind of statistical help or thinking they needed some kind of statistical help and it was that kind of side, that you know, got the thing going and kind of helped us form what the initiative was because when we started off you have this big idea of, you know, you want to help charity sounds great. And then you start doing it, you're like, Well how do we actually do this, and that's where we kind of learned over time, and where you know, over this period we've refined the approach and kind of learned how we can help help those charities to help themselves and help the volunteers and that you know it's been a learning process for myself or the people that say all the volunteers who've been involved in the kind of administration side of it, and the volunteers and charities who are actually doing it we're all we've all kind of learned, and we've got some kind of point today where I think we have some really powerful that, you know, works. So I just did a quick follow up Sorry, sorry, Richard I beat you to the punch here. But, so, who was the, who was the charity that gave you the shout out on radio four. And what did you do for him. It's gonna be really bad, I say, We've scoped I've been personally I've probably scoped about 3040 charities. So it might be easier if I explained the process because then you'll kind of realize why I. It's very, I can remember people who's on it and I can relate with the people who've done it if I meet them. But the charities themselves, we deal with lots of them so the approach that generally happens is the charity will we got the word out about this initiative, a charity will then send for an application year we think we need some help, and that will go directly to the RSS that will get dealt with this couple of people RSS you kind of working on this and they do great work. They'll take the, the initial kind of form that they send in, and that will get fed into somebody who's on the scoping committee so my kind of role when I first got involved was to set up this committee so this is a group of kind of statisticians and they're all way more eminent domain, you know, we've got former IRS presidents very eminent people still in that kind of that, that work now so you know we've got some great people and what will happen is one of those people will pick up this query, and we'll have a call with him. So, and I've done quite a few because initially when we were looking to get this going. I was around and yeah, I've actively given my time to do this so you just speak to the charity, the charity will then you're kind of to a conversation just to get an idea of what they want, because, you know, as statisticians when you work in a kind of a similar way you've got lots of stuff this it's really easy to know what you want, you know you get in a groove of what you're doing a charity you might have two or three people, and they've got they don't know what they want, they know they might have day or they, some of them do know what they want in that they need to produce reports or produce stuff for reasons. Others like we've got surveys, we've got the state of what can we do, How can we get insight. So, the stage one is that kind of scope. So, we'll, and I think I've scoped probably 3040 different charities so quite a lot of variety and with that you kind of put this together into some kind of form, which is then used to try and recruit in a volunteer to do the work. So this is where we kind of make use of the network so we'll then send out a request to the membership to see who wants to do it and you know a lot of the time we'll get a lot of people who show interest and they'll send a CV covering of, you know, their suitability to it and certain projects will suit certain skill sets you know if you've got a medical base project, then we're looking for somebody with a specific skill set if we've got survey analysis, looking for those people we try and make facilitate that and make it as easy as possible so when you apply you know what you're getting into. From there, then that individual who's chosen will meet with the cherry they will then come up with an idea formulate a bit more if need be, formalized work, go and do the work, a review panel will then come back and check that, just to confirm everything metal what was agreed at the start, and then you get to a point where you've got happy charity hopefully you've got an extra piece of work, and we've kind of formed a relationship. So, going back to your original question. I do lots of those. And then when you, when somebody dies, you're like, oh man, you remember the charity when they come to speak by but it's very easy to forget the names because there's so many of them. And that's wonderful.
Richard Campbell: I thought it was interesting looking at the website Rob, that the charities, understand that a lot of times to get good donations, they need to have data and statistics and I thought that was one of the things that, looking at some of your projects that sort of came up over and over. I thought that was an important this recognition that we're gonna need from donors they're going to want numbers, they're going to want to see data on this. Yeah, maybe you could talk about, have you worked on a specific project I was kind of interested in the, in the Consortium for street children, and how you count street children how that worked. I don't know if you can speak to that or not, so you know I think that was one of my pride and that's not so in terms of my role is usually the kind of, I'm involved at the start where I kind of speak with charity but I'm pretty sure I was involved in this some way because I do remember, or at least this was brought up at some of our meetings in terms of how this is done and you know the approach is around that. I will take the credit for doing the work there'll be a volunteer within the society who's taken the time and given a lot of time to do it.
Mastrodomenico: From my point of view and the kind of scoping committee what our role is is just to kind of flesh out a bit of that project and I think going back to your earlier point where you're talking about charities, becoming a bit more savvy about needing data and how they report it. That kind of thing, you know, it really impresses me with the charities that come in to us, especially when you see sometimes how small they're, you know, they're trying to do really good work on limited budgets and they know they need this but they don't know specifically but they've got you know there's enough of a kind of knowledge that this is something I want and my kind of role in a lot of in the kind of help process is to help them help themselves to get the idea out because they might come to us with some that they want. And actually, you can kind of tell initially that Ada might not be possible or be that's not really kind of what they want and the more you speak to them making any kind of consultancy type environment, you get more out of them and they kind of do the work for you, you're just asking the right questions, getting them to think in that kind of way. And as part of the kind of initiative, one of the things we'll be looking at rolling out next year is some resource for charities that are online allow them to kind of help themselves help make those decisions because what we're not trying to do is teach society statistics. And I think that's a really kind of important part that that skill of being a good statistician is, you still need to statistician, we can give you that. But what we've made we can do is help you to figure out some of the bits before you need us or need the statistician to help you make better decisions and understand what you've got and maybe what you can do so that's what we're working on at the moment and looking to roll out next year which we're quite excited about. And that kind of resource more focus towards the charities, because obviously there's lots of statistical teachings and resources just in the RSS as a whole. But some of this really focus for them that helps them get engaged with it, not that they're not already I kind of think we've already touched on charities knowing that they need to do it but given, letting them empower sounds a bit more and you know, really take advantage of what they've got.
Pennington: You know it's a. I know that the RSS has some, some really great resources out there, there's resources for journalists resources for teachers. I'm curious what kind of when you're talking about some of this, this future work is these resources for charities, can you give kind of an example or two of some of the, what would be part of those resources.
Mastrodomenico: Yeah, definitely. I mean, I'm as it's kind of as I'm chairing this kind of initiative I'm kind of involved now all levels I've infiltrated every level of statistisches for society. I'm very much hands on with this part so there's a, it's a very cognitive choir especially we have another subcommittee we did our many committees doing this, have individuals again volunteers who give up their time and we're just trying to put together resources. So a lot of this will be online a lot of this will be around case studies maybe videos on, you know, using data collecting data or things that you need to think about before that, before you get to the point of us because what we found, probably for experience and you know we're over 50 projects in the number changes, you know, all the time I can barely keep up with it is that actually you find some charities come to us too late, if they if they knew earlier that they needed us. They could have taken the steps to kind of either get us involved earlier, or have you know done things a little bit differently, you know, an example is surveys, so we had one charity will remain nameless and they were very very nice and he did a survey analyzing, very quickly, and, you know, they came to us and they'd already designed a survey and I'm pretty sure they're already implementing it and we're like, we can help you, but ideally, you should have come to us pre that, but they don't know and it's like, how can we help get that out and it's you know using examples from charities that we've worked with positive case studies people kind of talking about their experiences and using kind of common, you know, not statistical techniques but the understandings that you need to be more savvy with they were just trying to make you understand what you've got, not necessarily show you how to do a T test or you have 10 p values we don't want you to be doing that we just want you to know how to you should be dealing with collecting storing all those bits of the data that you can do yourself and help the charity just become a bit more savvy with that you're listening to stats and stories Our guest today is Robert Mastro dimenticato Chair of RSS is statisticians for society initiative. Now, this work is done pro bono by the statisticians, which I think is really interesting and I saw when I was reading up on the projects website, sort of the comparison to sort of the fact that like lawyers do pro bono work, and why you know there's something that maybe statisticians should consider doing too. Are there many projects like yours where statisticians are doing work pro bono and I wonder if you've seen what the reaction has been in the sort of statistician world, to what the project is involved in doing and whether there's been more of an effort to create these kinds of things. Since your initiative began. Yeah, I mean, I wouldn't say we are the originators of doing this obviously from a kind of statisticians, there wasn't anything like this before but the O r society have their own similar initiative, there are other initiatives around which a similar. We're trying to fill our space and do our part, you know, take care of the statistics side of it but there are other kinds of initiatives that we can kind of, if we get a query for example and it's more economic space, we can pass you on to somebody you could help that or if we think it's more operational research will pass you that, and we input we work together with, you know, we're all trying to kind of all we want to do really is help the charities help themselves so there's no kind of need for us to do things that we think someone else can do better. So, the guys at the RSS have been really good in kind of connecting with other groups so we've got really good networks out there and the kind of the charity sector within the UK has a lot of kind of initiatives, people you can speak to events we present various places to kind of get out what we're doing, but there are lots of people who are trying to help and you know, it's, it's that bit of trying to get yourself out to the charities, is the hardest bit the market because if you're a small charity, how can we make ourselves aware to you. And I think that's the kind of in general thing that, you know, we're always trying to be more forward facing but in the right way you know it's not being say on Twitter promoting yourself maybe doesn't help a charity scout for volunteers who are on Twitter all the time reading about this, we need to kind of get to them so we're looking at kind of how we can take advantage of the fact that RSS throughout the UK has different kind of local groups. You know concentrations of fellows all around the country. So we want to take advantage of that and kind of spread the word as much as possible.
Campbell: Rob Can you give us an idea of how many statisticians are actually involved in this project, you said they're from all over the country.
Mastrodomenico: Yeah, in our membership. So, the membership of the RSS is a is obviously nationwide so when we have, we have international members, obviously, we have members in the US just as as a will have member, you know, people will share those kind of memberships we have members in Europe, Africa all around. But obviously the kind of structure of the RSS kind of historically is you would have specialist sections you kind of have groups of volunteers who will run events for safe sports for example of which, Vice Chair of the sports section, but we also have local groups who will run events in their areas and you know, that kind of have essentially RSS events for people who West Midlands or Scotland, Glasgow, Edinburgh, so we've got concentrations all around so it's how we can kind of connect all the dots to make sure at each level we're getting the word spread, because you can do it through kind of promotional materials or going to conferences, which we do lots of we try it you will speak at various conferences, to try and you get an email, but we also need to take advantage of the fact that we're kind of nationwide, the terms of the initiative itself is open to anybody who's a member of the society who signs up to the mailing list to obtain a request. So, when we get a request in from a charity if it goes through the scoping process I scoped it for example, we've got something that we want to get a volunteer for as long as you're a fellow a member of the RSS and you've subscribed to the mailing list so you sign up for your RSS. org. That UK, then you can basically receive messages to say you know we've got the project. Here's a project, do you want to apply and you can apply that way. So it's open you have to be part of the RSS so we want, you need to be a member it's not open to anybody. So that's probably, you know the one block for people but actually, you know, there's a lot of statisticians involved. And I think we're at kind of four or 500 people have signed up to this. And a lot of it is just us also making the society members aware, because you know, at any one time we might have 8000 members in the society, I don't know what the current number is but you know we we were doing our work to make it more. You know, more known to the members because in my head. Everybody should be signed up really if they can you don't necessarily need to do anything but the more people we can get it out to the better. But this initiative has kind of grown from a pilot scheme, but it was set up to now something where we've got funding for the next five years. So, that's also part of our kind of growth is not just to get it out to more charities but to get it. You know seen by members within the society because probably like the NSA, you know, there's lots and lots of things going on and so it's not necessarily cutting through the noise but just kind of making sure we can get it in front of all the members and people are kind of seeing what's there because it may not be for every member of society because people join for different reasons but I think it will be something that a lot of people would be very interested in, at least knowing about and seeing what comes through.
Mastrodomenico: Because what we kind of hope is, we'll match you up when we get somebody who wants to do the work that they have a genuine interest in that kind of charity. And we've seen that a lot people signing up because it's something they're passionate about, you know, an area that they want to kind of work in and that makes it all the better because we can kind of cultivate those long term relationships, kind of, if you think of us as the people who are setting up the work, then once the work is done, we don't want you just to stop. Yeah, we don't want to start fishing a charity, not to ever speak again. So it's not like we're, but we want you to kind of have those long term relationships and, and as the kind of, we're starting to see that with more and more complete projects that we do a kind of review call and it's great to see Oh, we're going to do this next we're going to do that next. And it's like, you've made a bit of a difference which is really cool. That's, that's really neat I love the fact that that you've got this coordinated effort. I mean, there have been efforts like this I mean, for example the HSA had the statistics in the community, or statcom effort that's that started at Purdue, and it was basically student volunteers that were looking to quote they were providing pro bono stack consulting to local nonprofit governmental and community service organizations, but I think that, you know, what you have is the centralized coordinated effort I like the idea that that you have this Clearinghouse that you're kind of centralized then you're doing, it's it's it's a it's more of a coordinated as opposed to a distributed response which I think may be pretty effective because a small charity may know about RSS but may not know about what is a statistician, and where do I find them locally that appeals, I'll tell you one thing I really liked about the way that you all report your case studies, you know that you ended up partitioning it into the request the approach the result, and impact and benefits, and I that now all of a sudden I thought, Hmm, maybe I'm going to change the way I think about my data practicum classes and some of the ways that we might talk to our classes about, about how they would build up the results so I think that was really nicely framed So can you talk a little bit about, You know how populate those those components as part of interactions with the charities. So, I'm not going to take credit for all the case studies and how they are Amir Akhrif who from the RSS you deserve a big shout out. I've been involved along the hallway and they're not statisticians by trade but they help all of us who are kind of all the volunteers, they kind of keep us along the right path and help us kind of do what we need to do. And that kind of level of coordination is key. So those guys are key so they've got their shout out on this, but we kind of report along the way so one of the things we've learned since we started is how we just keep a track of things because it kind of like in any business like if you've got lots and lots of projects if you're if this is like a consultancy, I kind of think of it as we've got lots of projects go and we've got some centralized people within the RSS who are trying to manage all of that. But we kind of need that level of accountability. Not necessarily. If it goes wrong, but we also, we need to know things do slow down, and we've learned that for experience, sometimes you know when you're relying on volunteers and charities that you know people's what was important to them, changes you know you can have something like a global pandemic and your priorities may change and we've seen that with certain charities who you know this isn't this they may have signed up to kind of do something but this isn't really top of their priority right now. And so it's that kind of approach of just logging, all the steps and we've kind of as we've grown from a scoping committee. We've added a review committee, we've added certain meetings along the way that we've learned from experience and so, and the kind of the kind of thing I talked about earlier where we have a kind of end of project meeting where we kind of really get to understand what happened what was the results, all of that kind of allows us to come up with this kind of approach where we can kind of really show what happened and show it step by step, and we do talks where we get volunteers to talk about their experiences how it was for them.
We cannot we also do for charities we like how it's how is it for you and it's different for everybody. It's different for a scopa reviewer somebody who's done the work and a charity who was involved in it, and with the kind of help of centralized staff the RSS and the volunteers who are scoping and reviewing. We kind of formulated a kind of neat way of work and this seems to be, you know, we seem to have some down now that allows us to, you know, work with lots of charities simultaneously and provide that level of kind of support and service that you know they need on both sides, because it's as much about us supporting the statistician as it is the charity. Everybody's involved in this and you know the statisticians giving away their time for free. They're doing this work, we need to support them and allow them to work as well as possible with that charity so we've through experience and, you know, I'm not going to say we have this down perfect straightaway you know we learn by doing. And every time you know we do, we did a project, we learn how better to do it. And, you know, getting lots and lots of opinions of a lot of kind of people who are passionate about this, and even charities feeding back into us is kind of allowed us to create something that we hope can grow and work and for us. Have you heard any, any new kind of projects that have emerged as a consequence of these. These COVID-19 days in which we live, and they have this motivated certain specific specific kind of questions from some, some charities. You say, how is it for us, we're feeling the effects of it in terms of number of charities who are coming to us. Because, as expected you follow scheme, everything we've had in the UK in terms of people kind of, you know, batten down the hatches and just trying to get through it and I did an event with you know our society and a few other charities where we were talking about the effects of COVID and trying to kind of look at how data can be used and what people would want to do with it and we've seen our numbers go down over this period.
Mastrodomenico: Luckily, the way it works is we have a you know a kind of a lag effect in that we're, we're still fairly busy with projects ongoing, but there will come a time where actually are these projects going to dry up but then what will probably happen is this will open up opportunities for charities and maybe weren't eligible, because of turnover and that kind of effect. So what we're trying to do is prepare ourselves for kind of what's going to come out of this where people will need to do stuff, and we fully expect it to you know, charities will need to be using data and the COVID effects may be things that they will need to start showing as and when they need to do end of year reports or they're looking at the effects of it. One of the things we have seen is kind of a lot of charities UK wide kind of really increased their digital capacity, and that's probably one of the biggest things that's happened nationwide with us during this. And so that kind of makes things a lot easier. We can whisper speaking with charities, who maybe before didn't have, you know, online facilities to do in conferences meetings and you know or storing their data in certain ways everyone's kind of adjusted in the UK to this kind of remote working, and I think that's going to have a positive net effect on you know how charities work because we had seen before. The way charities stored and use data, maybe wasn't optimal but they seem to be kind of upscaling themselves from just, you know, the small sample sizes we're dealing with obviously I'm not going to make huge inferences of that.
Mastrodomenico: But that might kind of hope is that out of this we can help more charities when they need that help with it and we're sad. We're waiting, you know, we will be there for them when they want it, so hopefully we can help as many charities as possible.
Penington: That's all the time we have for this episode of stats and stories Robert thank you so much for being here today.
Mastrodomenico: Thanks for having me.
Pennington: Stats and Stories is a partnership between Miami University’s Departments of Statistics and Media, Journalism and Film, and the American Statistical Association. You can follow us on Twitter, Apple Podcasts, or other places where you can find podcasts. If you’d like to share your thoughts on the program send your emails to statsandstories@miamioh.edu or check us out at statsandstories.net and be sure to listen for future editions of Stats and Stories, where we explore the statistics behind the stories and the stories behind the statistics.
The Women of Hull House | Stats + Short Stories Episode 165 /
Of all places to look for statistics, who’d have thought a settlement house would be a place that you would find insight into data of their communities. However, that’s the focus of this episode of Stats+Short Stories with guest Sharon Lohr.
Read MoreCrime Statistics | Stats + Stories Episode 158 /
If you’ve been following the news much then you may have noticed reporters beginning to explore how COVID is impacting crime rates around the country. Police commissioners are even appearing on newscasts trying to explain how various COVID measures may have changed the kinds of crimes they’re seeing in their cities. One of the problems becomes tying those changes directly to COVID and of course, a long-standing issue when it comes to crime rates is understanding how we measure crime in the first place. Measuring crime is the focus of this episode of Stats and Stories with guest Sharon Lohr.
Read MoreBig Data and Big Laughs | Stats + Stories Episode 157 /
Harkness writes and presents BBC Radio 4 documentaries including the series FutureProofing and How To Disagree, and Are You A Numbers Person? for BBC World Service. She formed the UK’s first comedy science double-act with neuroscientist Dr. Helen Pilcher, and has performed scientific and mathematical comedy from Adelaide (Australia) to Pittsburgh PA with partners including Stand Up Mathematician Matt Parker and Socrates the rat.
Her latest solo show, Take A Risk, hit the 2019 Edinburgh Festival Fringe with randomized audience participation and an electric shock machine. A fellow of the Royal Statistical Society, she’s a founder member of their Special Interest Group on Data Ethics. Timandra’s book Big Data: does size matter? was published by Bloomsbury Sigma in 2016.
Episode Description
Statistics is generally a field not known for its humor, at least to the broad public. Which is a shame because humor is a way to make complicated subjects – like statistics or big data – accessible to general audiences. The intersection of humor and stats is the focus of this episode of Stats and Stories with guest Timandra Harkness, coming to you from the annual meeting of the Royal Statistical Society with guest host Brian Tarran.
+Full Transcript
Rosemary Pennington: Statistics is generally a field not known for its humor, at least to the broad public, although I will say John Bailer has been an exception in my life
John Bailer: That’s because you laugh at me.
Pennington: It’s a shame though because humor is a way to make complicated subjects like statistics or big data accessible to general audiences. The intersection of humor and stats is a focus of this episode of Stats and Stories coming to you from the annual meeting of the Royal Statistical Society. I’m Rosemary Pennington. Stats and Stories is a production of Miami University’s Departments of Statistics and Media, Journalism and Film as well as the American Statistical Association. Joining me as panelists are John Bailer, chair of Miami Statistics Department, and Brian Tarran, editor of Significance Magazine. Our guest is writer, comedian, and presenter Timandra Harkness. Harkness writes and presents BBC Radio for documentaries including the series Future-Proofing and How to Disagree and Are You a Numbers Person for BBC World Service. I, frankly, am not. She formed the UK’s first comedy science double-act with scientist Dr. Helen Pilcher and has performed scientific and mathematical comedy from Australia to Pennsylvania with partners including stand-up mathematician Matt Parker and Socrates the Rat. Her latest solo show Take a Risk hit the 2019 Edenborough fringe with randomized audience participation and an electric shock machine. A fellow of the Royal Statistical Society, she is a founding member of their special interest group on data ethics. Timandra’s book Big Data Does Size Matter? was published by Bloomsbury Sigma in 2016. Timandra thank you so much for being here today.
Harkness: It's a pleasure.
Pennington: I am just going to ask I think the obvious question is how does a comedian take on technology and math and science as a focus of her work?
Harkness: That’s a relief because I thought you were going to ask about the electric shock therapy.
[Laughter]
Pennington: I do want to know about that though.
John Bailer: My question Timandra, I’m going to ask that next.
Harkness: I may be the only fellow at the Royal Statistical Society that likes firing electric shock machines. Well, interestingly, there’s a lot of people now that use comedy as a way of getting across their particular subject, whether it’s science or math or something else, and I came in the other way. I came in the other direction. I was already a professional stand-up comedian and so was Pilcher, although she had a day job at the time, and we met at a meeting at the Royal Society on stem cells because I was trying to write something about it. And we bumped into each other in the coffee room and I was really surprised because I’d only ever seen her in rooms above pubs making jokes about beer bellies, and there she was looking smart with a badge on and so I sidled over and went what are you doing here and she said I’m a stem cell scientist, that’s my day job, what are you doing here? And so, we went, oh, we should do some comedy about science. Because we were both getting really bored with the things that comedy was always about. It was always about the differences between men and women and about drugs, about sex, about alcohol, and we just wanted to do some comedy about something more interesting. Although, ironically, when I look back at the things I’ve done comedy about, I have actually done, now, comedy about the differences about men and women and sex and drugs, but from a scientific and mathematical point of view. So, it was really for me, and then I went on to do a degree in math statistics, but for me, it was comedy that reignited my curiosity about science and mathematics and statistics. So, it’s more the other way around for me. It’s less why do you use comedy to talk about mathematics, it’s more how did you end up in mathematics having started out in comedy?
Bailer: You know I think there’s an element of you have to change arts before you change heads and that the comedy is opening up to message. It’s engaging and getting excitement and interest. And if you can get the interest, then the messaging can also be connected to it.
Harkness: Yes, all of that is true and I think a lot of people do use it for that but, absolutely, genuinely for me, it was the other way around. I like doing comedy because I like making people think. That’s absolutely true. I always have. I’ve always been more interested in the kind of comedy where people laugh and then go oh, that’s interesting, why did I laugh at that? Because it opens people’s minds up a bit. It catches them unaware, and also it is enjoyable, which is always a plus. And then it was my curiosity then about science and mathematics that I kind of came to in that direction, and then I thought well if I find it interesting, why wouldn’t anybody else find it interesting and it does make a change from talking about the same old same old thing. Because this was back in 2000-2001. So now there are a lot of good people doing good comedy about science, statistics, mathematics at the time we genuinely were the first two people in the UK, I think there were a couple of guys in Australia doing it.
[Laughter]
Harkness: The electric shock machine, I first got it when I did a show at Brave Science Agenda, costarring Socrates the Rat. His job was to be male and a rat and I- one of the differences that psychologists find, on average between taking risks. And I wanted something that I could demonstrate this very graphically to the audience, preferably with audience participation. So a psychologist friend don’t you is there like a civilian version of the equipment that you use that I could buy to do you know harmless pain on an audience member and he said this is great timing, I’m about to relocate to Singapore. I have an electric shock machine; I don’t want to take it with me; it’s yours. And so he gave me this laboratory machine with all the safety instructions, it’s got a seven-page risk assessment and everything and I would invite people in the audience in the show about sex differences to get up and basically gamble. Take a 50-50 bet, and if they lost the bet I’d get to give them electric shock and if they won the bet they get to give me electric shock and I gave them some money. And I have to say whoever was flipping the coins on that, who is another audience member- let’s just say I looked back at the end of the tour and I was well down on money and electric shocks so I don’t think it was fair action going on there. And then when I went to do a show about risk, this was my obvious thing, and again, basically, I used it for gambling; to let people in the audience think about their own decision making around risk. And your previous guest, Tim Harford, I think has probably looked at this where it’s never a purely mathematical calculation; there are always psychological elements. It’s never just about going on average I will win if I do this because you might say well, I’m prepared to take a quite a large risk of a very small electric shock, but I’m not prepared to take even a very small risk for a very large electric shock because there’s a kind of maximum amount of pain that I’m prepared to risk. So, it was I really always get people randomly selected from the audience and offer them a chance to do this gamble about whether to get electric shock or not as a way of saying that whenever we make these decisions, it’s not just about can you do arithmetic in your head. It’s always in the context about much wider decisions that we make.
Pennington: I love that you phrased it harmless pain. I would not do that because any pain to me feels harmful.
Harkness: Well, there is actually- I had to get people to sign a consent form because people with a pacemaker, for example, it’s very dangerous for them and certain other medical conditions. Also, it really ups the ante on stage when some audience member volunteer is having to read a consent form. It ups the fear level, which makes the whole thing more dramatic. And it also gives them a little point where they could elegantly back out, you know, if they’re having second thoughts, they can really say oh well, no, I’ve got a medical condition, so I can’t do this.
Bailer: You know, in reading through your Big Databook, I really liked the historical tour of thinking about data and society and statistics and also about computing and how that emerged. And then you have this organizing statement here of data, where you touch on these different components. Would you kind of summarize for folks who haven’t read it how you’ve organized your thinking about big data?
Harkness: Oh, my backronym.
Bailer: Your backronym?
Harkness: Backronym. Yes, I thought everybody knew this word backronym, which is where you want an acronym. So, you want a word that spells out your ideas, but then you reverse engineer it to get the word that you wanted. So, I felt I would do this so that I could get data, D-A-T-A, now obviously big data is partly big; there is a lot of it that is part of it. But I thought it’s not just that; it’s not just that there is more of it than there used to be, it’s also these other things and I did, I managed to get the big D-A-T-A. so these are diversity diverse or dimensions if you want to get a bit technical. The idea that you can have different types of data and when you combine them you get a multidimensional picture, whether it’s of an individual or something that you’re studying. So I mean domestic sounds and that was a brain scientist called Professor Paul Matthews, who said if you have lots of brain scans, for example, he said, I have brain scans, but if you have lots of brain scans that’s just large data. Big data is when you combine the brain scans, the patient records, the postcodes where the patients have left, the weather records of those postcodes, and then you put them all together and then you ask a different question from what the people were collecting the data for. In this case, he wanted to know how many hours of sunshine had the patients had, and did that correlate with the progression of their illness? So, there’s D’s, different divergence damages. A is automatic because so many things we do now just automatically generate data, so it’s almost collected by default. T is for time because things are pretty much collected in real-time it lends itself really well to making a time series and you can project that into the future and see how things are going to change. And then the other A is for AI, for artificial intelligence because the products used to analyze data very much are what you might call artificial intelligence. I mean I don’t want to make claims that it can be fake but there’s an element of unsupervised processing, where it’s sort of saying follow every step in this program. You say to the computer I want you to separate these into sick and well or healthy and I’m going to give you one dataset that’s presorted well, I’m going to let you use the rules that you need to follow to sort of rest the data. So that’s different, diverse, automatic, time and AI.
Tarran: On the subject of algorithms and AI I guess you could do a lot I think I saw, was it a tweet, that you said about you’ve become an overnight expert in algorithms
Harkness: Well I think it’s really quite well really, this is a classic case where
Harkness: We take the grades that the teachers have given often based on previous exams that kid has taken at least as your starting point, but they didn’t do that. they went what’s really important to us is that the overall pattern of the grades will closely resemble the previous three years. So what we’re going to do is for each school we will take the results for the previous three years and we’ll get an average of those and we’ll say okay well those are the grades that your school is going to get this year; this pattern. You know, so many As, so many Bs, and so on. And then oh okay how are we going to decide which kid gets which one well we’ll get the teachers to rank them in order from best to worst and then we’ve already got this select box of grades that we’ve decided your school is getting and we’ll give them out in that order from top to bottom, and that was what they did. The only role that the kids' exam results that the actual kids getting this result and to the algorithm, the only previous exam results played was as a whole class they would say if they had done spectacularly better or worse than previous year then we’ll adjust it upwards or downwards, or if they were in a very small subject group at which point say we’ve got ten or eight kids in a class, yeah okay it’s probably an old fad and just allocate for previous years. So, in that case, we will take them into account. But I just thought it was a, an astonishing decision, and b, also horrible typical in fact of the way a lot of algorithms work that make decisions about us, that then really that minimally based on anything we do or are or have done very largely based on what the population of people who are deemed to be like us have done in the past.
Harkness: Well, yes, and no. I mean I think that we are a bit more aware of these things but yes, it is a bit astonishing to see that the whole Juggernaut if you like rumbles on the same way. In fact, that’s the thing that I'm interested in now and look at that now is to say we're surprised let's go.
Harkness: Human being are the ones who built this stuff; human beings decide what data to collect this is all human beings doing this. The question really is. What is it about us? What is it about human beings here and now at this point in history, that makes us so very keen to hand over decisions to algorithms? No matter how many times we see how flawed and how biased how incomplete they can be the for this urge to hand over human judgment decision-making to an algorithm.
Pennington: You're listening to Stats and Stories recording at the annual meeting of the Royal Statistical Society. Our Guest is writer comedian and presenter Timandra Harkness. Why so use was talked about how now like we need to sort of step back a little bit from our trust in algorithms. I guess I the question I wanted to pose is sort of why you felt compelled to write about big data in the first place. There's a lot of people writing about and publishing about big Data. What was it that made you feel like you had to publish and write that book?
Harkness: It's actually- it started a few years before with me getting into statistics and I doubt that Brian remembers this but the first thing I ever wrote for Significance magazine was an article called Seduced by Stats Question Mark, which was probably about the time that my partner and I were doing the show called mass death on the fridge. And it was because I was confused. After all, you know, I really like math. So that's why I went back and studied it again. I've always liked it. But I've always realized that this is a kind of a minority sport really, most people don't like mathematics and they'd be very happy to never have to look at it again. And yeah those same people getting really excited about the statistics. They were getting really excited about infographic displays in newspapers, what your previous guest Tim Harford was talking about and I thought well, this is odd because I like statistics. I'm quite excited about what you can do with them. But I know for a fact that all of you people really hate mathematics, so why are you getting so excited about some graphs? It’s as if you think he has some magical Oracle of objective truth that in a difficult time where nobody really knows what's going on, you can at least look at the numbers of the numbers will appeared in shining light and tell you what to do and then as things evolved as such to see people talk about big data in the same way and I was thinking, well again the kind of mathematical side because this is really exciting. Can you really do all this stuff just by collecting loads data and applying mathematical processes to it because that’s really exciting if you can do all the things that you’re claiming this could really transform those things. But on the other hand, is this those same people that got really excited about infographics in newspapers are they now really excited about big data because it’s big and shiny and I don't understand it. So maybe it's really clever. And in fact, I took to an American Scholar called Christine Rosen who is looking at it and I said to her, you know if you got a definition of Big Data because this is this when I was making a program that for really for about it, and she said yes, it's an oracle. People look at it. They think it's going to give them all the answers. So so it was that really and maybe you know part of it was my mathematical interest; me going look isn't this clever you get all this data and you did this to it and it tells you this thing that you never knew before and I do still find that really exciting, but then the other bit of me was you know me as a citizen if you like going, why are we so convinced that all these quite difficult messy complicated human problems can be solved if you just collect enough data put it a big enough computer.
Pennington: I’m going to pull in a question from the audience and this is a reminder. If you have questions for to Timandra, we will try to get to them throughout the rest of the show and certainly at the end, but someone just post a question whose decisions do you think are more biased, algorithms or people? And I felt like a nice sort of question to sort of scoop in there.
Harkness: That's brilliant question. That is a big question. I mean, I think it partly underlies to get algorithm you think well, I know that I'm biased, I'm full of all these shortcuts and loyalties and emotions, so maybe an algorithm could step back from that and be more objective. Well, I think there's two things that play one of them is that algorithms are made and designed by people. They are as flawed and imperfect as the people that build them. The advantage they have if you like is that by building an algorithm you kind of have to build assumptions into it, but it does help you be more aware of what the assumptions are that you're building in and even though you can't have a fair algorithm in an unfair world, for example, to go back to the a-level schools results algorithm, the truth is that in a normal year where the kids took exams a lot of them would find that their exam results were lower than the teachers have predicted. So, this does tell us something about the the unfairness the school system probably but in a normal year, the kids get state exam themselves. So at least they get to affect their own outcome, and this year they didn’t. So, you can't actually have an algorithm that is going to dish out a completely fair result because the world is not fair. What you can do is say okay well, but we need to be explicit about what kind of fairness we trying to achieve. Are we trying to achieve everybody goes in on equal basis in which case we know that what comes out will be unfair because it's not a fair world, or are we going to say we want things to come out and look fair by some other measure in which case maybe we have to adjust people and not treat me fairly on the way in? I mean off called were big defensive. They said well, we have tested our algorithm by all these measures we test them like are poor children that are disadvantaged. No. Are boys or girls going to be disadvantaged? No. Are all these different ethnic groups can be disadvantaged? No. All these subpopulations are going to come out roughly is in the same state as they would if they take the exams. So, on a population level, they said we've been totally fair. Look every subpopulation has been treated fairly as like if every individual has been treated fairly. So so that I think it can be good that building an algorithm makes you think she decide well what's fairness look like what kind of fairness you want? And also, by the way, this reveals to us what the other fairness as there are in the world. But but the thing is there's also this slightly underlying assumption that people are basically all biased and prejudiced and awful and I think you have to remember the difference between algorithms and people is that a person can reflect on themselves go, oh, yeah, I should just caught myself assuming that, you know all boys were like this. But actually, You know, even if the data says that on average more boys are like this and I shouldn't assume that of every boy that I meet and therefore I'm going to change my attitude in future and deliberately try not to think that and deliberately set myself up so that I don't slide into this habit. What is an algorithm has no moral sense? I'll run this with this is going to be wrong and algorithms going to do exactly what you programmed it to do.
Bailer: So, you know one thing about the the algorithms I wonder I love this idea that you phrase from this issue of turning over human judgments to algorithms. But I also wonder if it's how people sell algorithms and the results of algorithms that perhaps they sell them as if they have this this this level of precision that they really don't have. That they oversell predict the precision and the uncertainty and variability that are baked into this.
Harkness: Yeah, I think that's a point very well made and again, When it's not there is just a very basic thing of especially if you're a corporate entity and you're designing an algorithm you go hey, our magic algorithm will help you do this and you go you've just given me two decimal places there so that basically into making this up. I'm not going to you seriously at all and this is a problem, sometimes, that you want to question, how did you get those outcomes? And especially if they're private companies they go we can't tell you it's a statute, is it's a secret. It's our commercial Secret. But the other thing is I think that- it's that uncertainty question, which I think is a much bigger question. I think we look to things that have numbers attached for certainty. I think that's one of the great deep appeals at the moment of statistics and data and numbers is that the world is very uncertain, it's very unpredictable feels. It feels risky, even though actually it's safer than any other period of history still even in spite of the pandemic is still a very safe period but because it's hard to make sense of Because it's a world that's changing socially and politically as well as everything else. I think people feel very insecure. They feel fearful about the future and they hope that numbers and data will give them something very definite. So, you may know that the future is going to be awful, but at least you'll know it's going to be awful in with mathematical precision. Whereas of course also all statisticians know that approximately 95% of your job, two or three close to either side is actually just quantifying uncertainty, is saying well, we think it's probably within this range, but the like the more you narrow that down the less certain you can be about it. So you can look at you could easily look at the whole of Britain and go well, we’re certain London is in there somewhere, but then the smaller the area you pull out in the city unless you know is London. So I think that- I think more be more upfront about inserting would really help in a lot of cases, just if we all need to learn to accept risk, not just in the sense of going out on your bicycle and getting in a terrible accident like poor Tim did, but risk in the sense that you don't know what the future's going to be and sometimes you don't even know things about the present, you know, we don't really know how many people have Coronavirus. We can make an estimate by various methods, we can have various figures and go. Okay. Well, these different but they give us a ballpark figure, but we don't know, we probably never will know what we have to do is become better at making decisions. Accepting that we don't know things were certain, and all we can really do is get an idea of roughly what something is and how to how uncertain this is.
Pennington: So, we have Tim Harford still with us and I believe he has a question for you.
Tim Harford: So to bother you you raised a couple of times of the puzzle that we put so much trust in algorithms and I wanted to ask you about that a little bit more in the a-level predictions thing is it's a really stark example. This is a situation where if you put it all the government said we are going to cancel your exams, this is not safe and then a computer will give you the grade that you would have got if you sat the exams. Which of course when you put it like that possibly be true. How did they- how is it that they managed to fool themselves into thinking that it might be true? How did the rest of us nod and accept it like, oh, yeah, I suppose that'll do, and is there anything that we can do to have a more realistic view of what algorithms can and can't achieve? Because they’ve got their place of course?
Harkness: Well exactly I mean possibly hopes that the fact we had teenagers out on the streets with signs that say things we probably can't say the podcast but they're very rude about what the algorithm does. I’d slightly hope that that will actually sink in and people will go, yeah lot of this is just hyperbole. How could- How could an algorithm ever possibly know that? I do think it's less a sign of how powerful its data are more sign of how much we lack in human fields of go politics economics philosophy, even with that. We do have a government in the UK at the moment that is quite technocratic. We have been certainly dominant coming since one of the chief advisors is really really keen on data prediction algorithms and getting more people into government and understand data, which you know, what level would be great. It would be great if more of them understood stats data, but they there is a slight air of well, yeah, you don't get enough clever people enough data and that will give us all the answers, and I rather want to say we are the government you should have ideas. You should have policies; you should have principles. You should have a vision of where you want to take this country to, and that's what's going to get us through and data and algorithms, however, good, they are can only be a means to help us get that. They can possibly give us a better idea of where we are and a better idea of the outcomes if we do different things, but they don't get to tell us what we should do. I do and I do think it's that I think it's the lack of direction, a lack of vision and lack of self-confidence that leads us to put far too much confidence in algorithms.
Pennington: That's that feels tied into a question we have we got from an audience member who asks if we're spending enough time scrutinizing the questions we're trying to have big data answer for us.
Harkness: No, I don't. That's a really good question. Exactly for that reason. I think if you formulate the question, right then finding the answer is often most the easier part and I think that if you ask a lot of statisticians their job is to go in early and help people formulate the right question in the right way. You know, I would still say even though I that I was more a writer than a statistician and I always say if I can ask the right questions I consider it a job well done rather than giving the answers for somebody else.
Pennington: So, we have two more comments that came through one just from someone who said they attended you show last year, very entertaining and instructive but they did not volunteer for the Shockwave. So, this is kind of related. So, someone is asking if you would mind telling your favorite statistical joke?
Harkness: Well, they might set it before because it is my favorite and I do tell it all the time. But why should you never tell a statistician he was average? Because it's means.
Pennington: That sounds like one that John Bailer might have actually said.
Bailer: I have to tell you Timandra, my family thought this was an impossibility that there could be someone who could have humor and statistics as part of their life, except I have a worse one. It may be a bit UK specific. I don't know if this will make sense for an American audience, but what is a statistician’s favorite sandwich filling? Correlation chicken.
Harkness: See, I don’t know if you have Correlation chicken. American listeners going, huh?
Pennington: I am familiar with it. I do before we before we go I saw you. So I was stalking you this morning as I did preparations and I saw that you tweeted out that you have a new piece out in Significance about and I figured Brian would like me to ask about that about John Gronte. I don't know if I'm pronouncing it Gronte. Yeah. I'm a bit obsessed with superhero of stats why? Harkness: Yeah, well because, now he was born 400 years ago this year as I discovered right this piece more Significance because he makes a tiny part in my book because I try and get over the ideas of the stats of this the book by telling you the story of the person who first thought of them because then they will make more sense. And he said he lived through the English Civil War, fought English Civil War of parliament side waves of plague because he was in London. He was a founder member of the Royal Society in spite of just being humble haberdasher. But he wrote this one book which is about the bills of mortality was about the death records of what people are died of and in this book he just didn't fetch it it all these concepts which he needed to try and get information out of the data it basically this raw data for about 50-60 years of mortality and he went through and he said well, you know, but you can see these patterns. If you do this, you can see that pattern, so he came up for example with the idea of excess deaths. He looked they said well in this, you know this year we says about plague year because this many plague deaths listed but hang on, if we look at deaths from other diseases in the years before and after this year, they were about seven or eight thousand. And in this year, it's seven- It's in this year's 18,000. So where did these 10,000 other deaths come from? The must-have been more plague deaths than were written down as plague. And so many ideas which you know, you didn't have the language for it, but he basically invented a lot of statistical ideas and yet there's not a statue that's not even a little plaque to say where he lived.
Bailer: There should be.
Harkness: There should be I'm going to start campaigning like there’s a few fans actually. I might just start a fan club and it's got no statue of John Gronte, and then he lost everything in the Great Fire of London and then he was persecuted because he had converted to being a Roman Catholic, which at the time, was very unpopular and basically died in poverty aged only 53. I know of his sight his life is a roller coaster and he invented all these statistical ideas. There should be a Hollywood movie about it. If there are any Hollywood producers listening, write to me.
Bailer: That’s our biggest listener audience segment, Timandra, that’s clearly who we’re appealing to in this series.
Harkness: Absolutely. George Clooney could totally play him.
Pennington: I’m going to launch, just as we’re wrapping up, John’s question he normally asks, some sort of maybe what advice would you give to statisticians who want to maybe not shock people in an audience, right, but maybe want to communicate to a broad public? What advice would you give to them as they're thinking about how to present they’re their research or connect with those those audiences outside the statistical community?
Harkness: Basically, you've got to start where those people are and I think this is always whether you're trying to do comedy or radial write books or whatever you're trying to do just start with those people are listen to them more than you talk to them. Think about, well, what's you know, what are they concerned with? Have a look at what their newspapers are, to see what the stories are what the adverts are for. Those are the things that those people are interested in start from there and go to where they are. Look for things that will arouse their emotions and it's taken us right back to Tim Harford started to dissolve. It's the feelings that will grab them and make them care. If you can't make them feel something about what you want to talk about, then why would they give you any attention at all?
Pennington: Oh, that's great. Thank you so much for being here today. That's all the time we have for this episode Timandra.
Harkness: It’s been an absolute pleasure.
Pennington: We’d also like to thank the Royal Statistical Society for allowing us to record two programs as part of their annual meeting. Stats and Stories is a partnership between Miami University’s Departments of Statistics and Media, Journalism and Film, and the American Statistical Association. You can follow us on Twitter, Apple Podcasts, or other places where you can find podcasts. If you’d like to share your thoughts on the program send your emails to statsandstories@miamioh.edu or check us out at statsandstories.net and be sure to listen for future editions of Stats and Stories, where we explore the statistics behind the stories and the stories behind the statistics.
How to Understand the World Better With Statistics | Stats + Stories Episode 156 /
Tim is an economist, journalist and broadcaster. He is author of "How To Make the World Add Up", "Messy", and the million-selling "The Undercover Economist". Tim is a senior columnist at the Financial Times, and the presenter of Radio 4's "More or Less", the iTunes-topping series "Fifty Things That Made the Modern Economy", and the new podcast "Cautionary Tales". Tim has spoken at TED, PopTech and the Sydney Opera House. He is an associate member of Nuffield College, Oxford and an honorary fellow of the Royal Statistical Society. Tim was made an OBE for services to improving economic understanding in the New Year honors of 2019.
Read MoreThe State of Human Rights in the Pandemic | Stats + Stories Episode 151 /
Megan Price is the Executive Director of the Human Rights Data Analysis Group, Price designs strategies and methods for statistical analysis of human rights data for projects in a variety of locations including Guatemala, Colombia, and Syria. Her work in Guatemala includes serving as the lead statistician on a project in which she analyzed documents from the National Police Archive; she has also contributed analyses submitted as evidence in two court cases in Guatemala. Her work in Syria includes serving as the lead statistician and author on three reports, commissioned by the Office of the United Nations High Commissioner of Human Rights (OHCHR), on documented deaths in that country. @StatMegan
Maria Gargiulo is a statistician at the Human Rights Data Analysis Group. She has conducted field research on intimate partner violence in Nicaragua and was a Civic Digital Fellow at the United States Census Bureau. She holds a B.S. in statistics and data science and Spanish literature from Yale University. She is also an avid tea drinker. You can find her on Twitter @thegargiulian.
Episode Description
Almost every day we seem to get new data about the COVID crisis. Whether it’s infection rates, death rates, testing rates, false-negative rates, there’s a lot of information to cull through. Making sense of COVID data is the focus of this episode of Stats and Stories with Megan Price and Maria Gargiulo.
+Timestamps
2:55 What’s the reaction been?
11:10 How important is the information in supporting these decisions.
14:30 What stories are we missing?
18:14 Schools and Covid.
23:30 How to Make Sense of all of the COVID data.
+Full Transcript
Rosemary Pennington: Almost every day we seem to get new data about the COVID crisis. Whether it’s infection rates, death rates, testing rates, false-negative rates, there’s a lot of information to cull through. Making sense of COVID data is the focus of this episode of Stats and Stories where we explore the statistics behind the stories and the stories behind the statistics. I’m Rosemary Pennington. Stats and Stories is a production of Miami University’s Departments of Statistics and Media, Journalism and Film, as well as the American Statistical Association. Joining me are regular panelists John Bailer, Chair of Miami’s Statistics Department and Richard Campbell, former Chair of Media, Journalism and Film. Our guests today are Maria Gargiulo and Megan Price of the Human Rights Data Analysis Group, or HRDAG. Price is the Executive Director where she’s worked on projects related to human rights issues in Guatemala, Colombia, and Syria. Gargiulo is a statistician with HRDAG and was also a data science fellow at the US Census Bureau. They’re here today to talk about some of the group’s work on the COVID crisis. Maria and Megan, thank you so much for being here.
Megan Price: Thank you for having us.
Maria Gargiulo: Yeah, thank you.
Pennington: Megan, I’m going to start with a question for you. So, HRDAG describes itself as quote -a non-profit, non-partisan organization that applies rigorous science to the analysis of human rights violations around the world- end quote. You’ve been publishing a bit about COVID including some pieces in Significance Magazine, how do you situate the work on COVID within the human rights framework that your group, you know, is sitting in?
Price: Yeah, that’s a great question, thank you. Well, everything that we do stems from the Universal Declaration of Human Rights. That’s the starting point for all of our thinking about our work and we’re also just humans. And so, when this crisis started, of course, understanding it and trying to just get some handle on how to even go about making decisions about how to live our lives was at the forefront of all of our minds. And through our work, we’ve had so much experience as what we think of as science communicators, thinking about how to explain really complicated, emotionally-fraught ideas to folks who may not have much or any grounding in statistics or data analysis or science work. And so, we really felt like that was not only a role that we could step into but also something that could help us as a team to focus on something that felt urgent and useful.
John Bailer: So, what’s been some of the reactions that you’ve had to these columns? I mean, you’ve been writing a number of these explanatory pieces to try to convey and communicate some of these issues that are emerging with the pandemic. Do you have any feedback?
Price: We have, and I have to say this is a little bit biased because it was one of my friends, but my favorite reaction has so far been to a column we wrote in a literary magazine called Granto, which is perhaps not a common outlet for statisticians, about essentially what role does stats play in interpreting screening tests and how do you know what your personal screening test means? And one of my particularly math-phobic friends reached out and said I actually understood that, thank you. And that is just the most gratifying feedback we can get.
Richard Campbell: So, can you talk a little bit about the undercounting of COVID infections and what some of the obstacles are in getting good data in your work?
Price: Sure, I think I might start that- I’ll start with your second question which is getting good data in our non-COVID work. Our non-COVID work is focused on human rights violations as our name implies, and specifically on types of violence. And there are a whole variety of reasons why that might not be fully documented. And some of them are pretty benign, some of them are just the violence wasn’t witnessed or the individuals who are doing the best they can to document and describe that violence just didn’t have the resources that week, didn’t have enough people the ground and then other times they’re pretty intentional, a lot of violence is hidden and very intentionally kept from the public eye, and so I would say that a variety of those same things are happening in our attempts to understand COVID-related deaths. There are certainly a lot of incentives to not categorize something as a COVID death or to choose different metrics in terms of positive rates of tests or numbers of tests or who gets tested, and those incentives are not always going to lead to the most complete and the best data collection, unfortunately. But then again there are also just lots of perfectly benign reasons in New York at the peak of the outbreak there, everyone was just overwhelmed, and the idea of writing everything down, you know, certainly came far lower on the list of priorities than helping everyone you could help. And so that’s, I think where statisticians can come in and say look, you don’t have to write everything down, we can use the tools in our toolkit to fill in those gaps.
Bailer: So, you write in one of the essays that your group wrote that science starts with theories and stories about how the world works. Now, does the idea of trying to- you know this is a really hard story to tell- that people, you know, they may have last thought about theory as something they heard about in the scientific method when they were at school and didn’t really think a lot about since then. What are some of the challenges and some of the potential solutions when trying to communicate these more complex stories? Whether they are SAR models and some of the nuance of finding them to an audience that may not think a lot about theories and background?
Price: Maria, can I put you on the spot? Do you want to take that one?
Gargiulo: Yeah, sure. So, when I think about theories personally, the thing I really like to try and figure out is how do I test if I think a theory holds in this situation. And I think in communicating science, giving people things to look for is really helpful so I think a lot about- I think the piece you mentioned the Director of Research, Patrick Ball wrote and he kind of provides a list of like things you might look out for, so for example when we’re testing a theory, a rigorous theory is really careful about the types of assumptions it makes. So, in order to come to our conclusion that we made about the way the world works, what are the things we assumed? And once someone kind of delineates those really clearly it’s a lot easier to say oh I think those assumptions are reasonable. I can kind of hold on to the threat here, that makes sense, or I don’t think that’s true. And if that’s not true you might have a way to start thinking about oh, if that’s not true, what other things might not hold? So, I guess trying to communicate the ideas that let people test the theory for themselves, even if that’s an informal way, I think that’s really important for things like this.
Campbell: So, this morning, speaking of stories, there’s a story on the front page of the Dayton Daily News about area residents could be part of a virus study. And through this podcast and talking to scientists and statisticians I’m just confounded by the fact that we haven’t done more random studies of COVID. And I’m wondering both at the regional level and at the national level; and that Ohio just now is going to do a random study of 1200 randomly selected participants. What’s the problem here? I mean we’ve talked to statisticians who have said this should have been going on much earlier and we’d have a much better idea of who’d infected and who’s not. And I’d like both of you to talk about this.
Gargiulo: I can start. I think for me, and part of this is I don’t actually understand, to the full extent, resource constraints right now, but I think a lot of this is resource constraints. It’s a lot easier, I think, to say oh we have these 20 people in the hospital right now, we can test them, we can talk to them, we can do these things. Rather than okay, you know thinking about what does a representative sample look like and finding that representative sample within the community. Do we actually want it to be fully representative in that normal sense? Do we want to oversample certain groups who want to sample other groups? So, I just think it’s harder. It’s- you know, convenient samples are nice because they’re convenient. Random samples are hard because they need to be really carefully constructed and under constrained resources, it’s not clear to me how feasible that is or how hard or easy it is.
Price: Yeah I’m mostly going to second everything Maria just said. I mean I think much like kind of prioritizing that happened around New York around do we just try to get everyone we can to the hospital? Or do we keep perfect records? I mean one of the things that I think is hardest about this moment in time is that just everything needs massive resources and figuring out how to allocate those and how to balance the really urgent today priorities, while also like recognizing that we need to make some long term- we need to make some decisions with a long term vision that you know our future selves will be grateful for, and I’m certainly grateful that it’s not my job to make those kinds of decisions. And I think also coming from- I have a public health background where, you know, there are lots of situations where you can’t do a randomized control trial for ethical and logistical reasons and I think there’s a certain amount of that at play here, too, and I think that because of the way the United States is set up- you know, something that public health has done for years and years is to identify these natural experiments that happen because different regions make different decisions and take different actions, and so, personally, I think that it’s as important and as valuable to identify those comparisons that are more readily available as it is. I mean I certainly- let’s also do randomized trials and let’s get those organized, but I think that both of those things happening at once is the way to go.
Bailer: So, this part of the conversation makes me think a lot about the value of information. You know, so, in some way what we’re saying is that we’re taking these samples of convenience we’re looking at individuals who are probably symptomatic and that are showing- that are of gravest concern, but they’re telling us about, you know, are people that are symptomatic, are the disease, do they have the disease as opposed to knowing what’s going on in the population? And so I think it’s a hard question, you know what’s- you talk about decisions and what’s the value of the information that you gain from knowing more about what’s going on in the population than knowing about what’s going on in some small symptomatic subset of the population, and I agree completely about the, you know, that resources have to be allocated in a way that – there’s a triage component to this, to solve this problem in a sensible order, but if we’re – how important is it to have the information that’s unbiased and kind of meaningful for supporting these decisions? That’s-
Price: I mean, yes.
[Laughter]
Price: And you know, but again I think that that’s where, you know, as statisticians I mean we should always recognize when our data are incomplete and biased but we also shouldn’t just sort of throw up our hands and say well, then we can’t use that data. There- we should recognize when a particular class of methods is appropriate to either adjust for those things or to account for them in some way. And I think also you know kind of coming back to natural experiments you know we do have a couple of really-I hesitate to use the word interesting in this setting, but really interesting things that have happened, specifically on cruise ships, which is a closed population and where they were able to collect data about every single person and so that again like the population on a cruise ship isn’t going to represent general populations anywhere but it gives us a chance to say okay if we test every single person, what’s the difference that we’re seeing between symptomatic and asymptomatic and I know here in San Francisco they did a very similar thing just at a microlevel they picked like a four-block radius in one of the neighborhoods in San Francisco and said we’re just going to test everybody in this four-block radius. And so, I think there’s also opportunities to do that kind of hyper-localized thing to start to learn more information.
Campbell: You know what that- what did that yield? That four-block study that was interesting?
Price: Oh man, that yield- so this was a UCFF study and in partner with another organization that I’m not going to be able to come up with but what they found was the kind of racial disparity that we’re now seeing at large, especially in the latest New York Times data. So, in this four-block radius this four-block neighborhood; it was in the mission. And I can’t remember now but I want to say like maybe five percent of the Hispanic residents were positive. Not necessarily symptomatic, not necessarily [inaudible] but they gave everyone a diagnostic test and they were positive. They literally could not find a single Caucasian member of that neighborhood that tested positive.
Pennington: Wow. That’s incredible. You’re listening to Stats and Stories and today we are talking with Maria Gargiulo and Megan Price of the Human Rights Data Analysis Group. We see a lot of coverage in news media of infection rates, of death rates, of hospitalizations. Given the work that you have been doing on HRDAG on this issue are there stories in the data that are under-reported that you think people should be paying more attention to?
Price: That’s a great question. Um. Hmm. To be honest I can’t really think of one because the one that has been pressing on my mind the most has been the racial and ethnic disparities and I think that we are starting to see more attention being paid to that so I’m grateful to see that coming to light. You know I think as with anything else that’s really scary, we’re seeing a lot of stories about how bad things can be, but I’m also really hesitant to say hey we should tell more stories about people who are recovered and are fine because we need people to take action to protect their community. So no, actually on balance I kind of think that most of the stories are out there. I don’t know, Maria, what do you think?
Gargiulo: So, a story I would like to hear more about in a non -U.S. context is what the intersection of say COVID at conflict or COVID at displacement is going to be. So, I’m thinking for example COVID arrives at a refugee camp, you know what happens? And that is terrifying because I think the only conclusion that I come to in my head is the results are going to be grim, but what does that like- what happens? Do people leave the camp? Do people stay in the camp and get sick? So that’s a space I’m watching to just see what happens and also how does humanitarian aid react to that? I have no idea. So we don’t- you know so that’s not so relevant in the U.S. context, but you know as we consider COVID as a global pandemic I think that’s something I will be watching and really hoping goes better than I’m expecting it to go.
Pennington: Do you know of any work that’s looking at infection rates along class lines? Because I would imagine that there could be particular breakdowns along with class in some places. And it’s not something that I can remember having seen like you’ve pointed out Megan, I think the reporting on race has just sort of started emerging in a lot of the coverage but I can’t remember seeing much about class. I’ve seen it about the geographic breakdown like rural versus urban, but then this issue of are poor communities being impacted more or less or anything like that, so I just wanted to ask that question.
Gargiulo: Yeah, not that I’m aware of and in fact, earlier in the pandemic, which I mean is such a weird way to describe things because as much as we’re all in this time dilation, you know it honestly hasn’t been that long, but earlier I did see some comparisons of occupation, of risk and infection rate by occupation which is a bit of a proxy for that and I, haven’t seen much follow up on that. so, I think that that is another thing that deserves more attention.
Bailer: And it seems like some of the things related to- some of the exposures related to occupation may also play out in terms of living conditions. So if you’re- the concern I guess, in the U.S. it’s something like 40% of the fatalities are in nursing homes, you know and as you look in other environments it tends to be where people are living in more group housed environments and if you live in a high-density area as well as go out and work it seems like that just kind of explodes it. So that runs a little bit counter to my earlier comment about who we’re studying and how. And in some ways, if we’re looking at the people who are going to be most dramatically impacted then you might want to be targeting what we’re doing. I thought that I saw that there was some recent work that’s starting to come out related to the COVID impact in Central and South America, and I won’t swear to it; I’d have to dig that up too, so I’m not sure.
Price: Yeah, there has been and so I guess that’s sort of the coda, to my comment to- you know, what stories are getting told is highly correlated with what media source you’re consuming and so, yeah. Because we have a lot of projects and partners and collaborations in Central and South America, I have a lot of sources who have information on that part of the world and so yes there is a fair amount of coverage coming about how the infection rates are unfolding there. But yeah I’m not seeing that in perhaps more conventional mainstream US media.
Campbell: One of the things that relates to the sort of class problem that Rosemary brought up is there’s a lot of discussions now should we send our kids back to school and part of is it is that wealthier school districts are in better shape to do this than poorer school districts and I guess my question is if you have children or if you don’t have children, I mean what should we do? What’s the best advice? Or is it all sort of just a regional or local problem?
[Laughter]
Price: So, I have two kids. My daughters are 14 months and 3 and a half years old and they’re at daycare right now, and I kind of am both like really happy about that and really scared about that. and also, my husband is a public-school teacher so schools and kids and what to do is like all we think about right now. And you know, it’s interesting I think that operationally it has to be regional because it’s going to be so contingent upon just what the situation is on the ground but on the other hand, you know a top-down national you know like threshold guidelines; you can only even consider opening up the schools if your case count per capita is X. You know to safely have in-person learning you need Y dollars per student. We’re going to provide these grants that are going to cover you know PPE and sanitation services. I mean that kind of thing can be in a bigger framing, but yeah I mean just to kind of answer the question as a statistician, I have no idea.
Bailer: Well, you’re telling us something because you’re both working from home now. So, there’s clearly a policy decision that you’re making at a very local level about kind of what can we do to prevent potential infection within our community, within our workforce. Maria, did you want to add to that too?
Gargiulo: I mean really just to reiterate what Megan said and I really have spent no time thinking about this but I think like you know one I have no idea like statistically speaking and two though I think like the whole idea of like either all schools opening or no schools opening, that’s not it for me. Like I think these decisions really, they need to be made in the communities because if something goes wrong it’s those same communities that are going to be affected. So, it’s not just about are the kids in school but if the kids are in school and something goes wrong what are the potential repercussions? And I don’t think while we might have really great- it would be great to have some national guidelines to help school districts out at the end of the day the national government isn’t suffering if something bad happens, the community is suffering; they need to make that decision.
Bailer: But those communities that need to make decisions, just getting back to what you’ve been producing, and some of the things that you’ve been writing about are that they need good data; they need good information. And in some ways you know, you- if you’re a- so now, Maria I’m going to make you the superintendent of our local school district.
Gargiulo: Excellent.
Bailer: Congratulations and condolences, by the way, because you’re the one that has to make a decision about how many kids can come back to school. How should they be spaced in their classrooms, how should- you know, all of these things? And by the way you’ve got ten parents on the line waiting to talk to you about why they need their kids back in school. I mean, so how does science help, you know, how does science and the study of some of the data that’s associated with this pandemic- how can that be communicated to help these local decision-makers that you’ve appropriately mentioned to make the calls that they need to make?
Gargiulo: Yeah, so I think that if I were the superintendent in charge of this I’d want to talk to different people. So, I’d want to talk to these parents on the phone, I’d want to talk to my teachers. Do they feel like, you know, part of this is not necessarily about the science, the ground troops it’s also like do you feel safe going to work? How do the kids feel about going to school? I’d love to talk to some of them and figure out you know if you had the opportunity to go back to school would you feel safe doing that? or would you just sit in class being really anxious all the time you know thinking today is the day I’m going to get sick or I’m going to get one of my classmates sick or my teacher? So, I’d want to start with conversations there and then I’d start asking questions like how much money do I actually have for personal protective equipment? do I have backup plans for when and if things go wrong? What do those look like? What are the effects of starting a school year in person and then sending kids home? This is I think a different kind of data collection that isn’t necessarily like you know biological data about the virus. You know like do students have the internet at home, right? These are other types of data collections we need to do. So virus biology and you know everything we know about the spread about the epidemic I think helps us make decisions about okay we can only have you know 15 students in the classroom so maybe 50% full, we’ll call that, but also then there are these other types of data collection that need to happen that really has nothing to do with the spread of the virus and everything to do with you know the upside kind of social dynamics of what’s happening to all kinds of angles, so I really want to get more data sources involved even though it would complicate things.
Bailer: Well, you know, if this stat thing doesn’t work out I think there might be a superintendent gig in your future.
[Laughter]
Gargiulo: My retirement job.
Bailer: That’s a well thought out response.
Pennington: I’m going to swoop in with a final question and steal it from John, you know, people I think are overwhelmed with data related to this you know because it’s coming out every day. Given the work that you’ve been doing what advice would you have for our listeners about you know how to wade through the data and how to make sense of it in their own lives?
Price: You want to go first Maria, or do you want me to?
Gargiulo: No, you go first.
Price: So, you know what I personally have been doing has been to have really strict news and data consumption diet and to really stay focused hyper-locally. And it’s hard because my phone at any moment wants to tell me about these headlines about how there’s a spike in cases in the state of California, but the state of California is really big and in my city, there is an increase in case but it’s not quite as scary and so working really hard to contextualize those big stories with the hyperlocal data and I do think that that’s actually something that most cities and counties that I have looked at have been doing a really good job of being transparent and saying look this is what we know and this is how we know it but that said, I am a statistician and so I find data very comforting. And I think that if that is not the place you’re coming from then even that can still feel really overwhelming because these hyperlocal dashboards do still contain a lot of information and they get updated every day and so you know in that case what I would really recommend is to identify one or two sources who you absolutely trust who are filtering and contextualizing that information for you and that may be a news source that maybe a friend that may be an expert on twitter, it can be hard to vet those sources and to really know that you’re getting really reliable information that way but I think that if you personally don’t have kind of the comfort to deal with that raw data that’s coming at you that would be my recommendation.
Gargiulo: Yeah, I’ll just kind of second everything that Megan just said I, in particular, don’t look at the data every day. call me crazy but I do read a lot of epidemiologists on twitter and you know it’s really nice I get really good synthesis and for me they also sometimes kind of write about kind of the news studies that are coming out and I could sit down and read those studies and you know I might understand bits and pieces of them but for me, it’s nice to have these data contextualized with like what are the advances we’re making, where are we making progress, where are we really struggling right now? And getting that from someone who is an expert not only in that field but there are lots of folks on Twitter being really thoughtful about science communication, that’s where I’ve been doing a lot of my learning and I think that’s just helped me kind of you know to find the signal in the noise and get out at least what I want to understand which is mainly like what does the general trajectory look like? And Megan is right with these hyperlocal news sources. Like that’s really helpful to me especially because I have not been really leaving my house so really the most relevant thing for me is that hyperlocal geography but then also understanding like here’s the trajectory we’re going on in terms of scientific methodsso balancing both like research like with what’s actually happening is what I look for and I just try and read experts on that.
Pennington: Well, Megan and Maria, thank you so much for being here today.
Megan and Maria: Thank you guys so much.
Pennington: That’s all the time we have for this episode of Stats and Stories. Stats and Stories is a partnership between Miami University’s Departments of Statistics and Media, Journalism and Film, and the American Statistical Association. You can follow us on Twitter, Apple Podcasts, or other places where you can find podcasts. If you’d like to share your thoughts on the program send your emails to statsandstories@miamioh.edu or check us out at statsandstories.net and be sure to listen for future editions of Stats and Stories, where we explore the statistics behind the stories and the stories behind the statistics.
Risk Assessment Biases | Stats + Stories Episode 147 /
Tarak Shah is a data scientist at HRDAG, where he cleans and processes data and fits models in order to understand evidence of human rights abuses.
Prior to his position at HRDAG, he was the Assistant Director of Prospect Analysis at University of California, Berkeley, in the University Development and Alumni Relations, where he developed tools and analytics to support major gift fundraising.
Episode Description
Protestors have taken to streets across the U-S this summer in order to fight back against what they see as an unjust criminal justice system – one that treats People of Color in prejudicial and violent ways. The concern over racial bias in policing has long been a concern of activists, but there’s an increasing focus on other ways racial bias might influence decisions made in America’s courts and police stations. The statistics related to race and the criminal justice system is a focus of this episode of Stats and Stories.
+Timestamps
What spurred this research? (1:33)
What is a risk assessment model? (2:12)
What ore these tools suppose to do? (4:00)
What is fairness? (5:18)
What did you learn? (10:12)
What is the takeaway for the layperson? (15:20)
What’re some parallels to this work? (19:35)
How do you make this interesting? (22:07)
What’s the flow of your work, for reproducibility? (25:00)
+Full Transcript
Rosemary Pennington: Protesters have taken to streets across the U.S. this summer in order to fight back in what they see as an unjust criminal justice system. One that treats people of color in prejudicial and violent ways. The concern over racial bias in policing has long been something activists were thinking about, but there’s an increasing focus on the other ways racial bias might influence decisions made in America’s courts and police stations. The statistics related to race in the criminal justice system is the focus of this episode of Stats and Stories where we explore the statistics behind the stories and the stories behind the statistics. I’m Rosemary Pennington. Stats and Stories is a production of Miami University’s Department of Statistics and Medial, Journalism and Film, and the American Statistical Association. Joining me are regular panelists John Bailer, Chair of Miami’s Statistics Department and Richard Campbell, former Chair of Media, Journalism and Film. Our guest today is Tarak Shah. Shah is a data scientist at the Human Rights Data Analysis Group, or HRDAG where he cleans, processes, and builds models from data in order to understand the evidence of human rights abuses. He was the co-author of a report released last fall that examined whether a particular risk assessment model reinforces racial inequalities in the criminal justice system. Tarak, thank you so much for being here today.
Tarak Shah: Thank you for having me.
Pennington: Could you explain what spurred this particular bit of research into this risk assessment model and what your report found?
Shah: Sure. So, there’s been interest in these pretrial risk assessment models in particular for a little while now. Partly because of how much public opposition has grown to the money bail system. And because of that these other risk assessment tools have been proposed as more objective or more neutral alternatives to decisions by judges which may be considered biased. And this kind of fits into that atmosphere.
John Bailer: So, can you talk- just to take a step back, just to help fill in the gaps for people that are new to this? I mean- this idea of what happens in a pretrial process. And then as- sort of to jump off of that, what does a risk assessment tool do in the context of this pretrial process?
Shah: Yeah. Excellent question. So, in general when a person is arrested they- a court must decide and depending on what state you live in they’ll have either one to two days to make this decision. Whether you can go home while you await the beginning of your trial, or whether they need to take some kind of action, whether that’s detention or some kind of supervisory condition in order to ensure that you will appear for your court date and or that you will not be a danger to your community during the time that the trial hasn’t happened yet. So, those decisions are being made, as I mentioned before, the judges have historically relied on bail to make sure that people appear for their court dates, there’s increasing recognition that that disproportionately harms people who are poor and so there’s been interest in alternatives, but the basic kind of decision that either a judge or some kind of decision-making system is required to make has to do with usually one or both of those two elements that I mentioned. So, either whether a person is going to be a danger to their community or whether they are going to flee the jurisdiction and escape accountability. So that’s kind of the decision in front of us, historically made by judges. More and more judges are getting information from these risk assessment tools as just additional information to make that decision.
Pennington: So, are these tools like a technology that they’re relying on to help them understand what possible behaviors of particular defendants might be?
Shah: Yeah exactly, and they so these are tools that will take data in about characteristics of the arrested person. So, things like their age and sex and other demographic information as well as things like their arrest history or other kind of encounters with the court system. And I mentioned those kinds of two high-level principles like danger to the community and risk of flight. In practice, like those are kind of fuzzy concepts that need to be made concrete when we’re talking about actual measurements, and so the way those things get measured is in terms of danger to the community we- those who developed these risk assessment tools look at rearrests. So, was a person who was going to face trial- were they rearrested before their trial concluded? Sometimes that’s narrowed down somewhat. So maybe in a given jurisdiction, they’ll only look at felony rearrests or violent rearrests and there’s all sorts of logic that goes into what counts in each of these categories. Similarly, with flight risks- that’s also a little bit fuzzy in some ways, and so what we can measure is failure to appear for a court date.
Richard Campbell: I was interested in you talked about fuzzy right there, and you talk about a definition of fairness, which seems like- that’s not something I hear statisticians talking about very much, having done a hundred and fifty shows- No offense, John. But how do you- what is the definition of fairness in risk assessment modeling?
Shah: Yeah, that is an excellent question and in fact, one that there are multiple definitions fairness in this context and I will give a couple of examples. So one is maybe just going back to what these models look like there are very often logistic regressions or some other kind of predictive model which will classify people into yes, they will re-offend or no they will not or something like that. And so, within that context, there’s kind of basic notions that I think anybody without any kind of statistics background might be able to pick up. Things like demographic So, do black individuals who appear before the court- are there similar decisions made about them versus white individuals? So, in practice there’s- we tend to rely on somewhat more complicated definitions of fairness. The idea being that- well, let me give you an example, so in addition to demographic, there’s things like equal false-positive rates. So, the argument here is that the biggest cost of one of these risk assessment decisions is when somebody has to be incarcerated or otherwise supervised as a result of that score. And so false positive here is somebody who is determined to be high-risk by this tool, but who in fact would not have gone on to re-offend or miss their court date if they were left to go home, so that’s one example. Another example is just kind of equal calibration across race groups. So that means like if you- so your logistic progression puts out the number 0.47, so like 47% likelihood that you’re going to re-offend or something. So white people who get that score people versus black people who get that store. SO, everybody got a 0.47 among the white group did about 47% of them reoffending versus similar numbers for the black group. There is=- so we have equal calibration, we also have false-positive rates. We also have a similar notion of equal false-negative rates- that is people who did go on to re-offend or miss their court date; how often were they actually labeled high risk or low risk and are those rates equal across race groups or other protected characteristics. The kind of challenge, well one challenge is just what I mentioned is that there are multiple different definitions and there’s not an official correct definition and in addition the examples I just happen to give there is like an important result in fairness which is that they are under most realistic circumstances, they are mutually incompatible. That is, you can’t meet all three of them at the same time. So, it’s- which I think makes sense from a non-statistical perspective. People have different notions of what fairness means; I think. But it does make it challenging to talk about fairness in these contexts and I just want to kind of add, so everything I’ve been kind of talking about is kind of fairness within the system defined by the model. Like where the outcomes that are measured in the model, and what were the data inputs that went into the model and they’re kind of taking those data as a given. A separate level of analysis here for fairness is whether or not there is bias in the data itself. Whether these measures are fair measures of the thing that we’re interested in measuring. And so that kind of goes back to what I said before where I said we have these notions of danger to the community or flight risk, but in practice when we’re talking about creating a progression model we need these measures and what we have is rearrests or failure to appear for court and so often there’s problems with both of those measures. A lot of people fail to appear not because they fled the jurisdiction but because they forgot or the court date got changed and they moved so the postcard that they received, they never got it. And similarly, with re-arrest, the assumption there if you’re using that data is that an arrest is an unbiased measure of criminality or dangerousness and there’s a lot of evidence that that’s not the case.
Bailer: So, what did you learn? Let’s get back again to kind of the punchline to the work that you’ve done. You’ve helped us frame what a risk assessment model is and how that’s being used in the context of establishing or evaluating fairness.
Shah: So, in this particular research, we were looking at a tool used in New York City to determine eligibility for a supervised release program. This is kind of an alternative to being detained while you await your trial. So, the idea was that individuals that get a low-risk score would become eligible for this supervised release program whereas those who received a high score would not and the alternative is that you are detained. As I mentioned we looked at some of these different fairness measures that I mentioned such as false-positive rates and accuracy across race groups, and the particular model met some of those but not others so it had much higher false-positive rates for black and Hispanic people than it did for white people and also in terms of demographic parody it was much more likely to give black individuals a higher risk than white individuals. But as I mentioned- well I don’t have anything else to say about that. but one kind of deeper thing that came out of that was a couple of things that we noticed about the data and the process used to build the model itself. So, in terms of the data, we had that question about whether in this case felony re-arrest was the outcome variable that the developers were modeling in their regression, whether felony re-arrest is a fair measure of dangerousness. And that was an important question for us because when we looked into it a little bit the training data for this model had all been collected during the height of New York’s Stop and Frisk program. And sometime after that data was collected New York courts themselves had determined that this was an unconstitutional program because it was disproportionately applied against black and Hispanic people. And something like 87% of people who were stopped under Stop and Frisk were black or Hispanic. So unfortunately for us I guess the arrest data that we got that was the training data for the model did not contain information about whether each arrest was a result of the Stop and Frisk or anything else, however, we’re able to kind of look into- like make some inference about how many of these arrest outcomes could have been affected by the Stop and Frisk program, so we looked back at what the most common arrest resulting from a Stop and Frisk stop were and they were either drug-related or weapons-related, so then we went back to our data and found that just under 40% of the arrests in our outcome data was either drug-related or weapons possession related. So, we can’t say that all of those were Stop and Frisk, but we can guess that a good number of them were because this was during that program. We also found- so one of the things that we had to do when we were writing this report was attempted to recreate the scoring model that’s used. So we kind of read the paper and it goes through logistic regression and so it kind of goes through the variables and so forth and so we have the same training data as the original developers did so we were trying to replicate their model at the beginning. And we ran into some challenges. We had the same train and test split as the developers did and we got the same coefficients for each of those different pieces of the data and that all made sense but then when we looked at the final model that New York was using the point scores that they gave for each characteristic did not match up with the coefficients that we found in regression and in fact appeared to be kind of picked and chosen from the different splits of data. So there were like some coefficients that were found from fitting the model to the training data, some coefficients that came again from the text data and some that came from fitting the model to the entire data set and it wasn’t- so that, I don’t think we would have known if we had not tried to replicate the model, to begin with. Once we saw that we contacted the developers and tried to get more information about what was going on and the best that we can find out is that there was like a decision process by a committee that was like oh this looks good, this doesn’t look good and they kind of worked their way to scores. And I point that out because that’s a little bit separate from the types of fairness that we were talking about but I think these tools often get packaged as like objective or neutral thing and here we see an illustration that that packaging is really hiding a lot of political decisions that are going on.
Pennington: You’re listening to Stats and Stories and today we are talking to Tarak Shah of the Human Rights Data Analysis Group. So, I’m going to go back to that sort of thing you were saying there at the end. So why should someone who is not a statistician, someone who is not engaged in activist morale criminal justice, you know what is the takeaway for just the layperson, as far as it relates to this particular report? Why should my mother or my best friend or my colleagues here care about what you found in this report?
Shah: Great, so I- that’s a good question, I would maybe think about that in a couple of ways. So one thing that I would think about going back to what I was just saying where often data-informed tools are presented as kind of a more objective alternative to some other procedure and I think it’s important for all of us regardless of where we’re working to be somewhat critical of that because often that’s just a way of kind of sneaking in whatever kind of biases we already had in through this kind of packaging. And I think that applies not just to incarceration decisions but often any kind of automated decision-making systems. I also think just- maybe a lot of us are concerned about policing and incarceration right now so as somebody who is concerned about that especially with pretrial incarcerations and these are people who are presumed to be innocent by the legal system, I think worrying about how those people are treated and how decisions about their liberty is made is an important thing on its own. And in particular, I mentioned that the risk assessment tools are often positioned as neutral or objective alternatives to judge’s decisions which are known to be biased and there is a lot of evidence that they are and so just to understand that we don’t get to wash our hands of the bias just by putting it through these systems. And maybe, hopefully knowing that would lead people to think a little bit more- so I don’t know if this is helpful or not but there’s kind of two ways to look at this, like are the decisions fair in terms of equal treatment under the law for different race groups. Another kind of level of wondering about this is whether- there’s kind of a larger issue here that’s at play and sometimes I feel like these risk assessment tools is to kind of push through an idea that like there’s a fairer way to incarcerate people who are presumed to be innocent and so I hope that p[ that the biases that we see everywhere else in our society also sneaked their way into the model, the data that we used to make kind of databased tools will force people to think about that larger picture and about what it actually means to fairly incarcerate innocent people.
Pennington: And it seems like it’s in line with a lot of research around technology that has shown that we have believed these technologies might, as you suggested, give us the get out of jail free card, to use that unfortunate phrase, right? When it comes to this issue of bias, oh we’ll allow the AI or the technology to handle everything because it’s not biased, and what we’re increasingly coming to realize in whether search engines or video games where you have certain avatars is that the bias is built-in because it’s not been challenged outside of technology. It would seem like your report is kind of suggesting that when it comes to this very important issue of incarceration of people before they are ever actually in court, right before they go to trial, that that bias has also been dealt into that technology. So, it feels like it’s within the framework of our larger understanding of the way AI has maybe not been as critically engaged with other technologies because we see them as these arbitrators of truth in a way that perhaps they’re not.
Shah: Yeah I think that’s right, and we see, I mean the data that we have are generated by existing social processes and the existing social processes that we have are not at all free from racial bias. And so that works its way into the data and like you say into all these different AI systems. And of course, in this case, making very high stakes decisions about people’s freedom. Yeah so I think that’s definitely an issue
Bailer: So, what if I- I’ going to hire you now to build a risk assessment tool for me. So, as you think about this and you know there are cases where we see this- in the banking industry, there are certain variables that are just not acceptable for use in prediction as inputs the models. Is there some sort of parallels to this? I mean if you were thinking about this as saying I would like to try to build as fair a risk assessment tool as possible, what would be some of the steps that you would think about and that you would need to consider in doing so?
Shah: So there are like- there have been people who’ve kind of pointed out or asked that or kind of define fairness as the absence of certain predictors in kind of the way that I think I’m less familiar with the banking industry but I think that they just dot have race as a variable and that’s how they kind of deal with that. I assume this is true in banking, it’s definitely true in criminal justice data that there are lots of things that correlate along with race and so you can’t really remove race from your predictors. You can remove that one column, but you can’t remove it from the zip code or from previous arrest history and stuff like that. So, I would start with that. like it’s not enough to sort of close your eyes and hope that you don’t see race because it’s there and so and then you know I think it would be important to kind of have like we talked about these technical definitions about fairness within the model and the predictions it makes and are they treating people equally across races. As I mentioned there are multiple different definitions and some of them might be mutually incompatible so it’s important if you are going to go down this route to pick one that makes sense for your application. And so I kind of- and I don’t have the absolute answer to this but I kind of mentioned that one way to think about this is how the cost of incorrect predictions is distributed across races and whether that cost and that burden is shared equally or not. And so in the case of- again I don’t want to come down on the side of one version or the other but in the case of these pretrial risk assessment tools that seems to suggest something like false-positive rates as like a more important thing to look at maybe because the- when you’re considered high-risk, that’s when you’re really paying the high cost of the predictions.
Richard Campbell: Some of your findings are really important. Even the phrase risk assessment modeling- I’m imagining- how do you get a journalist interested in that, and what the implications are of that, because I think these kinds of findings are really important for [people to understand, especially now. And have you seen any good coverage about your findings and how do you interest a journalist in this? I mean the way that I would go about it, of course, is to find somebody that was really treated unfairly in the system and tell that story, but then how do you make sure you explain what you found in the data understandable to a journalist who could then communicate that to the public?
Shah: So one thing before I go into my answer, this is like an interesting question because the fields that I’m talking about in evaluating fairness in terms of these technical definitions and so forth I don’t know if this is where I started but definitely my introduction and a lot of peoples introduction to it was actually through journalism it was a story in ProPublica about the compass risk assessment tool. That kind of set off a lot of this study. So, in some sense- like in that case the journalists were ahead of me. I was learning from them. But I think you pointed out the language itself risk assessment either sounds boring or technical but it’s also kind of a useful entry point because I think it’s a term that frames the decisions being made or the scope of what decisions can be made and so when we talk about pretrial decision making if we frame this decision in terms of risk assessment then the person in front of you all they are is a possible risk. And that’s- there’s been some push back against that in various places but I don’t know- so a different way to help think about what I mean when I say risk assessment frames that the decision in a very particular way. So, another alternative might be something like a needs assessment. Like I mentioned before one of the reasons a lot of people miss their court dates is not because they got on a private jet and went to the Bahamas, it’s because they don’t have a home address[ so they’re not able to receive updates about when their court date is. They couldn’t get childcare, or time off work, or they forgot. And so all these things are preventable in all of these other ways but if you only see the decision as a risk you’re only thinking of it in terms of well, regardless of what the reason the person didn’t show up is and that’s the only thing I can worry about. So, kind of opening up that decision to divert- not allowing it to necessarily be framed in terms of risk from the get-go is a useful entry point I think.
Bailer: You know that’s almost impossible, well the question might be short, but the answer might be long. I don’t know the risk of that response. You know I guess one of the things that I have hears, and I’m going to sneak in here anyway and just make Rosemary mad at me because she can’t hit me because we’re doing this all remotely.
Pennington: Right, I would never anyway.
Bailer: No that’s true. So, there’s a theme that you talked about here and that’s the transparency of research that came out. You know the importance of being able to reproduce what was done. And it sounds like you had to do some forensic analysis to figure out what occurred in this previous work. So, what that suggests is that you value this in what you do in your current work. So I was wondering in deference to my friend Rosemary here if you could give us a quick summary of what is the flow that helps ensure in your work that you make it reproducible and that you have accountability baked into it that others could follow.
Shah: Thank you for asking that question and it’s important to lots of people right now, like lots of scientists are worried about reproducibility and I think it’s particularly important to who work in human rights because it’s possible that, well you want to get it right first of all, and also you want it to stand up to scrutiny because you may have results that are not- that doesn’t make people that happy and so you need to be prepared for all kinds of scrutiny and attacks and one way to do that is to be very confident in the work that you’ve done. And in terms of like I found that like having specific ways of working and specific structures for how I manage a project are one of the best ways that I can kind of guarantee those sorts of results and in particular the way we do things at HRDAG is we use this system called principled data processing where we kind of very explicitly set up our pipeline where we work on individual tasks in the pipeline, for instance, importing data from an Excel file or something is one task that may be standardizing code values within the columns is its own task and reproducing a regressions model that we found in a paper is its own task and kind of having- being able to do those things distinctly so you’re not worried about is the entire thing reproducible or correct but like is this thing is this link in the chain strong enough and then moving on to the next piece. That’s helped us a lot and then we kind of manage everything technically with these files so I had to learn a little bit of computer engineering as part of this job but I think starting with that idea of breaking things up into tasks that can be tested individually so that you have a little bit more confidence once the project starts to get bigger and more unwieldy and you’re not worried about the core values when you’re working on the model because solve already kind of tested those in an earlier step.
Bailer: Thank you.
Pennington: Well, that is all the time that we have for this episode. Tarak thank you so much for being here today.
Shah: Thank you.
Pennington: Stats and Stories is a partnership between Miami University’s Departments of Statistics and Media, Journalism and Film, and the American Statistical Association. You can follow us on Twitter, Apple Podcasts, or other places where you can find podcasts. If you’d like to share your thoughts on the program send your emails to statsandstories@miamioh.edu or check us out at statsandstories.net and be sure to listen for future editions of Stats and Stories, where we explore the statistics behind the stories and the stories behind the statistics.
Statistical Summer Transportation Safety | Stats + Short Stories Episode 146 /
Joel B. Greenhouse, Ph.D., is Professor of Statistics at Carnegie Mellon University, and Adjunct Professor of Psychiatry and Epidemiology at the University of Pittsburgh. He is an elected Fellow of the American Statistical Association, the American Association for the Advancement of Science, and an elected Member of the International Statistical Institute.
Read MorePets During Quarantine | Stats + Stories Episode 145 /
Allen McConnell is University Distinguished Professor and Chair of the Department of Psychology at Miami University. His research examines how relationships with family and pets affect health and well-being, how people decode others’ nonverbal displays, and how self-nature representations influence pro-environmental action with this work supported over the years by National Institutes of Health (NICHD and NIMH) and National Science Foundation grants.
Read More