Megan Price is the Executive Director of the Human Rights Data Analysis Group, Price designs strategies and methods for statistical analysis of human rights data for projects in a variety of locations including Guatemala, Colombia, and Syria. Her work in Guatemala includes serving as the lead statistician on a project in which she analyzed documents from the National Police Archive; she has also contributed analyses submitted as evidence in two court cases in Guatemala. Her work in Syria includes serving as the lead statistician and author on three reports, commissioned by the Office of the United Nations High Commissioner of Human Rights (OHCHR), on documented deaths in that country. @StatMegan

Maria Gargiulo is a statistician at the Human Rights Data Analysis Group. She has conducted field research on intimate partner violence in Nicaragua and was a Civic Digital Fellow at the United States Census Bureau. She holds a B.S. in statistics and data science and Spanish literature from Yale University. She is also an avid tea drinker. You can find her on Twitter @thegargiulian.

Episode Description

Almost every day we seem to get new data about the COVID crisis. Whether it’s infection rates, death rates, testing rates, false-negative rates, there’s a lot of information to cull through. Making sense of COVID data is the focus of this episode of Stats and Stories with Megan Price and Maria Gargiulo.

+Timestamps

2:55 What’s the reaction been?

11:10 How important is the information in supporting these decisions.

14:30 What stories are we missing?

18:14 Schools and Covid.

23:30 How to Make Sense of all of the COVID data.

+Full Transcript

Rosemary Pennington: Almost every day we seem to get new data about the COVID crisis. Whether it’s infection rates, death rates, testing rates, false-negative rates, there’s a lot of information to cull through. Making sense of COVID data is the focus of this episode of Stats and Stories where we explore the statistics behind the stories and the stories behind the statistics. I’m Rosemary Pennington. Stats and Stories is a production of Miami University’s Departments of Statistics and Media, Journalism and Film, as well as the American Statistical Association. Joining me are regular panelists John Bailer, Chair of Miami’s Statistics Department and Richard Campbell, former Chair of Media, Journalism and Film. Our guests today are Maria Gargiulo and Megan Price of the Human Rights Data Analysis Group, or HRDAG. Price is the Executive Director where she’s worked on projects related to human rights issues in Guatemala, Colombia, and Syria. Gargiulo is a statistician with HRDAG and was also a data science fellow at the US Census Bureau. They’re here today to talk about some of the group’s work on the COVID crisis. Maria and Megan, thank you so much for being here.

Megan Price: Thank you for having us.

Maria Gargiulo: Yeah, thank you.

Pennington: Megan, I’m going to start with a question for you. So, HRDAG describes itself as quote -a non-profit, non-partisan organization that applies rigorous science to the analysis of human rights violations around the world- end quote. You’ve been publishing a bit about COVID including some pieces in Significance Magazine, how do you situate the work on COVID within the human rights framework that your group, you know, is sitting in?

Price: Yeah, that’s a great question, thank you. Well, everything that we do stems from the Universal Declaration of Human Rights. That’s the starting point for all of our thinking about our work and we’re also just humans. And so, when this crisis started, of course, understanding it and trying to just get some handle on how to even go about making decisions about how to live our lives was at the forefront of all of our minds. And through our work, we’ve had so much experience as what we think of as science communicators, thinking about how to explain really complicated, emotionally-fraught ideas to folks who may not have much or any grounding in statistics or data analysis or science work. And so, we really felt like that was not only a role that we could step into but also something that could help us as a team to focus on something that felt urgent and useful.

John Bailer: So, what’s been some of the reactions that you’ve had to these columns? I mean, you’ve been writing a number of these explanatory pieces to try to convey and communicate some of these issues that are emerging with the pandemic. Do you have any feedback?

Price: We have, and I have to say this is a little bit biased because it was one of my friends, but my favorite reaction has so far been to a column we wrote in a literary magazine called Granto, which is perhaps not a common outlet for statisticians, about essentially what role does stats play in interpreting screening tests and how do you know what your personal screening test means? And one of my particularly math-phobic friends reached out and said I actually understood that, thank you. And that is just the most gratifying feedback we can get.

Richard Campbell: So, can you talk a little bit about the undercounting of COVID infections and what some of the obstacles are in getting good data in your work?

Price: Sure, I think I might start that- I’ll start with your second question which is getting good data in our non-COVID work. Our non-COVID work is focused on human rights violations as our name implies, and specifically on types of violence. And there are a whole variety of reasons why that might not be fully documented. And some of them are pretty benign, some of them are just the violence wasn’t witnessed or the individuals who are doing the best they can to document and describe that violence just didn’t have the resources that week, didn’t have enough people the ground and then other times they’re pretty intentional, a lot of violence is hidden and very intentionally kept from the public eye, and so I would say that a variety of those same things are happening in our attempts to understand COVID-related deaths. There are certainly a lot of incentives to not categorize something as a COVID death or to choose different metrics in terms of positive rates of tests or numbers of tests or who gets tested, and those incentives are not always going to lead to the most complete and the best data collection, unfortunately. But then again there are also just lots of perfectly benign reasons in New York at the peak of the outbreak there, everyone was just overwhelmed, and the idea of writing everything down, you know, certainly came far lower on the list of priorities than helping everyone you could help. And so that’s, I think where statisticians can come in and say look, you don’t have to write everything down, we can use the tools in our toolkit to fill in those gaps.

Bailer: So, you write in one of the essays that your group wrote that science starts with theories and stories about how the world works. Now, does the idea of trying to- you know this is a really hard story to tell- that people, you know, they may have last thought about theory as something they heard about in the scientific method when they were at school and didn’t really think a lot about since then. What are some of the challenges and some of the potential solutions when trying to communicate these more complex stories? Whether they are SAR models and some of the nuance of finding them to an audience that may not think a lot about theories and background?

Price: Maria, can I put you on the spot? Do you want to take that one?

Gargiulo: Yeah, sure. So, when I think about theories personally, the thing I really like to try and figure out is how do I test if I think a theory holds in this situation. And I think in communicating science, giving people things to look for is really helpful so I think a lot about- I think the piece you mentioned the Director of Research, Patrick Ball wrote and he kind of provides a list of like things you might look out for, so for example when we’re testing a theory, a rigorous theory is really careful about the types of assumptions it makes. So, in order to come to our conclusion that we made about the way the world works, what are the things we assumed? And once someone kind of delineates those really clearly it’s a lot easier to say oh I think those assumptions are reasonable. I can kind of hold on to the threat here, that makes sense, or I don’t think that’s true. And if that’s not true you might have a way to start thinking about oh, if that’s not true, what other things might not hold? So, I guess trying to communicate the ideas that let people test the theory for themselves, even if that’s an informal way, I think that’s really important for things like this.

Campbell: So, this morning, speaking of stories, there’s a story on the front page of the Dayton Daily News about area residents could be part of a virus study. And through this podcast and talking to scientists and statisticians I’m just confounded by the fact that we haven’t done more random studies of COVID. And I’m wondering both at the regional level and at the national level; and that Ohio just now is going to do a random study of 1200 randomly selected participants. What’s the problem here? I mean we’ve talked to statisticians who have said this should have been going on much earlier and we’d have a much better idea of who’d infected and who’s not. And I’d like both of you to talk about this.

Gargiulo: I can start. I think for me, and part of this is I don’t actually understand, to the full extent, resource constraints right now, but I think a lot of this is resource constraints. It’s a lot easier, I think, to say oh we have these 20 people in the hospital right now, we can test them, we can talk to them, we can do these things. Rather than okay, you know thinking about what does a representative sample look like and finding that representative sample within the community. Do we actually want it to be fully representative in that normal sense? Do we want to oversample certain groups who want to sample other groups? So, I just think it’s harder. It’s- you know, convenient samples are nice because they’re convenient. Random samples are hard because they need to be really carefully constructed and under constrained resources, it’s not clear to me how feasible that is or how hard or easy it is.

Price: Yeah I’m mostly going to second everything Maria just said. I mean I think much like kind of prioritizing that happened around New York around do we just try to get everyone we can to the hospital? Or do we keep perfect records? I mean one of the things that I think is hardest about this moment in time is that just everything needs massive resources and figuring out how to allocate those and how to balance the really urgent today priorities, while also like recognizing that we need to make some long term- we need to make some decisions with a long term vision that you know our future selves will be grateful for, and I’m certainly grateful that it’s not my job to make those kinds of decisions. And I think also coming from- I have a public health background where, you know, there are lots of situations where you can’t do a randomized control trial for ethical and logistical reasons and I think there’s a certain amount of that at play here, too, and I think that because of the way the United States is set up- you know, something that public health has done for years and years is to identify these natural experiments that happen because different regions make different decisions and take different actions, and so, personally, I think that it’s as important and as valuable to identify those comparisons that are more readily available as it is. I mean I certainly- let’s also do randomized trials and let’s get those organized, but I think that both of those things happening at once is the way to go.

Bailer: So, this part of the conversation makes me think a lot about the value of information. You know, so, in some way what we’re saying is that we’re taking these samples of convenience we’re looking at individuals who are probably symptomatic and that are showing- that are of gravest concern, but they’re telling us about, you know, are people that are symptomatic, are the disease, do they have the disease as opposed to knowing what’s going on in the population? And so I think it’s a hard question, you know what’s- you talk about decisions and what’s the value of the information that you gain from knowing more about what’s going on in the population than knowing about what’s going on in some small symptomatic subset of the population, and I agree completely about the, you know, that resources have to be allocated in a way that – there’s a triage component to this, to solve this problem in a sensible order, but if we’re – how important is it to have the information that’s unbiased and kind of meaningful for supporting these decisions? That’s-

Price: I mean, yes.

[Laughter]

Price: And you know, but again I think that that’s where, you know, as statisticians I mean we should always recognize when our data are incomplete and biased but we also shouldn’t just sort of throw up our hands and say well, then we can’t use that data. There- we should recognize when a particular class of methods is appropriate to either adjust for those things or to account for them in some way. And I think also you know kind of coming back to natural experiments you know we do have a couple of really-I hesitate to use the word interesting in this setting, but really interesting things that have happened, specifically on cruise ships, which is a closed population and where they were able to collect data about every single person and so that again like the population on a cruise ship isn’t going to represent general populations anywhere but it gives us a chance to say okay if we test every single person, what’s the difference that we’re seeing between symptomatic and asymptomatic and I know here in San Francisco they did a very similar thing just at a microlevel they picked like a four-block radius in one of the neighborhoods in San Francisco and said we’re just going to test everybody in this four-block radius. And so, I think there’s also opportunities to do that kind of hyper-localized thing to start to learn more information.

Campbell: You know what that- what did that yield? That four-block study that was interesting?

Price: Oh man, that yield- so this was a UCFF study and in partner with another organization that I’m not going to be able to come up with but what they found was the kind of racial disparity that we’re now seeing at large, especially in the latest New York Times data. So, in this four-block radius this four-block neighborhood; it was in the mission. And I can’t remember now but I want to say like maybe five percent of the Hispanic residents were positive. Not necessarily symptomatic, not necessarily [inaudible] but they gave everyone a diagnostic test and they were positive. They literally could not find a single Caucasian member of that neighborhood that tested positive.

Pennington: Wow. That’s incredible. You’re listening to Stats and Stories and today we are talking with Maria Gargiulo and Megan Price of the Human Rights Data Analysis Group. We see a lot of coverage in news media of infection rates, of death rates, of hospitalizations. Given the work that you have been doing on HRDAG on this issue are there stories in the data that are under-reported that you think people should be paying more attention to?

Price: That’s a great question. Um. Hmm. To be honest I can’t really think of one because the one that has been pressing on my mind the most has been the racial and ethnic disparities and I think that we are starting to see more attention being paid to that so I’m grateful to see that coming to light. You know I think as with anything else that’s really scary, we’re seeing a lot of stories about how bad things can be, but I’m also really hesitant to say hey we should tell more stories about people who are recovered and are fine because we need people to take action to protect their community. So no, actually on balance I kind of think that most of the stories are out there. I don’t know, Maria, what do you think?

Gargiulo: So, a story I would like to hear more about in a non -U.S. context is what the intersection of say COVID at conflict or COVID at displacement is going to be. So, I’m thinking for example COVID arrives at a refugee camp, you know what happens? And that is terrifying because I think the only conclusion that I come to in my head is the results are going to be grim, but what does that like- what happens? Do people leave the camp? Do people stay in the camp and get sick? So that’s a space I’m watching to just see what happens and also how does humanitarian aid react to that? I have no idea. So we don’t- you know so that’s not so relevant in the U.S. context, but you know as we consider COVID as a global pandemic I think that’s something I will be watching and really hoping goes better than I’m expecting it to go.

Pennington: Do you know of any work that’s looking at infection rates along class lines? Because I would imagine that there could be particular breakdowns along with class in some places. And it’s not something that I can remember having seen like you’ve pointed out Megan, I think the reporting on race has just sort of started emerging in a lot of the coverage but I can’t remember seeing much about class. I’ve seen it about the geographic breakdown like rural versus urban, but then this issue of are poor communities being impacted more or less or anything like that, so I just wanted to ask that question.

Gargiulo: Yeah, not that I’m aware of and in fact, earlier in the pandemic, which I mean is such a weird way to describe things because as much as we’re all in this time dilation, you know it honestly hasn’t been that long, but earlier I did see some comparisons of occupation, of risk and infection rate by occupation which is a bit of a proxy for that and I, haven’t seen much follow up on that. so, I think that that is another thing that deserves more attention.

Bailer: And it seems like some of the things related to- some of the exposures related to occupation may also play out in terms of living conditions. So if you’re- the concern I guess, in the U.S. it’s something like 40% of the fatalities are in nursing homes, you know and as you look in other environments it tends to be where people are living in more group housed environments and if you live in a high-density area as well as go out and work it seems like that just kind of explodes it. So that runs a little bit counter to my earlier comment about who we’re studying and how. And in some ways, if we’re looking at the people who are going to be most dramatically impacted then you might want to be targeting what we’re doing. I thought that I saw that there was some recent work that’s starting to come out related to the COVID impact in Central and South America, and I won’t swear to it; I’d have to dig that up too, so I’m not sure.

Price: Yeah, there has been and so I guess that’s sort of the coda, to my comment to- you know, what stories are getting told is highly correlated with what media source you’re consuming and so, yeah. Because we have a lot of projects and partners and collaborations in Central and South America, I have a lot of sources who have information on that part of the world and so yes there is a fair amount of coverage coming about how the infection rates are unfolding there. But yeah I’m not seeing that in perhaps more conventional mainstream US media.

Campbell: One of the things that relates to the sort of class problem that Rosemary brought up is there’s a lot of discussions now should we send our kids back to school and part of is it is that wealthier school districts are in better shape to do this than poorer school districts and I guess my question is if you have children or if you don’t have children, I mean what should we do? What’s the best advice? Or is it all sort of just a regional or local problem?

[Laughter]

Price: So, I have two kids. My daughters are 14 months and 3 and a half years old and they’re at daycare right now, and I kind of am both like really happy about that and really scared about that. and also, my husband is a public-school teacher so schools and kids and what to do is like all we think about right now. And you know, it’s interesting I think that operationally it has to be regional because it’s going to be so contingent upon just what the situation is on the ground but on the other hand, you know a top-down national you know like threshold guidelines; you can only even consider opening up the schools if your case count per capita is X. You know to safely have in-person learning you need Y dollars per student. We’re going to provide these grants that are going to cover you know PPE and sanitation services. I mean that kind of thing can be in a bigger framing, but yeah I mean just to kind of answer the question as a statistician, I have no idea.

Bailer: Well, you’re telling us something because you’re both working from home now. So, there’s clearly a policy decision that you’re making at a very local level about kind of what can we do to prevent potential infection within our community, within our workforce. Maria, did you want to add to that too?

Gargiulo: I mean really just to reiterate what Megan said and I really have spent no time thinking about this but I think like you know one I have no idea like statistically speaking and two though I think like the whole idea of like either all schools opening or no schools opening, that’s not it for me. Like I think these decisions really, they need to be made in the communities because if something goes wrong it’s those same communities that are going to be affected. So, it’s not just about are the kids in school but if the kids are in school and something goes wrong what are the potential repercussions? And I don’t think while we might have really great- it would be great to have some national guidelines to help school districts out at the end of the day the national government isn’t suffering if something bad happens, the community is suffering; they need to make that decision.

Bailer: But those communities that need to make decisions, just getting back to what you’ve been producing, and some of the things that you’ve been writing about are that they need good data; they need good information. And in some ways you know, you- if you’re a- so now, Maria I’m going to make you the superintendent of our local school district.

Gargiulo: Excellent.

Bailer: Congratulations and condolences, by the way, because you’re the one that has to make a decision about how many kids can come back to school. How should they be spaced in their classrooms, how should- you know, all of these things? And by the way you’ve got ten parents on the line waiting to talk to you about why they need their kids back in school. I mean, so how does science help, you know, how does science and the study of some of the data that’s associated with this pandemic- how can that be communicated to help these local decision-makers that you’ve appropriately mentioned to make the calls that they need to make?

Gargiulo: Yeah, so I think that if I were the superintendent in charge of this I’d want to talk to different people. So, I’d want to talk to these parents on the phone, I’d want to talk to my teachers. Do they feel like, you know, part of this is not necessarily about the science, the ground troops it’s also like do you feel safe going to work? How do the kids feel about going to school? I’d love to talk to some of them and figure out you know if you had the opportunity to go back to school would you feel safe doing that? or would you just sit in class being really anxious all the time you know thinking today is the day I’m going to get sick or I’m going to get one of my classmates sick or my teacher? So, I’d want to start with conversations there and then I’d start asking questions like how much money do I actually have for personal protective equipment? do I have backup plans for when and if things go wrong? What do those look like? What are the effects of starting a school year in person and then sending kids home? This is I think a different kind of data collection that isn’t necessarily like you know biological data about the virus. You know like do students have the internet at home, right? These are other types of data collections we need to do. So virus biology and you know everything we know about the spread about the epidemic I think helps us make decisions about okay we can only have you know 15 students in the classroom so maybe 50% full, we’ll call that, but also then there are these other types of data collection that need to happen that really has nothing to do with the spread of the virus and everything to do with you know the upside kind of social dynamics of what’s happening to all kinds of angles, so I really want to get more data sources involved even though it would complicate things.

Bailer: Well, you know, if this stat thing doesn’t work out I think there might be a superintendent gig in your future.

[Laughter]

Gargiulo: My retirement job.

Bailer: That’s a well thought out response.

Pennington: I’m going to swoop in with a final question and steal it from John, you know, people I think are overwhelmed with data related to this you know because it’s coming out every day. Given the work that you’ve been doing what advice would you have for our listeners about you know how to wade through the data and how to make sense of it in their own lives?

Price: You want to go first Maria, or do you want me to?

Gargiulo: No, you go first.

Price: So, you know what I personally have been doing has been to have really strict news and data consumption diet and to really stay focused hyper-locally. And it’s hard because my phone at any moment wants to tell me about these headlines about how there’s a spike in cases in the state of California, but the state of California is really big and in my city, there is an increase in case but it’s not quite as scary and so working really hard to contextualize those big stories with the hyperlocal data and I do think that that’s actually something that most cities and counties that I have looked at have been doing a really good job of being transparent and saying look this is what we know and this is how we know it but that said, I am a statistician and so I find data very comforting. And I think that if that is not the place you’re coming from then even that can still feel really overwhelming because these hyperlocal dashboards do still contain a lot of information and they get updated every day and so you know in that case what I would really recommend is to identify one or two sources who you absolutely trust who are filtering and contextualizing that information for you and that may be a news source that maybe a friend that may be an expert on twitter, it can be hard to vet those sources and to really know that you’re getting really reliable information that way but I think that if you personally don’t have kind of the comfort to deal with that raw data that’s coming at you that would be my recommendation.

Gargiulo: Yeah, I’ll just kind of second everything that Megan just said I, in particular, don’t look at the data every day. call me crazy but I do read a lot of epidemiologists on twitter and you know it’s really nice I get really good synthesis and for me they also sometimes kind of write about kind of the news studies that are coming out and I could sit down and read those studies and you know I might understand bits and pieces of them but for me, it’s nice to have these data contextualized with like what are the advances we’re making, where are we making progress, where are we really struggling right now? And getting that from someone who is an expert not only in that field but there are lots of folks on Twitter being really thoughtful about science communication, that’s where I’ve been doing a lot of my learning and I think that’s just helped me kind of you know to find the signal in the noise and get out at least what I want to understand which is mainly like what does the general trajectory look like? And Megan is right with these hyperlocal news sources. Like that’s really helpful to me especially because I have not been really leaving my house so really the most relevant thing for me is that hyperlocal geography but then also understanding like here’s the trajectory we’re going on in terms of scientific methodsso balancing both like research like with what’s actually happening is what I look for and I just try and read experts on that.

Pennington: Well, Megan and Maria, thank you so much for being here today.

Megan and Maria: Thank you guys so much.

Pennington: That’s all the time we have for this episode of Stats and Stories. Stats and Stories is a partnership between Miami University’s Departments of Statistics and Media, Journalism and Film, and the American Statistical Association. You can follow us on Twitter, Apple Podcasts, or other places where you can find podcasts. If you’d like to share your thoughts on the program send your emails to statsandstories@miamioh.edu or check us out at statsandstories.net and be sure to listen for future editions of Stats and Stories, where we explore the statistics behind the stories and the stories behind the statistics.