Days of R COVID Lives | Stats + Stories Episode 366 by Stats Stories

Early in the COVID pandemic, as we figured out how to live our lives solely at home, news stories began to be filled with stories about COVID’s spread and reproduction rates. Soon, social media were filled with amateur epidemiologists trying to make sense of those rates and sometimes making a mess of it. A series of articles in Significance examined the discourse around reproduction rates during COVID and it’s the focus of this episode of Stats and Stories with guest Gavin Freeguard.

Read More

Statistical Literacy | Stats + Stories Episode 364 by Stats Stories

Every year, statistics classes are filled with math averse students who white knuckle it to the end of the semester in the hopes of getting a passing grade. And the dream of forgetting about math and statistics for a little while. But what if it didn’t have to be that way? What if instead of white knuckling it, students were actually excited about the subject; or, at the very least, not terrified of it? Two professors has been developing strategies to help students get over their fear of “sadisistics” and that’s the focus of this special two part episode Stats and Stories

Read More

Explaining Science | Stats + Stories Episode 363 by Stats Stories

Ionica Smeets is, chair of the science communication and society research group at Leiden University. She’s also chair of the board of The National Centre of Expertise on Science and Society of The Netherlands. Her research lies in the gap between experts and the public when it comes to science communication, with special interest in the problems that occur when those groups communicate and what scientists can do about those problems. Smeets is the author of a number of journal articles on this topic and engaged in science communication for the public when she worked on a Dutch TV show about math. She’s also the co-creator of a children’s book called Maths and Life.

Episode Description

In a commencement speech in 2016, Atul Gawande told the crowd that science is a, "commitment to a systematic way of thinking, an allegiance to a way of building knowledge and explaining the universe through testing and factual observation." In the last ten years that understanding of science has become muddied for the public. Social media has helped fuel the rise of conspiracy theories built upon so-called alternative facts as people claiming to be experts spout anti-science ideas. Communicating scientific ideas was already difficult, but it’s become even more difficult in this environment. Science communication is the focus of this episode of Stats and Stories, with guest Ionica Smeets.

+Full Transcript

Rosemary Pennington
In a commencement speech in 2016 Atoll, Gawande told the crowd that science is a quote, commitment to a systematic way of thinking, an allegiance to a way of building knowledge and explaining the universe through testing and factual observation. End quote. In the last 10 years, that understanding of science has become muddied for the public social media has helped fuel the rise of conspiracy theories built upon so called alternative facts, as people claiming to be experts spout anti science ideas. Communicating scientific ideas was already difficult, Carl Sagan notwithstanding, but it's become even more difficult in this environment, science communication is the focus of this episode of stats and stories, where we explore the statistics behind the stories and the stories behind the statistics. I'm Rosemary Pennington, stats and stories is a production of the American Statistical Association in partnership with Miami University departments of statistics and media, journalism and film. Joining me, as always, is our regular panelist. John Baylor, emeritus professor of statistics at Miami University. Our guest today is Ionica Smeets, Chair of the science communication and Society Research Group at Leiden University. She's also chair of the board of the National Center of expertise on science and society of the Netherlands. Her research lies in the gap between experts and the public when it comes to science communication, with special interest in the problems that occur when those groups communicate, and what scientists can do about those problems. Smeets is the author of a number of journal articles on this topic, and engaged in science communication for the public when she worked for on a Dutch TV show about math. She's also the CO creator of a children's book called maths and life. Yonica. Thank you so much for joining us today.

Ionica Smeets
Thank you so much for having me.

Rosemary Pennington
How did science communication become sort of your your lane.

Ionica Smeets
It's actually more or less by excellence. I was in high school. I was really doubting about what to study. I was thinking maybe journalism or theater, but I was also really good in mathematics. And then I figured, like, maybe I should study mathematics and do the other things as side jobs. Then somehow, after this study, I decided to do a PhD in mathematics. And I remember when I had a job interview, they asked me, like, what do you want to be after your PhD, a mathematician or journalist? And then I said, journalists, and they still hired me. Yeah, I really felt I'd ruin it, but in the end I said, like, wow, it's interesting. Like, it will be nice to have a journalist in this country who has been trained well in mathematics. So I did my PhD in number theory. Then I worked for a few years as an independent journalist, and then I got so annoyed about what was going wrong in science communication, and I felt a major part to fix were not on the journalist side, but on the academic side. So then I well, I started thinking about going back, and 10 years ago, I came back as a professor at Leiden to do this.

Rosemary Pennington
So I'm just curious, what kinds of things were you seeing as a journalist that made you say, No, I must go and become an academic and try to fix things.

Ionica Smeets
Well, one of the things that was very determining was a study done in the UK about how media exaggerate results from health news. And I think you probably know these kind of studies, but what the study found was that the major exaggerations were done by the press releases from the institutes and not by the journalists. So I thought that was interesting. I also see that, and I'm not sure if this is the same in the US, but in Europe, we usually have to spend a percentage of each research proposal on connecting to society. And then you see people who are really good at research in astronomy or some other field, but then they need to spend 10% of their huge budgets on science communication, and they have no idea. And then they make a terrible website or an app that costs, like so much money, and this terrible, and I thought, we we need to change something.

John Bailer
What a great backstory to how you got to what you're doing. This is, this is really, really neat, you know, and looking at your your your material, that some of the research that you've done, I thought it was really fascinating to see ideas from randomized clinical trials coming about, and some of the work that you are doing at that that all of a sudden, you wanted to evaluate these factors in a systematic and and, you know, kind of very careful way. What are can you? Can you talk through one of your your favorite examples of of where you've used ideas from. Clinical trial design to evaluate, evaluate effective science communication.

Ionica Smeets
Yeah. So first of all, when Rosemary read the introduction, I thought it was very funny that we have all these ideals for how science should be, based on evidence and facts. But then when we do science communication, it's quite often based on intuition, even for a very good scientist, or I really think we should have more evidence based science communication, and one way to do it is setting up randomized control trials. And I think one of my favorite examples is one thing we haven't published yet. We're still writing it up. It was part of a really big Norwegian project about communication during COVID. And I know you also had Joe rochelin on, who talked about it a few episodes back. Do you know the episode number by heart?

John Bailer
It's some number less than 355 that's, I think it's bigger than 300 too, but that's about as good as I can get

Ionica Smeets
here. So we did a really big project where we did a lot of different trials, where we tried making different videos about the pandemic and then measured how they work. So one thing that's important, like, when does science communication work and depends on your goals. So sometimes you want to inform people, sometimes you want to give trust, sometimes you want to spring them into action. So we had different videos and then measured what they did. But my favorite part was another part of this study. Before we were going to send out over 100,000 emails to participants, like, Would you like to participate in our study and fill out the survey, the person organizing it asked, yo, the Norwegian guy who's very famous in Norway, like, could you make a nice video to go with this invitation, and then people can see you the researcher, and then they will be more excited to participate in your study. And then we were discussing this, and I said, does it? And then Joe said, Yeah, I don't know. It sounds plasma attack. This is something you're asked to do quite a lot as a researcher, right? Like make a video. It's a lot of work if you want to make a good one. So we set up a small randomized control trial. Well, small in the sense that it was a very small project, but then it was big because we had 100,000 participants. Oh my gosh, you Yeah, it was very cool. And Joe actually made two videos, so one as a typical scientist, so just sitting in front of his webcam, crappy sound, just telling about how important the research was. And then one where he is, because he's also a TV host, I think one of the favorite variables in this project was also hair spray. So then he used hair spray, good lighting, good mic, and also, like, a very motivational speech, very passionate, like, very narrative. And then there was a third version where people just didn't get a video at all. And guess what? Not getting a video was the most effective group in getting people to fill out the

Rosemary Pennington
survey. Oh, that's really interesting. That

John Bailer
is so cool. I love when your intuition fails, when you kind of think that maybe it'll go one way, and you're completely surprised by the result. Oh, that's awesome. So that's you're writing that up now. It's going to be throwing that into the literature soon.

Ionica Smeets
Yeah, I hope it will be. Yeah, it always takes a while, as you know, but I thought it's also nice to share something on the podcast that isn't in the literature yet.

John Bailer
So so you've one of the studies that that we took a look at, and it's something, it's kind of near and dear to my heart as someone who is who taught build a data visualization class, taught that spent a lot of time talking about kind of good practice in that in that space, and effective communication. And you did. You were involved in collaborating on a project where you were debunking misleading bar graphs. So you know, why did you start there? What was it about bar graphs that kind of piqued your interest? And then how did you go about investigating that?

Ionica Smeets
Yeah, so this is one of my favorite projects of the past few years. So one thing is, I had been giving a lot of public talks when I was freelance journalist, I think next to TV and writing. One thing I did most was giving talks. And then one thing I really like to talk about was Misleading Graphs. And I have a collection of dozens of Misleading Graphs. And wherever you go, you know, people bring you new examples. So this has always been my hobby, and then I'd also been working with a journalist who does a lot on fact checking. And for fact checking, there are all these. It's rather obvious, but when fact checks start quite often, the headline would be the misleading fact, and then the claim would be in the text. They know by testing this, but also you could have thought of it by thinking, but maybe your intuition was wrong. But here, of course, you put the corrected claim in the headline, then you explain who said something that was misleading, and then you explain why it's misleading, and you end with the truth again. Now we're wondering about graphs, like, how does it work? There? When you see Misleading Graphs, should you put just a correction? Should you still show the original? What would be good ways to correct it? So we did a project which was really nice, with the fact checker from journalism studies, and then with someone with a background in rhetorics and a colleague who's at psychology at the statistics department there, and we took bar graph, basically because, well, it's the most used graph, and also the one where there's consensus that used to start at zero with your Y axis. And so what we did, we took different graphs with real data from the World Health Organization, and then we made misleading versions and nonsense. And then I was wondering COVID, you know, like, what do you measure then? Like, when is a graph misleading? This is a very tough thing.

You're pointed like, you Oh,

John Bailer
no, I think you're right. I mean, in some sense it's well, you know, you think back on some of the stuff that you might read from Tufte, like, lie factors. So people that might be looking at absolute differences, but instead are really looking at kind of multiples. They might think about kind of some ratio comparison of something that exaggerates a difference. So I think it's the misleading part for me. Gosh, I didn't realize I was gonna be tested today. Yeah, but you're absolutely right. I mean, that's so I think about relative differences and how it might convey an information that is an exaggeration of an effect that isn't present.

Ionica Smeets
Yeah, that's right, so maybe, yeah, it's good to say an example, one of the classic graphs you see popping up every time is body length of people. So there's this patient organization, and then you see the Dutch which are the tallest in the graph. So we always like this example from Indonesia, and they don't even come to our hips because they haven't started at zero. They have started somewhere halfway. So you see these exaggerations between lengths being blown up. And so most people in the field agree that you shouldn't do this. But then the question is, are you going to ask if people remember the values correctly, because we know that if you have seen a graph that is misleading in this way, even if you remember the value correctly, because usually the numbers are still readable, people still make different decisions if they have seen a misleading graph compared to a fair graph. So we ask people like, how how bad do you think this is pretty much that's what we asked. And then we shown them either Misleading Graphs or a correction. And then after that, we showed them also other Misleading Graphs to see if they learned something from having seen a correction. And then we did the same thing a bit later. So you would love to do this months later, but when you set up an experiment, one week later is already nice. What we saw that what was most effective was showing the misleading graph with a corrected version next to it, and that also helped people to be less mid lat in the future. But you do see that the effect is much smaller after a week. And one thing that I really like is that we also put it into practice. So we have, we are started the graph police.

John Bailer
Is that a web Do you have a website for the graph police? Or how does it?

Ionica Smeets
Yeah, but it's in Dutch. It's always like, like, science communication is so cultural and so national. So we do it in Dutch, with examples from Dutch media and politicians, because we also want to get feedback from the so we always call the ones who made it. And it's very interesting, like how people think about so what we found so far is that more research institutes who do something like that. They are always very interested in correcting it as soon as possible. So we had our National Bureau of Statistics who had a misleading graph, and they were like, Oh, we're really struggling, because we're trying to make our graphs more attractive, but we also still want to be correct. So that was really nice. But we also had, ah, we had an international one from Reuters, the press company, and they, they told us, yeah, that you have too much time.

Rosemary Pennington
Oh, you're listening to stats and stories, and we're talking about science communication with lighting universities, Ionica Smeets

John Bailer
yeah, maybe we should start the the graph police in English over here, and then you can, can lead the stats and stories Dutch that, you know, you get. That would be the next sort of, sort of an international trade agreement in this, I was wondering if we could change gears a little bit. I, you know, I've, this is all pretty serious stuff. I want to get, I want to look at the comics with you. I mean, I mean, we really, I mean, both rosemary and I just just really were tickled by this, yes, this

Rosemary Pennington
comic. Yeah. So I am curious, how did maths and life come about?

Ionica Smeets
Yeah. So it's a children's book for kids who are about 10 in primary school. And it started because there's a children's book otter, Edward van de Vendel. Yeah, and he's a very prolific author, so he has written over 100 books. He's very good at collaborating, and we became friends. Somehow. I always dreamed when I was a kid that I would be a children's book otter, so I tried to hang out in their surroundings. And this was by that at the time I when I was still a journalist, and I'd given a talk for kids who are around 1112 and answered their maths questions. And it was very close to Edwards home, and we had lunch afterwards, and I told him about my talk, and he said, I've always wanted to write a children's book about mathematics because it's such an important topic. He'd also been a primary teacher, and he used to be really bad at math when he was a kid, and he said, that's why I wanted to teach it so well, to not give children the fear I and he said, I've been wanting to make a book like that, but I can do the math part, so we decided to do it together. And I also really like the tech line of your podcast, because it's really a story. And I really strongly believe in story. So it's a story about a class or this one kid who's always fun, but also a bit loud, and he has to stay after school, and he has two teachers, one male, one female, and they sort of forget that he is still there, and they complain when they talk to each other about how dull the math book is and that they just hate it. Then this kid is like, yeah, wait a minute. If you hate it, what are we doing? And then the whole class starts reveling. And I think the one that really clinches the deal is that there's this quiet kid who never said anything. He says, Then stands up and he says, okay, teacher, I understand math is important, but what I've been wondering about, what has all of this to do with my life? And then the teachers make a deal with the kids, so they will do a project called meds for life, and every week, they will do one question that's asked by kids that has something to do with maths, and the entire book is every time a story about a child by Edward, and then a comic strip that is the Mad lesson which I wrote, and then Fleur who de Drew, and then there's a little extra. And what I really like is that there's all these overarching storylines. So it's about children, you know, falling in love, worrying about parents getting divorced, worrying about the climate, about refugees, all coming from very different families, and you understand their lives, and when you see their question, you're like, Yeah, that's a question this kid would have.

John Bailer
Yeah. I found this just to be just a, just a beautiful book. I mean, I really, I really love it. It's my mom was a third grade teacher for all of our lives. So, I mean, I sort of grew up with her always going, you know, trying to communicate and connect and and certainly I, I love how the the the contract is there with the kids signatures, that they're all signed up for this, that they've all enlisted in this, and the graphics are beautiful throughout so go,

Ionica Smeets
yeah, also as so, you know, I actually don't think that mad books are that bad, but we had to make it like, yeah. But I also made sure that they said, at the end of the year, we'll test you, and if you haven't learned enough, we'll get extra mad lessons, and otherwise we'll get a party with mathematics. So there's a bit, but yeah, the rest is mainly a story.

John Bailer
So, so you know that, unfortunately, we could only get one chapter in English. I mean, you know Yannick. I mean, we're, we're, you know, you left us hanging here. It's, you know, but, but we're, we'll wait for the the that to come out in the future. I did like that. The we did get to be introduced to a couple of the questions that the students had had posed, you know, one with, can you have a bathtub in your own tears, and the other a comparison of train versus airplane to go. So can you talk about, kind of how, how you helped deconstruct these questions in part with, in collaboration with the illustrator,

Ionica Smeets
yeah, so also how we got to the question. So we actually got questions from real kids, but we also made sure that all the Mad stuff that kids around 11 need to know is in there somewhere, but the can you have bad in your own tears? Edward asked this when he was a kid to his teacher and they told him, like, that's not mathematics.

John Bailer
Oh no, oh no,

Ionica Smeets
yeah. Then so what we do in the book, we talk about, like, how much do you cry? And then, of course, the boys are like, hi, it's crying. It's for girls. And then, indeed, we do see that after a certain age, which is quite young, like men cry must much less than women. So then we start calculating, like, Okay, if you cry this many times, and then you can calculate, how big is one tear? And then you come to a certain amount. But then like, yeah, how does that compare to a bathtub? So we talk thinking about milk cartons, like, how many is it? And then the end, it turns out that with your own tears as a woman, a. Who cries a lot, you still don't get to 1/4 of your bed top. So then three friends think about joining up and adding all their tears together. And so I think one I really like that you couldn't see because it hasn't been translated. Where we really worked well with the illustrator is where we did a version of Nim, like the classic game where you take away objects, and the one who takes the last one loses. And we really talked a long time about which objects to use. And I knew a British mathematician who did something like this with candies in schools, and then something that tasted terrible as the last one. And then the illustrator said, We should do bus a Brussels sprouts, which is like the Dutch horror vegetable, like it makes you very motivated not to have to take the last one. So it's 20 chocolates and one brussels sprouts, and then you take one, two or three at a time, and and that was and he said, it's also nice to draw, because they're all round, and then it looks very nice. And then the chocolates can be very colorful, and then there's this slightly bigger greeting. And then what was really fun, I did a book tour, and then I played this game with dozens of kids with a real Brussels Oh, no,

John Bailer
did you ever get the brussels sprout?

Ionica Smeets
Well, there's a winning strategy. So of course, I didn't, no mercy for the

John Bailer
children. Note to self, don't, don't play games with Brussels sprouts with januk.

Rosemary Pennington
So that's having worked on this book. Did it sort of, did it open up your eyes like other kinds of creative possibilities for sort of engaging in science communication like I. I loved the this chapter. You know, I was someone who had terrible math classes. Sorry, teachers but, but I think if they'd been presented in this way, where it felt like connected to my life and was illustrated and didn't feel so serious, I think my relationship to math could have been very different. And so I'm just curious, as you've worked on this, have you been thinking like, what are some other creative or interesting ways you might engage in this work?

Ionica Smeets
Yeah, so we really wanted to make a book for children who loved reading but hate mathematics. One thing that I discovered the book works also the other way around. So kids who like mathematics are like, Oh, finally, there's a children's book for me, and then they end up reading I thought that was very surprising. But also I learned a lot from how Edward and flow worked very differently from me. So in science communication, you know, we've been talking about evidence, so we do tests. So pretty much all the lessons that are in the book I've tested with real kids to see, does this explanation work, and also what the kids say in these lessons when they don't understand or when they make a joke. I think 80% of that is coming from real kids I did it with, but then I talked to the otter, and he was like, I'm not gonna test stories with kids. You don't do that. But then what he did do was check with very specific kids for certain expertise. So there's a kid in there from Suriname, which is a former Dutch colony, and she has a favorite vegetable. So he went to a kid he knew from Suriname. It's like, what would you mention there? And there was this vegetable that no ethnically Dutch person knows, but all kids from Suriname know. And the artist, for instance, he really took care, that's the so he drew full page images of all the kids in their rooms that the details are really spot on. So there's a girl who moved to the Netherlands From Afghanistan after the war, and she wears a very nice dress. And she's very interested in time travel, as you can imagine. And then he checked, and then someone said, like, no, no, this dress is something that's warm in Iran, not Afghanistan, so you should have a different dress. And what I see when you so we went to schools and libraries, and you really see the eyes of kids light up when you get the details about their lives, right? So I learned a lot about it from them about and also the little jokes. I mean, there's so many jokes in the images that I still see popping up, so we have one. I think that's one of the mathematically most tough ones about scaling, about why ants are so strong. And then the square cube law. So if you make something twice as big, then the surface grows faster than the contents. And so we have that with ants, who are very strong. And then we have, like the teachers, lift an elephant. But then at some point, I think, after I'd seen the book many times, I noticed that the elephant took one of the glasses of the teacher as a joke, and I hadn't noticed it before, ends walking at the next page, still like ends do.

John Bailer
Yeah, I love the idea of easter eggs being hidden in books. That's. A that that has great appeal to me. And I also, I think that's really cool that you, you tested you, sort of you field tested your stories in this book to see how they'd be received. And then you also worked. You aspired to great precision and representing the the diversity of cultures that are part of this. I think that's that, that all, that all, is a testimony to your care as a science communicator, I think that's a so kudos to you.

Ionica Smeets
Yeah, I would say one thing that we did even more, and then the writer and artist really thought I was crazy. We also did a survey on the kids who came to our book tour, because I didn't want it to be only kids who came from parents who are both at university, who are already very excited about mathematics. And then we surveyed them. So it was a child survey with smileys, but just asking them, like, do you know a lot about science? Do you know scientists in person? Do you talk about science a lot? And then, even though we really tried to places where different groups of kids could go, we found that they had a very high science capital. So then we planned some visits to schools in neighborhoods where we thought, okay, this is where the other kids are.

John Bailer
Wow. And Did, did you see a difference in terms of kind of how their responses to the book?

Ionica Smeets
No, actually not. So I think we see that quite a lot, right? If you organize something, the people who come are not necessarily representative of the entire population, but then if you go to the other places, the response is very similar. And what I really like and what I also like about the way that you talk about maths in a bit different way, that it's not the kids who are usually the best at mathematics who do these things? Well, yeah,

John Bailer
so, so what's the next book?

Ionica Smeets
I don't know, actually, well, actually, I do know, but I only really thought of it yesterday. It's not going to be children's book. It's going to be, have you ever heard of the ulipo? No, so it's a it's a French movement who combines literature and mathematics. I'm sure you'll love it. So they have all these rules for literature. For instance, you write an entire novel without using the letter E, or you tell the same story 99 times in 99 different styles, and I'm gonna make, well, I haven't, should I tell this? I don't know. I'm gonna do something like that with a friend who is a mathematician and the comedian for adults, and we're going to do something in their tradition with very strict rules for ourselves. How to write, yeah, it's yeah. I'm not gonna say more, so you have

John Bailer
to promise to come back after that's done.,

Ionica Smeets
Yes, I will. I would love that

Rosemary Pennington
Stats and Stories is a partnership between the American Statistical Association and Miami University departments of statistics and media, journalism and film. You can follow us on Spotify, Apple podcast or other places where you find podcasts. If you'd like to share your thoughts on the program, Send your email to statsstories@amstat.org or check us out at statsandstories.net and be sure to listen for future editions of stats and stories where we discuss the statistics behind the stories and the stories behind the statistics.

Counting on Official Statistics | Stats+Stories Episode 360 by Stats Stories

Erica Groshen is a senior economics advisor at the Cornell University School of Industrial and Labor Relations and research fellow at the Upjohn Institute for Employment Research. From 2013 to 2017 she served as the 14th commissioner of the US Bureau of Labor Statistics, the principal federal agency responsible for measuring labor market activity, working conditions and inflation. She's an expert on official statistics, authoring an article in 2021 pondering their future.

Episode Description

When people think of public goods, they most likely think of things like parks or schools. But official statistics are also a kind of public good. They help us understand things like housing prices, the costs of goods and the spread of disease. However, this data infrastructure is under threat around the world. The work of official statisticians and the obstacles they face, is the focus of this episode of Stats and Stories with guest Erika Groshen.

+Full Transcript

Rosemary Pennington
When people think of public goods, they most likely think of things like parks or schools. But official statistics are also a kind of public good. They help us understand things like housing prices, the costs of goods and the spread of disease, but this data infrastructure is under threat around the world. The work of officials, I'm sorry that since again, Charles three two, the work of official statistics and the obstacles they face, is the focus of this episode of stats and stories, where we explore the statistics behind the stories and the stories behind the statistics. I'm Rosemary Pennington, stats and stories is a production of the American Statistical Association in partnership with Miami University's departments of statistics and media, journalism and film. Joining me, as always, is regular panelist John Bailer, emeritus professor of statistics at. Miami University. Our guest today is Erica Groshen. She's senior economics advisor at the Cornell University School of Industrial and Labor Relations and research fellow at the Upjohn Institute for Employment Research. From 2013 to 2017 she served as the 14th commissioner of the US Bureau of Labor Statistics, the principal federal agency responsible for measuring labor market activity, working conditions and inflation. She's an expert on official statistics, authoring an article in 2021 pondering their future. Erica, thank you so much for joining us today. Thank you for having me

John Bailer
So Erica, this is a, it's a delight to have you on here. And it's a it was, it was fun to review some of the stuff that you've done and and you know you've been involved in such important work over the years and official statistics, we're at a time when there are, there are concerns, and you even voiced many concerns in that 2021 article, but, but in particular, I know that you've, you've, you think that there's some risks that exist now for official Statistics and the statistical system in general. Can you talk a little bit about what those risks are?

Groshen
Sure? Let me start with a very kind of a long standing risk to the statistical system, which is that, right now, the statistical agencies rely very heavily on surveys, and they developed the whole concept of the scientific survey, and it's really the basis of an awful lot of our official statistics. But survey response rates are falling all over the country for everything, and so at this at this point, the statistical agencies need to move on to taking more advantage of the other kind of data that's just growing really rapidly, which is the digitization of almost everything in our economy, and that requires a whole new set of skills and techniques and kind of products. So it's a great opportunity born out of a challenge, but it requires resources that just haven't been forthcoming,

John Bailer
you know. And I think that when you've talked about this, you know, there, there are other risks beyond so. So one is, is kind of the the old, the old school tools that were used in this. And I, I know you've described this in other contexts as moving from stats 2.0 to STATS 3.0 in terms of the operation of these, these, these organizations. What are some of the others? I, I thought you had a top five list that included some others. If could you go through that with us, please, right?

Groshen
So, so there's inadequate resources. The statistical agencies have been terribly funded of late, and so that makes it very hard to keep doing what they're doing, let alone modernize what they're doing. Then there is. They are worried about loss of expertise as the senior staff leave, and particularly if we get a big exodus of staff. In the current situation with the current administration, there's a big emphasis on cutting staff. So I'm very worried about that. I am also worried about protecting the independence of the statistical agencies from interference. This interference could be because of hubris, or it could be because of ill intent, either financial gain or political manipulation. So I'm very worried about that eroding the trust in the statistical system. I worry also which would erode cross law, the loss about the loss of confidentiality of the data that the statistical agencies collect, and because we are not going to get good information from the from respondents if, if the agencies can't hold to the the protection for confidentiality, it's guaranteed by law and and this. Agencies have a remarkable record of protecting that confidentiality. And then finally, I worry about the users losing access to data. If that's suppressed,

Rosemary Pennington
it's some of this feels sort of related to the work of the system itself. And I wonder, you know, there has been such a distrust of data, and I think we've talked about that a lot on the podcast, in various contexts, in the public, and it's, again, that's a global problem. We've seen it in the States, but we've also seen it in other parts of the world. How much does this loss of confidence or this lack of confidence or mistrust of data impact these risks that you're identifying? Because I would imagine, if the public were more supportive or a better understood, perhaps these risks wouldn't be so potent, but maybe that's me being too optimistic.

Groshen
Yeah, there's there's almost there. There's a feedback loop here, right? That all of these things that I've listed right would undermine trust, and then lack of trust is a contributor to facing these risks that we're talking about, right? So you get this downward spiral, and in a larger sense, that's what I'm worried about, that attacks on the credibility, attacks on the independence, inadequate funding, loss of confidentiality, all of those things further undermine trust, and then the trust leads to basically the loss of these statistical agencies and their products, because you're going to go back to this idea of public statistics being infrastructure public goods, would you want to drive on a bridge that you didn't trust? Right? Not really, right? Then you might as well not have the bridge. And if people don't trust the statistics, and they don't trust that that the information that they give to the statistical agencies will be used only for statistical purposes and not shared otherwise. Then you get, then, then you're going to get, nobody's going to want to fund it, and nobody's going to want to use it. And you just get a downward spiral. So in the larger sense, I'm worried about the downward spiral that comes from undermining all of the foundations of our statistical system,

John Bailer
you know, so, so one of the things that that's for me is a natural follow up is, is, you know, you talked about products. So I think it's important to kind of give a couple of examples of what those products might be. Because I think those products help answer the question, why should people care about this? You know, the the the person on the street may not appreciate this. And I, you know, when I, when I've looked at some of the products that BLS has put out, I mean, the the Occupational Outlook Handbook, it's hard to imagine someone who was about to enter the workforce not wanting to look at that. So can you talk about some of these products and why? Why are these important to have available for for the larger community?

Groshen
Yeah, so in a general sense, we believe that we're going to get the best outcome for society if we allow people as much as possible to make decisions for themselves, and that's true to the extent that they make good decisions, they can only make good decisions if they have good evidence, good data to base it on. And so it's the underpinning of our whole strategy for our society that that these statistics contribute to. So give you examples, right? The Occupational Outlook Handbook. Now we know BLS produces all of these very famous statistics, like the unemployment rate and the jobs growth and inflation and all of that, and I'll get to those. But actually, the most heavily used site, or set of web pages on the BLS website, is the Occupational Outlook Handbook, because job seekers at all levels, and career counselors and employers who are looking to hire people want to know about what occupations are paid, what qualifications go into them, also to and where they're located geographically. They they look there, right? And so, I mean, including the prisoners who are about to be released, right? Everybody has equal access to this information. You don't have to be rich to. Get it. It's right there for everybody. So, so that's an example very broadly used at kind of the other end of the spectrum are the national economic indicators that are relied upon for monetary policy. So the Federal Reserve System has a dual mandate, which is stable prices and maximum sustainable employment? Well, BLS produces inflate inflation rate and the jobs numbers, both unemployment rate and job growth, all right? And that, BL the Fed looks at many other things, but it's two main targets are, are the encapsulated in those numbers, and that sets that, that's from that, from their monetary policy decisions. We get increases or decreases in interest rates we get, we get efforts to fight recessions and and efforts to cool down the economy, to fight inflation. So, so those are important, but the inflation rates are used for so much more than that. So, for example, the federal government adjusts Social Security benefits using the CPI, if the BLS makes a a 1/10 of a 1% mistake on the inflation rate, then the federal government will overpay or underpay beneficiaries by A billion dollars. So, you know, makes a difference. And I could go on

Rosemary Pennington
you mentioned, I think, like the loss of expertise, and I think inadequate resources as potential risks. What does it take to just so people can understand what that might mean. What does it take to produce maybe one of these indicators that that people rely on daily?

Groshen
Oh, gee. Well, one of the interesting things that I learned when I was commissioner was just how different each one of these programs is. So, let's see. Well, let me, let me talk about the current population survey, which is a household survey of 60,000 households every month. All right, at the very beginning of the process, there is the BLS Census Bureau work on choosing a sample. So out of all of the people, all of the households in the country which 60,000 are going to be chosen, right? And that's done because that works from a register that the Census Bureau has of all of the households. And then there's a decision about how to do a stratified random sample, which means choosing a sample so that you don't accidentally leave out any important groups. So every so you want to make sure that, even though it's random, that you have enough homes with veterans in them, or in this in this state, geographically, right? Do you have a stratified random sample that's chosen? Then there are the people who reach out to all of those respondents and ask them to participate. The survey itself is the product of a lot of research. Which questions do we want to ask? How are we how are we going to combine them to make the measures that we want to put out the the the the enumerators ask the question right now, this be the CPS is collected mostly by phone, but the first contacts are in person, or actually by mail, and then or in person, and then that people are asked for their phone number. And so then there's a phone conversation every month with all of the participants. Now those participants are doing a voluntary public service. They don't have to say yes, so they're an important contributor too. They are reckon. They recognize how important this is, and they they contribute their information, which is hugely important to all of the rest of us, then you have the programmers who have programmed all the processing of the information that's collected. And this goes to analysts who put it together into the release and of. Um, you've got the IT specialists who put it up on the website and and at the very tail end, you have the commissioner who looks at the report and says, Oh, that's cool. That looks fine here. Why don't you change this language, you know, this little bit of language on the release. And that was that my only role there. But anyway, so it is a highly orchestrated, almost factory like process that happens every month in in involving people who know exactly what they're doing, and where a lot of thought has gone into every single step. It

Rosemary Pennington
definitely sounds like one, one missing piece, and it will all fall apart.

Groshen
You need expertise of all kinds. You need expert enumerators to talk and recruit the participants. You need good programmers. You need survey, survey, psychiat, psychologist, methodologist, to design the questions and to test them before they're on there, you need the statisticians to come up with to design the how you do the survey and then how you interpret the results. It's Yes, it's quite a process.

John Bailer
Yeah, oh,

Rosemary Pennington
sorry, you're listening to stats and stories, and we're talking about official statistics with Erica grossen.

John Bailer
So I'm a huge fan of the current population survey. We used it extensively on a number of projects when we were looking at some occupational fatal injury kind of rates and patterns across different categories and and ultimately, you know, we were looking for re information that was being generated by statistical agencies, and some of the best employment numbers that we could get that, you know, would that would cover the whole year that would that was generated year upon year, unlike the census, which, well, the decennial census, and it was just this incredible resource, and it was that the quality of the information was a critical part of that story. And so, I mean, I, I worry a lot about about these, these types of of products becoming less reliable and and, and not having that same kind of punch to them. You know, I think it might help people to realize this isn't a new thing to do, official statistics, right? That there is a there is an ancient history to this. Can you put on your historian hat for a minute, Erica and kind of comment on, what are some of the, what were some of the early motivations and inspirations for for official statistics.

Groshen
Well, the earliest ones I know about are, are in the Bible. So I think God commands Moses to count all of all of the Israelites, right? And why? Why do they do that? Well, one part was for military purposes. How many people do we have available to defend ourselves or to attack? Right? And also for taxation purposes, to support government so and then, the thing about statistics is that it's an economics jargon. We call it an experience good that once you have it, you find many, many uses for it, right? So, for instance, I've talked about how the Federal Reserve uses employment and inflation information for monetary policy. Well, the reason the BLS started collecting that was to facilitate bargaining between employers and employees in the time of nascent unions. But having that information allowed us to move off of the gold standard to a better basis for monetary policy. But then you have to have an anchor, and the anchor becomes the dual mandate, which is produced by statistics. So it's almost a precursor to modern monetary policy, too, and you can just see it throughout history that information gathering has been really key to governance of any sort.

John Bailer
Yeah, I, you know, I remember reading a book of a historian who commented that one of the first inspirations for for the use, or motivations for the use of statistics was the transition from into this, this kind of a fixed society versus hunter gatherer society, you know, you're there was this compilation of of kind of resources and products, and then ultimately, what was being transferred and exchanged, and being able to to kind of account for that. I mean this so, you know, it's, you. It for people to think, Gee, why are we doing official statistics? Well, it's not new. I mean, you know, it's, you know, we're doing this because this is part of what happened came along with civilization. We're just doing in a more sophisticated and kind of more targeting in that, in the way that we do it. Yeah.

Groshen
I mean, one of the other things that you see at the very dawn of civilization is standard Weights and Measures, because that that, then you then you can make trades. Then you can, you can trust somebody you don't know very well if you both agree what a pound is, and you can, you know a pound of this or that. And in these days, statistics are even more important, because one of the main things that we trade much more effectively than we were ever, ever able to trade before, is information. And statistics are a way, you know, a very important way where we trade information.

Chart Spark | Stats + Stories Episode 359 by Stats Stories

Being able to create compelling data visualizations is an expectation of a diverse array of fields, from sports to journalism to education. But learning how to create charts that spark joy can be difficult if you're not confident in your abilities. A recent book is designed to help people become more comfortable creating compelling charts, and it's the focus of this episode of Stats and Stories with guest Alli Torban.

Read More

Randomized Response Polling | Stats + Short Stories Episode 341 by Stats Stories

Dr. James Hanley is a professor of biostatistics in the Faculty of Medicine at McGill University. His work has received several awards including the Statistical Society of Canada Award for Impact of Applied and Collaborative Work and the Canadian Society of Epidemiology and Biostatistics: Lifetime Achievement Award.

+Full Transcript

————————

John Bailer
Did you ever think that you could know something about a population based on measurements that you didn't know were correct for any individual, or what it even meant for an individual in the population? That's something that's available through a method called randomized response. Well, not only could you ask questions about health care or health considerations and sensitive health questions, which was the motivation for it when it was developed many decades ago. There's a recent paper in the Journal of Statistics and Data Science Education on investigating sensitive issues in class through randomized response polling. And we're delighted to have James Hanley joining us to talk a little bit about this project. James, welcome back.

James Hanely
Thank you very, very much.

John Bailer
Yeah, so, so randomized response in classroom settings. Can you just give a quick summary of what the randomized response method is for our audience?

James Hanely
Yes, the idea is that I'm facing you. You're answering a survey, and I would like to know whether you've cheated on exams or not? Well, not you, but the class.

John Bailer
Me? Never James. Me, never, no, no.

James Hanely
But what about your taxes? Or, you know, what about something else, or I didn't give a book back to the library or whatever? But for a group, you can work out what proportion of them have or not with a certain plus and minus on it by giving you one of two questions to answer, and they could be the flip of each other as well, or they could be an irrelevant question, like, When was your mother born? Was your mother born in April? That's another version of it. Or did you cheat on your taxes? Those are two versions. So when I hear the answer yes from you, I don't know whether it's today. Are you lying about your mother? Are they cheating? And so the receiver can't interpret it. But when you put them all together, all the answers from all the classes should be a certain aggregate, and the aggregate is a mixture now of the two types of answers. So it's a mix. And if we know the mixing, which is what the probability of answering one way or the other does, we can then deconstruct it and separate out at an average level, what's going on. So that's the basic idea of it. Yeah, it's very clever. It hasn't worked very well, though, in sociology and in surveys. And I remember doing a seminar on this, and they gave you a talk about it when I graduated in 1973. The problem is that the general public doesn't understand it. They think you're cheating some or recording it, or doing some things, or have a camera. There's some way to do it. So I think it only works for a fairly sophisticated public, but the university students should be able to get it, but it's tricky. It's tricky. We were motivated by it because I was so annoyed that McGill wouldn't let us ask the question of our students whether they had been vaccinated against covid or not. So it was a huge political war at our university. I was talking to my co author, and I said, I am really esteemed, and I've actually written up this way of doing it again and adapting it. And Christian Jenna, my first co author, they said, oh my goodness, he had written a popular article for a journal doing it, but without any example, a real example that No, we've got to do this in class for real. Yeah. But the younger teachers at McGill didn't want to do it. They were afraid that the university had come down on them for breaking privacy laws, because in Quebec, your medical record is private and vaccination is part of your medical record. And in your country, you had no problem. Most of the American universities had no problem asking and insisting on vaccination. We were not allowed to, and it caused major trouble. And I sent the article to the provost the other day. I said, Look, you know, out of necessity comes methods, yeah, so we adapted it.

Rosemary Pennington
You stole my question from me because I was about to ask you what spurred this particular–

James Hanely
Don't get me started. We're still upset at the university in Quebec. It's private. Your vaccination status in Ontario and every other province with a different kind of law, or way of human you know, civil liberties, they had it the opposite way you had, yeah, if you weren't vaccinated and they let you into class, that's it. And you American reviewers of our paper had a tough time understanding why? Why? Why can't you ask? So we had a lot of trouble, and then we didn't get it accepted right away. So it was all about covid In the first version, and then we didn't get accepted right away. We need revisions, and we're all so busy, we didn't get to it. Revised the article two years later, and by that time, the whole story was stale. So that's when we had to broaden it so that it could go to cheating or whatever. But the original impetus was and in the article, I say in the little time. My own art class of 10 or 12, we repeated it. The one new twist we have is you can repeat the survey with people and ask them several times, and you can average the answers, and that's what gets you a narrower margin of error. And in fact, one of the reviewers said to her, if I asked you often enough, I should be able to figure out, even for you, whether you were ever not cheating, because the two mixes, rather than will, kind of diverge, you know, whichever you'll see one or the other eventually. But that was going too far.

John Bailer
Well, I'm afraid that's all the time we have for this rather short but very interesting episode of Stats and Short Stories. James, thank you so much for joining us.

James Hanley It was a pleasure.

John Bailer Stats and Stories is a partnership between Miami University's departments of statistics and media, journalism and film and the American Statistical Association. You can follow us on Twitter, Apple podcasts or other places where you can find podcasts. If you'd like to share your thoughts on our program, Send your email to stats and stories@miamioh.edu or check us out@statsandstories.net and be sure to listen for future editions of stats and stories where we discuss the statistics Behind the stories and the stories behind the statistics.

————————

The Nation's Data at Risk | Stats + Stories Episode 339 by Stats Stories

The democratic engine of the United States relies on accurate and reliable data to function. A year-long study of the 13 federal agencies involved in U-S data collection – including the Census Bureau, Bureau of Labor Statistics, and the National Center for Education Statistics – suggests that the nation’s statistics are at risk. The study was produced by the American Statistical Association in partnership with George Mason University and supported by the Sloan Foundation and is the focus of this episode of Stats+Stories

Read More

Getting Into Music Statistics | Stats + Short Stories Episode 330 by Stats Stories

Dr. Kobi Abayomi is the head of science for Gumball Demand Acceleration, a software service company for digital media. Dr. Abayomi was the first and founding Senior Vice President of Data Science at Warner Music Group. He has led data science groups at Barnes and Noble education and Warner media. As a consultant, he has worked with the United Nations Development Programme, the World Bank, the Innocence Project in the New York City Department of Education. He also serves on the Data Science Advisory Council at Seton Hall University where he holds an appointment in the mathematics and computer science department. Kobi, thank you so much for being here today.

Episode Description

We’ve always said that data science is a gateway to other fields on this show. From climate change to medical research, knowledge around numbers can be useful in just about every aspect of life. This is why we’ve brought back Kobi Abayomi to talk about his journey using data to get into the music industry on this episodes of Stats+Short Stories

+Full Transcript

Coming Next Week


Making Ethical Decisions Is Hard | Stats + Stories Episode 321 by Stats Stories

 

Stephanie Shipp is a research professor at the Biocomplexity Institute, University of Virginia. She co-founded and led the Social and Decision Analytics Division in 2013, starting at Virginia Tech and moving to the University of Virginia in 2018. Dr. Shipp’s work spans topics related to using all data to advance policy, the science of data science, community analytics, and innovation. She leads and engages in local, state, and federal projects to assess data quality and the ethical use of new and traditional data sources. She is leading the development of the Curated Data Enterprise (CDE) that aligns with the Census Bureau’s modernization and transformation and their Statistical Products First approach.

Donna LaLonde is the Associate Executive Director of the American Statistical Association (ASA) where she works with talented colleagues to advance the vision and mission of the ASA. Prior to joining the ASA in 2015, she was a faculty member at Washburn University where she enjoyed teaching and learning with colleagues and students; she also served in various administrative positions including interim chair of the Education Department and Associate Vice President for Academic Affairs. At the ASA, she supports activities associated with presidential initiatives, accreditation, education, and professional development. She also is a cohost of the Practical Significance podcast which John and Rosemary appeared on last year.

Episode Description

What fundamental values should data scientists and statisticians bring to their work? What principles should guide the work of data scientists and statisticians? What does right and wrong mean in the context of an analysis? That’s the topic of today's stats and stories episode with guests Stephanie Shipp and Donna LeLonde.

+Full Transcript

John Bailer
What fundamental values should data scientists and statisticians bring to their work? What principles should guide the work of data scientists and statisticians? What does right and wrong mean in the context of an analysis? Today's Stats and Stories episode will be a conversation about ethics and data science. I'm John Bailer. Stats and Stories is a production of Miami University's Department of Statistics and media, journalism and film, as well as the American Statistical Association. Rosemary Pennington is away. Our guests today are Dr. Stephanie Shipp and Donna LaLonde. Shipp is a research professor at the Biocomplexity Institute at the University of Virginia and a member of the American Statistical Association’s Committee on Professional Ethics, Symposium on data science and statistics Committee, and the professional issues and visibility Council. LaLonde is the Associate Executive Director of the American Statistical Association, where she supports activities associated with presidential initiatives, accreditation, education, and professional development. She's also a co-host of the practical significance podcast, Stephanie and Donna, thank you so much for being here today.

Stephanie Shipp
Well, thank you for having us. I'm delighted to be here.

Donna LaLonde
Thanks, John. It's always fun to have a conversation on Stats and Stories.

John Bailer
Oh, boy, I love that. I love getting that love from another podcaster. So thank you so much.

Donna LaLonde
Absolutely.

John Bailer
So your recent Chance article had a title ending in an exclamation mark Making Ethical Decisions is Hard! Well, I'd like to start our conversation with a little bit of unpacking of that title by having you describe an example or two, where data scientists encounter decisions that need to be informed by ethics.

Stephanie Shipp
I might start with that, because I'm the one that's always saying making ethical decisions is hard. And Donna seized on that and said, that will be the title of our article for Chance. And I'm like, Okay, that's great. So I don't have examples, but I want to just start by saying that I'm always on the hunt for tools to incorporate ethical thinking into our work. And I find conversations about ethics, especially with my staff primarily, who are young, a lot of postdocs and assistant research professors and students. But these conversations often go flat. So when we try to have conversations about our projects in the context of ethics, their reaction is well, I'm ethical, do you think I'm not ethical, or we only use publicly available data? So what's the big deal? And so we do a lot of the things like the traditional implicit bias training, and that's helpful. But that's actually more individually focused. It does translate to projects, because implicit bias is one of the areas of looking at ethics and projects. But it's not the entire answer. And so the focus of my work throughout my career has always been on: how do we benefit society? And thanks to Donna, if you notice that I'm participating in three AASA activities, I didn't actually realize that until they were listed, and I'm like, that's why I'm always so busy. Okay. I digress. Is that one of the first activities that I got involved in? Because I asked Donna if I would join the Committee on Professional Ethics. And there was a spot at that time because it's a committee of nine members, although they do have a lot of friends. And I was fortunate to join in the year that they were revising, they have to revise them every five years, the HSA guidelines. And I got to watch with awe, as a subgroup, every two weeks met and talked about how they would broaden those guidelines to incorporate data science and statistical practice across disciplines. I then also, at about the same time, was invited to be part of the Academic Data Science Alliance, and they were coming up with their own guidelines. And the group decided we had enough guidelines as good, the American for computing scientists are good. So why don't we create a tool which happened to me, I was like, This is great. And then I also became very involved in the history focused on societal benefit. So that's not really answering the ethical dilemmas I faced in my career, but sort of why I find making ethical decisions hard and what I've set out to try to do to maybe make it easier for not only me but others as well.

John Bailer
So Donna, you want to jump in with some sort of your sense of some cases or places where data scientists encounter decisions that need to be informed by ethics?

Donna LaLonde
Yeah, actually, we probably could have just titled the article Making Decisions is Hard. And I think that one of the reasons that I was so excited to see the ad work is the Academic Data Science Alliance, because I thought their focus on case studies aligned really nicely with the ethical guidelines for professional practice that the Committee on Professional Ethics had been involved in, in revising. And then obviously, the HSA board approved. And I think the reason that making ethical decisions is hard is, or maybe the two top reasons in my way of thinking, one is, is that there's often a power differential. And it's really hard to navigate that power differential just in your day to day work, right? If you're a junior investigator, and there's a more senior investigator, it can be difficult not to say that all of the conversation is too difficult, but it can be difficult to navigate, a concern or a potential place for disagreement about what's the best practice. And so that's, that's a part of where we're, I think the melding of case studies, and the ethical guidelines are really powerful, because it lets you practice before you're actually confronted with having to deal with a potential issue. I think the other issue that I became more aware of, as I was sitting in on the deliberations of the Committee on Professional Ethics, is there are a lot of stakeholders, and all of those stakeholders bring different perspectives and have different context. And so just navigating that landscape that is really complicated, also takes practice. So not specific examples of ethical decision making being hard, but sort of the bigger picture, which I think the ads tool and the ethical guidelines, help support.

John Bailer
You know, one of the things that I find interesting about discussions of professional ethics, ethics and data analysis, is that it's something that has evolved over time, you know, that you have this history. And you mentioned that in your article as well, going back to the late 1940s. So I was wondering if you could give a little bit of a sort of a lead into what was some of the history of research ethics, that then led to kind of this latest layering of considering data science issues?

Stephanie Shipp
I started with the Belmont Commission, which works only because that is the foundation for the IRB. So the Institutional Review Board processes that at least in the social sciences, we have to file an IRB protocol for every project that we undertake. Amazingly, there's a lot of disciplines that don't have to do that, although at UVA, that's somewhat different. But the Belmont Commission started because of the ethical failures of researchers primarily in the United States that were coming to the surface. Perhaps the most famous is the Tuskegee syphilis study that was conducted for a period of over 40 years in which African American men were subjected to a study of watching the progression of syphilis, even after penicillin had been discovered. And they were not told about the treatment, violating every ethical principle by today's standards. Because of that, I sort of wanted to say, Okay, how far back does this go and it actually, it's not that ethical discussions haven't gone back for a long time. But the first written one that I could find was the Nuremberg Code, which was a result of the atrocities of World War Two. And they had 10 ethical principles, and they were really clearly written but tense a lot to remember. And so 30 years later, when the Belmont commission formed around 1979, I think they realized that and they came up with the three principles of respect for people, which means you must be able to volunteer for the study, and you must be able to withdraw from the study. And that goes to the point that Donna made about the power differentials. You know, if there's somebody in authority telling you, you have to be part of that study, you may feel you have no choice, but that's not true. And then beneficence, understanding the risk of benefits of the study, but you have to weigh that with doing no harm and maximizing the benefits over possible harms. And then justice decides on the risks and benefits, so the research is distributed fairly. I think these are really important. But I also think their language is a bit hard to deal with, sometimes grab your, you know, wrap your arms around, and that's why I would advocate that you do need new tools and new ways of thinking. So that's a little bit of the history, but I think Donna's perspective was also really insightful when we looked at that and how we might be expanding our look at what the Mulla report did as well.

John Bailer
So Donna, did the AASA have sort of guidelines for professional ethics?

Donna LaLonde
It was informed by some of these discussions of this Menlo report. Well, actually the most recent revision was approved prior to the work that Stephanie Wendy Martinez and I have been doing and then it's since been joined by an ethicist colleague of Stephanie's. Although Stephanie mentioned she was on the Committee on Professional Ethics at the time that the working group was working on the revisions, and so certainly acknowledged the existence of the Menlo report. And obviously that's the Belmont, the Belmont Report. I think I'm excited about the opportunity and feel it's really critical that the HSA play a role moving forward. Now we're talking about artificial intelligence technologies, and how those technologies are going to impact science, but also society. I read, and I think I'll get this, this is close to correct, if not a direct quote, I read that Tim Berners Lee has said recently, that in 30 years, we'll all have a personal AI assistant. And it's up to us to work to make sure that it's the kind of assistance that we want. And I think that that's a really important conversation, that that needs to be informed by the American Statistical Association, obviously, that at the ad set group is really important as well, the Association for Computing Machinery, it has to be collaborative, because data science and AI is is collaborative, but we have to be focused on it right. And so I'm kind of excited that we might be able to use this Chance article as a jumping off point to figure out how to move that conversation forward and how to build some consensus. I'll just share one other reading. I don't know if you all, because I've just started reading the book, The Worlds That I See by Fei Fei Li, who, I guess now is being called the godmother of artificial intelligence, right. But anyway, in one of the chapters of the book, she says something like, we're moving into a world where from Ai being in vitro, to AI being in vivo, and I thought that is spot on. And we have to be paying attention.

John Bailer
Well, you're listening to Stats and Stories. Our guests today are Stephanie Shipp, and Donna LaLonde. Ethical uses of data have been legislated in parts of the world, including the European General Data Protection Regulation rules, are similar laws starting to emerge in the United States?

Donna LaLonde
Well, I'm not an expert on the laws, I would say similar conversations are happening. And I know that NIST, the National Institute for Standards, is leading the way by having framework conversations. Obviously, the White House issued a memo on artificial intelligence. So I don't, I'm not aware of laws. But I think certainly we're talking about how AI needs to be legislated.

John Bailer
So my question in part was sort of thinking about what are some of these rules of practice, and in your article, you talk about the importance of ethical decisions throughout the entire process, this whole investigative process. And one aspect of that was kind of the data security and data, you know, kind of how you deal with the data, and sort of this is a matter of trust. And that immediately got me thinking about things like this GDPR rules that were really kind of codifying, and forcing this idea. So that was an example of kind of saying, Look, if you're there certain information, informed uses of your data. So this is tying on some of those issues that you mentioned about informed consent, risks, benefits, and otherwise. Can you talk about some of the other components of an analysis where ethical decisions are coming into play? I mean, you know, Stephanie, you kind of hinted at it kind of with where you were talking about this idea of implicit bias, that might be part of an analysis. Maybe you could sort of expand on that a little bit for us.

Stephanie Shipp
Sure. I'll go back to your GDPR question for a second. I mean, that's primarily on the commercial side, and making sure that companies aren't misusing the data in ways unintended that could cause unintended consequences. Claire McKay Bowen has written a book, Protecting Your Privacy in a Data Driven World, and I highly recommend that and maybe highly recommend her. Maybe she's already been on Stats and Stories. Okay, and so she would be the expert to talk about that specific legislation. But definitely in terms of implicit bias, that's probably one of the hardest parts. Because we all think we're ethical. We all think we're very objective. When we're doing our work primarily as statisticians or economists or any, anyone in a quantitative field. I think it's because of constant conversations and training. And I'll just give a really simple example from work that we were doing a few years ago, where we were bringing data in science to inform or promote or support economic mobility in rural areas. And it was a three state project, we were working with colleagues in Virginia, Iowa, and Oregon. And one of the professors was just in this is what I find with ethics, when you see solutions. They're deceivingly simple and elegant. But, you know, thinking of those ahead of time, it's not always so easy. But anyway, this professor, they were just starting out with working with a project in rural areas. So he used a Mentimeter. It's a tool that collects data or answers from a team or a group, anonymously. And then it provides some analysis. In this case, he did a word cloud. So he just asked them a really simple question, what is life in rural America like? And so these students, you know, they quickly started putting in a lot of just words and keywords in their thoughts. But when the word cloud showed up, they immediately recognized their implicit bias. So there were a lot of positives or neutrals that they talked about rural areas being quiet, hardworking, healthy, small towns, crops, or farming, and also had a lot of negatives for the uneducated, ignorant, isolated, forgotten, non optimal. Well, they now went into their project working in a rural area with their eyes wide open, they now understood Oh, now when I'm looking at the research questions, we're going to be asking for problems that they mutually identified with the community. They could now address, am I being biased? When they're looking at the data sources they were using? You know, are these data sources? Will they have unintended consequences? What about my analysis? What are the results? Will they harm a particular group over another group, you know, maybe at the benefit of another group? So I thought that was just a very simple but excellent way to teach implicit bias specifically in the context of a research project. And that got me excited.

John Bailer
So would you think about the kind of workflow in a data analysis project? There's also analysis that occurs, there's modeling, there's prediction, and you mentioned some of their ethical issues, even in how you train a model, how you build a model to make predictions for other cases? Could you talk a little bit about how that might play out in terms of an ethical concern?

Donna LaLonde
Well, I'll just jump in and say, I think we started to appropriately pay more attention to vulnerable populations, right. And so that if the data set isn't reflective of the population, then the model is going to be flawed. And I think, you know, we all are probably familiar with some of the facial recognition, the concerns about facial recognition, right, and where white faces are more likely to be recognized than people of color. So I think it starts with the data that's being collected, then it also is, I think, we talked about models or being black box. Right? Really, do we really understand what the model is doing? Or do we just sort of trust and I think that many in our community are moving us to be more aware that we need to have interpretable machine learning, right, we need to understand what the model is doing. Because otherwise we're, we're likely to make flawed decisions. And I guess, John, I'll just say one thing, I think I left the tee out of NIST. So I want to make sure I give a shout out to the National Institute of Standards and Technology, right.

John Bailer
Nailed that answer to a tee. Perfect. Yeah. So it's interesting, when I was looking at some of your discussion in that pit in the paper, you talked about the idea that some of these rules like the ADSA ethos, talks about different lenses to think about, you know, work that's being done. Could you give a couple of examples of such lenses and why they're important?

Stephanie Shipp
I'm happy to jump in on that one. So I think in their case studies, they gave good examples, and one of the simplest ones and they say it was the simplest way to get the sort of story or get people thinking about this was using cell phone data to conduct a census and they just focused on the life cycle stage of data discovery. And of course, data discovery led them to say using cell phone data. And so what are the kinds of questions you might ask? It would be like, What was the motivation of the company for sharing their data? And are they sharing a complete set of data? What are the challenges with the data? Are they willing to be forthright about that? Or is it again, a black box? And if it's a black box, maybe you can validate those data using other data sources. But really going through that sort of the whole lifecycle and asking those questions, but how important first, the problem identification is to identifying those data sources that are relevant. And then really questioning? How are the data born? What's the motivation for providing them? What's missing in those data? And what kind of biases might be implicit in the data as well? And then again, always the ultimate question, how might this harm one group at the risk of benefiting another? And so the cellphone data in some countries, that may be all they have, they may not have the resources to conduct a census, but then how might you validate that if you are using it, so it's always weighing the pros and cons of the limitations and the caveats, with the benefits?

John Bailer
You know, it's interesting, as you're talking about some of these applications, in certain places, you can't get other kinds of data, they're not even available. And I know that the existing datasets are becoming more and more important to our friends in the official statistics community. Just because you know others, they're a great supplement to these existing data sources they can find. But I'm curious about this idea of provenance of data, just sort of knowing where it comes from. And that's also something that makes me think a lot about the models that are being used, whether they're the generative AI models, or others that are being used for prediction. A lot of times the good examples that you've given, people have provided a lot of detail about where their data comes from, and their analyses and they share the models on GitHub or some other repo, there's sort of, it's almost this, this kind of let the light shine in. And you can see what I've done. So is this a sea change in terms of how people are being asked to think about when they're doing an analysis, and thinking about when I publish my results, I'm also publishing everything that goes into it.

Donna LaLonde
So, I hope so, John. I think that I think and I hope that we, the members of the American Statistical Association, are leading the way for that, you know, which obviously builds on lots of great work around reproducibility and replicability. But I wanted to come back to your data provenance question, and bring in another group of folks that I think we explicitly want to acknowledge, who need to be a part of the ethical decision making education process, and that is students and teachers. And I think that students and teachers, not just at the undergraduate level, not just as graduate students, but K-12. And I think a lot about this, because I don't know if we are doing a sufficient job of describing the data provenance of the secondary data sources that teachers might bring into their classrooms. And I think that's on us, right. So the work that we are doing at the research level, where we're asking researchers to make their code available, make their data available, I think we need to be thinking about how we're describing these data sets that might be part of an educational experience. So that students are practiced in recognizing the provenance and the ethical concerns that could, could arise. And so wanted to make that explicit. And I think that's the kind of nice compliment that the ethical guidelines and the ads ethos project bring to mind for us, right? Because the lenses are really interesting in terms of a socio-technical view. And then the guidelines are really focused on you as the individual statistical practitioner. And I think you take those two together, and we actually have a powerful way in which to both educate and make sure that in practice researchers and data scientists and statisticians and computer scientists are behaving ethically.

John Bailer
You know, one of the things that I'm really glad that you all have done this type of work, so I sort of, you know, I raised my cup of water to you and saluted because I think that it's so important to have these. When I taught data practicum classes, I would use these as an early assignment for the students to start thinking you know, you're using data from someone you have a responsible ability to treat that with respect. And we used to, we also used to bring people in to do the IRB training with these classes, just to get them thinking about it. But I really love this idea of how do we push a conversation of kind of where does data come from? And what is your responsibility to handle this appropriately? Not just thinking that you can mechanically process it. I'm curious now, just sort of as we're sort of sneaking up on a close here. What do you see as kind of some of the future issues or challenges thinking about ethics and practice of data science and statistics?

Stephanie Shipp
I think we've already discussed some of them with AI. And how do we go forward? Donna, Wendy, the other co-author on the paper, and I have been talking about, does there need to be a Menlo commission, version two or point to 2.0. And Donna brought up education at a young age. I remember when my daughters were learning statistics in first and second grade, I was so excited. But now how do you incorporate like, Okay, where did the data come from? And what are the ethical dimensions of this not you need to of course, make those words a little easier to look at. I also think from this article, what I learned the most was the benefit of looking across disciplines. And I have a colleague who likes to say statistics is the quintessential transdisciplinary science. And in this article, we brought together science and technology studies through these four lenses through the ad sub tool. I learned a lot from that. Again, a lot of the language around ethics, though, I think is very hard to grapple with. And I wish there were a way to simplify that language. But once you understand the concepts, that's also important. We also looked at the computer and the IT world through the Menlo report. But it's also just beginning to look through these from a cross disciplinary perspective, which is what statistics does, but encouraging even more of that, because I think how much we learned just in doing this article and looking across disciplines as well. And then finally, just one last port when I gave my very first talk on statistics. And that was now I think, in hindsight how bold I was not being an expert, and still not an expert in this field. Somebody from industry stood up and said, How do we bring this to industry? And she meant it. But I don't think the industry always feels that way about that. But how do we bring these ethical dimensions of using data, which is part of the premise of the GDPR. Behind that are the teeth of that?

John Bailer
Well, I'm afraid that's all the time we have for this episode of Stats and Stories. Stephanie and Donna, thank you so much for joining us today.

Stephanie Shipp
Thank you.

Donna LaLonde Yep, thank you for having us.

John Bailer
Stats and Stories is a partnership between Miami University’s Departments of Statistics, and Media, Journalism and Film, and the American Statistical Association. You can follow us on Twitter, Apple podcasts, or other places you can find podcasts. If you’d like to share your thoughts on the program send your email to statsandstories@miamioh.edu or check us out at statsandstories.net, and be sure to listen for future editions of Stats and Stories, where we discuss the statistics behind the stories and the stories behind the statistics.


The Art of Writing for Data Science | Stats + Stories Episode 320 by Stats Stories

Sara Stoudt is an applied statistician at Bucknell University with research interests in ecology and the communication of statistics. Follow her on Twitter (@sastoudt) and check out her recent book with Deborah Nolan, Communicating with Data: The Art of Writing for Data Science.

Episode Description

Communicating clearly about data can be difficult but it’s also crucial if you want audiences to understand your work. Whether it’s through writing or speaking telling a compelling story about data can make it less abstract. That’s the focus of this episode of Stats+Stories with guest Sara Stoudt. 

+Full Transcript

Rosemary Pennington
Communicating clearly about data can be difficult. But it's also crucial if you want audiences to understand your work. Whether it's through writing or speaking, telling a compelling story about data can make it less abstract. Communicating with data is the focus of this episode of Stats and Stories, where we explore the statistics behind the stories and the stories behind the statistics. I'm Rosemary Pennington. Stats and Stories is a production of Miami University's departments of Statistics and Media, Journalism and Film as well as the American Statistical Association. Joining me as always is regular panelist, John Bailer, emeritus professor of statistics at Miami University. Our guest today is Sara Stoudt. Stoudt is an Applied Statistician and Assistant Professor of Mathematics at Bucknell University with research interest in ecology, and the communication of statistics. She's the author with Deborah Nolan of the book, Communicating With Data, the Art of Writing for Data Science. Sara, thank you so much for joining us today.

Sara Stoudt
Yeah, no problem. Thanks for having me.

Rosemary Pennington
You have been doing a lot of work around data communication, about writing about data, why did communicating data become this passion of yours?

Sara Stoudt
Yeah, it started sort of serendipitously, in that Deb Nolan, when I was in grad school, was thinking about teaching this class for undergrads, and reached out to me about maybe helping out. And so at that point, I hadn't really thought of myself, maybe as a writer, like, how do I claim that title, but through working on that class, and then writing the book after that we sort of both had to grapple with like, yes, we're statisticians. But we do a lot of communicating. And at some point, we have to claim that sort of title of writer as well. And so I think by starting with that process, and really working through the book, maybe sort of get more into it and think about how I might apply it to my teaching more and how I might apply it to my own work and sort of snowballed from there.

John Bailer
So now, I gotta ask you, are you a better writer now?

Sara Stoudt
Maybe I think I'm a better writer now. I think that I think more about my writing now than maybe I did before. I don't know if that helps or hurts, but I think that I pay more attention to it. And when I'm doing other things, I'm thinking more about reading, like when I'm reading just for fun. Now, I'm like, in my head about that a little bit. And I think that's a good thing.

John Bailer
No, I agree completely, just this, I really love seeing the diversity of different types of ways that you approach ideas in writing, you know, ranging from Significance article reading to writing to another Significance piece, can TV make you a better stats communicator? So I'd like to just sort of explore those maybe in reverse order. Because I think though, the one about the TV shorts, you know, these sort of small episodes as being this model, can you give a reason why you are inspired to connect to what was going on in these small, episodic television shows? And what that might teach us about writing?

Sara Stoudt
Yeah, I think for me, I was writing a lot of talks. And I was thinking about, like, zooming out, like, how am I writing this talk, because you give a talk for lots of different audiences. And the job talk is maybe a little bit more formal. But more recently, I've been doing more talks for broader audiences. And I had to mix up my approach. And I think it's also just the 20 Minute versus like, the 40 Minute versus like, the five minute talk, like all those things, take different structures. And I was trying to think about that. But at the same time, it was like right after deep pandemic, and I had just watched a lot of TV, frankly, and rewatching, a lot of my old favorite shows, but from the beginning, and really paying attention to the pilot of like, how much has to actually get done in the pilot to set things up. And you don't appreciate it until you know what the story actually is to like, how much effort went into that? So I was thinking all about that. And I was thinking, Oh, this is sort of related to how you do the talk, like, you know, the whole storyline. How do you set it up, you only have so many minutes to get the point across. And so part of it was like me justifying watching so much TV. Another part of it was just like, how do you write a good talk? I think it's sort of elusive. And doing it for different audiences, different time points, like having a good sort of structure, I think can go a long way. And that was sort of the motivation for that piece.

Rosemary Pennington
As you've been doing this work on sort of communicating data broadly, have you noticed things that are particular hiccups for you and how have you sort of worked around them?

Sara Stoudt
That's a great question. Yes, I have many hiccups. I think that sometimes, and you might see it today, like I can tend to monologue and in my head, I'm like, yes, this all is gelling. But because I have all of this extra context, I forget that the connections are not necessarily being made by the audience. Right? It's sort of like the stream of consciousness makes sense for me, but not for everyone. And I think that gets back to the planning. And I think a lot of the work I've done recently is the planning of writing. Because you have to take that step back. And I think we can just sort of forget to do that, because we're pressed for time we're reading that talk on a plane, you know, you just don't have that sort of zoom out, like, “what am I saying” moment? So I think that gets, like planning the talk. The reading too, right? Just slowing down, I think, is my biggest hiccup. I'm sort of like, Oh, I gotta do this, I gotta do this. But if I take the time to breathe, and zoom out, like, what am I saying? What is the goal? What's the best way to do this, even starting with pictures? Sometimes I just start the talks with like, all of the plots, or the little doodles that tell the story. I think that has helped me a lot too, because I think I can just sort of jump in too quickly, and then get in the weeds. So I've been trying to pull myself out of that.

John Bailer
Yeah, that's, I recognize that same temptation. And, you know, when I've done this kind of writing, I think a lot about having to pull out, you know, in sort of thinking big picture. And one thing that really struck me when I was reading one of your pieces on or reviewing some of your slides from this storyboarding talk that you did, as part of this idea process, the process of writing is that the punch line has been organized in the form of narrative. And one of the things this podcast has taught me is a lot more about thinking about the narrative that goes along with an analysis or with any kind of work that you're doing in research. So can you talk a little bit about the kind of insights that you've kind of gained about structure from the idea of storyboarding?

Sara Stoudt
Yeah, I think the main thing is that when we do statistical work, we're so proud of all the stuff we did, we're like, I did this, I did this. And II did this fancy thing. But ultimately, that's not what the reader cares about, they want to know what you found. So I think it's this temptation of like, you want to show what you did. But that's only ancillary to what you actually are trying to say, which is the findings. And so trying to this gets back at the like, taking a breath. It's like you have to switch gears from doing the stuff to saying: What is the big picture? And so I think the storyboarding helps you sort of shift gears. It's like, don't talk about what plots you made, what analysis you did, what are the common themes? What did you find? How does this connect towards a bigger picture, and it also makes you sort of Kill Your Darlings, you can't put every plot in a paper or a talk. And so you have all these things, and you have to sort of whittle it down. So I think the storyboarding is both just like it's iterative. It's really tactile, you sort of have to think there's like, no numbers involved. It's like, maybe there's some plots, and you're rearranging, so I think it sort of helps you sort of shift that gear. And I do this all the time. I mean, to write a talk and write a paper now, I'm like a very tactile writer. And so I think doing that activity with students has really helped us all sort of shift gears, right, fewer reports have like, I made a histogram of this, I ran a regression and more just like, this is skewed left, which means this and the regression tells me this, I think helping us to get towards that sort of language is what motivated the storyboard and why I keep sort of using it.

Rosemary Pennington
When I was in my past life, when I was a journalist, I did Science and Medical reporting, sort of towards the end of my time. And I loved it. I absolutely loved it. But it was always a little tough to sometimes get scientists to talk to me, because they were always so scared that their work would be misconstrued. Or they were concerned, I had more than one say, I don't have it, like, you know, five minutes is not enough time to communicate, whatever it is, and I guess what advice would you have for, you know, statisticians or scientists or anyone who has data that they want to communicate around the fear that they're not going to have enough time to tell it clearly? Or if they or if they tell it, they're not going to do their work justice, if they sort of have to sort of make it very simple, or turn it into a narrative?

Sara Stoudt
Yeah, I definitely feel that tension. Statisticians are so annoying, because yeah. Did you say that Sara just reinforced her belief? I think it comes down to the level of detail. It's like, maybe we don't want to talk about that one regression result in five minutes because there's nuance but that regression result means something in context and you want people to know about that thing. So not to sound like a broken record, but I think it comes down to the zoom in and out thing like, I think you can zoom out in five minutes, what's the impact of your work? Let's not try to explain necessarily the details of how you got there in that sort of form of communication, perhaps. But I think it's hard because that's not the part we get the most practice with. We're in the weeds most of the time. And so trying to navigate that is, is challenging. But I feel that tension too. Like, sometimes I'm like, Oh, I don't really want to explain what I'm doing right here until it's perfect. But, but that, how are you going to get your work out there? So it's a balance, but maybe focusing on the impact first. And trying to get away from the things you feel most worried about the precision for?

John Bailer
You know, what you just said, really, really resonates. This idea is what do you spend most of your time doing? What is the focus of your effort, and one of our former colleagues, Richard Campbell, was fond of saying that people are the best writers they'll ever be when they're just getting out of composition after their first year at the University, because they don't write a lot more after that. And in you know, the ideas that you have to you become a better writer by writing, you know, that and having some structure, I think, really kind of catalyzes that in a real, real great way. So I find that this challenge is trying to help get people out of the kind of full in technical focus, and then expanding it to think about okay, now, how do you take from the technical out to the broader community? So what are things that you've been doing to kind of help the students that you work with and the communities that you interact with to do that?

Sara Stoudt
Yeah, I think one thing is just the fact that if you think about the structure of a typical assignment, it's like you do a final project, you turn it in, and then that's it. Right? You don't get the chance to iterate. And that's where you start to get at the like, what is this really saying? And so what we've done at Bucknell is sort of add in the iterative process more in the project. So we actually teach a writing intensive designated intro stat. And that means that that comes along with having to do revision throughout the semester, and they get tons of feedback from peers from the instructor. And they rewrite different parts that come together as a full report. And so they'd just like to spend more time noodling on it for lack of a better word. And so I do think we still need to push more on zooming out, what's the big picture? Because I think we spent a lot of time on the preciseness of how they're talking about the results. What does that significance level mean? That kind of thing, and that kind of class. But I think just building in time to revise before the final deadline goes a long way. I think it's hard because it does take a lot of feedback time in the semester, which is challenging to do quickly, especially at scale. But I think you have to show students that revision is part of the process. And to do that, they have to revise the final project. And so that means pushing back deadlines so that you have time for that. But the context part is important too. And I think I actually want to do more with that, because I think I'm not doing a great job of pulling that out. I feel that tension. It's something like interest at work content seems king, but thinking about how to do that, as they keep progressing as statisticians thinking more about those conclusion sections and trying to work shop those more than the results sections, which is what we ended up having to focus on at least in that class.

Rosemary Pennington
You're listening to Stats and Stories. And today we're talking with Bucknell University Sara Stoudt about communicating with data. Sara, so I'm going to sort of take this question slightly sideways, I asked a former journalism professor, revise Yes. Like we revise those kids revise till the end of the semester. But I wonder what advice you would have for a working journalist who maybe is trying to report on data. You know, most of us are generalists. Many of us are not comfortable with numbers and stats. I mean, that is a stereotype that doesn't linger, because it's sort of, there's some truth in it. So I wonder, you know, we want to communicate this clearly. Because we think it's important to our audiences. What advice given sort of what you've been doing, would you have for journalists when it comes to reporting on stories that involve data, whether it's complicated or not?

Sara Stoudt
I think one thing is like, have a buddy, like, statisticians we're friendly. If you find someone that you'd like, work well with workshopping it that way, because I have collaborators who just help me write better in general. And I think journalists can have that too. And I would love to see more cross pollination with that. Because, yeah, like statisticians want to be able to write for broader audiences better, too. So that seems like a win-win. I think there's some common statistical things that everybody is fussy about, and doing a little reading up on that. I mean, you have a lot of, I'm not saying do more work, because I know, journalists are busy and doing important things. But maybe like, you know, a little community that talks about some of those big ticket items, like, you know, how to report on a p-value, how to report on a confidence interval. It's dry, but that's the stuff that gets you, but maybe doing it in a more community setting. And maybe I started getting a group of statisticians and journalists together to do that. Because as teachers, we face that, too. It's like, how many ways can I explain this? It's still confusing. So it's good for us all to practice, I think. But I don't have any magic solutions. You never know, I guess.

John Bailer
Yeah. So, before we started the podcast, I team-taught a class with a journalist with Richard Campbell, this was quite a while ago, and it was interesting to me to think about the style of writing was so different, that he was talking about then what I was, was thinking about, and that I had done professionally. You know, he was there, there was a sharpness and focus to what he would bring to writing that I found myself being surprised by I mean, not not in a bad way. But just, it was just such a different style. And I was realizing there were these multiple epiphanies for me about kind of this idea of how often in my own writing, I wasn't getting to the point as quickly as I could have. And I wasn't kind of spending so much time on the talking about process, but maybe not getting to the punch line with this the kind of emphasis that it really deserved. So that so I, I mean, I think that the exposure that that for me, as a statistician, and working with with journalism colleagues, has helped me become a much better writer, and a communicator, because of trying to think about, well, gosh, if I tried to do what they're doing, how does that what does that mean in terms of how I produce a product had I written or oral product? So it sounds like you've learned a lot and went through these processes, but also these examples that you found whether it was from pilots from a television show, or from other models? I know that you're sort of thinking that a question will eventually emerge. And I'm wondering, yes, I always wonder, and that's always the problem here. So I would like though, to get to get back to this idea of the pacing and timing of a story as it as a sort of parallels, you know, a big bang theory episode, you know, so I love this idea of the these parent thinking of these parallels between early on introducing kinds of characters and introducing context, and then kind of introducing some conflict and some resolution to conflict and some punch line to the very end. So could you just kind of give us a little kind of a talk through of kind of, of the parallels between kind of, you know, starting out where, where the characters meet? And what does that mean in terms of statistics, and then going through the rest, please?

Sara Stoudt
Yeah, so if you're giving a talk about your own work, you know, everything, but literally, people don't come in with any context. And they have to like to care about it by the end of your talk, because you want them to follow up, because you're not going to tell them everything. Same with the pilot, it's like you have this like 20 minute period, to hook them and have them come back. And you have to set up everything. They don't know anything about the characters or the setting, like, what's the show gonna be about. So you have to cover a lot of ground. And if you think about how you want to present your work, people have to understand why you're doing the work, because that's part of the way of getting them there. Why is your work hard? Like, why is it a big deal that you're doing the work and sort of connecting it to what other people might be doing. So it's sort of like, actually, in the first talk that people hear from you. It doesn't even matter how you're doing the thing. They just need to know why you're doing the thing. And what makes it interesting, or hard that it's worth doing, because they'll follow up and read the paper after that if they care. Same with the pilot, they'll keep watching the show once they're sort of brought in. So I think you have to think about it in terms of stripping it way back to when you started the project, right? Like, why? Why did you pick it as an interesting problem? Who brought you the context, if you're a statistician who's working in applied field, you also have the challenge of talking about the context of the work so like I work in ecology, if I'm presenting at a SAS conference, there's some baseline ecology, I also have to cover in that talk. Right so you can imagine like, okay, maybe ecology terms are like the characters, you got to learn what they're about. You have to learn what the major conflict is. There's an ecological conflict, like why do I care from that point of view? But then there's a statistical conflict of like, why is this a stats problem? That's hard and I started going from there. But I've described all that, then do you have, like, 20 minutes?

John Bailer
No, that helps a lot. I mean, the idea of the the images ties back to kind of some of your storyboarding, I love the idea of thinking about putting all of your plots on, you know, on some display and sort of moving them around, and maybe connecting them in terms of the story that you want to tell annexing out the ones that aren't effective. When I taught visualization or other kinds of data practicum classes, I would often say you'll make more than 10 times more the number of figures you'll ever include in a report that you issue, just because you're trying to find the right way to tell the story. And ultimately, for me, I often found that if I could generate the figure that spoke to me, I could write the text that would describe it to others. So do you find any kind of relevance and importance of kind for you doing the visualizations as part of input and inspiration for the text that you would produce?

Sara Stoudt
Yeah. And actually, I've been doing a lot of things, not even on the computer, but like, sketching what is the graph I want? That will show me what I need? Or what do I expect this to look like if what I'm thinking about is true. And then trying to make that graph. Because I think sometimes when I'm just making graphs on the fly, I'm making ones that are easy for me to code, but are not necessarily the right graphs. And so I've been doing a lot of that sort of thing, like doodling. And I think that has helped and especially if you're thinking about the right conceptual diagram for explaining your work, that is also something that I need to draw first, because I'm not great at the like, shapes on the Google Slides or whatever. But I think that it really helped me solidify the story. Because sometimes if I'm just looking at a bunch of, you know, scatter plots, histograms, it's hard to, like, really see what's going on. So thinking about the maybe less traditional visualization that would like to really consolidate everything, and then trying to think about like, is this a plot I can actually make?

Rosemary Pennington
So you've been doing this work for a while now you've done work around how to present and the storyboarding and you have the book, what sort of next for you when it comes to stats communication, like what do you want to be working on next?

Sara Stoudt
Yeah, I think for me, personally I'm thinking a lot of like creative writing that's related to stats and data. So thinking about either data or statistics concepts as constraints for something like maybe like, Could you write a poem that's constrained in a way that's informed by data? Or could you write short stories or speculative fiction that have these sort of like data II concepts? You think there's all this sci fi now, that has to do with, you know, climate change, or the rise of machine learning and like the ethics of those things? I think that we could also write more stats focused fiction, not just for the sake of writing them, but I could see them being useful teaching tools. I think I'm personally just trying to break this sort of false binary of like, you're a quantitative person, or you're like a creative type. And so I'm really interested in trying to fuse those and like, can we do more artsy things with data? So that's what I'm thinking a lot about. I don't know if that's necessarily going to end up my professional take on communication. But I'm really trying to do that for myself. I think when I started down this road again, I didn't really claim the ownership of the title writer. And now that I feel like I can say that, I feel like the next hurdle is like, Are you a creative writer? Like, can I write more than just nonfiction? So we'll see where that goes.

Rosemary Pennington
Well,thank you so much for being here today, Sara. That's all the time we have for this episode. It's been great talking with you.

Sara Stoudt
Yeah, thanks for having me again.

Rosemary Pennington
Stats and Stories is a partnership between Miami University’s Departments of Statistics, and Media, Journalism and Film, and the American Statistical Association. You can follow us on Twitter, Apple podcasts, or other places you can find podcasts. If you’d like to share your thoughts on the program send your email to statsandstories@miamioh.edu or check us out at statsandstories.net, and be sure to listen for future editions of Stats and Stories, where we discuss the statistics behind the stories and the stories behind the statistics.


Data Visualization Contest Winner | Stats + Stories Episode 300 by Stats Stories

Nicole Mark is a visual learner and communicator who found her passion in the field of data visualization. She started out making maps of imaginary worlds and cataloging her volumes of The Baby-Sitters Club on her family's original Apple Macintosh. Now, she analyzes and visualizes data in Tableau and with code, always on a Mac! She writes about dataviz, life with ADHD, and the modern workplace in her blog, SELECT * FROM data. Nicole co-leads Women in Dataviz and the Healthcare Tableau User Group. She’s working on her master’s in data science at the University of Colorado, Boulder. Check out her Tableau site.

Episode Description

After producing hundreds of episodes we have lots of data lying around. Data we made available to you, asking you to crunch the numbers for a contest that told the story of our podcast. The winner of that contest Nicole Mark joins us today on Stats+Stories.

+Full Transcript

Coming Soon

Stats and Stories is a partnership between Miami University’s Departments of Statistics, and Media, Journalism and Film, and the American Statistical Association. You can follow us on Twitter, Apple podcasts, or other places you can find podcasts. If you’d like to share your thoughts on the program send your email to statsandstories@miamioh.edu or check us out at statsandstories.net, and be sure to listen for future editions of Stats and Stories, where we discuss the statistics behind the stories and the stories behind the statistics.


Viral Statistical Capacity Building | Stats + Stories Episode 293 (Live From the WSC) by Stats Stories

Matthew Shearing is a private sector consultant working globally in partnership with the public, private and not-for-profit sectors on improving official statistics and other data systems, Monitoring and Evaluation, and embedding official statistics standards in wider international development.

David Stern is a Mathematical Scientist and Educator. He is a former lecturer in the School of Mathematics, Statistics and Actuarial Sciences at Maseno University in Kenya and a founding board member of African Maths Initiative (AMI).

Read More

Survey Statistics: Where is it Heading? | Stats + Short Stories Episode 292 (Live From the WSC) by Stats Stories

Natalie Shlomo is Professor of Social Statistics since joining the faculty in September 2012. She was the head of the Department of Social Statistics (2014-2017). Her research interests are in topics related to survey statistics and survey methodology. She is the UK principle investigator for several collaborative grants from the 7th Framework Programme and H2020 of the European Union all involving research in improving survey statistics and dissemination. She was the principle investigator for the ESRC grant on theoretical sample designs for a new UK birth cohort and co-investigator for the NCRM grant focusing on non-response in biosocial research. She was also principle investigator for the Leverhulme Trust International Network Grant on Bayesian Adaptive Survey Designs. She is an elected member of the International Statistical Institute and a fellow of the Royal Statistical Society. She is an elected council member (to 2021) and Vice-President (to 2019) of the International Statistical Institute. She serves on editorial boards of several journals as well as national and international advisory boards.

Read More

Are We Trustworthy? | Stats + Stories Episode 290 by Stats Stories

Communicating facts about science well, is an art. Especially if you are trying to reach an audience outside your area of expertise. A statistician in Norway however, is convinced that how you say something is just as important as what you say when it comes to science communication. That topic is the focus of this episode of Stats+Stories with guest Jo Røislien.

Read More

C.R. Rao: A Statistics Legend by Stats Stories

The International Prize in Statistics is one of the most prestigious prizes in the field. Awarded every two years at the ISI World Statistics Congress, it’s designed to recognize a single statistician or a team of statisticians for a significant body of work. This year’s winner is C.R. Rao, professor emeritus at Pennsylvania State University and Research Professor at the University at Buffalo. Rao’s created and been honored for a number of contributions to the statistical world in his over 75-year career. That’s the focus of this episode of Stats and Stories, with our guests Sreenivas Rao Jammalamadaka and Krishna Kumar.

Read More

Judging Words by the Company They Keep | Stats + Stories Episode 269 by Stats Stories

The close reading of texts is a methodology that's often used in humanities disciplines, as scholars seek to understand what meanings and ideas a text is designed to communicate. While such close readings have historically been done sans technology, the use of computational methods in textual analysis is a growing area of inquiry. It's also the focus of this episode of Stats and Stories with guest Collin Jennings.

Read More

Rewards Points vs. Privacy | Stats + Short Stories Episode 262 by Stats Stories

Everyone can relate to being in a rush and needing to get just one last item from the store. However, upon reaching the checkout line, after being asked the all too often refrain of, “can I get your loyalty card or phone number” you may wonder why is this information so important to a store. The annoyance and potential ramifications of giving up your data so freely is the focus of this episode of Stats+Stories with guest Claire McKay Bowen.

Read More

Talking to a Statistical Knight | Stats + Short Stories Episode 259 by Stats Stories

Sir Bernard Silverman is an eminent British Statistician whose career has spanned academia, central government, and public office. He was President of the Royal Statistical Society in 2010 before stepping down to become Chief Scientific Adviser to the Home Office until 2017. Since 2018, Sir Bernard has been a part-time Professor of Modern Slavery Statistics at the University of Nottingham and also has a portfolio of roles in Government, as chair of the Geospatial Commission, the Technology Advisory Panel to the Investigatory Powers Commissioner, and the Methodological Assurance Panel for the Census.  He was awarded a knighthood in 2018 for public service and services to science. 

Episode Description

Sir Bernard Silverman is an eminent British Statistician whose career has spanned academia, central government, and public office. He will discuss his wide-ranging career in statistics with Professor Denise Lievesley, herself a distinguished British social statistician.

+Full Transcript

Coming Soon

Stats and Stories is a partnership between Miami University’s Departments of Statistics, and Media, Journalism and Film, and the American Statistical Association. You can follow us on Twitter, Apple podcasts, or other places you can find podcasts. If you’d like to share your thoughts on the program send your email to statsandstories@miamioh.edu or check us out at statsandstories.net, and be sure to listen for future editions of Stats and Stories, where we discuss the statistics behind the stories and the stories behind the statistics.


A Shared Passion for Math and Statistics | Stats + Short Stories Episode 257 by Stats Stories

At Stats and Stories, we love to have statisticians and journalists tell stories of their careers and give advice to inspire younger professionals and the next generation about what they can do with the power of data. However, we have yet to have a couple join us to talk about their careers and how statistics in Brazil have progressed over the past 30 years. That's the focus of this episode of Stats and Stories Pedro and Denise Silva. 

Read More