Counting on Official Statistics | Stats+Stories Episode 360 by Stats Stories

Erica Groshen is a senior economics advisor at the Cornell University School of Industrial and Labor Relations and research fellow at the Upjohn Institute for Employment Research. From 2013 to 2017 she served as the 14th commissioner of the US Bureau of Labor Statistics, the principal federal agency responsible for measuring labor market activity, working conditions and inflation. She's an expert on official statistics, authoring an article in 2021 pondering their future.

Episode Description

When people think of public goods, they most likely think of things like parks or schools. But official statistics are also a kind of public good. They help us understand things like housing prices, the costs of goods and the spread of disease. However, this data infrastructure is under threat around the world. The work of official statisticians and the obstacles they face, is the focus of this episode of Stats and Stories with guest Erika Groshen.

+Full Transcript

Coming Soon

Chart Spark | Stats + Stories Episode 359 by Stats Stories

Being able to create compelling data visualizations is an expectation of a diverse array of fields, from sports to journalism to education. But learning how to create charts that spark joy can be difficult if you're not confident in your abilities. A recent book is designed to help people become more comfortable creating compelling charts, and it's the focus of this episode of Stats and Stories with guest Alli Torban.

Read More

Randomized Response Polling | Stats + Short Stories Episode 341 by Stats Stories

Dr. James Hanley is a professor of biostatistics in the Faculty of Medicine at McGill University. His work has received several awards including the Statistical Society of Canada Award for Impact of Applied and Collaborative Work and the Canadian Society of Epidemiology and Biostatistics: Lifetime Achievement Award.

+Full Transcript

————————

John Bailer
Did you ever think that you could know something about a population based on measurements that you didn't know were correct for any individual, or what it even meant for an individual in the population? That's something that's available through a method called randomized response. Well, not only could you ask questions about health care or health considerations and sensitive health questions, which was the motivation for it when it was developed many decades ago. There's a recent paper in the Journal of Statistics and Data Science Education on investigating sensitive issues in class through randomized response polling. And we're delighted to have James Hanley joining us to talk a little bit about this project. James, welcome back.

James Hanely
Thank you very, very much.

John Bailer
Yeah, so, so randomized response in classroom settings. Can you just give a quick summary of what the randomized response method is for our audience?

James Hanely
Yes, the idea is that I'm facing you. You're answering a survey, and I would like to know whether you've cheated on exams or not? Well, not you, but the class.

John Bailer
Me? Never James. Me, never, no, no.

James Hanely
But what about your taxes? Or, you know, what about something else, or I didn't give a book back to the library or whatever? But for a group, you can work out what proportion of them have or not with a certain plus and minus on it by giving you one of two questions to answer, and they could be the flip of each other as well, or they could be an irrelevant question, like, When was your mother born? Was your mother born in April? That's another version of it. Or did you cheat on your taxes? Those are two versions. So when I hear the answer yes from you, I don't know whether it's today. Are you lying about your mother? Are they cheating? And so the receiver can't interpret it. But when you put them all together, all the answers from all the classes should be a certain aggregate, and the aggregate is a mixture now of the two types of answers. So it's a mix. And if we know the mixing, which is what the probability of answering one way or the other does, we can then deconstruct it and separate out at an average level, what's going on. So that's the basic idea of it. Yeah, it's very clever. It hasn't worked very well, though, in sociology and in surveys. And I remember doing a seminar on this, and they gave you a talk about it when I graduated in 1973. The problem is that the general public doesn't understand it. They think you're cheating some or recording it, or doing some things, or have a camera. There's some way to do it. So I think it only works for a fairly sophisticated public, but the university students should be able to get it, but it's tricky. It's tricky. We were motivated by it because I was so annoyed that McGill wouldn't let us ask the question of our students whether they had been vaccinated against covid or not. So it was a huge political war at our university. I was talking to my co author, and I said, I am really esteemed, and I've actually written up this way of doing it again and adapting it. And Christian Jenna, my first co author, they said, oh my goodness, he had written a popular article for a journal doing it, but without any example, a real example that No, we've got to do this in class for real. Yeah. But the younger teachers at McGill didn't want to do it. They were afraid that the university had come down on them for breaking privacy laws, because in Quebec, your medical record is private and vaccination is part of your medical record. And in your country, you had no problem. Most of the American universities had no problem asking and insisting on vaccination. We were not allowed to, and it caused major trouble. And I sent the article to the provost the other day. I said, Look, you know, out of necessity comes methods, yeah, so we adapted it.

Rosemary Pennington
You stole my question from me because I was about to ask you what spurred this particular–

James Hanely
Don't get me started. We're still upset at the university in Quebec. It's private. Your vaccination status in Ontario and every other province with a different kind of law, or way of human you know, civil liberties, they had it the opposite way you had, yeah, if you weren't vaccinated and they let you into class, that's it. And you American reviewers of our paper had a tough time understanding why? Why? Why can't you ask? So we had a lot of trouble, and then we didn't get it accepted right away. So it was all about covid In the first version, and then we didn't get accepted right away. We need revisions, and we're all so busy, we didn't get to it. Revised the article two years later, and by that time, the whole story was stale. So that's when we had to broaden it so that it could go to cheating or whatever. But the original impetus was and in the article, I say in the little time. My own art class of 10 or 12, we repeated it. The one new twist we have is you can repeat the survey with people and ask them several times, and you can average the answers, and that's what gets you a narrower margin of error. And in fact, one of the reviewers said to her, if I asked you often enough, I should be able to figure out, even for you, whether you were ever not cheating, because the two mixes, rather than will, kind of diverge, you know, whichever you'll see one or the other eventually. But that was going too far.

John Bailer
Well, I'm afraid that's all the time we have for this rather short but very interesting episode of Stats and Short Stories. James, thank you so much for joining us.

James Hanley It was a pleasure.

John Bailer Stats and Stories is a partnership between Miami University's departments of statistics and media, journalism and film and the American Statistical Association. You can follow us on Twitter, Apple podcasts or other places where you can find podcasts. If you'd like to share your thoughts on our program, Send your email to stats and stories@miamioh.edu or check us out@statsandstories.net and be sure to listen for future editions of stats and stories where we discuss the statistics Behind the stories and the stories behind the statistics.

————————

The Nation's Data at Risk | Stats + Stories Episode 339 by Stats Stories

The democratic engine of the United States relies on accurate and reliable data to function. A year-long study of the 13 federal agencies involved in U-S data collection – including the Census Bureau, Bureau of Labor Statistics, and the National Center for Education Statistics – suggests that the nation’s statistics are at risk. The study was produced by the American Statistical Association in partnership with George Mason University and supported by the Sloan Foundation and is the focus of this episode of Stats+Stories

Read More

Getting Into Music Statistics | Stats + Short Stories Episode 330 by Stats Stories

Dr. Kobi Abayomi is the head of science for Gumball Demand Acceleration, a software service company for digital media. Dr. Abayomi was the first and founding Senior Vice President of Data Science at Warner Music Group. He has led data science groups at Barnes and Noble education and Warner media. As a consultant, he has worked with the United Nations Development Programme, the World Bank, the Innocence Project in the New York City Department of Education. He also serves on the Data Science Advisory Council at Seton Hall University where he holds an appointment in the mathematics and computer science department. Kobi, thank you so much for being here today.

Episode Description

We’ve always said that data science is a gateway to other fields on this show. From climate change to medical research, knowledge around numbers can be useful in just about every aspect of life. This is why we’ve brought back Kobi Abayomi to talk about his journey using data to get into the music industry on this episodes of Stats+Short Stories

+Full Transcript

Coming Next Week


Making Ethical Decisions Is Hard | Stats + Stories Episode 321 by Stats Stories

 

Stephanie Shipp is a research professor at the Biocomplexity Institute, University of Virginia. She co-founded and led the Social and Decision Analytics Division in 2013, starting at Virginia Tech and moving to the University of Virginia in 2018. Dr. Shipp’s work spans topics related to using all data to advance policy, the science of data science, community analytics, and innovation. She leads and engages in local, state, and federal projects to assess data quality and the ethical use of new and traditional data sources. She is leading the development of the Curated Data Enterprise (CDE) that aligns with the Census Bureau’s modernization and transformation and their Statistical Products First approach.

Donna LaLonde is the Associate Executive Director of the American Statistical Association (ASA) where she works with talented colleagues to advance the vision and mission of the ASA. Prior to joining the ASA in 2015, she was a faculty member at Washburn University where she enjoyed teaching and learning with colleagues and students; she also served in various administrative positions including interim chair of the Education Department and Associate Vice President for Academic Affairs. At the ASA, she supports activities associated with presidential initiatives, accreditation, education, and professional development. She also is a cohost of the Practical Significance podcast which John and Rosemary appeared on last year.

Episode Description

What fundamental values should data scientists and statisticians bring to their work? What principles should guide the work of data scientists and statisticians? What does right and wrong mean in the context of an analysis? That’s the topic of today's stats and stories episode with guests Stephanie Shipp and Donna LeLonde.

+Full Transcript

John Bailer
What fundamental values should data scientists and statisticians bring to their work? What principles should guide the work of data scientists and statisticians? What does right and wrong mean in the context of an analysis? Today's Stats and Stories episode will be a conversation about ethics and data science. I'm John Bailer. Stats and Stories is a production of Miami University's Department of Statistics and media, journalism and film, as well as the American Statistical Association. Rosemary Pennington is away. Our guests today are Dr. Stephanie Shipp and Donna LaLonde. Shipp is a research professor at the Biocomplexity Institute at the University of Virginia and a member of the American Statistical Association’s Committee on Professional Ethics, Symposium on data science and statistics Committee, and the professional issues and visibility Council. LaLonde is the Associate Executive Director of the American Statistical Association, where she supports activities associated with presidential initiatives, accreditation, education, and professional development. She's also a co-host of the practical significance podcast, Stephanie and Donna, thank you so much for being here today.

Stephanie Shipp
Well, thank you for having us. I'm delighted to be here.

Donna LaLonde
Thanks, John. It's always fun to have a conversation on Stats and Stories.

John Bailer
Oh, boy, I love that. I love getting that love from another podcaster. So thank you so much.

Donna LaLonde
Absolutely.

John Bailer
So your recent Chance article had a title ending in an exclamation mark Making Ethical Decisions is Hard! Well, I'd like to start our conversation with a little bit of unpacking of that title by having you describe an example or two, where data scientists encounter decisions that need to be informed by ethics.

Stephanie Shipp
I might start with that, because I'm the one that's always saying making ethical decisions is hard. And Donna seized on that and said, that will be the title of our article for Chance. And I'm like, Okay, that's great. So I don't have examples, but I want to just start by saying that I'm always on the hunt for tools to incorporate ethical thinking into our work. And I find conversations about ethics, especially with my staff primarily, who are young, a lot of postdocs and assistant research professors and students. But these conversations often go flat. So when we try to have conversations about our projects in the context of ethics, their reaction is well, I'm ethical, do you think I'm not ethical, or we only use publicly available data? So what's the big deal? And so we do a lot of the things like the traditional implicit bias training, and that's helpful. But that's actually more individually focused. It does translate to projects, because implicit bias is one of the areas of looking at ethics and projects. But it's not the entire answer. And so the focus of my work throughout my career has always been on: how do we benefit society? And thanks to Donna, if you notice that I'm participating in three AASA activities, I didn't actually realize that until they were listed, and I'm like, that's why I'm always so busy. Okay. I digress. Is that one of the first activities that I got involved in? Because I asked Donna if I would join the Committee on Professional Ethics. And there was a spot at that time because it's a committee of nine members, although they do have a lot of friends. And I was fortunate to join in the year that they were revising, they have to revise them every five years, the HSA guidelines. And I got to watch with awe, as a subgroup, every two weeks met and talked about how they would broaden those guidelines to incorporate data science and statistical practice across disciplines. I then also, at about the same time, was invited to be part of the Academic Data Science Alliance, and they were coming up with their own guidelines. And the group decided we had enough guidelines as good, the American for computing scientists are good. So why don't we create a tool which happened to me, I was like, This is great. And then I also became very involved in the history focused on societal benefit. So that's not really answering the ethical dilemmas I faced in my career, but sort of why I find making ethical decisions hard and what I've set out to try to do to maybe make it easier for not only me but others as well.

John Bailer
So Donna, you want to jump in with some sort of your sense of some cases or places where data scientists encounter decisions that need to be informed by ethics?

Donna LaLonde
Yeah, actually, we probably could have just titled the article Making Decisions is Hard. And I think that one of the reasons that I was so excited to see the ad work is the Academic Data Science Alliance, because I thought their focus on case studies aligned really nicely with the ethical guidelines for professional practice that the Committee on Professional Ethics had been involved in, in revising. And then obviously, the HSA board approved. And I think the reason that making ethical decisions is hard is, or maybe the two top reasons in my way of thinking, one is, is that there's often a power differential. And it's really hard to navigate that power differential just in your day to day work, right? If you're a junior investigator, and there's a more senior investigator, it can be difficult not to say that all of the conversation is too difficult, but it can be difficult to navigate, a concern or a potential place for disagreement about what's the best practice. And so that's, that's a part of where we're, I think the melding of case studies, and the ethical guidelines are really powerful, because it lets you practice before you're actually confronted with having to deal with a potential issue. I think the other issue that I became more aware of, as I was sitting in on the deliberations of the Committee on Professional Ethics, is there are a lot of stakeholders, and all of those stakeholders bring different perspectives and have different context. And so just navigating that landscape that is really complicated, also takes practice. So not specific examples of ethical decision making being hard, but sort of the bigger picture, which I think the ads tool and the ethical guidelines, help support.

John Bailer
You know, one of the things that I find interesting about discussions of professional ethics, ethics and data analysis, is that it's something that has evolved over time, you know, that you have this history. And you mentioned that in your article as well, going back to the late 1940s. So I was wondering if you could give a little bit of a sort of a lead into what was some of the history of research ethics, that then led to kind of this latest layering of considering data science issues?

Stephanie Shipp
I started with the Belmont Commission, which works only because that is the foundation for the IRB. So the Institutional Review Board processes that at least in the social sciences, we have to file an IRB protocol for every project that we undertake. Amazingly, there's a lot of disciplines that don't have to do that, although at UVA, that's somewhat different. But the Belmont Commission started because of the ethical failures of researchers primarily in the United States that were coming to the surface. Perhaps the most famous is the Tuskegee syphilis study that was conducted for a period of over 40 years in which African American men were subjected to a study of watching the progression of syphilis, even after penicillin had been discovered. And they were not told about the treatment, violating every ethical principle by today's standards. Because of that, I sort of wanted to say, Okay, how far back does this go and it actually, it's not that ethical discussions haven't gone back for a long time. But the first written one that I could find was the Nuremberg Code, which was a result of the atrocities of World War Two. And they had 10 ethical principles, and they were really clearly written but tense a lot to remember. And so 30 years later, when the Belmont commission formed around 1979, I think they realized that and they came up with the three principles of respect for people, which means you must be able to volunteer for the study, and you must be able to withdraw from the study. And that goes to the point that Donna made about the power differentials. You know, if there's somebody in authority telling you, you have to be part of that study, you may feel you have no choice, but that's not true. And then beneficence, understanding the risk of benefits of the study, but you have to weigh that with doing no harm and maximizing the benefits over possible harms. And then justice decides on the risks and benefits, so the research is distributed fairly. I think these are really important. But I also think their language is a bit hard to deal with, sometimes grab your, you know, wrap your arms around, and that's why I would advocate that you do need new tools and new ways of thinking. So that's a little bit of the history, but I think Donna's perspective was also really insightful when we looked at that and how we might be expanding our look at what the Mulla report did as well.

John Bailer
So Donna, did the AASA have sort of guidelines for professional ethics?

Donna LaLonde
It was informed by some of these discussions of this Menlo report. Well, actually the most recent revision was approved prior to the work that Stephanie Wendy Martinez and I have been doing and then it's since been joined by an ethicist colleague of Stephanie's. Although Stephanie mentioned she was on the Committee on Professional Ethics at the time that the working group was working on the revisions, and so certainly acknowledged the existence of the Menlo report. And obviously that's the Belmont, the Belmont Report. I think I'm excited about the opportunity and feel it's really critical that the HSA play a role moving forward. Now we're talking about artificial intelligence technologies, and how those technologies are going to impact science, but also society. I read, and I think I'll get this, this is close to correct, if not a direct quote, I read that Tim Berners Lee has said recently, that in 30 years, we'll all have a personal AI assistant. And it's up to us to work to make sure that it's the kind of assistance that we want. And I think that that's a really important conversation, that that needs to be informed by the American Statistical Association, obviously, that at the ad set group is really important as well, the Association for Computing Machinery, it has to be collaborative, because data science and AI is is collaborative, but we have to be focused on it right. And so I'm kind of excited that we might be able to use this Chance article as a jumping off point to figure out how to move that conversation forward and how to build some consensus. I'll just share one other reading. I don't know if you all, because I've just started reading the book, The Worlds That I See by Fei Fei Li, who, I guess now is being called the godmother of artificial intelligence, right. But anyway, in one of the chapters of the book, she says something like, we're moving into a world where from Ai being in vitro, to AI being in vivo, and I thought that is spot on. And we have to be paying attention.

John Bailer
Well, you're listening to Stats and Stories. Our guests today are Stephanie Shipp, and Donna LaLonde. Ethical uses of data have been legislated in parts of the world, including the European General Data Protection Regulation rules, are similar laws starting to emerge in the United States?

Donna LaLonde
Well, I'm not an expert on the laws, I would say similar conversations are happening. And I know that NIST, the National Institute for Standards, is leading the way by having framework conversations. Obviously, the White House issued a memo on artificial intelligence. So I don't, I'm not aware of laws. But I think certainly we're talking about how AI needs to be legislated.

John Bailer
So my question in part was sort of thinking about what are some of these rules of practice, and in your article, you talk about the importance of ethical decisions throughout the entire process, this whole investigative process. And one aspect of that was kind of the data security and data, you know, kind of how you deal with the data, and sort of this is a matter of trust. And that immediately got me thinking about things like this GDPR rules that were really kind of codifying, and forcing this idea. So that was an example of kind of saying, Look, if you're there certain information, informed uses of your data. So this is tying on some of those issues that you mentioned about informed consent, risks, benefits, and otherwise. Can you talk about some of the other components of an analysis where ethical decisions are coming into play? I mean, you know, Stephanie, you kind of hinted at it kind of with where you were talking about this idea of implicit bias, that might be part of an analysis. Maybe you could sort of expand on that a little bit for us.

Stephanie Shipp
Sure. I'll go back to your GDPR question for a second. I mean, that's primarily on the commercial side, and making sure that companies aren't misusing the data in ways unintended that could cause unintended consequences. Claire McKay Bowen has written a book, Protecting Your Privacy in a Data Driven World, and I highly recommend that and maybe highly recommend her. Maybe she's already been on Stats and Stories. Okay, and so she would be the expert to talk about that specific legislation. But definitely in terms of implicit bias, that's probably one of the hardest parts. Because we all think we're ethical. We all think we're very objective. When we're doing our work primarily as statisticians or economists or any, anyone in a quantitative field. I think it's because of constant conversations and training. And I'll just give a really simple example from work that we were doing a few years ago, where we were bringing data in science to inform or promote or support economic mobility in rural areas. And it was a three state project, we were working with colleagues in Virginia, Iowa, and Oregon. And one of the professors was just in this is what I find with ethics, when you see solutions. They're deceivingly simple and elegant. But, you know, thinking of those ahead of time, it's not always so easy. But anyway, this professor, they were just starting out with working with a project in rural areas. So he used a Mentimeter. It's a tool that collects data or answers from a team or a group, anonymously. And then it provides some analysis. In this case, he did a word cloud. So he just asked them a really simple question, what is life in rural America like? And so these students, you know, they quickly started putting in a lot of just words and keywords in their thoughts. But when the word cloud showed up, they immediately recognized their implicit bias. So there were a lot of positives or neutrals that they talked about rural areas being quiet, hardworking, healthy, small towns, crops, or farming, and also had a lot of negatives for the uneducated, ignorant, isolated, forgotten, non optimal. Well, they now went into their project working in a rural area with their eyes wide open, they now understood Oh, now when I'm looking at the research questions, we're going to be asking for problems that they mutually identified with the community. They could now address, am I being biased? When they're looking at the data sources they were using? You know, are these data sources? Will they have unintended consequences? What about my analysis? What are the results? Will they harm a particular group over another group, you know, maybe at the benefit of another group? So I thought that was just a very simple but excellent way to teach implicit bias specifically in the context of a research project. And that got me excited.

John Bailer
So would you think about the kind of workflow in a data analysis project? There's also analysis that occurs, there's modeling, there's prediction, and you mentioned some of their ethical issues, even in how you train a model, how you build a model to make predictions for other cases? Could you talk a little bit about how that might play out in terms of an ethical concern?

Donna LaLonde
Well, I'll just jump in and say, I think we started to appropriately pay more attention to vulnerable populations, right. And so that if the data set isn't reflective of the population, then the model is going to be flawed. And I think, you know, we all are probably familiar with some of the facial recognition, the concerns about facial recognition, right, and where white faces are more likely to be recognized than people of color. So I think it starts with the data that's being collected, then it also is, I think, we talked about models or being black box. Right? Really, do we really understand what the model is doing? Or do we just sort of trust and I think that many in our community are moving us to be more aware that we need to have interpretable machine learning, right, we need to understand what the model is doing. Because otherwise we're, we're likely to make flawed decisions. And I guess, John, I'll just say one thing, I think I left the tee out of NIST. So I want to make sure I give a shout out to the National Institute of Standards and Technology, right.

John Bailer
Nailed that answer to a tee. Perfect. Yeah. So it's interesting, when I was looking at some of your discussion in that pit in the paper, you talked about the idea that some of these rules like the ADSA ethos, talks about different lenses to think about, you know, work that's being done. Could you give a couple of examples of such lenses and why they're important?

Stephanie Shipp
I'm happy to jump in on that one. So I think in their case studies, they gave good examples, and one of the simplest ones and they say it was the simplest way to get the sort of story or get people thinking about this was using cell phone data to conduct a census and they just focused on the life cycle stage of data discovery. And of course, data discovery led them to say using cell phone data. And so what are the kinds of questions you might ask? It would be like, What was the motivation of the company for sharing their data? And are they sharing a complete set of data? What are the challenges with the data? Are they willing to be forthright about that? Or is it again, a black box? And if it's a black box, maybe you can validate those data using other data sources. But really going through that sort of the whole lifecycle and asking those questions, but how important first, the problem identification is to identifying those data sources that are relevant. And then really questioning? How are the data born? What's the motivation for providing them? What's missing in those data? And what kind of biases might be implicit in the data as well? And then again, always the ultimate question, how might this harm one group at the risk of benefiting another? And so the cellphone data in some countries, that may be all they have, they may not have the resources to conduct a census, but then how might you validate that if you are using it, so it's always weighing the pros and cons of the limitations and the caveats, with the benefits?

John Bailer
You know, it's interesting, as you're talking about some of these applications, in certain places, you can't get other kinds of data, they're not even available. And I know that the existing datasets are becoming more and more important to our friends in the official statistics community. Just because you know others, they're a great supplement to these existing data sources they can find. But I'm curious about this idea of provenance of data, just sort of knowing where it comes from. And that's also something that makes me think a lot about the models that are being used, whether they're the generative AI models, or others that are being used for prediction. A lot of times the good examples that you've given, people have provided a lot of detail about where their data comes from, and their analyses and they share the models on GitHub or some other repo, there's sort of, it's almost this, this kind of let the light shine in. And you can see what I've done. So is this a sea change in terms of how people are being asked to think about when they're doing an analysis, and thinking about when I publish my results, I'm also publishing everything that goes into it.

Donna LaLonde
So, I hope so, John. I think that I think and I hope that we, the members of the American Statistical Association, are leading the way for that, you know, which obviously builds on lots of great work around reproducibility and replicability. But I wanted to come back to your data provenance question, and bring in another group of folks that I think we explicitly want to acknowledge, who need to be a part of the ethical decision making education process, and that is students and teachers. And I think that students and teachers, not just at the undergraduate level, not just as graduate students, but K-12. And I think a lot about this, because I don't know if we are doing a sufficient job of describing the data provenance of the secondary data sources that teachers might bring into their classrooms. And I think that's on us, right. So the work that we are doing at the research level, where we're asking researchers to make their code available, make their data available, I think we need to be thinking about how we're describing these data sets that might be part of an educational experience. So that students are practiced in recognizing the provenance and the ethical concerns that could, could arise. And so wanted to make that explicit. And I think that's the kind of nice compliment that the ethical guidelines and the ads ethos project bring to mind for us, right? Because the lenses are really interesting in terms of a socio-technical view. And then the guidelines are really focused on you as the individual statistical practitioner. And I think you take those two together, and we actually have a powerful way in which to both educate and make sure that in practice researchers and data scientists and statisticians and computer scientists are behaving ethically.

John Bailer
You know, one of the things that I'm really glad that you all have done this type of work, so I sort of, you know, I raised my cup of water to you and saluted because I think that it's so important to have these. When I taught data practicum classes, I would use these as an early assignment for the students to start thinking you know, you're using data from someone you have a responsible ability to treat that with respect. And we used to, we also used to bring people in to do the IRB training with these classes, just to get them thinking about it. But I really love this idea of how do we push a conversation of kind of where does data come from? And what is your responsibility to handle this appropriately? Not just thinking that you can mechanically process it. I'm curious now, just sort of as we're sort of sneaking up on a close here. What do you see as kind of some of the future issues or challenges thinking about ethics and practice of data science and statistics?

Stephanie Shipp
I think we've already discussed some of them with AI. And how do we go forward? Donna, Wendy, the other co-author on the paper, and I have been talking about, does there need to be a Menlo commission, version two or point to 2.0. And Donna brought up education at a young age. I remember when my daughters were learning statistics in first and second grade, I was so excited. But now how do you incorporate like, Okay, where did the data come from? And what are the ethical dimensions of this not you need to of course, make those words a little easier to look at. I also think from this article, what I learned the most was the benefit of looking across disciplines. And I have a colleague who likes to say statistics is the quintessential transdisciplinary science. And in this article, we brought together science and technology studies through these four lenses through the ad sub tool. I learned a lot from that. Again, a lot of the language around ethics, though, I think is very hard to grapple with. And I wish there were a way to simplify that language. But once you understand the concepts, that's also important. We also looked at the computer and the IT world through the Menlo report. But it's also just beginning to look through these from a cross disciplinary perspective, which is what statistics does, but encouraging even more of that, because I think how much we learned just in doing this article and looking across disciplines as well. And then finally, just one last port when I gave my very first talk on statistics. And that was now I think, in hindsight how bold I was not being an expert, and still not an expert in this field. Somebody from industry stood up and said, How do we bring this to industry? And she meant it. But I don't think the industry always feels that way about that. But how do we bring these ethical dimensions of using data, which is part of the premise of the GDPR. Behind that are the teeth of that?

John Bailer
Well, I'm afraid that's all the time we have for this episode of Stats and Stories. Stephanie and Donna, thank you so much for joining us today.

Stephanie Shipp
Thank you.

Donna LaLonde Yep, thank you for having us.

John Bailer
Stats and Stories is a partnership between Miami University’s Departments of Statistics, and Media, Journalism and Film, and the American Statistical Association. You can follow us on Twitter, Apple podcasts, or other places you can find podcasts. If you’d like to share your thoughts on the program send your email to statsandstories@miamioh.edu or check us out at statsandstories.net, and be sure to listen for future editions of Stats and Stories, where we discuss the statistics behind the stories and the stories behind the statistics.


The Art of Writing for Data Science | Stats + Stories Episode 320 by Stats Stories

Sara Stoudt is an applied statistician at Bucknell University with research interests in ecology and the communication of statistics. Follow her on Twitter (@sastoudt) and check out her recent book with Deborah Nolan, Communicating with Data: The Art of Writing for Data Science.

Episode Description

Communicating clearly about data can be difficult but it’s also crucial if you want audiences to understand your work. Whether it’s through writing or speaking telling a compelling story about data can make it less abstract. That’s the focus of this episode of Stats+Stories with guest Sara Stoudt. 

+Full Transcript

Rosemary Pennington
Communicating clearly about data can be difficult. But it's also crucial if you want audiences to understand your work. Whether it's through writing or speaking, telling a compelling story about data can make it less abstract. Communicating with data is the focus of this episode of Stats and Stories, where we explore the statistics behind the stories and the stories behind the statistics. I'm Rosemary Pennington. Stats and Stories is a production of Miami University's departments of Statistics and Media, Journalism and Film as well as the American Statistical Association. Joining me as always is regular panelist, John Bailer, emeritus professor of statistics at Miami University. Our guest today is Sara Stoudt. Stoudt is an Applied Statistician and Assistant Professor of Mathematics at Bucknell University with research interest in ecology, and the communication of statistics. She's the author with Deborah Nolan of the book, Communicating With Data, the Art of Writing for Data Science. Sara, thank you so much for joining us today.

Sara Stoudt
Yeah, no problem. Thanks for having me.

Rosemary Pennington
You have been doing a lot of work around data communication, about writing about data, why did communicating data become this passion of yours?

Sara Stoudt
Yeah, it started sort of serendipitously, in that Deb Nolan, when I was in grad school, was thinking about teaching this class for undergrads, and reached out to me about maybe helping out. And so at that point, I hadn't really thought of myself, maybe as a writer, like, how do I claim that title, but through working on that class, and then writing the book after that we sort of both had to grapple with like, yes, we're statisticians. But we do a lot of communicating. And at some point, we have to claim that sort of title of writer as well. And so I think by starting with that process, and really working through the book, maybe sort of get more into it and think about how I might apply it to my teaching more and how I might apply it to my own work and sort of snowballed from there.

John Bailer
So now, I gotta ask you, are you a better writer now?

Sara Stoudt
Maybe I think I'm a better writer now. I think that I think more about my writing now than maybe I did before. I don't know if that helps or hurts, but I think that I pay more attention to it. And when I'm doing other things, I'm thinking more about reading, like when I'm reading just for fun. Now, I'm like, in my head about that a little bit. And I think that's a good thing.

John Bailer
No, I agree completely, just this, I really love seeing the diversity of different types of ways that you approach ideas in writing, you know, ranging from Significance article reading to writing to another Significance piece, can TV make you a better stats communicator? So I'd like to just sort of explore those maybe in reverse order. Because I think though, the one about the TV shorts, you know, these sort of small episodes as being this model, can you give a reason why you are inspired to connect to what was going on in these small, episodic television shows? And what that might teach us about writing?

Sara Stoudt
Yeah, I think for me, I was writing a lot of talks. And I was thinking about, like, zooming out, like, how am I writing this talk, because you give a talk for lots of different audiences. And the job talk is maybe a little bit more formal. But more recently, I've been doing more talks for broader audiences. And I had to mix up my approach. And I think it's also just the 20 Minute versus like, the 40 Minute versus like, the five minute talk, like all those things, take different structures. And I was trying to think about that. But at the same time, it was like right after deep pandemic, and I had just watched a lot of TV, frankly, and rewatching, a lot of my old favorite shows, but from the beginning, and really paying attention to the pilot of like, how much has to actually get done in the pilot to set things up. And you don't appreciate it until you know what the story actually is to like, how much effort went into that? So I was thinking all about that. And I was thinking, Oh, this is sort of related to how you do the talk, like, you know, the whole storyline. How do you set it up, you only have so many minutes to get the point across. And so part of it was like me justifying watching so much TV. Another part of it was just like, how do you write a good talk? I think it's sort of elusive. And doing it for different audiences, different time points, like having a good sort of structure, I think can go a long way. And that was sort of the motivation for that piece.

Rosemary Pennington
As you've been doing this work on sort of communicating data broadly, have you noticed things that are particular hiccups for you and how have you sort of worked around them?

Sara Stoudt
That's a great question. Yes, I have many hiccups. I think that sometimes, and you might see it today, like I can tend to monologue and in my head, I'm like, yes, this all is gelling. But because I have all of this extra context, I forget that the connections are not necessarily being made by the audience. Right? It's sort of like the stream of consciousness makes sense for me, but not for everyone. And I think that gets back to the planning. And I think a lot of the work I've done recently is the planning of writing. Because you have to take that step back. And I think we can just sort of forget to do that, because we're pressed for time we're reading that talk on a plane, you know, you just don't have that sort of zoom out, like, “what am I saying” moment? So I think that gets, like planning the talk. The reading too, right? Just slowing down, I think, is my biggest hiccup. I'm sort of like, Oh, I gotta do this, I gotta do this. But if I take the time to breathe, and zoom out, like, what am I saying? What is the goal? What's the best way to do this, even starting with pictures? Sometimes I just start the talks with like, all of the plots, or the little doodles that tell the story. I think that has helped me a lot too, because I think I can just sort of jump in too quickly, and then get in the weeds. So I've been trying to pull myself out of that.

John Bailer
Yeah, that's, I recognize that same temptation. And, you know, when I've done this kind of writing, I think a lot about having to pull out, you know, in sort of thinking big picture. And one thing that really struck me when I was reading one of your pieces on or reviewing some of your slides from this storyboarding talk that you did, as part of this idea process, the process of writing is that the punch line has been organized in the form of narrative. And one of the things this podcast has taught me is a lot more about thinking about the narrative that goes along with an analysis or with any kind of work that you're doing in research. So can you talk a little bit about the kind of insights that you've kind of gained about structure from the idea of storyboarding?

Sara Stoudt
Yeah, I think the main thing is that when we do statistical work, we're so proud of all the stuff we did, we're like, I did this, I did this. And II did this fancy thing. But ultimately, that's not what the reader cares about, they want to know what you found. So I think it's this temptation of like, you want to show what you did. But that's only ancillary to what you actually are trying to say, which is the findings. And so trying to this gets back at the like, taking a breath. It's like you have to switch gears from doing the stuff to saying: What is the big picture? And so I think the storyboarding helps you sort of shift gears. It's like, don't talk about what plots you made, what analysis you did, what are the common themes? What did you find? How does this connect towards a bigger picture, and it also makes you sort of Kill Your Darlings, you can't put every plot in a paper or a talk. And so you have all these things, and you have to sort of whittle it down. So I think the storyboarding is both just like it's iterative. It's really tactile, you sort of have to think there's like, no numbers involved. It's like, maybe there's some plots, and you're rearranging, so I think it sort of helps you sort of shift that gear. And I do this all the time. I mean, to write a talk and write a paper now, I'm like a very tactile writer. And so I think doing that activity with students has really helped us all sort of shift gears, right, fewer reports have like, I made a histogram of this, I ran a regression and more just like, this is skewed left, which means this and the regression tells me this, I think helping us to get towards that sort of language is what motivated the storyboard and why I keep sort of using it.

Rosemary Pennington
When I was in my past life, when I was a journalist, I did Science and Medical reporting, sort of towards the end of my time. And I loved it. I absolutely loved it. But it was always a little tough to sometimes get scientists to talk to me, because they were always so scared that their work would be misconstrued. Or they were concerned, I had more than one say, I don't have it, like, you know, five minutes is not enough time to communicate, whatever it is, and I guess what advice would you have for, you know, statisticians or scientists or anyone who has data that they want to communicate around the fear that they're not going to have enough time to tell it clearly? Or if they or if they tell it, they're not going to do their work justice, if they sort of have to sort of make it very simple, or turn it into a narrative?

Sara Stoudt
Yeah, I definitely feel that tension. Statisticians are so annoying, because yeah. Did you say that Sara just reinforced her belief? I think it comes down to the level of detail. It's like, maybe we don't want to talk about that one regression result in five minutes because there's nuance but that regression result means something in context and you want people to know about that thing. So not to sound like a broken record, but I think it comes down to the zoom in and out thing like, I think you can zoom out in five minutes, what's the impact of your work? Let's not try to explain necessarily the details of how you got there in that sort of form of communication, perhaps. But I think it's hard because that's not the part we get the most practice with. We're in the weeds most of the time. And so trying to navigate that is, is challenging. But I feel that tension too. Like, sometimes I'm like, Oh, I don't really want to explain what I'm doing right here until it's perfect. But, but that, how are you going to get your work out there? So it's a balance, but maybe focusing on the impact first. And trying to get away from the things you feel most worried about the precision for?

John Bailer
You know, what you just said, really, really resonates. This idea is what do you spend most of your time doing? What is the focus of your effort, and one of our former colleagues, Richard Campbell, was fond of saying that people are the best writers they'll ever be when they're just getting out of composition after their first year at the University, because they don't write a lot more after that. And in you know, the ideas that you have to you become a better writer by writing, you know, that and having some structure, I think, really kind of catalyzes that in a real, real great way. So I find that this challenge is trying to help get people out of the kind of full in technical focus, and then expanding it to think about okay, now, how do you take from the technical out to the broader community? So what are things that you've been doing to kind of help the students that you work with and the communities that you interact with to do that?

Sara Stoudt
Yeah, I think one thing is just the fact that if you think about the structure of a typical assignment, it's like you do a final project, you turn it in, and then that's it. Right? You don't get the chance to iterate. And that's where you start to get at the like, what is this really saying? And so what we've done at Bucknell is sort of add in the iterative process more in the project. So we actually teach a writing intensive designated intro stat. And that means that that comes along with having to do revision throughout the semester, and they get tons of feedback from peers from the instructor. And they rewrite different parts that come together as a full report. And so they'd just like to spend more time noodling on it for lack of a better word. And so I do think we still need to push more on zooming out, what's the big picture? Because I think we spent a lot of time on the preciseness of how they're talking about the results. What does that significance level mean? That kind of thing, and that kind of class. But I think just building in time to revise before the final deadline goes a long way. I think it's hard because it does take a lot of feedback time in the semester, which is challenging to do quickly, especially at scale. But I think you have to show students that revision is part of the process. And to do that, they have to revise the final project. And so that means pushing back deadlines so that you have time for that. But the context part is important too. And I think I actually want to do more with that, because I think I'm not doing a great job of pulling that out. I feel that tension. It's something like interest at work content seems king, but thinking about how to do that, as they keep progressing as statisticians thinking more about those conclusion sections and trying to work shop those more than the results sections, which is what we ended up having to focus on at least in that class.

Rosemary Pennington
You're listening to Stats and Stories. And today we're talking with Bucknell University Sara Stoudt about communicating with data. Sara, so I'm going to sort of take this question slightly sideways, I asked a former journalism professor, revise Yes. Like we revise those kids revise till the end of the semester. But I wonder what advice you would have for a working journalist who maybe is trying to report on data. You know, most of us are generalists. Many of us are not comfortable with numbers and stats. I mean, that is a stereotype that doesn't linger, because it's sort of, there's some truth in it. So I wonder, you know, we want to communicate this clearly. Because we think it's important to our audiences. What advice given sort of what you've been doing, would you have for journalists when it comes to reporting on stories that involve data, whether it's complicated or not?

Sara Stoudt
I think one thing is like, have a buddy, like, statisticians we're friendly. If you find someone that you'd like, work well with workshopping it that way, because I have collaborators who just help me write better in general. And I think journalists can have that too. And I would love to see more cross pollination with that. Because, yeah, like statisticians want to be able to write for broader audiences better, too. So that seems like a win-win. I think there's some common statistical things that everybody is fussy about, and doing a little reading up on that. I mean, you have a lot of, I'm not saying do more work, because I know, journalists are busy and doing important things. But maybe like, you know, a little community that talks about some of those big ticket items, like, you know, how to report on a p-value, how to report on a confidence interval. It's dry, but that's the stuff that gets you, but maybe doing it in a more community setting. And maybe I started getting a group of statisticians and journalists together to do that. Because as teachers, we face that, too. It's like, how many ways can I explain this? It's still confusing. So it's good for us all to practice, I think. But I don't have any magic solutions. You never know, I guess.

John Bailer
Yeah. So, before we started the podcast, I team-taught a class with a journalist with Richard Campbell, this was quite a while ago, and it was interesting to me to think about the style of writing was so different, that he was talking about then what I was, was thinking about, and that I had done professionally. You know, he was there, there was a sharpness and focus to what he would bring to writing that I found myself being surprised by I mean, not not in a bad way. But just, it was just such a different style. And I was realizing there were these multiple epiphanies for me about kind of this idea of how often in my own writing, I wasn't getting to the point as quickly as I could have. And I wasn't kind of spending so much time on the talking about process, but maybe not getting to the punch line with this the kind of emphasis that it really deserved. So that so I, I mean, I think that the exposure that that for me, as a statistician, and working with with journalism colleagues, has helped me become a much better writer, and a communicator, because of trying to think about, well, gosh, if I tried to do what they're doing, how does that what does that mean in terms of how I produce a product had I written or oral product? So it sounds like you've learned a lot and went through these processes, but also these examples that you found whether it was from pilots from a television show, or from other models? I know that you're sort of thinking that a question will eventually emerge. And I'm wondering, yes, I always wonder, and that's always the problem here. So I would like though, to get to get back to this idea of the pacing and timing of a story as it as a sort of parallels, you know, a big bang theory episode, you know, so I love this idea of the these parent thinking of these parallels between early on introducing kinds of characters and introducing context, and then kind of introducing some conflict and some resolution to conflict and some punch line to the very end. So could you just kind of give us a little kind of a talk through of kind of, of the parallels between kind of, you know, starting out where, where the characters meet? And what does that mean in terms of statistics, and then going through the rest, please?

Sara Stoudt
Yeah, so if you're giving a talk about your own work, you know, everything, but literally, people don't come in with any context. And they have to like to care about it by the end of your talk, because you want them to follow up, because you're not going to tell them everything. Same with the pilot, it's like you have this like 20 minute period, to hook them and have them come back. And you have to set up everything. They don't know anything about the characters or the setting, like, what's the show gonna be about. So you have to cover a lot of ground. And if you think about how you want to present your work, people have to understand why you're doing the work, because that's part of the way of getting them there. Why is your work hard? Like, why is it a big deal that you're doing the work and sort of connecting it to what other people might be doing. So it's sort of like, actually, in the first talk that people hear from you. It doesn't even matter how you're doing the thing. They just need to know why you're doing the thing. And what makes it interesting, or hard that it's worth doing, because they'll follow up and read the paper after that if they care. Same with the pilot, they'll keep watching the show once they're sort of brought in. So I think you have to think about it in terms of stripping it way back to when you started the project, right? Like, why? Why did you pick it as an interesting problem? Who brought you the context, if you're a statistician who's working in applied field, you also have the challenge of talking about the context of the work so like I work in ecology, if I'm presenting at a SAS conference, there's some baseline ecology, I also have to cover in that talk. Right so you can imagine like, okay, maybe ecology terms are like the characters, you got to learn what they're about. You have to learn what the major conflict is. There's an ecological conflict, like why do I care from that point of view? But then there's a statistical conflict of like, why is this a stats problem? That's hard and I started going from there. But I've described all that, then do you have, like, 20 minutes?

John Bailer
No, that helps a lot. I mean, the idea of the the images ties back to kind of some of your storyboarding, I love the idea of thinking about putting all of your plots on, you know, on some display and sort of moving them around, and maybe connecting them in terms of the story that you want to tell annexing out the ones that aren't effective. When I taught visualization or other kinds of data practicum classes, I would often say you'll make more than 10 times more the number of figures you'll ever include in a report that you issue, just because you're trying to find the right way to tell the story. And ultimately, for me, I often found that if I could generate the figure that spoke to me, I could write the text that would describe it to others. So do you find any kind of relevance and importance of kind for you doing the visualizations as part of input and inspiration for the text that you would produce?

Sara Stoudt
Yeah. And actually, I've been doing a lot of things, not even on the computer, but like, sketching what is the graph I want? That will show me what I need? Or what do I expect this to look like if what I'm thinking about is true. And then trying to make that graph. Because I think sometimes when I'm just making graphs on the fly, I'm making ones that are easy for me to code, but are not necessarily the right graphs. And so I've been doing a lot of that sort of thing, like doodling. And I think that has helped and especially if you're thinking about the right conceptual diagram for explaining your work, that is also something that I need to draw first, because I'm not great at the like, shapes on the Google Slides or whatever. But I think that it really helped me solidify the story. Because sometimes if I'm just looking at a bunch of, you know, scatter plots, histograms, it's hard to, like, really see what's going on. So thinking about the maybe less traditional visualization that would like to really consolidate everything, and then trying to think about like, is this a plot I can actually make?

Rosemary Pennington
So you've been doing this work for a while now you've done work around how to present and the storyboarding and you have the book, what sort of next for you when it comes to stats communication, like what do you want to be working on next?

Sara Stoudt
Yeah, I think for me, personally I'm thinking a lot of like creative writing that's related to stats and data. So thinking about either data or statistics concepts as constraints for something like maybe like, Could you write a poem that's constrained in a way that's informed by data? Or could you write short stories or speculative fiction that have these sort of like data II concepts? You think there's all this sci fi now, that has to do with, you know, climate change, or the rise of machine learning and like the ethics of those things? I think that we could also write more stats focused fiction, not just for the sake of writing them, but I could see them being useful teaching tools. I think I'm personally just trying to break this sort of false binary of like, you're a quantitative person, or you're like a creative type. And so I'm really interested in trying to fuse those and like, can we do more artsy things with data? So that's what I'm thinking a lot about. I don't know if that's necessarily going to end up my professional take on communication. But I'm really trying to do that for myself. I think when I started down this road again, I didn't really claim the ownership of the title writer. And now that I feel like I can say that, I feel like the next hurdle is like, Are you a creative writer? Like, can I write more than just nonfiction? So we'll see where that goes.

Rosemary Pennington
Well,thank you so much for being here today, Sara. That's all the time we have for this episode. It's been great talking with you.

Sara Stoudt
Yeah, thanks for having me again.

Rosemary Pennington
Stats and Stories is a partnership between Miami University’s Departments of Statistics, and Media, Journalism and Film, and the American Statistical Association. You can follow us on Twitter, Apple podcasts, or other places you can find podcasts. If you’d like to share your thoughts on the program send your email to statsandstories@miamioh.edu or check us out at statsandstories.net, and be sure to listen for future editions of Stats and Stories, where we discuss the statistics behind the stories and the stories behind the statistics.


Data Visualization Contest Winner | Stats + Stories Episode 300 by Stats Stories

Nicole Mark is a visual learner and communicator who found her passion in the field of data visualization. She started out making maps of imaginary worlds and cataloging her volumes of The Baby-Sitters Club on her family's original Apple Macintosh. Now, she analyzes and visualizes data in Tableau and with code, always on a Mac! She writes about dataviz, life with ADHD, and the modern workplace in her blog, SELECT * FROM data. Nicole co-leads Women in Dataviz and the Healthcare Tableau User Group. She’s working on her master’s in data science at the University of Colorado, Boulder. Check out her Tableau site.

Episode Description

After producing hundreds of episodes we have lots of data lying around. Data we made available to you, asking you to crunch the numbers for a contest that told the story of our podcast. The winner of that contest Nicole Mark joins us today on Stats+Stories.

+Full Transcript

Coming Soon

Stats and Stories is a partnership between Miami University’s Departments of Statistics, and Media, Journalism and Film, and the American Statistical Association. You can follow us on Twitter, Apple podcasts, or other places you can find podcasts. If you’d like to share your thoughts on the program send your email to statsandstories@miamioh.edu or check us out at statsandstories.net, and be sure to listen for future editions of Stats and Stories, where we discuss the statistics behind the stories and the stories behind the statistics.


Viral Statistical Capacity Building | Stats + Stories Episode 293 (Live From the WSC) by Stats Stories

Matthew Shearing is a private sector consultant working globally in partnership with the public, private and not-for-profit sectors on improving official statistics and other data systems, Monitoring and Evaluation, and embedding official statistics standards in wider international development.

David Stern is a Mathematical Scientist and Educator. He is a former lecturer in the School of Mathematics, Statistics and Actuarial Sciences at Maseno University in Kenya and a founding board member of African Maths Initiative (AMI).

Read More

Survey Statistics: Where is it Heading? | Stats + Short Stories Episode 292 (Live From the WSC) by Stats Stories

Natalie Shlomo is Professor of Social Statistics since joining the faculty in September 2012. She was the head of the Department of Social Statistics (2014-2017). Her research interests are in topics related to survey statistics and survey methodology. She is the UK principle investigator for several collaborative grants from the 7th Framework Programme and H2020 of the European Union all involving research in improving survey statistics and dissemination. She was the principle investigator for the ESRC grant on theoretical sample designs for a new UK birth cohort and co-investigator for the NCRM grant focusing on non-response in biosocial research. She was also principle investigator for the Leverhulme Trust International Network Grant on Bayesian Adaptive Survey Designs. She is an elected member of the International Statistical Institute and a fellow of the Royal Statistical Society. She is an elected council member (to 2021) and Vice-President (to 2019) of the International Statistical Institute. She serves on editorial boards of several journals as well as national and international advisory boards.

Read More

Are We Trustworthy? | Stats + Stories Episode 290 by Stats Stories

Communicating facts about science well, is an art. Especially if you are trying to reach an audience outside your area of expertise. A statistician in Norway however, is convinced that how you say something is just as important as what you say when it comes to science communication. That topic is the focus of this episode of Stats+Stories with guest Jo Røislien.

Read More

C.R. Rao: A Statistics Legend by Stats Stories

The International Prize in Statistics is one of the most prestigious prizes in the field. Awarded every two years at the ISI World Statistics Congress, it’s designed to recognize a single statistician or a team of statisticians for a significant body of work. This year’s winner is C.R. Rao, professor emeritus at Pennsylvania State University and Research Professor at the University at Buffalo. Rao’s created and been honored for a number of contributions to the statistical world in his over 75-year career. That’s the focus of this episode of Stats and Stories, with our guests Sreenivas Rao Jammalamadaka and Krishna Kumar.

Read More

Judging Words by the Company They Keep | Stats + Stories Episode 269 by Stats Stories

The close reading of texts is a methodology that's often used in humanities disciplines, as scholars seek to understand what meanings and ideas a text is designed to communicate. While such close readings have historically been done sans technology, the use of computational methods in textual analysis is a growing area of inquiry. It's also the focus of this episode of Stats and Stories with guest Collin Jennings.

Read More

Rewards Points vs. Privacy | Stats + Short Stories Episode 262 by Stats Stories

Everyone can relate to being in a rush and needing to get just one last item from the store. However, upon reaching the checkout line, after being asked the all too often refrain of, “can I get your loyalty card or phone number” you may wonder why is this information so important to a store. The annoyance and potential ramifications of giving up your data so freely is the focus of this episode of Stats+Stories with guest Claire McKay Bowen.

Read More

Talking to a Statistical Knight | Stats + Short Stories Episode 259 by Stats Stories

Sir Bernard Silverman is an eminent British Statistician whose career has spanned academia, central government, and public office. He was President of the Royal Statistical Society in 2010 before stepping down to become Chief Scientific Adviser to the Home Office until 2017. Since 2018, Sir Bernard has been a part-time Professor of Modern Slavery Statistics at the University of Nottingham and also has a portfolio of roles in Government, as chair of the Geospatial Commission, the Technology Advisory Panel to the Investigatory Powers Commissioner, and the Methodological Assurance Panel for the Census.  He was awarded a knighthood in 2018 for public service and services to science. 

Episode Description

Sir Bernard Silverman is an eminent British Statistician whose career has spanned academia, central government, and public office. He will discuss his wide-ranging career in statistics with Professor Denise Lievesley, herself a distinguished British social statistician.

+Full Transcript

Coming Soon

Stats and Stories is a partnership between Miami University’s Departments of Statistics, and Media, Journalism and Film, and the American Statistical Association. You can follow us on Twitter, Apple podcasts, or other places you can find podcasts. If you’d like to share your thoughts on the program send your email to statsandstories@miamioh.edu or check us out at statsandstories.net, and be sure to listen for future editions of Stats and Stories, where we discuss the statistics behind the stories and the stories behind the statistics.


A Shared Passion for Math and Statistics | Stats + Short Stories Episode 257 by Stats Stories

At Stats and Stories, we love to have statisticians and journalists tell stories of their careers and give advice to inspire younger professionals and the next generation about what they can do with the power of data. However, we have yet to have a couple join us to talk about their careers and how statistics in Brazil have progressed over the past 30 years. That's the focus of this episode of Stats and Stories Pedro and Denise Silva. 

Read More

The Career of the Chief Demographer of the U.S. Census | Stats + Stories Episode 251 by Stats Stories

Hogan is the former chief demographer of the U.S. Census Bureau and studied at Princeton’s Office of Population Research, and its School of Public Affairs. He then spent two years teaching at the University of Dar es Salaam and working on the Tanzanian census. He joined the Census in 1979. He worked on household surveys, business surveys, and the population census. He led the statistical design of the 2000 Census. He served as an expert witness in Utah v Evans, in which the Supreme Court considered the use of imputation in the Census 2000. He served as Associate Director for Demographic Programs and later as Census Bureau’s Chief Demographer. He taught as an Adjunct Professor at the Department of Statistics of George Washington University. He is an Honorary Fellow of the American Statistical Association. He was awarded the 2018 Jeanne E. Griffith Mentoring Award. He retired from federal service in 2018.

Episode Description

Demographers study the way populations change. The things they might focus on include births and deaths, living conditions, and age distributions. In the United States, population changes are tracked nationally by the Census Bureau. A conversation with the retired chief demographer of the U.S Census Bureau with Howard Hogan is the focus of this episode of Stats+Stories.

+Full Transcript

Rosemary Pennington
Demographers study the way populations change the things they might focus on include births and deaths, living conditions, and age distributions. In the United States, population change is tracked nationally by the Census Bureau. A Conversation with retired Chief demographer of the US Census Bureau Howard Hogan is the focus of this episode of Stats and Stories, where we explore the statistics behind the stories and the stories behind the statistics. I'm Rosemary Pennington. Stats and Stories is a production of Miami University's departments of statistics and media journalism and film, as well as the American Statistical Association. Joining me is regular panelist John Bailer, emeritus professor of statistics at Miami University. Our guest today, as I mentioned, is Howard Hogan, retired Chief demographer at the United States Census Bureau. He joined the census in 1979, where he worked on household surveys, business surveys and the population census, and where he led the statistical design of the 2000 census, Hogan also served as an expert witness and Utah V. Evans, in which the Supreme Court considered the use of imputation. In the 2000 census, he taught as an adjunct professor at the Department of Statistics of George Washington University, and is an Honorary Fellow of the American Statistical Association. Chance recently featured an interview with Hogan, and we're happy to have him joining us here on Stats and Stories today. Thanks so much for joining us today, Howard.

Howard Hogan
Pleased to be here.

Rosemary Pennington
Could you just describe what the job of the chief demographer of the census is?

Howard Hogan
Yeah, it's a senior research position that basically you get to research anything you want and work with whoever wants to work with you. It was a chance at pretty much the last few years of my career to mentor all the younger generation and teach them what I'd learned about census taking and survey taking and demography over my career. So it was a chance to collaborate with lots of people. It's one of the best jobs the federal government has to offer, maybe the best.

John Bailer
How did you get there? You know, can you give us sort of a synopsis of your career journey that led to this one position?

Howard Hogan
Yeah, well, after I got my degree in demography, I spent two years in East Africa working on the surveys and in the Tanzanian census. And then, after a brief time down the University of North Carolina Chapel Hill, was hired by the Census Bureau as part of the 1980 census, to use demographic methods to measure the undercount. And specifically to see if there was a way to determine the uncertainty some sort of range of various however you want to say it around those demographic methods of measuring the undercount. As part of the 1980 census, the Census Bureau was sued by Detroit, and a number of other cities about the undercount. And I got involved in that litigation at a very low level, writing some responses to their queries and whatever else. Soon after that, I was put in charge of determining a method to measure the undercount for the 1990 census that was spent the entire 1980s perfecting, well, that perfecting is maybe a bit strong, but certainly improving and the methods that were being used, and essentially, the methods we developed in the 1980s, with with improvements, obviously, are the methods that they they've been using to this date. Then after 1990, I moved over to the economic area and worked on economic censuses and surveys for a while. And then move back to the decennial area where I was, as mentioned in the introduction, the head of statistical design, not the operational design, this statistical design for the 2000 census, then back over to the economic area for a few years. And then back to being the Associate Director for Demographic Programs, which is in charge of all the household surveys, including the unemployment survey, and the crime survey and all of those, as well as the demographic estimates, the projections and the estimates. And then after a few years of that, I was offered the position to get out of management, out of supervising got to doing budgets and taking a senior research position as chief demographer, and that's pretty much what I had had was very happy to take that and that's, that's where I finished my career.

Rosemary Pennington
Howard, you mentioned the undercount a few times when you're talking about your work with the census, and I wonder if you could talk about what and what the undercount is and why it's something that we have to pay so much attention to?

Howard Hogan
Well, the census is used for a number of things, the three most important is dividing the 435 congressmen between the states, then it's used by the states to draw their congressional districts within the states. The first is called apportionment. And the second is called redistricting. And then it's used by the federal government to distribute billions and billions of dollars. So when a city, mainly cities, determined that they might have been undercounted, they felt they were cheated out of both representation and out of money in the 70s. As you may know, a lot of big central cities lost population. And so suddenly, the undercount became very important to them, because they were losing and they didn't want to lose more than that, then they actually had, in addition, in this in the slight 60s, but certainly by South by 90, at the one person, one vote was well established. And they realize the cities and states the importance of having a good census to measure that one person, one vote. So between the two, it became politically really important. The flip side of that is demographers starting with as they call but also Jay Siegel, at the Census Bureau, had been developing methods to measure the undercount. Before that, there were some sketchy ideas about how many people might have been missed. And some of the census directors pretty much implied we, we I still use we a lot, we hardly miss anybody. And it might be a few, you know, hermits out living in the desert, but really, we get everybody. And demographers pretty much prove that that was not true. Following up on that statistic using demographic methods, as you know, births minus deaths plus immigration minus emigration tells you how many people should be counted. In addition, statisticians, led by Eli Marx and others, began to use follow-on surveys, what's called a post enumeration survey, where you go out and Census Bureau goes out and does a second survey. But then they do a one to one match, to see how many people on that second survey were also counted in the census. So, the proportion of people in the second survey who were missed by the first survey gives you a measure of the undercount. Then to get it, measure the net undercount, of course, then you have to go to the Census and verify how many of those people how many of those records refer to a unique person who should have been counted and had been counted only once. So we've had these two parallel methods of measuring the undercount for many years, and I started to have one foot in each camp.

John Bailer
It seems like such a natural and intuitive way to quantify when that occurs by using this post enumeration survey. But you know, so but if it was consistent, I mean, if you had this the same degree of undercounting in urban versus rural, then maybe it's not an issue but one of the key issues was this differential right there was absolutely correct.

Howard Hogan
I mean, if it was a uniform undercount, almost all the there's a few exceptions, but almost all the uses are proportional, so it would make no difference. But historically, and census after census, African Americans are undercounted much more than white Americans, the undercount back in the 70s and 80s, the undercount of black adult males was really high, I mean, over 10%, as I recall, and then once we were able to measure the undercount of Hispanics, it's also high. Measuring the undercount of Native Americans is kind of difficult, given a number of technical reasons, but that seems to be also high. Renters are much more likely to meet us than people who live in, you know, owned homes. So there's the flip side of that, of course, college students tend to be disproportionately counted twice, once at college, where they should have been counted. And once at home, where their parents are paying the tuition.

John Bailer
You know, as you were talking about your trajectory there. And in your last piece, you mentioned that you love this work as chief demographer. What were some of the coolest projects that you had a chance to work on when you had that last position?

Howard Hogan
Well, I think the most final project was a few years ago. There were all these news articles about how this one lady born on January 1 1946, was the first baby boomer to, you know, qualify for Social Security and that seemed wrong to me. So I gathered the time series for right after World War Two, and then I collaborated with Bill Bell, who is just a fantastically good time series analyst. And between the two of us, he did the hard mathematical work. We showed him pretty precisely when the baby boom actually began, it began in July 1946. And we did this all statistically. It turns out, I love this. It turns out that's almost exactly nine months after, after the end of the war. Wait, wait, wait, didn't know that's how it turned out. But it's how it turned out

Rosemary Pennington
What propelled your interest in demography?

Howard Hogan
I had the really fantastic opportunity to study for a year at the University of Stockholm, thanks to the Rotary Club. At that time, I was interested in regional planning and city planning. And realize that if you're going to do any regional planning or city planning, you need to know some, some demography. It's pretty, pretty simple. So when I went to Princeton University at what was then called the Woodrow Wilson School, I was doing City and Regional Planning. But mediately decided, well, there was a course in demography that might as well take that. And after one semester, I had fallen in love with demography and fallen out of love with city planning. So I continued on that tract. And by the way, did you know that the University of Miami was the home of demography in the United States? I'm In Miami University Oxford, Ohio, is the home of demography in the United States as it had the first program.

John Bailer
Yeah, there's still the Wellington lecture is still hosted annually. So yeah, okay. So that, so yes, I did. I didn't know that I had some affiliation with the folks in the Scripps Gerontology center now. But yeah, that's a pretty cool story that it said, I'm delighted that you know that too. That's, that's neat.

Howard Hogan
Well, one of my professors graduated from there. So he told me, he actually taught there. So he told me–

John Bailer
You know, what was this? The thing that you mentioned, also, as part of the different responsibilities you had within the census? It really highlights something that we've talked about before on previous episodes, where we've had some of the census colleagues of yours was just the diversity of products that the Census produces. You know, I think there's the sense of the decennial census being such a well known land, and the impact of it as well. But some of these other surveys that are routinely and regularly conducted, it's not clear that they're well appreciated. Okay. Can you just give a kind of a summary of some of those the types of things the census is contributing?

Howard Hogan
Absolutely. Well, we've mentioned that every 10 years decennial census, and then we have a number of household surveys, the biggest and right now is the American Community Survey where we're out literally almost every day of the year, collecting data on veteran status, fertility, unemployment, household housing, almost every demographic topic, you can think of immigration status, mobility, and that that that is it's such a large survey, we're able to publish pretty local data. In addition, the other very famous survey is the CPS, the, the measures where we measure the under unemployment, that's a collaboration with the Bureau of Labor Statistics, but the current population survey, we go out every month and collect data on unemployment. And together with BLS, we publish it. We lot of these surveys are in collaboration with other federal agencies, they they they funded and we do it, the crime survey, community computer Expenditure Survey, a number of those, then as long as we're still on surveys, over on the academic side, we do one of my favorite surveys, which is the monthly retail trade survey. And that's really literally the very first indicator of how the economy is doing. So without 10 working days, after the end of the month, we publish economic data about what the economy is doing. And you will get to work on that survey. It was a lot of fun. When the data were released on the dot I mean, we checked with the National clock at 830 On the day it was released, and the stock market and the bond market in the futures market would have reacted within nanoseconds to our results. So it was fun to work on a survey where you just know right away everybody was eager to see what you had to say. Additionally, there's there's a number of economic surveys but additionally we do the population estimates that where we fill in, you know, the population between decennial census is working again working with our state partners to get the data and, and a lot of federal programs So we're actually based off of the pop estimates program. And as a demographer, one of the fun things to do is the population projections program, where we project what the populations look like. And that's where you will read about how for what is worth the press loves to report on, you know, and at such and such a date, the United States will become a, a minority majority or majority minority country, and this is going to be a great turning point. Those news articles are all based on our projections. And that's a lot of fun to do as a demographer.

Rosemary Pennington
You're listening to Stats and Stories. And today we're talking to Howard Hogan, retired Chief demographer of the United States Census Bureau. One of the things that I saw a lot leading up to the 2020 census was a discussion about mistrust of the census, of concern about whether people will participate, and sort of what that means for the undercount, and so many other things that rely on census data. And I wonder sort of what your take is on the mistrust of census and what can be done to sort of help alleviate that?

Howard Hogan
Yes, the mistrust is there, and especially prevalent in 2020, the voluntary participant participation in the census has been declining. The Census Bureau has an almost perfect record of protecting the confidentiality of its data, people have to go back to what happened in 1942. And world war two to find anything to point to the say the Census Bureau cannot be trusted, you know, the last 70 years has been virtually once a we never made a mistake. But it was where the Census Bureau staff was fanatical about protecting confidentiality, and how to get that message across to the American public. And it's a challenge, because in the final analysis, we're part of the federal government. And if people don't trust the federal government, don't trust the press the promises of the federal government, and they don't say, well, but the Census Bureau is completely different, even though in many ways it is completely different. We just it's a matter of outreach, I can't remember the exact numbers, but you know, the way the census has been done since 1970, is we mailed out a questionnaire and asked people to respond voluntarily before we go knocking on doors. By 2020. That was done a lot on the internet. But the voluntary initial participation has been falling decade by decade by decade, as it has in almost all household surveys. So it's a general societal change. And I don't, I wished I had a good solution. But we partner with community organizations, tribes, cities, about states, we partner with everybody we can partner with to spread the word that you know, we're good guys. And you can trust us. And I'm no longer retired. I want to be clear I'm I'm retired. So when I say we, it's my love. I am no longer speaking for the Commerce Department, the Census Bureau or, or any official capacity?

John Bailer
Well, if it's hard to imagine something more important than things like apportionment and redistricting and distributing, you know, so much money to these various districts. So, it's important work. So it's important to be heard, and people clearly care about it. I mean, you mentioned over the course of both your article and previous comments, you know, the 1980 census lawsuit, 1990 census lawsuit, 2000 lawsuit, and there were different aspects that kind of whether it was early on, you were mentioning the undercount, as being a trigger for these kinds of concerns and legal action, but later it was there were aspects of of sampling and imputation. And it was really fascinating, you know, as someone who was teaching statistics at that time to be tracking this and talking about this in classes, could you just give a little bit of a summary of, of what was the what were some of the issues that surfaced in the 1990 and 2000 Census?

Howard Hogan
Certainly, there are two sort of issues. One, there's a strictly statistical issue: how precisely can one measure it? Of course any survey, including an enumeration survey has survey area errors in it itself and whether those survey errors would swamp what you're trying to measure or not is a statistical issue that was greatly discussed among statisticians. In addition, there are two legal constitutional issues, one of which is that the Constitution talks about counting talks about an actual enumeration and counting the whole number of people. My personal understanding is When they said actual enumeration, they meant something that's not a political deal. That first apportionment was a political deal. But the courts and the Supreme Court have interpreted both of those pretty much to say, you can't you pretty much you have to cram come pretty close to a nose count. And I'll come back to that in a moment. Then when sampling was introduced in the census, the first census to use sampling was the 1940. When that was introduced, the Congress added a provision in the law saying it cannot be used for apportionment. Now, I think what they were thinking is, you know, you can't just count every other county or every other household, but that's what the law says. So there were a number of lawsuits. The final one, when it reached the Supreme Court, they basically said, we cannot use the post enumeration survey, which is based on sampling, to correct the apportionment. They didn't really address whether it can be used for other other uses. But the politics of it were such that every attempt, the Census Bureau had to use the posting ratio survey for other uses and got shut down for political and or statistical reasons. But anyway, the court was pretty clear that sampling could not be used for for apportionment, that that was that those cases were really nailed down in 1990, around 9090, then around the 2000 census, let me step back a second, the way your portion is done is a really cool algorithm, where the 435 congressmen are sequentially distributed to the to the 50 states. And so in 90 in 2000, the 430/5 Congressman, went to North Carolina, and F has been 436 that would have gone would have gone to Utah, Utah was not happy. And, they still pursued, saying, well, if you'd counted our overseas missionaries, clearly, we would have had enough to beat out North Carolina. The head of the census, J Wait, was actually a fairly senior member of that church. And, and he said, You guys probably counted those children at home anyway. But anyway, they lost that pretty court pretty quickly in court. So then they came back and said, Well, you, the Census Bureau has used a whole person. The whole person imputation is strictly speaking, count imputation. To come up with a numbers count imputation is when you knock on a door, nobody answers. And the interviewer after trying the best that she can can't determine whether it's occupied or not. Or if it's occupied, whether anybody knows how many people live there. And so there's sort of three levels of counted imputation, they can't even find the address that happens sometimes, or not sure whether it's a living thing or business, they find that they can't determine whether it's occupied or not. They or they're pretty sure that it's occupied. But they can't determine the number of people. So in each of those cases since 1960, the Census Bureau has imputed the number of people living there. Well, Utah's argument was basically that sampling, you know, it wasn't really their argument, it doesn't really matter whether you count 1% and infer 99 Or you count 99 and infer one. They're both, you know, mathematically, you're inferring the hall from the part and therefore sampling the Census Bureau said, well, no, it's not sampling, uh, for a number of reasons, very technical. And this case went to the Supreme Court. And this is really one of the high points of my career because I was the chief witness for the Census Bureau, and I got to actually sit in the Supreme Court when it was argued and work with the Solicitor General, the United States and the former solicitor general to prepare the case. And it's a case it's now taught in law school as when you take the course in argumentation, because what we came up with was just brilliant. I wish I could say I came up with it all by myself, but I didn't. In here. Here's the hairy, here's how we sold it. If you went into the Supreme Court's library, you brought it home to the Supreme Court's library and you wanted to know how many books there were and you counted the books on every other shelf and multiplied them to your honors. That would be sampling but what would you do if you were going down the shelves, and you're going, you know, volume 11, volume 12, and there's a gap there, volume 14, volume 15, volume 16, why would you infer that, you know, volume 13 was probably checked out. And you would count it that your honor says imputation. And that's the difference. And we won the case, just barely it was a very fractured decision. But looking back, if we had lost that case, at the courts had said, even this very small, and it really is, you know, let around 1% They're very small use of statistics is not allowed, then that would have, you know, really prescribed hardly any use of statistical inference for almost anything in the census, it would, it really would have handcuffed us for the use of entire administered records in 2020. So it's a very important victory and in terms of it allows us at least some wiggle room to use statistical inference to come up with the counts.

John Bailer
You know, our podcast, we called it Stats and Stories. And it was purposeful, you know, in part because the statistics behind the story and the story behind the statistics was part of what we were hoping to explore as part of this podcast. And I was , I'm delighted that you brought that up. In fact, my next question was going to be that because, you know, I love that you said two things. Well, you said lots of things I've really enjoyed. But you talked about, if no one's criticizing you, it means you're not working on anything important. That was one of the quotes from your chance piece. And certainly the work that the Census has been doing and kind of the attention it's gotten from many different constituencies is a clear demonstration of that. And you also said that some advice that you would give is no data without a story. And I that very much resonated with something that was key for us. So I want to thank you for bringing that out. And also reinforcing that.

Howard Hogan
Yeah, it's what I try to teach. And it's not always easy to come up with a story. But when I met with the Solicitor General, General Olson, that was literally his first question. We sat down at the table, his first question was, how can we put this in the form of a story or an analogy, and he was a pretty good lawyer.

Rosemary Pennington
Well, that's all the time we have for this episode of Stats and Stories. Howard, thank you so much for joining us today.

Howard Hogan My pleasure.

Rosemary Pennington

Stats and Stories is a partnership between Miami University’s Departments of Statistics, and Media, Journalism and Film, and the American Statistical Association. You can follow us on Twitter, Apple podcasts, or other places you can find podcasts. If you’d like to share your thoughts on the program send your email to statsandstories@miamioh.edu or check us out at statsandstories.net, and be sure to listen for future editions of Stats and Stories, where we discuss the statistics behind the stories and the stories behind the statistics.


Conducting a National Survey | Stats + Stories Episode 249 by Stats Stories

What is the nutritional status of children in your town? How many tourism and hospitality companies are in your community? Answering these questions at a small scale seems like a challenge. However, imagine scaling this to a country, with one hundred and twenty three municipalities, 26 states and a federal district. Answering these questions with survey methods is the focus of this episode of Stats+Stories with guest Pedro and Denise Silva.

Read More