Comedy, Art, and ... Statistics? | Stats + Stories: Episode 370 by Stats Stories

Dr. Matthews is Associate Professor of Statistics and Director of the Center for Data Science and Consulting at Loyola University. He also is a data artist who developed and promoted the Data Art Show, which debuted at the 2016 Joint Statistical Meetings. He performs with the Uncontrolled Variables comedy troupe at the Lincoln Lodge in Chicago and you can see his data art, links to his comedy performance, and much more at his website, Stats in the Wild.

Episode Description

A statistician walks into a bar, and a comedy and art show begins. Creative work for scholars can extend beyond novel research and application. In today's episode of stats and stories, we see how the intersection between interest in statistics and art, as well as the intersection of statistics and comedy, with Dr Greg Matthews.

+Full Transcript

John Bailer A statistician walks into a bar and a comedy and art show begins. Creative work for scholars can extend outside of novel research and application. In today's episode of stats and stories, we see how the intersection between interest in statistics and art, as well as the intersection of statistics and comedy, is realized. I'm John Bailer. Stats and stories is a production of the American Statistical Association as well as Miami University's departments of statistics and media, journalism and film. I'm joined in the studio by Rosemary Pennington from the Department of media, journalism and film. Our guest today is Dr Greg Matthews, Associate Professor of statistics and Director of the Center for data science and consulting at Loyola University. He also is a data artist who developed and promoted the data art show that first appeared at the ASA joint statistical meetings in 2016 and he performs with the uncontrolled variables comedy troupe. You can see his data art, links to his comedy performance and much more at his website, stats in the wild.com,

John Bailer
Greg, thank you so much for being with us today.

Greg Matthews
Oh, it's great to be here. I love talking about statistics and art and comedy.

John Bailer
Well, thank goodness, because we weren't prepared for anything else. Perfect. I have to thank you for giving so many, so many options of where and how to begin this episode. You know, I was just spinning my wheels for a while, but I guess I think I'll start with art. So, so how about just the a basic idea of how you would differentiate data art from, say, data visualization,

Greg Matthews
that's a spectacular question. So I have this sort of like corny answer for the difference between data visualization and data art. I give a I give a data art talk, and this is what I always say, is the difference data visualization answers the question. Data art asks a question, pretty deep, right? And the idea is that data visualization, I mean, there's certainly overlap, but in my mind, data visualization is essentially functional. You're trying to answer a question, you're trying to summarize data, you're trying to convey information to a viewer. Whereas data visualizations can become art when presented in the in the in the correct context. But they don't always have to be good data visualizations, right? The goals are different data like, you know, like a pie charts, not a very good data visualization. And there's rules to good data visualization, just like there's rules to good art, and you can sort of break those rules of data visualization when you're making data art. But I do see a lot of overlap in these two these two ideas, visualization and an art, and I think depending on the context, they could be one in the same, depending on how you presented it to a viewer.

Rosemary Pennington
I was looking around on your website at some of your art, and it struck me that if I did not know that you were doing this based off of data, I would have had no idea, right? It just looks like really interesting abstract art sometimes. How did you get started creating this kind of art? Well, so

Greg Matthews
it's my wife. My wife gets all the credit. So my wife went to art school, and we started dating, we got married, and over the course of, you know, our our our time together. I basically got in an art school education, and she sort of got me really interested in art. And then I started thinking about, Oh, I can, I can make art with my computer, and I can, I can make data art. And so that's what got me hooked. And then I just started doing it. And I really, I really like doing it. And what. And to your other point about how you don't you didn't know that it was data art. My goal, when I make data art is to make something that will stand on its own, and you can look at it visually, and it's interesting by itself, and then when you learn that there's data behind it, it becomes even more interesting. That's That's my goal. That's like my what I view as success in something that I've created artistically. It doesn't always work, but, like, that's my goal when I'm making it, it's more interesting when you when you realize there's data

John Bailer
behind it. Oh, that I agree. And, you know, that's, that's really cool, just to see this process. Can, can you talk a little bit about the process that you follow when you're producing a piece of data art? You know, I looked at, you know, looking through some of your collections that you have online, there's, there's some analysis and rendering that seems like it's that's hiding behind the scenes here. And so could you talk a

Greg Matthews
little bit about that? Do you know which pieces you're specifically talking about? So how

John Bailer
about, how about, no, just all of them. We have about five minutes go through it. Ready Start. How about, how about celebrity and gun. Okay, so a couple of recent ones.

Greg Matthews
Yeah, so these are from the Google Image Search series. So what I've been trying to do more recently is, you know, last five years, or whatever, five or six years, I'm trying to find data sets that are, like, more interesting, and I'm interested in exploring, like, data sets that are personal. And so I've done work with, like my Fitbit data and and things like that. But with the Google image search data, the way these are created is I'm doing a Google image search, and I'm saying I pick a word, so I the word was gun, or what was the other one you mentioned? Celebrity, celebrity. So I google celebrity, or gun, or any of these other words, and I and Google will give you back a set of images. And these images that it gives you back are it's what Google thinks you think those words mean at that particular time, right? If I Google the word, if you Google that word at a different time, you're going to get different results than I'll get. So there's something very personal about what is returned from Google. It's about the time and the place and your personal search history that you're getting these results back. So it's like, what it's what Google thinks I think this word means at that time. So I'm taking these images and I'm trying to put them into a composite image. Now there's certain ways you could do this. You could take the pixels and you could, like, average them, but this gives you kind of what I think are kind of either not boring images, but they're not as exciting visually as they could be. And there's actually an artist at University of Chicago who did this 20 years ago, taking images and making composites. His name is Jason saliva, and he has a lot of really interesting work. So what I was trying to do is make these composites. They're actually made using cart models. Where I'm I'm building a cart model to try to predict the color of each pixel. And now you could do this very well. You could build a very accurate cart model to predict the color of the pixel, the average color of the pixel, very accurately. But it's not that interesting visually. So what I'm doing is I'm actually I want to produce a cart model that is doing worse than you could, because it creates more interesting visual images. And so by creating a bad statistical model, I think you get more interesting visual images. And there's this idea in art where you need to understand the rules before you can break them, right? So I think there's analogies here where I have to understand what a good cart model looks like so that I can break the rules to make art. And that kind of connection really brings me a little bit of joy.

Rosemary Pennington
I wonder what your process is like and how you developed it, right? Because I, as you know, John pointed out, there's gotta be layers and of this, and there's lots of analysis going behind on behind the scenes that we don't see. But when you are trying to figure out what you want to create, like, how are you identifying the data, the process that you're going to use? And how do you like, what is your process for for getting to an image that you are happy with

Greg Matthews
well, so I'll usually start by picking a data set that I think is interesting, and so like I had the Google, the Google, Google Images, or fit it, I'm looking around my office. But what else I was using? I have some some work done that was based on the GPS coordinates your phone is tracking you constantly, right? So I have these images that are like the, I know it's a podcast, you can't see any of this stuff, but they're, they're, they're just like, single days of my life based on where my phone tracked me, right? And that's that's really interesting personal data and and I think good art asks a question, and one of the questions that I want to ask is about, you know, are we okay with all this data being collected about us? Right? There's just constant data being collected. And so if I find a data set I think is interesting, I'll start with that, then I just start writing code in R. So I do this, I do all this stuff in R, and. I will just go through, I'll just start with, like, you know, basic data visualizations. I'll start plotting pixels, and I'll see, all right, what if I write code like this? What does that look like? And then it's just, you know, you try 1000 things, you find the one that you like, and then that's what, that's what you you see, right? It's very much like doing statistical research. I you fail 10,000 times, and then what succeeds you write in a paper. And so people only see the successes, but there's a, there's a lot of trial and error, or, like, you know, you try something and you think it's you get excited about it, and then you look at the image, and you go, I don't really like that. So it's just a, it's just a lot of, you know, experimentation, trying something and seeing if it works, and seeing if it works and if it doesn't work,

Rosemary Pennington
you try again. You said you got into art via your wife. What does your wife think of your data art?

Greg Matthews
She thinks I'm a paradigm shifting, generational genius. She likes it. She thinks it's really good. She will she we have a relationship where, like, I can make something and she can, she can genuinely say that's good or that's bad. Here's why I don't like it. And so one of the things she's taught me is, when she was in art school, they would do like these critiques, and you had to say, I like something or I don't like something, and then you had to explain why. And that process her doing that with me, and then her getting me to do that has been really helpful in thinking about art in a different way for me, you know, I think a lot of people think about art as, you know, paintings from the 1500s about, you know, they're depicting, like, biblical scenes, like, that's what art is for a lot of people, like in a museum. But like, art is so much broader than that, and I had no idea about this until, you know, well, I met my wife, and we started talking about art a lot, right? But like, she will give me feedback, and she'll say, I like this, or this is working. This is what's good about it, and this is what's bad about it. And you could try to do something better here. So, like, we have that kind of, she helps you with that kind of stuff.

John Bailer
That's, yeah, I look you through some of your pictures. I mean, some of them, like, like beach, for example, seemed like it was really cut from kind of this impressionist cloth. Or, like the, you know, sometimes I've seen, like, the Scottish colorist cloth that that has that, that lovely image, those lovely colors. So you seem like you have this this recently, a lot of times these prompt from from image searches that are there. How has your art, your data art, changed over time? Do you find, where did you start, and what are some of the paths towards where you've where you've come to now?

Greg Matthews
So when I start, so when I started making art, I would describe, I make a distinction between computer art and data art. And when I started, I was making a lot more computer art. And what I mean by that is I was generating things randomly. So some of the other I did a lot of games of chance. So I was messing around with things like dominoes, Powerball, Craps. I have keynote in here, and I was using those as, like, a process of generating images randomly, using, like, you know, ours pseudo random number generator to generate images, and then picking the ones, picking the random numbers that I liked. But I don't that's not really data are because it's not, there's no data behind it. I'm just generating the images. And so at some point I did this for I did this for years at the beginning, and at some point I said, you know, I would, I would like there to be something more behind this than completely randomly generated numbers, even though I think a lot of this stuff is very visually interesting. But as I said before, once you find out there's data behind it, it adds a layer to it. And so I started shifting towards using using data as the the primary source of the art, right? And I like to pick data sets that have, you know, some kind of meaning beyond just, you know, here's a random data set from Kaggle, right? So I like to use a lot of my I like to use a lot of data that's generated by myself, because I think that gets, if other people are seeing it there, they can potentially think about what data is being collected about me. Are we okay with that? These are, like, big questions that I don't think people think about at all. They just use their phone all the time, and we don't. I mean, if you really think about it, you know that there's a ton being collected on you, but I think we just don't think about it's just part of life every day, right? But I've also, I've also, I've also worked with data that's like the census data about wealth and race, right? I had a series called American money, and it looked at difference. It looked at how money is distributed by within zip codes, and then the demographic makeup of those zip codes. And you know, you can see big differences between these things and but that it also stands alone as interesting images. But then when you see what's behind it, it's even more interesting. So I like to pick data sets that are, you know, meaningful in some way, and they sort of ask bigger questions. It's not always successful, but like, that's what making art is. It's trial and error. Mm. Yeah.

John Bailer
So you're listening to stats and stories, and we're talking to Greg Matthews about data art and comedy. I think we're, it's about time for us to shift to maybe some uncontrolled variables. All right, spectacular. So how did you get involved? What is uncontrolled variables? And how did you first get involved?

Greg Matthews
So uncontrolled variables is a science and comedy show. And the way it works is we, we bring, we get some Chicago area comedians, and we get Chicago area scientists, and we do a show together. So the basic premise of the show is a comedian comes on, and they do a regular set of comedy. And then we, we give them the we give them the scientist slides, and they present the slides, never having seen those slides before. And hilarity ensues as as you know. And then we bring the real scientist up, and they, they present the slides again, and they, like, you know, address all the things that the comedian screwed up, which is everything, and so we get so the audience gets comedy, and they're actually getting to see a real scientist, and they get to talk about the work that they're doing. My involvement in the show is I help produce it, I help find the scientists, but I also do a, what we call a guest lecture, and I take the topic of the show, and I do a data and I do a, like, a completely absurd data analysis related to the topic of the show. We actually had a show last night. We do it once a month, and you just happen to, we just happen to be talking the day after the show. Last night's theme was environmental science. So I did a data analysis on enteric fermentation, which is livestock flatulence, and about how methane is released into the air by livestock. And I killed,

Rosemary Pennington
you know, it's we talk a lot on stats and stories about how, you know, we're living in this environment where there does seem to be a bit of distrust around science and expertise and facts. And I wonder, as you are working to produce these shows, how you are, who you're imagining your audience is for this, and sort of how you're imagining this, interfacing with this larger, sort of cultural distrust of science.

Greg Matthews
So, I mean, I know exactly who the audience is. The the audience is a lot of graduate students in, you know, science in STEM and there's a lot of professors who show up. We sort of have a whatever scientist we bring in, they'll bring all their colleagues, they'll bring the people in their labs. So it's sort of like a comedy show cheat code, where we are guaranteed at least, you know, 15 people to show up, because the scientists will bring a bunch of people, because it's their one, their one time to be on stage. That's, that's sort of who the audience is, like, we're not reaching a huge it's not like random Chicago. And generally, though, there are people who just show up and they'll, they'll go, I want to go see a show tonight. And they end up at the show not really knowing what it is. And they seem to, they seem to like it. Your other question, though, about the culture, I think is really interesting. So I hadn't thought about this for like, ever. Like, okay, so there's, there's a woman who is currently filming a documentary about our show, and we did an interview. I did an interview with her for the documentary, I don't know, three weeks ago, and she was asked, she asked sort of the same question. She goes, What do you think about like when you do this show? Is there a goal of trying to reach a bigger audience, or, like, is there like a serious goal of trying to communicate science to the general population? And when I started, when I when I first got involved the show, I didn't start the show when I first got involved in the show, it was no it's just a fun science and comedy show. But since, I don't know if you know what happened in last November, but since then, the show is sort of taken on, I do feel like there's a little bit more, there's a little bit more of a serious side to that. There's a little bit of resistance in doing the show, right? So we just did a show in environmental science, and we talked about global we talked about greenhouse gasses and global warming, and I think the fact that we're just talking about this in this environment, it's a little bit resistancey. It feels different than it did, you know, a year ago or two years ago, and we're like, not we're leaning into this. We're going to do a show on. On the biology of gender in June. Oh, right, yeah, we're bringing in, we're bringing in a biologist who is going to talk about, you know, gender from a bio he studies lizards. He's going to talk about the biology of lizards. But like doing those shows a year ago, would be very different than doing those shows now. And I hadn't really thought about this, but I do think there's something a little bit more serious now about even just talking about this stuff, right? And so I feel a different sort of I don't know if responsibility is the right word, but it does feel different to me in a way. Does that make sense? Yeah,

John Bailer
absolutely. So as you think about next month and preparing for this, you know, you're going to be giving a guest lecture again, yes. So, so what's, what is the, when you start thinking about the, you know, the topic going in, how do you start kind of finding that, that hook, that connection, that you want to really, really build on, I mean, the Cal flatulence, that seems like a really good call for environmental science piece. So, so where do you Where are you going to start now in this process for the next month?

Greg Matthews
So this the the process is, I panic for a week trying to figure out what I'm going to do. When I settle on that. Then I do some data analysis. I see what the results are. I go, that's funny, that's funny, that's funny. And then I put them in slides, and then I like, add jokes in at the very, very end, right? If you can get a decent analysis, the jokes like write themselves. You just find some like, everything's funny about it's easy to make fun of science. I think the hard part is finding what you're going to talk about. So what I do is I go to, I go to Kaggle data sets, and I'll like, just search. I'll search like, environmental science or gender or whatever, and I'll see what data sets come up. Because I just want to, I just want to, I just want to see what's out there. And there's, like, this process of, you know, what is even available for me to look at, and then I go through, I don't know, 10 or 15 data sets just to see if there's anything interesting in them. And I'm always worried there's not going to be a light bulb moment, but there always has been, and there always seems to be something that like, oh, that's the right thing to do, and then we go from there. But the important thing is just picking something, so I have enough time to do it in the month. So I at this point, the day after the last show, I have no idea what's going to happen next month. I'll spend Monday, Tuesday night next week, looking through data sets for next month.

Rosemary Pennington
I wonder what you've learned about communicating science in different environments from working on this. So

Greg Matthews
I think in, like in the totality of so I did improv, I like took improv classes, I performed as improv. And this is stand up comedy. And sort of through all of, all of that, I think it's, I think it's changed the way I teach in in class, you know, from simple things that, like, where you stand when you're facing an audience, you know, to to, you know, not being boring, Right? Like it's if you just stand up there and talk about, you know, statistics. I know you and I don't think this is true, but some people think statistics is dry. And if you can take a dry subject and teach someone and teach someone that, but also keep their interest with you know, things that are at least, you know, funny to some people, it creates a good environment for learning. So like, the I'll give you an example where, when I would teach stat 203, it's Introduction to Statistics. Instead of, like, writing out examples, what I will do is, I'll be like, All right. Someone shout out, what do you want to do an example about and they will be like, All right, horse racing. And I'll and I'll sit there, and I'll make up an example, and we're going to do a hypothesis testing example, and I will make it about two horses or or two different groups of horses. They each run something, and this other horse runs something, but they eat a special kind of oat. And then you get to, like, make up a little story, and you're doing, like, a little bit of improv, and that gets them interested, because they get to choose what we're doing. The example about, it's all the same hypothesis test behind the scenes, but, like, that kind of stuff really works, right? 19 year olds aren't all that interested in hypothesis testing in general. But like, if you can get them to, you know, be involved in any way. It's, it's really helpful educationally. I think,

John Bailer
yeah, that kind of audience involvement sounds like an awesome strategy. I I often thought, when I was teaching, teaching stat classes, that that I was glad that the expectations were low coming in for many, you know, because it's, it's, you know, if you're, if you're teaching certain classes, the expectations are coming in sky high. You know, if you're teaching a geography of wines, there's a very different expectation than if you're teaching hypothesis testing and inference and but that's a that's an opportunity. It's not, it's, it's, it can be a blessing. So I, I really like that, and I, I'm curious, you've mentioned this kind. Of how the comedy and sort of that the improvisational has impacted your your thinking about this as a presenter and as a teacher. How about your consulting? Have you? Have you found that how has has kind of this, this work in comedy and in data art changed the way, or changed some of the ways you interact with with clients in a consulting setting.

Greg Matthews
So I don't actually, I mean, I don't know if it actually impacts that, but I will say that I've learned things through making data art and from doing analysis for the comedy show that I've actually used in consulting projects or in my research, right? I haven't used this yet, but like the most recent example of this is last night. I did, I did some change point analysis. Well, I didn't do it last night, but in the show last night, I presented some change point analysis using the pelt algorithm. I had never used this before, and I would never have come across it other than just I had the I wanted to use it for a comedy show, and so I studied the pelt algorithm for detecting change points because of a comedy show. That's incredible, right? I feel much more comfortable working with image data, right? So, like, when you do some when you do, like convolutional neural networks, or you're doing image classification, the reason I know how to do any of that stuff is because of the data art and working with images like as a hobby. And so I've actually learned quite a bit that helps me professionally, from doing these hobbies, right, from doing the comedy shows, or from doing from making this data art, because it's all, it's all coding, right? And so, you know that also brings me joy. I can justify doing it because it's professional work, right?

Rosemary Pennington
I wonder, as you've been preparing these guest lectures, was there ever a topic that you really struggled to just sort of get written in a way that you felt was funny and compelling and sort of what helped you get through that

Greg Matthews
last June. So June is pride month. Last June, we did LGBTQ health. And I am a straight white man, and I had to be very careful the way I wrote that. But I did. I think I did a very good job. I looked at, you know, I looked at, I did a statistical analysis of legislation in like Oklahoma, or legislation across the states. But Oklahoma has a lot of these that is trying to, like, you know, legislate LGBTQ topics. And so I did the, I, I just presented other people's work, or other people's proposed legislation on this, but that was a difficult topic because of the sort of sensitivity around it. But it was, it was a fun challenge, and I had to do it in a way that was aware of who I am and the topic that I was talking about. But I would say that was the most challenging topic because of what it was.

John Bailer
So what kind of recommendations might you have for people who are interested and in getting involved in data, art or or getting involved in and comedy?

Greg Matthews
So with a lot of things, like, if you want to be a lawyer, you got to go to law school, and you got to pass the bar, and then someone calls you a lawyer. If you want to be a professional athlete, you got to, you know, make a team and sign a contract. The bar to becoming an artist is you just saying, I make art now, right? There's no, there's no there's no entry, there's no gatekeepers. You can just make art, right? And it's the same thing with comedy. Comedy is a little harder, because you got to get people to show up. People to show up, to your to your shows, but like you can make art right now, right? There's no There's literally no bar to it. If you want to be an artist, all you have to do is say, I'm an artist now I make art. Just go do it. And I think people are afraid to fail, and they need to stop this. You're gonna fail a lot. Like every time you see someone who did something really successful, all you're seeing is their successes. Right? Whenever you see a comedy show, whenever you see someone do an hour long comedy special that took them months to write and they failed the whole time before that, what you're seeing is the final, finished product. Same thing with an artist. They screwed up that that piece 1000 times before they got the final thing. So don't be afraid to fail and just go do it. It doesn't have to be art or comedy. This applies to like everything, whatever it is, whatever it is that you want to go try. Just go do it, right? Just try stuff. It's okay.

John Bailer
That's good advice. That is good advice. And you know, in this we like to end with good advice, so I'm afraid that's all the time we have for this episode of stats and stories. Greg, thank you so much for joining us today. Thank you. It was an absolute pleasure. Yeah, thank you so much. Stats and stories is a partnership between Miami University. Whoops, I'm going to do that again. Stats and stories is a partnership. Between the American Statistical Association and Miami University departments of statistics and media, journalism and film. You can listen to us on Spotify Apple podcasts or other places where you find podcasts. If you'd like to share your thoughts on our program,


Music Streaming Statistics | Stats + Stories Episode 354 by Stats Stories

Chris Dalla Riva is an analyst for the music streaming service Audiomack by day while spending his nights writing and recording music and writing about music for his newsletter Can’t Get Much Higher.

Check out the Full Article in Significance Magazine

Episode Description

Artists of today are still making albums, however with so much emphasis being put on streaming charts how many of today's album streams are being made up by a few hit tracks? That distinction is the focus of today's episode of Stats and Stories with guest Chris Dalla Riva.

+Full Transcript

Coming Soon


The Statistical Kings of Comedy | Stats + Stories Episode 348 by Stats Stories

Sachin Date works for VitalEdge Technologies and has, over his career, worked in two research labs, three software companies including two product companies, and in a classroom. He has built and delivered all kinds of software including massively distributed discrete-time simulations, data science stacks, a new programming language, and dozens of mobile apps, including the world’s first Napster app for Blackberries. Along the way, Sachin taught 100 liberal arts majors how to program in BASIC and built a mobile applications practice from scratch.

Check out the full article in Significance Magazine.

Episode Description

A journalist, statistician and sound engineer walk into a bar. Well, well, actually, to a studio to record a podcast. Comedians have been a source of great amusement and delight over generations. Popular comedians can earn a great deal from their live shows. In 2023 billboard reported that Kevin Hart earned 67, and a half 1 million dollars from 82 shows with 631,000 tickets sold. Comedies are also a popular genre for television and movies. One of the most successful shows, Seinfeld, created by Jerry Seinfeld and Larry David ran from 1989 to 1998. Have you ever noticed an echo of one of your favorite comedians from the past in the work of a comedian today that’s the topic of this week’s episode of Stats+Stories with guest Sachin Date.

+Full Transcript

John Bailer
A journalist, statistician and sound engineer walk into a bar…well, actually, to a studio, to record a podcast. Comedians have been a source of great amusement and delight over generations. Popular comedians can earn a great deal from their live shows. In 2023, Billboard reported that Kevin Hart earned 67 and a half million dollars from 82 shows with 631,000 tickets sold. Comedies are also a popular genre for television and movies, one of the most successful shows, Seinfeld, created by Jerry Seinfeld and Larry David, ran from 1989 to 1998. Have you ever noticed an echo of one of your favorite comedians from the past in the work of a comedian today who may have influenced Seinfeld or David? How would you know? Stay tuned, and you will get your question answered on this episode of Stats and Stories, where we explore the statistics behind the stories and the stories behind the statistics. I'm John Bailer. Stats and Stories is a production of Miami University's departments of statistics and media, journalism and film, as well as the American Statistical Association. Joining me is regular panelist, Rosemary Pennington, chair of the department of media, journalism and film at Miami University. Our guest today is Sachin Date. Date works for Vital Edge Technology. His career has included work in two research labs, three software companies, including two product companies and in a classroom. He has built and delivered all kinds of software, including massively distributed, discrete time simulations, data science stacks, new programming languages and dozens of mobile apps, including the world's first Napster app for Blackberries. I remember Blackberries and Napster too. For that, he has also taught 100 liberal arts majors how to program in basic and build a mobile applications practice from scratch. Date’s recent Significance article entitled that Shakespeare influenced Seinfeld provides the background for our conversations today. Thank you so much for joining us today.

Sachin Date
Thank you for having me, John.

John Bailer
So what is it? What inspired you to embark on this project? Right?

Sachin Date
So I didn't actually start with the intention of establishing the patterns of influence between specific comedians and their influences. What really happened was, I was browsing through the Wikipedia pages of some of the comedians I follow, and I quickly discovered that a lot of these pages have material on them that seem to indicate that the comedian was heavily influenced by other comedians, and sometimes not necessarily other comedians, but also writers and a lot of other, you know, kinds of people, like family members and friends and so forth. So I clicked on the links of some of these influences, particularly the influences of influences that came from other comedians, and I discovered that the Wikipedia pages of those influencers also contained information about whom they influenced. So I clicked on those links. And then I kind of kept on going back in time, until I ran into Wikipedia pages of writers in the 18th century, 17th century, 16th century. At one point, I opened a Wikipedia page of Shakespeare, William Shakespeare, and I realized that I had actually basically followed the links through from someone who is alive today in the 21st century, and then kind of transported myself back in time all the way to William Shakespeare. So that made me wonder, well, how common is this pattern? Are there other comedians who also have influence data listed on their Wikipedia pages? So I kind of started clicking around, and I discovered that a lot of comedians actually have this kind of data on their Wikipedia pages. Additionally, the Wikipedia pages of very influential comedians like Richard Pryor, for example, or John Carlin, have legacy sections on them which contain information about whom they have influenced. That's kind of part of their legacy. So there's those backlinks also to be followed. So I figured, well, let me actually see if I can do a systematic study of this topic. But when I started doing that, I realized that, well, the number of comedians involved is very big. Wikipedia itself has about, I think, 50 to 100 different categories devoted to comedy. So I figured, well, let me, let me, kind of just put a circle around my research. I'll focus only on the comedians who are contemporarily the most popular comedians in America today, and then I'll start tracing the links back from that set of comedians. And let me see how far back in time and how widespread those things kind of get. And that's kind of, you know, what motivated the research on that topic.

Rosemary Pennington
How did you determine who were the top comedians working today?

Sachin Date
Yeah, so I was interested in the way of finding that information, what I thought I would do and not actually work remarkably well was that I ran a couple of well, actually, I ran three pretty straightforward Google searches. So the search text basically went: most popular American comedians in 22x where that X was either one or two or three. So basically, the most popular American comedians in 2021, 2022, 2023 I figured, well, the last three years could be considered as kind of the window or the most popular contemporary comedians. So sure enough, Google showed a lot of search results. So I tweaked those results by setting the time frame filter to include only the results that were published in the October through December timeframe. So as soon as I did that, that brought forth research that was really more focused toward the end of the year, rankings and less and ratings that were available on the internet. And then I started going through those research, and sure enough, there was a large amount of diversity in there. So for each one of those three years, 2021, 2022, 2023, what I did was I essentially identified about 10 different types of sources, and I tried to keep those sources as different from each other as possible, just to kind of, you know, reduce the bias and improve the diversity in the data. So that gave me essentially a mass of comedians to work with, and then I merged that data, and then kind of arrived at the list of what I consider to be the most popular contemporary American stand ups.

John Bailer
So let's name some names. So who are some of the comedians that you ended up including kind of from this, this three year window?

Sachin Date
Well, there was Jerry Seinfeld, of course, and then there was Hasan Minhaj. Well, let's see. There was John Mulaney and Taylor Tomlinson, David Chapelle. A lot of the same, you know, same set of people started repeating in those names and those things. So one thing that kind of was common amongst them was that a lot of them were very active in stand up comedy. I mean, not just now, but I mean just, you know, three years ago, four years ago, 10 years ago. So they've been doing stand up for a long period of time.

John Bailer
So how many different comedians did you identify in this collection? I mean, once you filtered it based on you said that they were American comedians in this time window that were identified in October through December of these three years. So what was the total number of comedians that you included to start building this connection of influence?

Sachin Date
So the three sources that I ran those searches produced several 100 different comedians, and once, I kind of twittered out all the ones that were not US persons, because my focus was only on American stand ups, so I filtered those out. Then I also filtered out comedians which did not have Wikipedia pages, because my study was really kind of just focused on data that came from Wikipedia. I also filtered out comedians who had not really performed any kind of stand up or improv or sketch comedy. So once all those filters were applied, I narrowed the space down to about 100-175 to 200 comedians. So, that was kind of the social network of comedians that I started with. Now, this was the set of the most popular contemporary comedians as of the end of 2023, now, of course, a lot of those the Wikipedia pages did not have the influence data on them. In fact, I think for over 100 of those 175 or so comedians, there was no good data available on Wikipedia on who influenced them. So those were really isolated nodes in the network, and then the balance set of comedians who had that data, I kind of followed the links back in time and also across in space to build a social network. So in the end, I basically ended up with about 64 to 70 comedians who had a lot of influence data associated with them, and then the social network was kind of based off of that set. The overall network, once you kind of factored in all the influences on those comedians, the overall network of influences ran up to 200 and about 250 to 260 nodes and around 700 of influence.

Rosemary Pennington
What concerns did you have about using Wikipedia data?

Sachin Date
Right? Yeah, so Wikipedia, on one hand, most of the data that's mentioned on Wikipedia is referenced very nicely. So that's kind of one advantage you get from using Wikipedia data, that you can follow through the reference links and just kind of verify that the influence that is mentioned on the page actually does ring true. The text talking about the influence, it is actually a valid influence, but it kind of links through to some article somewhere that mentions how the comedian actually was influenced by someone else. On the other hand, with Wikipedia, there is really no way for you to know the strength of the current strength of the influence, so you're forced to consider that influence as a binary variable, so either the influence is there or the influence is not there. But in reality, of course, influence is much more complex than that. Someone could be influenced by someone else, very heavily in the past, but not really so much anymore. And that character of the influence isn't really brought out very well. Actually, it's not brought out at all in most cases on Wikipedia. So that's another problem. Well, it's really not so much a problem about Wikipedia as much as it is with the nature of the influence itself. I mean, it's an inherently qualitative measure. And in fact, one of the goals of the study was to kind of work, work around that, try to work around the qualitative nature of the influence. But yeah, back to your question about the limitations of Wikipedia data. So there was that, that the influence of nature was entirely binary. You either assume that the influence was there or it was not there, depending on what was mentioned in the page. The other aspect of information on Wikipedia is that you have to be very careful to interpret the text, the sentence, the context around the influence very carefully. So I mean, in fact, I'll give you a couple of examples. In one instance, I think this was on the page David Letterman's page, where he talks about how Norm McDonald has been one of the greatest comedians that he has run into, but that that kind of a text is really more in the context of Letterman considering Norm McDonald as really a great comedian, not so much an influence. So you have to be careful about creating the text around words such as great comedian or my hero, or anything like that, so it can kind of, you know, the there's a lot of subjectivity involved over there,

John Bailer
You're listening to Stats and Stories. Our guest today is Sachin Date. So you've talked a lot about this idea of an influence network. So help the audience. Picture this. You have a cloud out there, and each comedian is some, I don't know, some unique cloud itself that's connected potentially to others, and those edges that can check them. Those nodes are comedians. The edges are if they hit one influences the other. There's direction here if one is influencing the other. So you've built this from the data. What kind of influences or influencers surprised you most after having built this, this network out?

Sachin Date
Well, okay, so let me kind of give you some examples here. So one of interesting findings was that people such as Charlie Chaplin and Stan Laurel and Oliver Hardy of the Laurel and Hardy fame, they, all three of them, in fact, individually seem to either directly or indirectly influence almost a third of the contemporarily most popular American stand ups who had influences listed on Wikipedia. So I kind of found that to be quite interesting. What that also pointed to was that a lot of the influence was coming from people who were not really stand ups in the currently understood definition of that term, a lot of the influences or influencers were writers, comedic writers, or stage performers, or people like Charlie Sharply, who were clearly not stand ups, not stage performers as such, also, but very accomplished comic actors and directors and producers. So that was one interesting thing. I found another thing worth mentioning is to do with the data about the birth dates of the influenced comedians and their influencers. So as I was kind of tracing out this network, one of the things that I was doing was also capturing the dates of birth of the comedians and their influencers. And what I found was an overwhelming volume, actually almost 100% of the volume, I think, like more than 95%, 95 point, some percent of direct influence volume came from individuals who were at most two generations older than the influenced comedian, and more than half of the direct influence volume on the contemporary most popular American stand ups came from people within the same generation. So it just kind of seemed like a lot of the, I would say, an overwhelming majority of American stand ups are drawing their influence from people who are kind of roughly their age, or not really very much older than them. Now if you also factor in the indirect influences, meaning, let's say comedian a was influenced by comedian B and comedian B was influenced by comedian C, so comedian C indirectly influences comedian A. So I guess that was kind of one of the fundamental assumptions of the paper over there, the birth year to birth year time spans naturally swept across a pretty vast period of time, and that that period of time was like, truly vast. I mean, it was 10 years to more than 400 years, with a median time span of like around three years. So overall, what it was pointing to was that, well, first of all, there was a very strong pattern of influences, like an 80-20 pattern, where a large fraction of the influence was coming from a very small fraction of influencers. And then if you combine that with the vast span of birth year to birth year time spans, if you kind of put those two things together, the kind of the conclusion to draw from that was that most of the contemporarily most popular American stand ups drew their inspiration from A small set of influencers who were themselves, spread across multiple centuries. So that was kind of an interesting thing, an interesting conclusion that I drew.

Rosemary Pennington
I'm looking at your visualizations of the influence chains from William Shakespeare to first Jerry Seinfeld and then to Larry David. And the thing that I was struck by looking at these is that the chain of influence to Larry David seems a little more direct than it seems to have been to Jerry Seinfeld. And I wonder, you know, what do you make of that, given that Seinfeld and Larry David are so, you know, tightly connected as far as comedians and producers. But also, were there chains that influence that you found particularly interesting as you were combing through this what must have been a vast bunch of data?

Sachin Date
That's right. So there's definitely a very large diversity in the structure of the influence chains. Now one thing to kind of keep in mind over there is that the data definitely has some degree of what we could consider as some form of, you know, non response bias, and that's because a large number of comedians simply don't have influence data mentioned about them on their Wikipedia pages. So, that's going to generate some kind of a bias, which is kind of similar to the sort of bias that one encounters on surveys, where people simply don't respond to the survey. So that's missing data bias associated with that kind of missing data. So there could very well be influences which are not represented accurately enough by the crafts that you see in the paper. And that's almost certainly because the data for them is simply not available. But at the same time, there is still, I think, enough data on Wikipedia to draw the conclusion that the influence networks of a lot of these comedians have a lot of diversity in them. Now going back to your question about some kind of interesting features about these graphs. Well, one of the things that I noticed fairly consistently was that Woody Allen seemed to be performing the role of what you might consider as a router of influence. So his position in the influence networks was such that he seemed to be routing over influences from what were essentially writers in the 1800s, 1700s, 1600s all the way back to William Shakespeare, over to the set of modern day American stand ups. So on one side of the craft there were a bunch of writers and humorists and playwrights, and on the other side of the craft were people who were largely American stand up comedians with Woody Allen. The node representing Woody Allen kind of sits in between. So that I found it interesting in the way that it, you know, this pattern repeated so often. The other thing, one other kind of interesting feature I ran into was just the lengths of some of these influence chains. So for instance, I observed like 20 long, really long chains of influence. And they were about, I think, 12 to 15 influences in each chain. And then, as you kind of go back in time, starting with present day comedians like Hassan Minaj or Michelle wolf or Taylor Tomlinson, if you kind of trace back the chains from comedians such as those, you slowly start hitting notes that represented comedians of the American vaudeville era of the early 1900s to late 1800s and then before that come the notes that represent comic writers like James Joyce or Ken Jeong, and then you keep following through on those chains until you kind of finally reach people like William Shakespeare in one instance, and then in another instance, Miguel de Cervantes, the creator of Don Quixote. So that's more than 400 years ago. So that's like more than four centuries of influence carrying over from me, well, the Cervantes, all the way to the 21st century comedians.

John Bailer
So what's next for you? I mean, you know, you've looked at this kind of connection here, of comedians, you mentioned some gaps that were in the Wikipedia study. And I think even in your article, you mentioned Lenny Bruce, not being within this influence graph. Do you have any thoughts of back filling some information that you thought were gaps, or are there sort of next projects that would be associated with these types of investigations?

Sachin Date
So with Lenny Bruce, one of the things I noticed was that a few previous studies on scholarly influence in general, not necessarily our district influences on comedians, but scholarly influence in general, those studies did mention Lenny Bruce. Those Lenny Bruce didn't really appear to be one of the major influences over there, but the moment you kind of look at Lenny Bruce's influence and the context of comedy, it kind of bubbles up to the top very quickly in terms of influence. The interesting thing about that is that there's simply, you know, not a whole lot of data available about some of these comedians, and in some cases, there's a lot of data available about others. So it's quite possible that Lenny's position in the influence structure is very heavily dependent by simply the availability of data associated with the comedian. Now, well, in terms of future work, one of the things I'd like to do is to essentially look at the influence structures of individual comedians and comic actors. So I mentioned Woody Allen. Woody Allen turned out to be a router of influences from writers to stand up comedians. So I'd like to inspect the influence structures around other famous personalities in this space to see if they are also routing over influences in a particular manner, from their influencers to the people who they influence. And then the other kind of natural extension to this study is to go beyond the contemporary, most popular American stand ups, which is what the focus of this study was, and then study all American stand ups, or maybe all comedians who have performed stand up of some kind all over the world, and then inspect the influence structures associated with that much, you know, much more, much more comprehensive set of comedians. So one of those things I've already done is a paper out recently from me, where I've extended this study out to include basically all American stand ups, and then studied the influence structures on that body of comedians. And one of the things I found was that a lot of the results of this paper in significance actually carried through very nicely in that bigger body of American stand ups as well.

John Bailer
Well, I'm afraid that's all the time we have for this episode of Stats and Stories. Sachin, thank you so much for joining us today.

Rosemary Pennington
Yeah. Thank you for being here.

Sachin Date
Thank you for having me.

John Bailer
Stats and Stories is a partnership between Miami University's departments of statistics and media, journalism and film and the American Statistical Association. You can listen to us on Spotify, SoundCloud, Apple podcasts, or other places. You can find podcasts and follow us on LinkedIn and Twitter. If you'd like to share your thoughts on the program, Send your email to stats and stories@miamioh.edu or check us out at stats and stories.net and be sure to listen for future editions of Stats and Stories, where we discuss the statistics behind the stories and the stories behind the statistics.


Math and Music | Stats + Stories Episode 317 by Stats Stories

Long after Harry Nilsson said, “one is the loneliest number,” and after Bob Seger sang about feeling like a number, music streaming services are using data to help of discover new music that connects to our frequent plays and preferences. Dr. Kobi Abayomi helps break that all down in this episode of Stats+Stories.   

Read More

Careers in Rom Coms | Stats + Stories Episode 264 by Stats Stories

Romantic comedies are rife with plucky heroines. Small bookstore owners are being pushed out by big corporations, runaway brides, and Perpetual bridesmaids. But where are the scientists, microbiologists and engineers, and statisticians? One researcher went looking for them, which is the focus of this episode of Stats+Stories with guest Veronica Carlan. 

Read More

Predicting the Weather with Pietro the Weather Tortoise | Stats + Stories Episode 225 by Stats Stories

Meteorologists go to school to be able to predict the weather accurately, but for some people, weather prediction is a hobby. Maybe they have a trick knee that hurts when it rains or perhaps they know when a storm is coming by how the birds at their feeders are behaving. Some lucky folks have pets that can help them figure out what the weather is going to do and that’s the focus of this episode of Stats and Stories with guest Connor Jackson.

Read More

The Best Friend on Friends | Stats + Stories Episode 220 by Stats Stories

Since the 1990’s people have been trying to figure out who’s the best friend. Is it Chandler because of his dry wit? Phoebe because of her unabashed enthusiasm? Joey because his loyalty? Well, leave it to statistics to give us a firm answer. Who’s the best friend from the show Friends is the focus of this episode of Stats and Stories with guest Mathias Basner

Read More

A Not So Standard Podcast | Stats + Stories Episode 212 by Stats Stories

Our lives are increasingly shaped by statistics and data. However, they remain concepts that can be difficult for broad audiences to understand. A number of outlets, including this one, have sprung up to help make them more accessible. Today another one, the “Not So Standard Deviations” podcast is the focus of this episode of Stats+Stories with guests Hilary Parker and Roger D. Peng.

Read More

#MemeMedianMode Contest Winner! | Stats + Stories Episode 200 by Stats Stories

At Stats+Stories we're lucky to have listeners who put up with John's bad jokes and our general shenanigans. In fact, you've listened to 199 discussions of the statistics behind the stories and the stories behind the statistics. To mark our 100th episode we asked you to submit statistical headlines and a haiku won. For 200 we took to Twitter using the #MemeMedianMode hashtag and this time those that rose to the top actually memes. Today we're talking to the creators of our top two.

Nynke Krol (@krol_nynke) is a statistician at statistics Netherlands who also submitted a stance mean that caused both, John and Rosemary, to actually laughed out loud when they saw her take on data normality.

Eric Daza (@ericjdaza) is a data scientist statistician who focuses on digital health, he submitted several means to our mean, median, mode contest, including one that made me flashback to my first graduate class in research methods, on causation/correlation.

Read More

The "Key" to a Successful Kickstarter | Stats + Stories Episode 197 by Stats Stories

About 20 years ago, most people would have been unfamiliar with the term crowdfunding. Now, when it comes to the arts, you can crowdfund anything from comic books to Mystery Science Theater 3 Thousand to musical compositions. What it takes to successfully crowdfund a rock project is the focus of this episode of Stats and Stories with guests Moinak Bhaduri, Dominique Haughton and Piaomu Liu.

Read More

The Stats of the Decade | Stats + Stories Episode 120 by Stats Stories

Iain Wilton directs the Royal Statistical Society’s policy, public affairs and external relations work. His team’s responsibilities include the production of the RSS member newsletter, Significance magazine and the RSS’s policy briefing papers for MPs and peers. Iain’s team also organises the All-Party Parliamentary Group on Statistics as well as the RSS Statistical Ambassador network and the annual Statistical Excellence Awards. Iain has a doctorate from Queen Mary, University of London and has previously worked for the BBC, the Cabinet Office and the University of Essex. He has also written a biography of the sportsman, writer and politician CB Fry.

Read More

What Do Seinfeld, The Tonight Show And Stats+Stories Have In Common? | Stats + Stories Episode 7 (REPOST) by Stats Stories

Rick Ludwin was hired by NBC Entertainment in 1979 and made director of variety shows there in 1980. He then became vice president for specials and variety programs in 1983; senior VP for specials, variety programs and late-night in 1989; and executive VP for NBC’s late-night and prime time series in 2005. In its 57 years, The Tonight Show has had five permanent hosts, and Rick has been the boss of three of them. His late-night division at NBC developed the hit comedy Seinfeld. Rick, a 1970 Miami University grad, joined the Stats+Stories regulars to discuss the use and impact of ratings on television programming

Read More

How Esports Stats are Tracked | Stats and Stories at JSM by Stats Stories

Brian McDonald is currently the Director of Sports Analytics in the Stats & Information Group at ESPN. He was previously the Director of Hockey Analytics with the Florida Panthers Hockey Club, an Associate Professor in the Department of Mathematical Sciences at West Point, an Adjunct Professor in the Department of Management Science at the University of Miami, and an Adjunct Professor in Sports Analytics in the College of Business at Florida Atlantic University. He received a Bachelor of Science in Electrical Engineering from Lafayette College, Easton, PA, and a Master of Arts and a Ph.D. in Mathematics from Johns Hopkins University, Baltimore, MD.

Read More

Using the Stats to Improve Your League of Legends Game | Stats and Stories at JSM by Stats Stories

Michael Schuckers is the Charles A. Dana Professor of Statistics at St. Lawrence University in Canton, NY. An applied statistician he has received funding from the US National Science Foundation, the US Department of Defense and the US Department of Homeland Security. He is the author of over three dozen publications including Computational Methods for Biometric Authentication (Springer, 2010). Additionally, Schuckers has done work in sports analytics particularly ice hockey including consulting with a MLB team and an NHL team. For his work in this area, he was named a American Statistical Association's Section on Statistics in Sports "Significant Contributor".

Read More

The Statistics of the Year | Stats + Stories Episode 76 by Stats Stories

David Spiegelhalter pic.jpg

David Spiegelhalter is Winton Professor for the Public Understanding of Risk in the Statistical Laboratory at the University of Cambridge, Chair of the Winton Centre for Risk and Evidence Communication, and President of the Royal Statistical Society.

+ Full Transcript

(Background music plays)

Rosemary Pennington: As 2018 winds down, everyone from social media users to mainstream media outlets are releasing their lists of top albums, top books or top films of the year. Earlier this month the Royal Statistical Society got in on the action by announcing its statistics of the year. That's the focus of this episode of Stats and Stories, where we explore the statistics behind the stories and the stories behind the statistics. I'm Rosemary Pennington. Stats and Stories is a production of Miami University's departments of Statistics and Media, Journalism and Film as well as the American Statistical Association. Joining me in the studio are regular panelist John Bailer, Chair of Miami Statistics department and Richard Campbell of Media, Journalism and Film. Our guest today is David Spiegelhalter, I should say maybe Sir David Spiegelhalter, Chair of the Winton Center at the University of Cambridge. He's also the president of the Royal Statistical Society or RSS, which as I said just announced its choices for statistic of the year and I want to point out that he's the first three-time guest on Stats and Stories…

John Bailer: That should have been statistic of the year!

(Collective laughter)

(Vocies overlap)

Pennington: David thank you so much for being here today.

David Spiegelhalter: A great pleasure to be back again!

Pennington: Why choose a statistic of the year in the first place?

Spiegelhalter: Well you know we are statisticians, we think statistics are immensely important and we launched this last year as an experiment just to see if it would catch on and we were amazed at the interest in it. We’re in print, the popular radio programs and we don't just do a statistic of the year, we got 10 of them and people loved the variety and the choice so we thought we’d do it again!

Bailer: What was the criteria that you used? I mean you said you had hundreds of submissions so how do you…

(Voices overlap)

Spiegelhalter: We got hundreds of submissions. The first criteria was that it was faintly true.

(Collective laughter)

Richard Campbell: How many did you get rid of?

Spiegelhalter: Some of the entries were the old joke, you know, 95 percent of all statistics are made up. We expect those, but unfortunately that's actually one of the truest statistics judging by the entries we got, because they come in and they sound very impressive but then you start doing the fact checking and so many of them just don't stand up. I suppose this is not news to anybody. There's a lot of fake news around the world, a lot of false claims being made, a lot of them statistical and we ended up getting sent these. So we've had to do some serious filter to try to get things that we actually think are fairly accurate.

Bailer: So after you kind of filtered out the fake, how did you pick among the real?

Spiegelhalter: Oh, very difficult. We got good panel, we got journalists we got all sorts of official statisticians, you know, with some difficulty. We wanted a variety. We didn’t want them all gloomy. You know you could pick 10 gloomy statistics. We don't want to reinforce the impression that statisticians are all just such miserable people. And we wanted ones that were covered also, so some of the stories that you know, we know have been going on throughout the year. I should say the one thing that we haven't got is a Brexit statistic, but that's you know our own local problem that we're having to deal with.

Pennington: I remember that it’s so local. Do you remember that?

(Collective laughter)

Bailer: So I guess there are…how many did you pick? You picked 2 winners and how many runner ups?

Spiegelhalter: Yep, we've got two winners and then 8 runners up, highly commended statistics.

Bailer: So one of the things that I was curious about is, you know, there's lots of ways to report a statistic. And so I’m going to let you talk about some of the ones that you picked, I'm curious about the winners. I think…we know the people are just sitting on the edge of their seat, waiting to hear this result. So after you talk about the winners, I'd be curious for you to comment a little bit about why this representation versus some other representation of the story was compelling to you.

Spiegelhalter: Exactly. I mean it's terribly important because I know, we all know that we can make any number big or small, depending on how we frame it, what comparisons we make, what units we use and so you know we would try to frame them I think in a way that is most realistic. So when we do the winner, we actually reported into multiple frames in order to get a more balanced feeling about it.

Campbell: When you did this last year for the first time, I know the statistic, the international statistic I think, or was it the American statistic or U.S.?

Spiegelhalter: It was the international.

Campbell: The international one, that was in the Huffington Post and Kim Kardashian picked it up. How much news generation came out of that?

Spiegelhalter: You know we got a lot of coverage that was about essentially over the last 10 years. I think our main statistic was the number of U.S. citizens killed by lawn mowers over the years. That of course was just a hook to try to draw people in, just compare the number of people that are being killed by you know immigrant jihadist terrorists which is on an average of 2 a year, compared with the number for example killed by fellow Americans.

Campbell: Yes, and that was 11000.

(Voices overlap)

Spiegelhalter: So these are very stark figures, and we received some criticism about that, you know and I can see why, because it suggests well, that's the future risk. We didn't mean that, it's the past rates that what has happened. These are the statistics of last year, they are not predictions about what's going to happen next year.

Pennington: So what are your statistics this year for the international side of things and I know you also identify U.K. one as well. So what are the winners?

Spiegelhalter: OK the International one is a slightly negative one, it was more than negative. So it's 90.5 percent. And that's the proportion of plastic waste that has never been recycled. We also frame it to say well 9.5 percent has been recycled but still not a very large number given you know you're talking about you know 6000 million metric tons of plastic that’s actually not in use anymore, that has been got rid of. And so you know that means that only 10 percent has being recycled and out of the rest of it, about 12 percent is being incinerated and the rest is just lying around at landfills or will be dumped in the environment and you know I'm sure that in the States, certainly in the U.K., plastics has received a lot of attention this year….Blue Planet, these pictures, whales and fish and things like that with all this plastic in them, and this has become a very strong story. And then this was a really strong study done from the University of California, you know published in Science Advances. They made this assessment of the amount of plastic that was not being recycled.

Campbell: So I'm a general listener and say, I am watching cable news in America and I see the statistics come on and I'm saying, OK. How do they know 9.5 percent of the plastic waste has never been recycled? So I'm putting you on the spot here. So how would we respond to that? Because we get a lot of that, you know, people not believing in statistics and certainly not willing to do the work to find out where that information came from.

Spiegelhalter: Actually it was reported in a UN paper, in a report but it comes from a published paper in Science Advances from 2017 and kind of….Oh interesting! So they got plastic production data. They can get that from industrial production system statistics and then they can look at product lifetime distributions from eight different industrial use sectors. So by breaking it up into the different sectors, packaging and so on and then they have got data on how long within each sector plastic is in use, and then by knowing about the productions they can work out how much plastic is out there. So that's how they work out that you know only 30 percent of plastic ever produced is currently in use. That means 70 percent has gone and then…I'm just trying to work through how did they get at the amount that’s being recycled and they know from other sources…then they look at the recycling rates broken down around the world, from Europe and China. And in the United States plastic recycling has remained steady at 9 percent since 2012. So essentially I can stop and do this again. It's really cool. So they build a big model. First of all the model for plastic production, looking at industrial data. Then a model for how long plastic is in use. That enables them to estimate how much plastic is actually in use at the moment, which is you know, at least 30 percent of what's being produced. And then by looking at incineration and recycling data from different countries, they can work out how much is being recycled out of everything that's being produced and is not in use anymore.

Bailer: So a natural question is, you just described the models, that's estimating a lot of components. And you know, none of these things are known, and so there's uncertainty associated with all of this and you know what would you say when people say well, by reporting a single number that perhaps this is conveying an overly strong sense of precision?

Spiegelhalter: And I would completely agree. And that it would be much better to give a range of these numbers at a minimum. Actually I believe the giving ranges would make it more trustworthy and happier, having a range than a single number. I mean one can qualify it by saying around or an estimate and so on. So they’ve got a relative measure of plus or minus about 6 to 7 percent, which isn't too bad. So that would only take it, if the total is 10 percent, you know you might say the total is between 8 and 12 percent for something that’s being recycled.

Bailer: OK. I just think math is such an important point. All the time we see the kind of headline statistics, there's always in my mind, kind of two things that come - one is, you know, how well do they know this number, and then even when you have some of these other components like the 63 million metric tons, do people have a sense of how much that represents?

Spiegelhalter: Yeah, these are just big numbers. What does it mean? And that's why people will be so much more influenced by seeing a picture of a turtle you know with his head through a piece of plastic or something like that, what drives the emotional reaction to these things. You know what does that 63 million metric tons mean? It is extremely difficult to judge. I mean one way of course is to do it at per head of population, for a million people in the world, that’s a ton each, that’s enormous so I. So I think there is a problem with all these big numbers. It is amazing it is almost exactly a ton each of plastic for each person that is no longer in use. Wow! That debate is more impressive than the 6000 million metric tons which I haven't got a clue what that means!

Pennington: You're listening to Stats and Stories and today we're talking about the statistics of the year according to the R.S.S. with society president David Spiegelhalter. I'm going to ask you to talk now about the U.K. stat of the year, because I think it's interesting that both of these statistics of the year are somehow related to environmental concerns.

Here now 12:16

Spiegelhalter: That was a deliberate choice and we’ve also chosen one negative and one positive there. The U.K. one is a positive environmental one, that on the 30th of June the 28.7 percent is the figure and that's the peak percentage of all electricity produced in the U.K. is solar power, on the 30th of June. So that means that amazingly for this certain country, solar power was the biggest producer of electricity. Briefly, extremely briefly and that number is exact. That is a true statistic. But of course it was only brief, but it's a staggering change from you know, when it was so low, nobody thought about it 10 years ago in this country.

Bailer: So could you give us the list of kind of the highly commended statistics international?

Spiegelhalter: Yes. We've got some very positive and negative ones. The positive one is that in spite of all the stories you know that we hear about the decline of living standards in the West, worldwide the percentage of the population that it considers living in absolute poverty, has more than halved since 2008, that’s in the last 10 years. It has gone down from 18 percent to essentially 9 percent and it is a quite extraordinary benefit that this happened to people. And this isn't a story that makes the international news, that far fewer people are living in absolute poverty than 10 years ago.

Bailer: And then, just as a…well before we go to the other ones, I had a question for you in terms of reporting this. When I saw it, I was wondering if 50 percent reduction in absolute poverty would be a more impressive statistic to me than 9.5 percent...

Pennington: Yeah, maybe.

Spiegelhalter: No exactly. We chose deliberately to use the percentage point reduction. Then we can say it’s halved, essentially, but in this case we would have a bigger emotional hit to say poverty has halved in the last ten years. But we want to do this statistic, which is the percentage point reduction. We could frame this and give it a stronger emotional hit, but we chose not to.

Bailer: You are a risk difference guy, than a risk ratio guy here.

Spiegelhalter: Yeah exactly I believe in absolute risks, absolute proportions. We know that relative risk, relative changes can be highly manipulative. The way in which to communicate changes over time.

Campbell: Is part of the statistics of the year to how much behind the final decisions is what's going to attract a news story? We need to get people interested in and learning about statistics. What's going to get the New York Times to cover this, what's going to get the British press to cover this?

Bailer: Yes. There is a trade-off there. We can't just have a whole lot of negative stories and they can’t be too dull. We want them interesting, but at the same time they can’t all be about celebrities or whatever. Last year's was quite a nice mix. We couldn't find quite like that this time. We want good news stories but we also want ones that are just important and frankly ones that have a story that’s not generally being told, rather than just the celebrity stories. The stories about poverty being halved in the last 10 years, nobody's written a story about that this year. That’s not in our news.

Bailer: So you had 3 more that were in your highly commended group. So you want to just run through them real quickly and then we can…

Spiegelhalter: Yeah. Well the second one, I think this is terribly important. Amazingly from November 2017 to October 2018, the number of measles cases in Europe which is 64,946, nearly 65000 measles cases and 2 years ago it was 4000 cases.

Bailer: Oh my goodness!

Pennington: Wow!

Spiegelhalter: Isn’t that staggering? That’s 15 fold rise in 2 years. This is really terrifying. This is very serious indeed and we know why, because of all the stories about vaccines are giving kids autism, in spite of being disproved and in Britain we've recovered from that story, largely because we've exported Andrew Wakefield to the states.

(Collective laughter)

(Voices overlap)

Spiegelhalter: But the number of anti-vaccine websites and the fact that this has become politically acceptable, for example in Italy, major parties are arguing against vaccination. This is very dangerous, and you know the kids will die, and you know this is a really bad story.

Bailer: And so then the next one related to the Russian men.

Spiegelhalter: Yeah this is really extraordinary. This year Russia raised the retirement age for men from 60 to 65. Unfortunately for Russian men, 65 is their current life expectancy. It's only just above that, so it's estimated that 4 in 10 Russian men, 40 percent, will actually die before they get to that pensionable age, which is quite troubling compared with say the US, you know, that 80 percent men will get their retirement age and in U.K. 87 percent men will live past 65. I’m 65, I’m just taking my pensions. I’m a lucky one of those 87 percent.

Bailer: I really like that part of when you're reporting out the idea of putting that context. You know when people think about that, when you first report that 40 percent, which is that? Is that big or is it little? Then they are given that other example with the U.K. and US, I find that a really nice part of contextualizing the story.

Spiegelhalter: Yeah, so it's still in the U.K. 20 percent of men, 21 percent won't do it. So you know it's about half the figure in the U.K. About 13 percent won’t make the retirement age. So you know it is bad but in Russia then that's 3 times that, right? Which is very high. You need the international context with that data.

Bailer: And how about your last one?

Pennington: Kardashian, I guess.

(Collective laughter)

Spiegelhalter: This was a bit of a celebrity. 1.3 billion, this is extraordinary. The amount wiped off Snapchat’ value within a day of one Kylie Jenner’s tweet. So this is a bit of a flagrant appeal to populism. You know just a brief tweet that she made in February 2018…so does anyone else not open Snapchat or is it just me? Oh yeah. 367,000 likes! I mean, it is extraordinary. I mean there are other things that were changing about Snapchat also. Again we've got to be careful with drawing you know a causal pattern with certain decisions we know we can't draw straight causal pattern. But this is too good a story to miss.

Pennington: Yeah. I had a question about the U.K. statistics and maybe we can talk about some of the other highly commended ones but I wanted to ask about Jaffa cake.

Spiegelhalter: Oh yes.

(Voices overlap)

Pennington: …to explain what exactly they are and why this is noteworthy statistic for people who are not in the UK.

Spiegelhalter: There are kind of a form of biscuit, but they had to go to court to claim they are a cake because and they didn't have to pay a VAT tax on them if they will call it a cake. It is a type of biscuit with a soft bottom but a chocolate top with a bit of sort of orangey you know jammy stuff inside as well. I love them! They are a real sugar rush. I love them, I have to keep them out of my way. They normally sell them in smallish boxes but at Christmas they release what used to be called a yard of Jaffa cake which was 36 inches long, an old yard.

Pennington: A lot of sugar!

Spiegelhalter: Last year that contained 48 Jaffa cakes. Well, now it only contains 40 Jaffa Cakes, the cakes generally are of the same size, you just get less of them in your box, and actually the boxes shrunk and they couldn't fit it into a yard anymore. So now they have to call it a sort of Christmas cracker or Jaffa Cakes, and you know what this is? The end of Jaffa Cakes. They are incredibly…they sell billions of these things. I know I love them. But some say…but this is just the one example of you know the shrinking size of products, that you could say this is a good thing. It could be a great thing if people didn't eat so many Jaffa cakes. You know Mars Bar and other things will go smaller, this is a very good thing. Portion control is incredibly important, it’d be wonderful if people didn’t eat so much. But the price has gone down.

Bailer: I love the way you describe it as shrink inflation too!

Spiegelhalter: Shrinkflation, yeah! And Toblerone got a lot of interest last year as well when they reduced the size of the chocolate but not prices. So this is not a matter of perhaps global importance but some people notice this kind of thing, and again it made a good news story, where they got a lot of coverage.

Bailer: Were you surprised that the one report about the amount of shopping that was in store versus online?

Spiegelhalter: Yeah, this is the issue where you to decide, what about the framing of this? Do you frame it as saying that 18 percent of all shopping is now online, you know the big one in five spending online…or do you frame it as 82 percent of shopping is still done in the shop, rather than online. Do you do a positive or a negative frame? Because I have seen this story reported in both ways. Actually for us, we found it quite surprising that given you know the huge publicity around the rise of online shopping, the closure of so many shops in the high street now, I thought it is going to a big effect of this. I'm surprised it was at 82 percent. But then again of course you've got food, you’ve got a lot of stuff that’s not done online as well. But still you know 82 percent is still done by people walking to shop and paying.

Campbell: How that compares with the U.S.? It would be interesting to see that!

Spiegelhalter: I don’t know what that in the US is.

Campbell: It seems very high.

Pennington: That does seem high.

Campbell: Here in the US, everybody is using Amazon here you know.

Spiegelhalter: Yeah, well people use that here as well, you know. It was a huge amount as well so…I don't know the U.S. figure, I’m about to find that out.

Bailer: You know, the other one of the stories that the commended stats related to, the trains running on time and you know, we all thought well, all of us do travel and you know, it is about rail travel, but I was wondering how the rail travel in Great Britain compared to that in Europe, or how it might compare to air travel…I was thinking about some of this contextualizing and framing this too.

Spiegelhalter: Yeah, we really should. Again I think that's a very good point that we need to look at because the reason why that story's in here is that we have an utter disaster this year with regards to trains. They introduced a new timetable, that wasn’t planned properly, huge numbers of cancellations, absolute chaos and there were strikes as well. So I mean this 86 percent of trains are running on time is terrible because they know this must be above 95 percent of the time, that’s what they claim to be able to do, and that's where you can start getting compensation as well. They paid out a fortune in compensation. I was travelling on trains in the summer and you know they were just announcing on every train to tell you how to claim compensation. I was making the claim even before the train came in, got to my destination. I had my online compensation claim that I submitted so it was absolute shambolic. So this is far worse than it generally is, it’s the worst for nearly 15 years in this country, it has been quite recently late. But I don't know the international comparisons. That’s something I should find out. But actually it was so noticeable this year, the whole system really fell apart in the summer.

Pennington: We are starting to be getting ready to wrap up, but I do, before we go want to ask you about this. The first listed commended statistic for the U.K. about female executives of 250 companies.

Spiegelhalter: Yeah, so that's the figure 6.5 percent, which is 6.4 percent. Sorry, the figure is 6.4 percent, which is the percentage of female executive directors within 250 companies, especially the big companies in the U.K.. And the gender pay gap has been a massive issue in this country, because this country for the first time by law, in larger and medium sized employers after they pull gender pay gaps. Unfortunately those are just reported as what women get paid from what men get paid. And we were going to use those figures but actually they’re not…they can be very misleading because it includes many women who are in part time work, they are not adjusted for the kind of work. So what we want to do is to pick a job in which you know everyone is roughly comparable and then looking at what’s the percentage of female and it's extraordinarily low. And it doesn't seem to be getting any better. I mean it changed, it went from 38 to 30 in a year. I don't think that’s really statistically significantly different, but it's certainly no indication of things getting bigger.

Pennington: So that’s all the time we have for this episode of Stats and Stories. David, thank you so much for being here, it has been a really interesting conversation today.

Bailer: Always a pleasure David, I still think three should have been on there, number of times David Spiegelhalter has been on Stats and Stories.

Spiegelhalter: It’s going to be okay, I seem to be fumbling around as you see! You can tell it's the first interview I've done on this, I'm going to do a bit more preparation, some background on them. Yeah but that was very helpful to me in fact!

Campbell: Good!

Pennington: Stats and Stories is a partnership between Miami University’s departments of Statistics and Media, Journalism and Film, as well as the American Statistical Association. You can follow us on Twitter, Apple podcast or other places you can find podcasts. If you'd like to share your thoughts on the program send your email to statsandstories@miamioh.edu you can check us out at Statsandstories.net and be sure to listen for future editions of Stats and Stories, where we discuss the statistics behind the stories and the stories behind the statistics.


Better Bayes Winner Revealed | Stats and Stories Episode 73 by Stats Stories

Stephen T. Ziliak is Professor of Economics at Roosevelt University and Conjoint Professor of Business and Law at the University of Newcastle-Australia.  A major contributor to the American Statistical Association “Statement on Statistical Significance and P-values” (2016) he is probably best known for his book (with Deirdre N. McCloskey) on The Cult of Statistical Significance: How the Standard Error Costs Us Jobs, Justice, and Lives (2008), showing the damage done by a culture of mindless significance testing, the history of wrong turns, and the benefits which could be enjoyed by returning to Bayesian and Guinnessometric roots.

Read More