Music Streaming Statistics | Stats + Stories Episode 354 by Stats Stories

Chris Dalla Riva is an analyst for the music streaming service Audiomack by day while spending his nights writing and recording music and writing about music for his newsletter Can’t Get Much Higher.

Check out the Full Article in Significance Magazine

Episode Description

Artists of today are still making albums, however with so much emphasis being put on streaming charts how many of today's album streams are being made up by a few hit tracks? That distinction is the focus of today's episode of Stats and Stories with guest Chris Dalla Riva.

+Full Transcript

Coming Soon


The Statistical Kings of Comedy | Stats + Stories Episode 348 by Stats Stories

Sachin Date works for VitalEdge Technologies and has, over his career, worked in two research labs, three software companies including two product companies, and in a classroom. He has built and delivered all kinds of software including massively distributed discrete-time simulations, data science stacks, a new programming language, and dozens of mobile apps, including the world’s first Napster app for Blackberries. Along the way, Sachin taught 100 liberal arts majors how to program in BASIC and built a mobile applications practice from scratch.

Episode Description

A journalist, statistician and sound engineer walk into a bar. Well, well, actually, to a studio to record a podcast. Comedians have been a source of great amusement and delight over generations. Popular comedians can earn a great deal from their live shows. In 2023 billboard reported that Kevin Hart earned 67, and a half 1 million dollars from 82 shows with 631,000 tickets sold. Comedies are also a popular genre for television and movies. One of the most successful shows, Seinfeld, created by Jerry Seinfeld and Larry David ran from 1989 to 1998. Have you ever noticed an echo of one of your favorite comedians from the past in the work of a comedian today that’s the topic of this week’s episode of Stats+Stories with guest Sachin Date.

+Full Transcript

John Bailer
A journalist, statistician and sound engineer walk into a bar…well, actually, to a studio, to record a podcast. Comedians have been a source of great amusement and delight over generations. Popular comedians can earn a great deal from their live shows. In 2023, Billboard reported that Kevin Hart earned 67 and a half million dollars from 82 shows with 631,000 tickets sold. Comedies are also a popular genre for television and movies, one of the most successful shows, Seinfeld, created by Jerry Seinfeld and Larry David, ran from 1989 to 1998. Have you ever noticed an echo of one of your favorite comedians from the past in the work of a comedian today who may have influenced Seinfeld or David? How would you know? Stay tuned, and you will get your question answered on this episode of Stats and Stories, where we explore the statistics behind the stories and the stories behind the statistics. I'm John Bailer. Stats and Stories is a production of Miami University's departments of statistics and media, journalism and film, as well as the American Statistical Association. Joining me is regular panelist, Rosemary Pennington, chair of the department of media, journalism and film at Miami University. Our guest today is Sachin Date. Date works for Vital Edge Technology. His career has included work in two research labs, three software companies, including two product companies and in a classroom. He has built and delivered all kinds of software, including massively distributed, discrete time simulations, data science stacks, new programming languages and dozens of mobile apps, including the world's first Napster app for Blackberries. I remember Blackberries and Napster too. For that, he has also taught 100 liberal arts majors how to program in basic and build a mobile applications practice from scratch. Date’s recent Significance article entitled that Shakespeare influenced Seinfeld provides the background for our conversations today. Thank you so much for joining us today.

Sachin Date
Thank you for having me, John.

John Bailer
So what is it? What inspired you to embark on this project? Right?

Sachin Date
So I didn't actually start with the intention of establishing the patterns of influence between specific comedians and their influences. What really happened was, I was browsing through the Wikipedia pages of some of the comedians I follow, and I quickly discovered that a lot of these pages have material on them that seem to indicate that the comedian was heavily influenced by other comedians, and sometimes not necessarily other comedians, but also writers and a lot of other, you know, kinds of people, like family members and friends and so forth. So I clicked on the links of some of these influences, particularly the influences of influences that came from other comedians, and I discovered that the Wikipedia pages of those influencers also contained information about whom they influenced. So I clicked on those links. And then I kind of kept on going back in time, until I ran into Wikipedia pages of writers in the 18th century, 17th century, 16th century. At one point, I opened a Wikipedia page of Shakespeare, William Shakespeare, and I realized that I had actually basically followed the links through from someone who is alive today in the 21st century, and then kind of transported myself back in time all the way to William Shakespeare. So that made me wonder, well, how common is this pattern? Are there other comedians who also have influence data listed on their Wikipedia pages? So I kind of started clicking around, and I discovered that a lot of comedians actually have this kind of data on their Wikipedia pages. Additionally, the Wikipedia pages of very influential comedians like Richard Pryor, for example, or John Carlin, have legacy sections on them which contain information about whom they have influenced. That's kind of part of their legacy. So there's those backlinks also to be followed. So I figured, well, let me actually see if I can do a systematic study of this topic. But when I started doing that, I realized that, well, the number of comedians involved is very big. Wikipedia itself has about, I think, 50 to 100 different categories devoted to comedy. So I figured, well, let me, let me, kind of just put a circle around my research. I'll focus only on the comedians who are contemporarily the most popular comedians in America today, and then I'll start tracing the links back from that set of comedians. And let me see how far back in time and how widespread those things kind of get. And that's kind of, you know, what motivated the research on that topic.

Rosemary Pennington
How did you determine who were the top comedians working today?

Sachin Date
Yeah, so I was interested in the way of finding that information, what I thought I would do and not actually work remarkably well was that I ran a couple of well, actually, I ran three pretty straightforward Google searches. So the search text basically went: most popular American comedians in 22x where that X was either one or two or three. So basically, the most popular American comedians in 2021, 2022, 2023 I figured, well, the last three years could be considered as kind of the window or the most popular contemporary comedians. So sure enough, Google showed a lot of search results. So I tweaked those results by setting the time frame filter to include only the results that were published in the October through December timeframe. So as soon as I did that, that brought forth research that was really more focused toward the end of the year, rankings and less and ratings that were available on the internet. And then I started going through those research, and sure enough, there was a large amount of diversity in there. So for each one of those three years, 2021, 2022, 2023, what I did was I essentially identified about 10 different types of sources, and I tried to keep those sources as different from each other as possible, just to kind of, you know, reduce the bias and improve the diversity in the data. So that gave me essentially a mass of comedians to work with, and then I merged that data, and then kind of arrived at the list of what I consider to be the most popular contemporary American stand ups.

John Bailer
So let's name some names. So who are some of the comedians that you ended up including kind of from this, this three year window?

Sachin Date
Well, there was Jerry Seinfeld, of course, and then there was Hasan Minhaj. Well, let's see. There was John Mulaney and Taylor Tomlinson, David Chapelle. A lot of the same, you know, same set of people started repeating in those names and those things. So one thing that kind of was common amongst them was that a lot of them were very active in stand up comedy. I mean, not just now, but I mean just, you know, three years ago, four years ago, 10 years ago. So they've been doing stand up for a long period of time.

John Bailer
So how many different comedians did you identify in this collection? I mean, once you filtered it based on you said that they were American comedians in this time window that were identified in October through December of these three years. So what was the total number of comedians that you included to start building this connection of influence?

Sachin Date
So the three sources that I ran those searches produced several 100 different comedians, and once, I kind of twittered out all the ones that were not US persons, because my focus was only on American stand ups, so I filtered those out. Then I also filtered out comedians which did not have Wikipedia pages, because my study was really kind of just focused on data that came from Wikipedia. I also filtered out comedians who had not really performed any kind of stand up or improv or sketch comedy. So once all those filters were applied, I narrowed the space down to about 100-175 to 200 comedians. So, that was kind of the social network of comedians that I started with. Now, this was the set of the most popular contemporary comedians as of the end of 2023, now, of course, a lot of those the Wikipedia pages did not have the influence data on them. In fact, I think for over 100 of those 175 or so comedians, there was no good data available on Wikipedia on who influenced them. So those were really isolated nodes in the network, and then the balance set of comedians who had that data, I kind of followed the links back in time and also across in space to build a social network. So in the end, I basically ended up with about 64 to 70 comedians who had a lot of influence data associated with them, and then the social network was kind of based off of that set. The overall network, once you kind of factored in all the influences on those comedians, the overall network of influences ran up to 200 and about 250 to 260 nodes and around 700 of influence.

Rosemary Pennington
What concerns did you have about using Wikipedia data?

Sachin Date
Right? Yeah, so Wikipedia, on one hand, most of the data that's mentioned on Wikipedia is referenced very nicely. So that's kind of one advantage you get from using Wikipedia data, that you can follow through the reference links and just kind of verify that the influence that is mentioned on the page actually does ring true. The text talking about the influence, it is actually a valid influence, but it kind of links through to some article somewhere that mentions how the comedian actually was influenced by someone else. On the other hand, with Wikipedia, there is really no way for you to know the strength of the current strength of the influence, so you're forced to consider that influence as a binary variable, so either the influence is there or the influence is not there. But in reality, of course, influence is much more complex than that. Someone could be influenced by someone else, very heavily in the past, but not really so much anymore. And that character of the influence isn't really brought out very well. Actually, it's not brought out at all in most cases on Wikipedia. So that's another problem. Well, it's really not so much a problem about Wikipedia as much as it is with the nature of the influence itself. I mean, it's an inherently qualitative measure. And in fact, one of the goals of the study was to kind of work, work around that, try to work around the qualitative nature of the influence. But yeah, back to your question about the limitations of Wikipedia data. So there was that, that the influence of nature was entirely binary. You either assume that the influence was there or it was not there, depending on what was mentioned in the page. The other aspect of information on Wikipedia is that you have to be very careful to interpret the text, the sentence, the context around the influence very carefully. So I mean, in fact, I'll give you a couple of examples. In one instance, I think this was on the page David Letterman's page, where he talks about how Norm McDonald has been one of the greatest comedians that he has run into, but that that kind of a text is really more in the context of Letterman considering Norm McDonald as really a great comedian, not so much an influence. So you have to be careful about creating the text around words such as great comedian or my hero, or anything like that, so it can kind of, you know, the there's a lot of subjectivity involved over there,

John Bailer
You're listening to Stats and Stories. Our guest today is Sachin Date. So you've talked a lot about this idea of an influence network. So help the audience. Picture this. You have a cloud out there, and each comedian is some, I don't know, some unique cloud itself that's connected potentially to others, and those edges that can check them. Those nodes are comedians. The edges are if they hit one influences the other. There's direction here if one is influencing the other. So you've built this from the data. What kind of influences or influencers surprised you most after having built this, this network out?

Sachin Date
Well, okay, so let me kind of give you some examples here. So one of interesting findings was that people such as Charlie Chaplin and Stan Laurel and Oliver Hardy of the Laurel and Hardy fame, they, all three of them, in fact, individually seem to either directly or indirectly influence almost a third of the contemporarily most popular American stand ups who had influences listed on Wikipedia. So I kind of found that to be quite interesting. What that also pointed to was that a lot of the influence was coming from people who were not really stand ups in the currently understood definition of that term, a lot of the influences or influencers were writers, comedic writers, or stage performers, or people like Charlie Sharply, who were clearly not stand ups, not stage performers as such, also, but very accomplished comic actors and directors and producers. So that was one interesting thing. I found another thing worth mentioning is to do with the data about the birth dates of the influenced comedians and their influencers. So as I was kind of tracing out this network, one of the things that I was doing was also capturing the dates of birth of the comedians and their influencers. And what I found was an overwhelming volume, actually almost 100% of the volume, I think, like more than 95%, 95 point, some percent of direct influence volume came from individuals who were at most two generations older than the influenced comedian, and more than half of the direct influence volume on the contemporary most popular American stand ups came from people within the same generation. So it just kind of seemed like a lot of the, I would say, an overwhelming majority of American stand ups are drawing their influence from people who are kind of roughly their age, or not really very much older than them. Now if you also factor in the indirect influences, meaning, let's say comedian a was influenced by comedian B and comedian B was influenced by comedian C, so comedian C indirectly influences comedian A. So I guess that was kind of one of the fundamental assumptions of the paper over there, the birth year to birth year time spans naturally swept across a pretty vast period of time, and that that period of time was like, truly vast. I mean, it was 10 years to more than 400 years, with a median time span of like around three years. So overall, what it was pointing to was that, well, first of all, there was a very strong pattern of influences, like an 80-20 pattern, where a large fraction of the influence was coming from a very small fraction of influencers. And then if you combine that with the vast span of birth year to birth year time spans, if you kind of put those two things together, the kind of the conclusion to draw from that was that most of the contemporarily most popular American stand ups drew their inspiration from A small set of influencers who were themselves, spread across multiple centuries. So that was kind of an interesting thing, an interesting conclusion that I drew.

Rosemary Pennington
I'm looking at your visualizations of the influence chains from William Shakespeare to first Jerry Seinfeld and then to Larry David. And the thing that I was struck by looking at these is that the chain of influence to Larry David seems a little more direct than it seems to have been to Jerry Seinfeld. And I wonder, you know, what do you make of that, given that Seinfeld and Larry David are so, you know, tightly connected as far as comedians and producers. But also, were there chains that influence that you found particularly interesting as you were combing through this what must have been a vast bunch of data?

Sachin Date
That's right. So there's definitely a very large diversity in the structure of the influence chains. Now one thing to kind of keep in mind over there is that the data definitely has some degree of what we could consider as some form of, you know, non response bias, and that's because a large number of comedians simply don't have influence data mentioned about them on their Wikipedia pages. So, that's going to generate some kind of a bias, which is kind of similar to the sort of bias that one encounters on surveys, where people simply don't respond to the survey. So that's missing data bias associated with that kind of missing data. So there could very well be influences which are not represented accurately enough by the crafts that you see in the paper. And that's almost certainly because the data for them is simply not available. But at the same time, there is still, I think, enough data on Wikipedia to draw the conclusion that the influence networks of a lot of these comedians have a lot of diversity in them. Now going back to your question about some kind of interesting features about these graphs. Well, one of the things that I noticed fairly consistently was that Woody Allen seemed to be performing the role of what you might consider as a router of influence. So his position in the influence networks was such that he seemed to be routing over influences from what were essentially writers in the 1800s, 1700s, 1600s all the way back to William Shakespeare, over to the set of modern day American stand ups. So on one side of the craft there were a bunch of writers and humorists and playwrights, and on the other side of the craft were people who were largely American stand up comedians with Woody Allen. The node representing Woody Allen kind of sits in between. So that I found it interesting in the way that it, you know, this pattern repeated so often. The other thing, one other kind of interesting feature I ran into was just the lengths of some of these influence chains. So for instance, I observed like 20 long, really long chains of influence. And they were about, I think, 12 to 15 influences in each chain. And then, as you kind of go back in time, starting with present day comedians like Hassan Minaj or Michelle wolf or Taylor Tomlinson, if you kind of trace back the chains from comedians such as those, you slowly start hitting notes that represented comedians of the American vaudeville era of the early 1900s to late 1800s and then before that come the notes that represent comic writers like James Joyce or Ken Jeong, and then you keep following through on those chains until you kind of finally reach people like William Shakespeare in one instance, and then in another instance, Miguel de Cervantes, the creator of Don Quixote. So that's more than 400 years ago. So that's like more than four centuries of influence carrying over from me, well, the Cervantes, all the way to the 21st century comedians.

John Bailer
So what's next for you? I mean, you know, you've looked at this kind of connection here, of comedians, you mentioned some gaps that were in the Wikipedia study. And I think even in your article, you mentioned Lenny Bruce, not being within this influence graph. Do you have any thoughts of back filling some information that you thought were gaps, or are there sort of next projects that would be associated with these types of investigations?

Sachin Date
So with Lenny Bruce, one of the things I noticed was that a few previous studies on scholarly influence in general, not necessarily our district influences on comedians, but scholarly influence in general, those studies did mention Lenny Bruce. Those Lenny Bruce didn't really appear to be one of the major influences over there, but the moment you kind of look at Lenny Bruce's influence and the context of comedy, it kind of bubbles up to the top very quickly in terms of influence. The interesting thing about that is that there's simply, you know, not a whole lot of data available about some of these comedians, and in some cases, there's a lot of data available about others. So it's quite possible that Lenny's position in the influence structure is very heavily dependent by simply the availability of data associated with the comedian. Now, well, in terms of future work, one of the things I'd like to do is to essentially look at the influence structures of individual comedians and comic actors. So I mentioned Woody Allen. Woody Allen turned out to be a router of influences from writers to stand up comedians. So I'd like to inspect the influence structures around other famous personalities in this space to see if they are also routing over influences in a particular manner, from their influencers to the people who they influence. And then the other kind of natural extension to this study is to go beyond the contemporary, most popular American stand ups, which is what the focus of this study was, and then study all American stand ups, or maybe all comedians who have performed stand up of some kind all over the world, and then inspect the influence structures associated with that much, you know, much more, much more comprehensive set of comedians. So one of those things I've already done is a paper out recently from me, where I've extended this study out to include basically all American stand ups, and then studied the influence structures on that body of comedians. And one of the things I found was that a lot of the results of this paper in significance actually carried through very nicely in that bigger body of American stand ups as well.

John Bailer
Well, I'm afraid that's all the time we have for this episode of Stats and Stories. Sachin, thank you so much for joining us today.

Rosemary Pennington
Yeah. Thank you for being here.

Sachin Date
Thank you for having me.

John Bailer
Stats and Stories is a partnership between Miami University's departments of statistics and media, journalism and film and the American Statistical Association. You can listen to us on Spotify, SoundCloud, Apple podcasts, or other places. You can find podcasts and follow us on LinkedIn and Twitter. If you'd like to share your thoughts on the program, Send your email to stats and stories@miamioh.edu or check us out at stats and stories.net and be sure to listen for future editions of Stats and Stories, where we discuss the statistics behind the stories and the stories behind the statistics.


Math and Music | Stats + Stories Episode 317 by Stats Stories

Long after Harry Nilsson said, “one is the loneliest number,” and after Bob Seger sang about feeling like a number, music streaming services are using data to help of discover new music that connects to our frequent plays and preferences. Dr. Kobi Abayomi helps break that all down in this episode of Stats+Stories.   

Read More

Careers in Rom Coms | Stats + Stories Episode 264 by Stats Stories

Romantic comedies are rife with plucky heroines. Small bookstore owners are being pushed out by big corporations, runaway brides, and Perpetual bridesmaids. But where are the scientists, microbiologists and engineers, and statisticians? One researcher went looking for them, which is the focus of this episode of Stats+Stories with guest Veronica Carlan. 

Read More

Predicting the Weather with Pietro the Weather Tortoise | Stats + Stories Episode 225 by Stats Stories

Meteorologists go to school to be able to predict the weather accurately, but for some people, weather prediction is a hobby. Maybe they have a trick knee that hurts when it rains or perhaps they know when a storm is coming by how the birds at their feeders are behaving. Some lucky folks have pets that can help them figure out what the weather is going to do and that’s the focus of this episode of Stats and Stories with guest Connor Jackson.

Read More

The Best Friend on Friends | Stats + Stories Episode 220 by Stats Stories

Since the 1990’s people have been trying to figure out who’s the best friend. Is it Chandler because of his dry wit? Phoebe because of her unabashed enthusiasm? Joey because his loyalty? Well, leave it to statistics to give us a firm answer. Who’s the best friend from the show Friends is the focus of this episode of Stats and Stories with guest Mathias Basner

Read More

A Not So Standard Podcast | Stats + Stories Episode 212 by Stats Stories

Our lives are increasingly shaped by statistics and data. However, they remain concepts that can be difficult for broad audiences to understand. A number of outlets, including this one, have sprung up to help make them more accessible. Today another one, the “Not So Standard Deviations” podcast is the focus of this episode of Stats+Stories with guests Hilary Parker and Roger D. Peng.

Read More

#MemeMedianMode Contest Winner! | Stats + Stories Episode 200 by Stats Stories

At Stats+Stories we're lucky to have listeners who put up with John's bad jokes and our general shenanigans. In fact, you've listened to 199 discussions of the statistics behind the stories and the stories behind the statistics. To mark our 100th episode we asked you to submit statistical headlines and a haiku won. For 200 we took to Twitter using the #MemeMedianMode hashtag and this time those that rose to the top actually memes. Today we're talking to the creators of our top two.

Nynke Krol (@krol_nynke) is a statistician at statistics Netherlands who also submitted a stance mean that caused both, John and Rosemary, to actually laughed out loud when they saw her take on data normality.

Eric Daza (@ericjdaza) is a data scientist statistician who focuses on digital health, he submitted several means to our mean, median, mode contest, including one that made me flashback to my first graduate class in research methods, on causation/correlation.

Read More

The "Key" to a Successful Kickstarter | Stats + Stories Episode 197 by Stats Stories

About 20 years ago, most people would have been unfamiliar with the term crowdfunding. Now, when it comes to the arts, you can crowdfund anything from comic books to Mystery Science Theater 3 Thousand to musical compositions. What it takes to successfully crowdfund a rock project is the focus of this episode of Stats and Stories with guests Moinak Bhaduri, Dominique Haughton and Piaomu Liu.

Read More

The Stats of the Decade | Stats + Stories Episode 120 by Stats Stories

Iain Wilton directs the Royal Statistical Society’s policy, public affairs and external relations work. His team’s responsibilities include the production of the RSS member newsletter, Significance magazine and the RSS’s policy briefing papers for MPs and peers. Iain’s team also organises the All-Party Parliamentary Group on Statistics as well as the RSS Statistical Ambassador network and the annual Statistical Excellence Awards. Iain has a doctorate from Queen Mary, University of London and has previously worked for the BBC, the Cabinet Office and the University of Essex. He has also written a biography of the sportsman, writer and politician CB Fry.

Read More

What Do Seinfeld, The Tonight Show And Stats+Stories Have In Common? | Stats + Stories Episode 7 (REPOST) by Stats Stories

Rick Ludwin was hired by NBC Entertainment in 1979 and made director of variety shows there in 1980. He then became vice president for specials and variety programs in 1983; senior VP for specials, variety programs and late-night in 1989; and executive VP for NBC’s late-night and prime time series in 2005. In its 57 years, The Tonight Show has had five permanent hosts, and Rick has been the boss of three of them. His late-night division at NBC developed the hit comedy Seinfeld. Rick, a 1970 Miami University grad, joined the Stats+Stories regulars to discuss the use and impact of ratings on television programming

Read More

How Esports Stats are Tracked | Stats and Stories at JSM by Stats Stories

Brian McDonald is currently the Director of Sports Analytics in the Stats & Information Group at ESPN. He was previously the Director of Hockey Analytics with the Florida Panthers Hockey Club, an Associate Professor in the Department of Mathematical Sciences at West Point, an Adjunct Professor in the Department of Management Science at the University of Miami, and an Adjunct Professor in Sports Analytics in the College of Business at Florida Atlantic University. He received a Bachelor of Science in Electrical Engineering from Lafayette College, Easton, PA, and a Master of Arts and a Ph.D. in Mathematics from Johns Hopkins University, Baltimore, MD.

Read More

Using the Stats to Improve Your League of Legends Game | Stats and Stories at JSM by Stats Stories

Michael Schuckers is the Charles A. Dana Professor of Statistics at St. Lawrence University in Canton, NY. An applied statistician he has received funding from the US National Science Foundation, the US Department of Defense and the US Department of Homeland Security. He is the author of over three dozen publications including Computational Methods for Biometric Authentication (Springer, 2010). Additionally, Schuckers has done work in sports analytics particularly ice hockey including consulting with a MLB team and an NHL team. For his work in this area, he was named a American Statistical Association's Section on Statistics in Sports "Significant Contributor".

Read More

The Statistics of the Year | Stats + Stories Episode 76 by Stats Stories

David Spiegelhalter pic.jpg

David Spiegelhalter is Winton Professor for the Public Understanding of Risk in the Statistical Laboratory at the University of Cambridge, Chair of the Winton Centre for Risk and Evidence Communication, and President of the Royal Statistical Society.

+ Full Transcript

(Background music plays)

Rosemary Pennington: As 2018 winds down, everyone from social media users to mainstream media outlets are releasing their lists of top albums, top books or top films of the year. Earlier this month the Royal Statistical Society got in on the action by announcing its statistics of the year. That's the focus of this episode of Stats and Stories, where we explore the statistics behind the stories and the stories behind the statistics. I'm Rosemary Pennington. Stats and Stories is a production of Miami University's departments of Statistics and Media, Journalism and Film as well as the American Statistical Association. Joining me in the studio are regular panelist John Bailer, Chair of Miami Statistics department and Richard Campbell of Media, Journalism and Film. Our guest today is David Spiegelhalter, I should say maybe Sir David Spiegelhalter, Chair of the Winton Center at the University of Cambridge. He's also the president of the Royal Statistical Society or RSS, which as I said just announced its choices for statistic of the year and I want to point out that he's the first three-time guest on Stats and Stories…

John Bailer: That should have been statistic of the year!

(Collective laughter)

(Vocies overlap)

Pennington: David thank you so much for being here today.

David Spiegelhalter: A great pleasure to be back again!

Pennington: Why choose a statistic of the year in the first place?

Spiegelhalter: Well you know we are statisticians, we think statistics are immensely important and we launched this last year as an experiment just to see if it would catch on and we were amazed at the interest in it. We’re in print, the popular radio programs and we don't just do a statistic of the year, we got 10 of them and people loved the variety and the choice so we thought we’d do it again!

Bailer: What was the criteria that you used? I mean you said you had hundreds of submissions so how do you…

(Voices overlap)

Spiegelhalter: We got hundreds of submissions. The first criteria was that it was faintly true.

(Collective laughter)

Richard Campbell: How many did you get rid of?

Spiegelhalter: Some of the entries were the old joke, you know, 95 percent of all statistics are made up. We expect those, but unfortunately that's actually one of the truest statistics judging by the entries we got, because they come in and they sound very impressive but then you start doing the fact checking and so many of them just don't stand up. I suppose this is not news to anybody. There's a lot of fake news around the world, a lot of false claims being made, a lot of them statistical and we ended up getting sent these. So we've had to do some serious filter to try to get things that we actually think are fairly accurate.

Bailer: So after you kind of filtered out the fake, how did you pick among the real?

Spiegelhalter: Oh, very difficult. We got good panel, we got journalists we got all sorts of official statisticians, you know, with some difficulty. We wanted a variety. We didn’t want them all gloomy. You know you could pick 10 gloomy statistics. We don't want to reinforce the impression that statisticians are all just such miserable people. And we wanted ones that were covered also, so some of the stories that you know, we know have been going on throughout the year. I should say the one thing that we haven't got is a Brexit statistic, but that's you know our own local problem that we're having to deal with.

Pennington: I remember that it’s so local. Do you remember that?

(Collective laughter)

Bailer: So I guess there are…how many did you pick? You picked 2 winners and how many runner ups?

Spiegelhalter: Yep, we've got two winners and then 8 runners up, highly commended statistics.

Bailer: So one of the things that I was curious about is, you know, there's lots of ways to report a statistic. And so I’m going to let you talk about some of the ones that you picked, I'm curious about the winners. I think…we know the people are just sitting on the edge of their seat, waiting to hear this result. So after you talk about the winners, I'd be curious for you to comment a little bit about why this representation versus some other representation of the story was compelling to you.

Spiegelhalter: Exactly. I mean it's terribly important because I know, we all know that we can make any number big or small, depending on how we frame it, what comparisons we make, what units we use and so you know we would try to frame them I think in a way that is most realistic. So when we do the winner, we actually reported into multiple frames in order to get a more balanced feeling about it.

Campbell: When you did this last year for the first time, I know the statistic, the international statistic I think, or was it the American statistic or U.S.?

Spiegelhalter: It was the international.

Campbell: The international one, that was in the Huffington Post and Kim Kardashian picked it up. How much news generation came out of that?

Spiegelhalter: You know we got a lot of coverage that was about essentially over the last 10 years. I think our main statistic was the number of U.S. citizens killed by lawn mowers over the years. That of course was just a hook to try to draw people in, just compare the number of people that are being killed by you know immigrant jihadist terrorists which is on an average of 2 a year, compared with the number for example killed by fellow Americans.

Campbell: Yes, and that was 11000.

(Voices overlap)

Spiegelhalter: So these are very stark figures, and we received some criticism about that, you know and I can see why, because it suggests well, that's the future risk. We didn't mean that, it's the past rates that what has happened. These are the statistics of last year, they are not predictions about what's going to happen next year.

Pennington: So what are your statistics this year for the international side of things and I know you also identify U.K. one as well. So what are the winners?

Spiegelhalter: OK the International one is a slightly negative one, it was more than negative. So it's 90.5 percent. And that's the proportion of plastic waste that has never been recycled. We also frame it to say well 9.5 percent has been recycled but still not a very large number given you know you're talking about you know 6000 million metric tons of plastic that’s actually not in use anymore, that has been got rid of. And so you know that means that only 10 percent has being recycled and out of the rest of it, about 12 percent is being incinerated and the rest is just lying around at landfills or will be dumped in the environment and you know I'm sure that in the States, certainly in the U.K., plastics has received a lot of attention this year….Blue Planet, these pictures, whales and fish and things like that with all this plastic in them, and this has become a very strong story. And then this was a really strong study done from the University of California, you know published in Science Advances. They made this assessment of the amount of plastic that was not being recycled.

Campbell: So I'm a general listener and say, I am watching cable news in America and I see the statistics come on and I'm saying, OK. How do they know 9.5 percent of the plastic waste has never been recycled? So I'm putting you on the spot here. So how would we respond to that? Because we get a lot of that, you know, people not believing in statistics and certainly not willing to do the work to find out where that information came from.

Spiegelhalter: Actually it was reported in a UN paper, in a report but it comes from a published paper in Science Advances from 2017 and kind of….Oh interesting! So they got plastic production data. They can get that from industrial production system statistics and then they can look at product lifetime distributions from eight different industrial use sectors. So by breaking it up into the different sectors, packaging and so on and then they have got data on how long within each sector plastic is in use, and then by knowing about the productions they can work out how much plastic is out there. So that's how they work out that you know only 30 percent of plastic ever produced is currently in use. That means 70 percent has gone and then…I'm just trying to work through how did they get at the amount that’s being recycled and they know from other sources…then they look at the recycling rates broken down around the world, from Europe and China. And in the United States plastic recycling has remained steady at 9 percent since 2012. So essentially I can stop and do this again. It's really cool. So they build a big model. First of all the model for plastic production, looking at industrial data. Then a model for how long plastic is in use. That enables them to estimate how much plastic is actually in use at the moment, which is you know, at least 30 percent of what's being produced. And then by looking at incineration and recycling data from different countries, they can work out how much is being recycled out of everything that's being produced and is not in use anymore.

Bailer: So a natural question is, you just described the models, that's estimating a lot of components. And you know, none of these things are known, and so there's uncertainty associated with all of this and you know what would you say when people say well, by reporting a single number that perhaps this is conveying an overly strong sense of precision?

Spiegelhalter: And I would completely agree. And that it would be much better to give a range of these numbers at a minimum. Actually I believe the giving ranges would make it more trustworthy and happier, having a range than a single number. I mean one can qualify it by saying around or an estimate and so on. So they’ve got a relative measure of plus or minus about 6 to 7 percent, which isn't too bad. So that would only take it, if the total is 10 percent, you know you might say the total is between 8 and 12 percent for something that’s being recycled.

Bailer: OK. I just think math is such an important point. All the time we see the kind of headline statistics, there's always in my mind, kind of two things that come - one is, you know, how well do they know this number, and then even when you have some of these other components like the 63 million metric tons, do people have a sense of how much that represents?

Spiegelhalter: Yeah, these are just big numbers. What does it mean? And that's why people will be so much more influenced by seeing a picture of a turtle you know with his head through a piece of plastic or something like that, what drives the emotional reaction to these things. You know what does that 63 million metric tons mean? It is extremely difficult to judge. I mean one way of course is to do it at per head of population, for a million people in the world, that’s a ton each, that’s enormous so I. So I think there is a problem with all these big numbers. It is amazing it is almost exactly a ton each of plastic for each person that is no longer in use. Wow! That debate is more impressive than the 6000 million metric tons which I haven't got a clue what that means!

Pennington: You're listening to Stats and Stories and today we're talking about the statistics of the year according to the R.S.S. with society president David Spiegelhalter. I'm going to ask you to talk now about the U.K. stat of the year, because I think it's interesting that both of these statistics of the year are somehow related to environmental concerns.

Here now 12:16

Spiegelhalter: That was a deliberate choice and we’ve also chosen one negative and one positive there. The U.K. one is a positive environmental one, that on the 30th of June the 28.7 percent is the figure and that's the peak percentage of all electricity produced in the U.K. is solar power, on the 30th of June. So that means that amazingly for this certain country, solar power was the biggest producer of electricity. Briefly, extremely briefly and that number is exact. That is a true statistic. But of course it was only brief, but it's a staggering change from you know, when it was so low, nobody thought about it 10 years ago in this country.

Bailer: So could you give us the list of kind of the highly commended statistics international?

Spiegelhalter: Yes. We've got some very positive and negative ones. The positive one is that in spite of all the stories you know that we hear about the decline of living standards in the West, worldwide the percentage of the population that it considers living in absolute poverty, has more than halved since 2008, that’s in the last 10 years. It has gone down from 18 percent to essentially 9 percent and it is a quite extraordinary benefit that this happened to people. And this isn't a story that makes the international news, that far fewer people are living in absolute poverty than 10 years ago.

Bailer: And then, just as a…well before we go to the other ones, I had a question for you in terms of reporting this. When I saw it, I was wondering if 50 percent reduction in absolute poverty would be a more impressive statistic to me than 9.5 percent...

Pennington: Yeah, maybe.

Spiegelhalter: No exactly. We chose deliberately to use the percentage point reduction. Then we can say it’s halved, essentially, but in this case we would have a bigger emotional hit to say poverty has halved in the last ten years. But we want to do this statistic, which is the percentage point reduction. We could frame this and give it a stronger emotional hit, but we chose not to.

Bailer: You are a risk difference guy, than a risk ratio guy here.

Spiegelhalter: Yeah exactly I believe in absolute risks, absolute proportions. We know that relative risk, relative changes can be highly manipulative. The way in which to communicate changes over time.

Campbell: Is part of the statistics of the year to how much behind the final decisions is what's going to attract a news story? We need to get people interested in and learning about statistics. What's going to get the New York Times to cover this, what's going to get the British press to cover this?

Bailer: Yes. There is a trade-off there. We can't just have a whole lot of negative stories and they can’t be too dull. We want them interesting, but at the same time they can’t all be about celebrities or whatever. Last year's was quite a nice mix. We couldn't find quite like that this time. We want good news stories but we also want ones that are just important and frankly ones that have a story that’s not generally being told, rather than just the celebrity stories. The stories about poverty being halved in the last 10 years, nobody's written a story about that this year. That’s not in our news.

Bailer: So you had 3 more that were in your highly commended group. So you want to just run through them real quickly and then we can…

Spiegelhalter: Yeah. Well the second one, I think this is terribly important. Amazingly from November 2017 to October 2018, the number of measles cases in Europe which is 64,946, nearly 65000 measles cases and 2 years ago it was 4000 cases.

Bailer: Oh my goodness!

Pennington: Wow!

Spiegelhalter: Isn’t that staggering? That’s 15 fold rise in 2 years. This is really terrifying. This is very serious indeed and we know why, because of all the stories about vaccines are giving kids autism, in spite of being disproved and in Britain we've recovered from that story, largely because we've exported Andrew Wakefield to the states.

(Collective laughter)

(Voices overlap)

Spiegelhalter: But the number of anti-vaccine websites and the fact that this has become politically acceptable, for example in Italy, major parties are arguing against vaccination. This is very dangerous, and you know the kids will die, and you know this is a really bad story.

Bailer: And so then the next one related to the Russian men.

Spiegelhalter: Yeah this is really extraordinary. This year Russia raised the retirement age for men from 60 to 65. Unfortunately for Russian men, 65 is their current life expectancy. It's only just above that, so it's estimated that 4 in 10 Russian men, 40 percent, will actually die before they get to that pensionable age, which is quite troubling compared with say the US, you know, that 80 percent men will get their retirement age and in U.K. 87 percent men will live past 65. I’m 65, I’m just taking my pensions. I’m a lucky one of those 87 percent.

Bailer: I really like that part of when you're reporting out the idea of putting that context. You know when people think about that, when you first report that 40 percent, which is that? Is that big or is it little? Then they are given that other example with the U.K. and US, I find that a really nice part of contextualizing the story.

Spiegelhalter: Yeah, so it's still in the U.K. 20 percent of men, 21 percent won't do it. So you know it's about half the figure in the U.K. About 13 percent won’t make the retirement age. So you know it is bad but in Russia then that's 3 times that, right? Which is very high. You need the international context with that data.

Bailer: And how about your last one?

Pennington: Kardashian, I guess.

(Collective laughter)

Spiegelhalter: This was a bit of a celebrity. 1.3 billion, this is extraordinary. The amount wiped off Snapchat’ value within a day of one Kylie Jenner’s tweet. So this is a bit of a flagrant appeal to populism. You know just a brief tweet that she made in February 2018…so does anyone else not open Snapchat or is it just me? Oh yeah. 367,000 likes! I mean, it is extraordinary. I mean there are other things that were changing about Snapchat also. Again we've got to be careful with drawing you know a causal pattern with certain decisions we know we can't draw straight causal pattern. But this is too good a story to miss.

Pennington: Yeah. I had a question about the U.K. statistics and maybe we can talk about some of the other highly commended ones but I wanted to ask about Jaffa cake.

Spiegelhalter: Oh yes.

(Voices overlap)

Pennington: …to explain what exactly they are and why this is noteworthy statistic for people who are not in the UK.

Spiegelhalter: There are kind of a form of biscuit, but they had to go to court to claim they are a cake because and they didn't have to pay a VAT tax on them if they will call it a cake. It is a type of biscuit with a soft bottom but a chocolate top with a bit of sort of orangey you know jammy stuff inside as well. I love them! They are a real sugar rush. I love them, I have to keep them out of my way. They normally sell them in smallish boxes but at Christmas they release what used to be called a yard of Jaffa cake which was 36 inches long, an old yard.

Pennington: A lot of sugar!

Spiegelhalter: Last year that contained 48 Jaffa cakes. Well, now it only contains 40 Jaffa Cakes, the cakes generally are of the same size, you just get less of them in your box, and actually the boxes shrunk and they couldn't fit it into a yard anymore. So now they have to call it a sort of Christmas cracker or Jaffa Cakes, and you know what this is? The end of Jaffa Cakes. They are incredibly…they sell billions of these things. I know I love them. But some say…but this is just the one example of you know the shrinking size of products, that you could say this is a good thing. It could be a great thing if people didn't eat so many Jaffa cakes. You know Mars Bar and other things will go smaller, this is a very good thing. Portion control is incredibly important, it’d be wonderful if people didn’t eat so much. But the price has gone down.

Bailer: I love the way you describe it as shrink inflation too!

Spiegelhalter: Shrinkflation, yeah! And Toblerone got a lot of interest last year as well when they reduced the size of the chocolate but not prices. So this is not a matter of perhaps global importance but some people notice this kind of thing, and again it made a good news story, where they got a lot of coverage.

Bailer: Were you surprised that the one report about the amount of shopping that was in store versus online?

Spiegelhalter: Yeah, this is the issue where you to decide, what about the framing of this? Do you frame it as saying that 18 percent of all shopping is now online, you know the big one in five spending online…or do you frame it as 82 percent of shopping is still done in the shop, rather than online. Do you do a positive or a negative frame? Because I have seen this story reported in both ways. Actually for us, we found it quite surprising that given you know the huge publicity around the rise of online shopping, the closure of so many shops in the high street now, I thought it is going to a big effect of this. I'm surprised it was at 82 percent. But then again of course you've got food, you’ve got a lot of stuff that’s not done online as well. But still you know 82 percent is still done by people walking to shop and paying.

Campbell: How that compares with the U.S.? It would be interesting to see that!

Spiegelhalter: I don’t know what that in the US is.

Campbell: It seems very high.

Pennington: That does seem high.

Campbell: Here in the US, everybody is using Amazon here you know.

Spiegelhalter: Yeah, well people use that here as well, you know. It was a huge amount as well so…I don't know the U.S. figure, I’m about to find that out.

Bailer: You know, the other one of the stories that the commended stats related to, the trains running on time and you know, we all thought well, all of us do travel and you know, it is about rail travel, but I was wondering how the rail travel in Great Britain compared to that in Europe, or how it might compare to air travel…I was thinking about some of this contextualizing and framing this too.

Spiegelhalter: Yeah, we really should. Again I think that's a very good point that we need to look at because the reason why that story's in here is that we have an utter disaster this year with regards to trains. They introduced a new timetable, that wasn’t planned properly, huge numbers of cancellations, absolute chaos and there were strikes as well. So I mean this 86 percent of trains are running on time is terrible because they know this must be above 95 percent of the time, that’s what they claim to be able to do, and that's where you can start getting compensation as well. They paid out a fortune in compensation. I was travelling on trains in the summer and you know they were just announcing on every train to tell you how to claim compensation. I was making the claim even before the train came in, got to my destination. I had my online compensation claim that I submitted so it was absolute shambolic. So this is far worse than it generally is, it’s the worst for nearly 15 years in this country, it has been quite recently late. But I don't know the international comparisons. That’s something I should find out. But actually it was so noticeable this year, the whole system really fell apart in the summer.

Pennington: We are starting to be getting ready to wrap up, but I do, before we go want to ask you about this. The first listed commended statistic for the U.K. about female executives of 250 companies.

Spiegelhalter: Yeah, so that's the figure 6.5 percent, which is 6.4 percent. Sorry, the figure is 6.4 percent, which is the percentage of female executive directors within 250 companies, especially the big companies in the U.K.. And the gender pay gap has been a massive issue in this country, because this country for the first time by law, in larger and medium sized employers after they pull gender pay gaps. Unfortunately those are just reported as what women get paid from what men get paid. And we were going to use those figures but actually they’re not…they can be very misleading because it includes many women who are in part time work, they are not adjusted for the kind of work. So what we want to do is to pick a job in which you know everyone is roughly comparable and then looking at what’s the percentage of female and it's extraordinarily low. And it doesn't seem to be getting any better. I mean it changed, it went from 38 to 30 in a year. I don't think that’s really statistically significantly different, but it's certainly no indication of things getting bigger.

Pennington: So that’s all the time we have for this episode of Stats and Stories. David, thank you so much for being here, it has been a really interesting conversation today.

Bailer: Always a pleasure David, I still think three should have been on there, number of times David Spiegelhalter has been on Stats and Stories.

Spiegelhalter: It’s going to be okay, I seem to be fumbling around as you see! You can tell it's the first interview I've done on this, I'm going to do a bit more preparation, some background on them. Yeah but that was very helpful to me in fact!

Campbell: Good!

Pennington: Stats and Stories is a partnership between Miami University’s departments of Statistics and Media, Journalism and Film, as well as the American Statistical Association. You can follow us on Twitter, Apple podcast or other places you can find podcasts. If you'd like to share your thoughts on the program send your email to statsandstories@miamioh.edu you can check us out at Statsandstories.net and be sure to listen for future editions of Stats and Stories, where we discuss the statistics behind the stories and the stories behind the statistics.


Better Bayes Winner Revealed | Stats and Stories Episode 73 by Stats Stories

Stephen T. Ziliak is Professor of Economics at Roosevelt University and Conjoint Professor of Business and Law at the University of Newcastle-Australia.  A major contributor to the American Statistical Association “Statement on Statistical Significance and P-values” (2016) he is probably best known for his book (with Deirdre N. McCloskey) on The Cult of Statistical Significance: How the Standard Error Costs Us Jobs, Justice, and Lives (2008), showing the damage done by a culture of mindless significance testing, the history of wrong turns, and the benefits which could be enjoyed by returning to Bayesian and Guinnessometric roots.

Read More

The Fab Formula | Stats and Stories Episode 68 by Stats Stories

Mark Glickman, a Fellow of the American Statistical Association, is Senior Lecturer on Statistics at Harvard University, and Senior Statistician at the Center for Healthcare Organization and Implementation Research, a VA Center of Innovation.  He is well-known for his work in games and sports, having created the Glicko and Glicko-2 rating systems that are widely used in online gaming.  Mark co-organizes the biannual New England Symposium on Statistics in Sports, has been Editor-in-Chief of the Journal of Quantitative Analysis in Sports, and has been the chair of the US Chess Ratings Committee since 1992.  More recently, Mark has embarked on projects in music analytics.  His work on authorship attribution of Lennon-McCartney songs has received widespread media coverage.

Read More