Taylor Swift and Markov Chains | Stats + Stories Episode 390 / July 9, 2026 by Stats Stories

Marina Ferrari de Aquino Klemm is an associate curator at the Auckland War Memorial Museum. She can be found listening to Taylor Swift while she's on marine expeditions throughout the South Pacific, cataloging museum specimens, or analyzing all sorts of biodiversity data.

Check Out the Full Article

Create Your Own Color-Based Playlist

Charlotte Jones-Todd taught as a senior lecturer at the University of Auckland, New Zealand. She spends her time teaching statistics and coding, fixing bugs, pandering to her pets, and baking bread.

Episode Description

Taylor Swift is an entertainment juggernaut. She's become one of the best-selling musical artists of all time, with her Eras tour grossing more than $2 billion. During that tour, Swift surprised her audience each night with costume changes to mark different eras of her career. Now, a couple of researchers have figured out how to predict what costumes Swift would wear, when, and if these outfit colors were related to the sentiment of songs being performed. Predicting the colors of Swift's Eras tour is the focus of this episode of Stats and Stories with guests Marina Ferrari de Aquino Klemm and Charlotte M. Jones-Todd.

Full Transcript

John Bailer

Taylor Swift is an entertainment juggernaut. She's become one of the best-selling musical artists of all time, with her Eras Tour grossing more than $2 billion. During that tour, Swift surprised her audience each night with costume changes to mark different eras of her career.

Now, a couple of researchers have figured out how to predict what costumes Swift would wear, when she would wear them, and whether the colors of those outfits were related to the sentiment of the songs being performed. Predicting the colors of Swift's Eras Tour is the focus of this episode of Stats and Stories, where we explore the statistics behind the stories and the stories behind the statistics.

I'm John Bailer. Stats and Stories is a production of the American Statistical Association in partnership with Miami University's Departments of Statistics and Media, Journalism, and Film.

Rosemary Pennington is away. We have two guests joining us on today's show. Marina Ferrari de Aquino Klemm is an associate curator at the Auckland War Memorial Museum. She can be found listening to Taylor Swift while on marine expeditions throughout the South Pacific, cataloging museum specimens, or analyzing all sorts of biodiversity data.

Charlotte M. Jones-Todd is a senior lecturer at the University of Auckland, New Zealand. She spends her time teaching statistics and coding, fixing bugs, pandering to her pets, and baking bread.

Marina and Charlotte are the authors of a recent Significance article, "We Were in Screaming Color: Colors of the Taylor Swift Universe—A Statistical Story."

Thank you both for being here. When I'm thinking about Markov chains, correspondence analysis, and tests of independence with categorical data, I immediately think about Taylor Swift.

Charlotte M. Jones-Todd

Right? There's no other thing to think about.

John Bailer

Yes, I mean, where else would you start?

I must start by finding out what led you to think about the colors of outfits and questions related to this performer's wearing of them and when she wore them.

Charlotte M. Jones-Todd

I'm going to pass this over to Marina because she's the brains behind all of this.

Marina Ferrari de Aquino Klemm

Well, Taylor Swift is my favorite thing to talk about, so I'm so happy you're asking me this question.

I've always been a fan of Taylor Swift since I was maybe 13 years old. She always put hints into her songs and lyric books. Back in 2008 or 2009, she would put capitalized letters within her lyrics, and they would form a secret message that fans would try to decipher by circling them.

That was me in Brazil. I didn't have access to the lyric books because she wasn't very popular there then, so I would download them, print them out, and circle the letters myself so I could have the experience.

That started way back. Taylor Swift fans, or Swifties, have a reputation for trying to find hidden messages everywhere. That's really how she raised us. She created this detective-board mentality, with lines connecting clues and hints she dropped in music videos and other places.

I thought it was natural that she would do that on the Eras Tour, too—that she would be putting hints into different aspects of the show. I was particularly interested in the colors of the outfits for the surprise song set.

This mammoth tour had 189 concerts over several years. She always performed the same songs in the same order, and the one thing that was truly special at every concert was the surprise song set. It was an acoustic segment featuring songs that weren't part of the regular set list. She would perform two songs selected for that show—one on guitar and one on piano—and she would wear a different-colored outfit each time.

That was the thing everyone talked about. People would say, "Oh my God, I went to night two in Melbourne, and she wore this dress and sang these songs and this mashup."

So, I started wondering how she chose the songs. I couldn't find any obvious link between the songs she selected, but the dresses were much easier to sample and analyze because there were only 12 dresses throughout the entire concert series.

That's how the quest began.

Charlotte M. Jones-Todd

Whenever you think something's random, humans love to ask, "Is there a pattern?" We think there's a pattern, so let's see if we can quantify it.

John Bailer

Yes, so you capture this in the Significance article, "We Were in Screaming Color: The Colors of Taylor Swift's Universe—A Statistical Story."

The first part of the article looks at the outfit worn during the surprise song set on a given night and whether it was associated at all with the outfit worn during the surprise song set the previous night. My question is, you're using tools like Markov chains to do this. Can you give a nice, lay explanation of what a Markov chain would consider in this situation?

Charlotte M. Jones-Todd

Oh, yes. Way to put us on the spot.

I like to think about it in terms of imagining Swift trying to choose her outfits. Imagine there's a nice wardrobe with a big rail where everything is beautifully organized. The second option is that you've just got a pile of clothes on the floor, which is probably more like how I get dressed in the morning. You're just grabbing whatever is clean.

The difference is this: Are you going to the organized rail of clothes and picking out your blue dress, then saying, "Okay, I wore my blue dress yesterday, so today I'll wear my purple dress"? Or are you doing something more random?

What we're good at quantifying, and what computers are really good at quantifying, is what happens at random. What happens when we throw all our clothes on the floor and just pick things up without any thought about what we're doing?

The idea of these Markov chains is a fancy statistical way of asking whether things are following an order. Have we nicely hung up all our clothes and are we selecting them in sequence, making the process a bit more deterministic? Or are we just picking things up at random?

By comparing those two possibilities, you can get a feeling for whether Taylor Swift was an organized outfit planner or not.

John Bailer

Okay, so if you knew what she was wearing last night, did it give you any information about what she might be wearing tonight during the surprise song set? What did you find?

Marina Ferrari de Aquino Klemm

It didn't matter—at least not all the time.

We separated the tour into three legs. There was the first leg, which included the United States, Oceania, and South America; the middle leg, which was Europe; and the last leg, when she came back to North America and performed in the United States and Canada.

We found that for the first and last legs, the outfit choices appeared random. We couldn't find any clear order in how she chose the dresses. But for the European leg, there was a clear sequence of colors that seemed to be intentionally chosen.

We don't really know why. Maybe she found a better dry cleaner in Europe. Or perhaps it was because the European leg was when she introduced another part of the tour featuring The Tortured Poets Department album. It was a whole new set of songs and a new section of the show.

Maybe because she debuted new songs, she also introduced a new pattern in the dress colors. Everything seemed so planned, and she had a good break between tour legs. I wonder whether that was the point when the team decided, "Okay, we're going to plan even the colors of the dresses." I believe that's what her team said.

Charlotte M. Jones-Todd

I think she literally was like, "Yes, we're going to do a first-order Markov chain, and that's how I'm going to wear my dresses."

John Bailer

I would love to have validation of this. I want to hear from her. I want to hear from her people. Have her people call our people just to let us know. This was great fun.

How did you get the data for this? Where did you find all this information?

Marina Ferrari de Aquino Klemm

Dedication.

When the Eras Tour started, I had just finished my PhD and was looking for a job, so it was kind of a perfect storm of free time. It's easy to make an Excel spreadsheet, and that's how I started tracking what color dress she wore and what songs she sang. It just kind of went from there.

John Bailer

Well, it certainly generates more hypotheses. You start thinking about what's different about that middle leg of the tour. You hinted at some of the things that might have changed and why the pattern appeared there but not during the first or last part of the tour.

Do you have any other thoughts about what is different?

Marina Ferrari de Aquino Klemm

She was also recording another album—her last album—during that time. She was doing a lot of things. So maybe a different person was choosing her outfits because she was busy recording, finishing concerts, and still going to the studio.

It's a three-and-a-half-hour concert, and she was still recording a whole new set of songs and flying to Sweden all the time. So maybe that's what happened. Maybe there was another person taking care of the dresses for her.

Charlotte M. Jones-Todd

I think she just had a nice, tidy wardrobe. The Europeans obviously have good wardrobes, right? A proper closet.

John Bailer

I found myself thinking, "Gosh, I wonder how temperature and humidity affect this." Was it the weather during the performance? What other covariates might be involved?

This is the pathology of thinking about things analytically. You start wondering what else you might introduce that could drive the probabilities of wearing a particular color given the color worn the night before.

But we don't need to go there. You have done a lot of work already, and I'm not trying to assign extra projects.

You didn't stop with one question, though. You weren't going to be done with colors after just one bite at that apple. You also started thinking about color and sentiment.

Can you talk a little bit about that? You're looking at lyrics, so how do you think about colors, how people feel, and how sentiment is measured?

Charlotte M. Jones-Todd

Yes, we had a good chat about this. Marina had the idea and said, "I think there's a link between the tone of the song and the colors that are mentioned." I'm sure Marina will talk a little more about that.

I should say that I made Marina listen to all the songs again and act as the expert, coding whether there was a positive or negative sentiment based on her knowledge as a fan.

You can automate these things. You could do the boring thing and run the lyrics through a dictionary that assigns positive, negative, or neutral sentiment. But there are so many layers to these songs. Who's better than an absolute expert—in the form of Marina—to say, "I'm going to listen to this and code it all myself"?

Marina Ferrari de Aquino Klemm

Yes, because I came to Charlotte and said there was an issue. I was trying to get sentiment scores for each song, but it gave us funny results.

For example, "Shake It Off," which is one of her most positive and empowering songs, was getting a very negative score because she says, "The haters gonna hate, hate, hate, hate, hate, hate, hate," so many times. Every time the word "hate" appeared, it lowered the score of the song.

I was telling Charlotte, "How am I going to classify songs as positive, negative, or neutral if this is what happens when we use dictionary values?"

Then Charlotte said, "You could ask an expert. You're the expert."

And I was like, "Oh my God, of course I am." I've been a fan for most of my life, so I could do the arduous task of listening to every single song again. Oh my gosh, it was terrible!

Charlotte M. Jones-Todd

Yes, an arduous task. I made you do that.

Marina Ferrari de Aquino Klemm

Terrible.

John Bailer

You also mentioned in your paper that you had some automation of this process, where you would look at the lyrics, maybe a line at a time, and somehow aggregate that information over the entire song.

Marina Ferrari de Aquino Klemm

That's right, yes. Each song would have an average score based on all its lines. Each line had a separate score, and then we calculated the average for the song. That's right.

John Bailer

And then did you compare that type of rating to your general, aggregate sense of what was going on with the song?

Marina Ferrari de Aquino Klemm

I did. I used two different methods. I think one was a package in R called syuzhet, and the other was sentimentr. I ended up using sentimentr because it was more like how I would have rated the songs myself.

It didn't make it into the paper because there was already too much to cover, but yes, I did compare them.

John Bailer

All right, you're listening to Stats and Stories, and we're talking about Taylor Swift, colors, outfits, songs, and sentiment with Marina Ferrari de Aquino Klemm and Charlotte Jones-Todd.

I must ask: What did you learn? You did all this analysis, processed the colors mentioned in the songs, and examined the associated sentiment.

There was also a second component to the analysis—the color of the outfit being worn and the songs that were performed. Did you find any connection between the outfit color and the sentiment of the songs?

Marina Ferrari de Aquino Klemm

Yes, we did.

It wasn't statistically significant in the traditional sense, but we found a negative correlation. What that means is that she would wear one of her happier-colored dresses and then sing one of her saddest songs, and vice versa. It added another element of surprise to the performance.

We found that reds and blues were the saddest colors mentioned in her lyrics, while more colorful shades, such as yellows and greens, were associated with happier themes. Based on that, we expected that when she wore the green or yellow dresses—or the dresses that mixed multiple colors—she would be singing happier songs.

Instead, we found the opposite. When she wore those dresses, she tended to sing sadder songs. And when she wore the pink or blue dresses, which corresponded to colors that were associated with sadness in her lyrics, she tended to sing happier songs.

Charlotte M. Jones-Todd

There was a little bit of statistics involved in getting there.

We were talking earlier about the sentiment associated with the colors Taylor Swift mentions in her songs. After Marina went through and coded whether those references were positive or negative, we could essentially attribute a general sentiment to each color. For example, some colors were more often mentioned when she was talking about happy or fun things.

Finding that she then wore those colors while singing some of her sadder songs was really interesting. I thought that was quite cool.

I've heard people talk about how the tone of a song can sound happy even when the lyrics are very sad. That's probably going beyond my knowledge of music, but it was interesting to see something similar reflected in her use of color as well.

John Bailer

Yes, it's always interesting to think that there might be this sort of mixed message, with the outfit running counter to the sentiment of a song. That was a neat observation. It would be fun to see whether that continues.

Charlotte M. Jones-Todd

You were talking the other day, Marina, about how the way she mentions color is changing as well, right?

Marina Ferrari de Aquino Klemm

Yes. I can give some examples of how she has used colors in her lyrics throughout the years, if that's helpful.

She has a very famous album called Red. Throughout that album, she mentions red many, many times. In the title track, "Red," she sings, "Losing him was blue like I'd never known, missing him was dark gray all alone, but loving him was red, burning red."

Just think about that phrase: "burning red."

Then, seven years later, on the album Lover, she has a song called "Daylight," where she says, "I once believed love would be black and white, but it's golden. I once believed love would be burning red, but it's golden."

To me, it almost feels like she's name-dropping a person. In 2012, she was saying, "Love is burning red." Then in 2019, she's saying, "I used to think love was burning red, but it's golden." It's as though she's saying, "My new muse is golden."

Throughout Lover, she also mentions blue many times in relation to love. For example: "I'm with you even if it makes me blue," "My heart's been borrowed and yours has been blue," "I blew things out of proportion, now you're blue," and "It's blue, the feeling I've got."

Those are references from four different songs. This new muse is associated with blue and golden. The Lover album has this feeling of falling in love, but also that it hurts. There's a sense of incompatibility or uncertainty, which may explain all the blue references. Blue ended up being one of her saddest colors.

Then, on her newest album—which came out just as our paper was about to be published, and we were able to sneak in a final sentence about it—she has a song called "Honey," which is about her now-fiancé, Travis Kelce.

In that song, she says, "You redefined all of my blues when you say 'Honey.'" She connects blue to the color of his eyes. So now the question becomes: Does blue mean something different? Has it become a positive color because he redefined all her blues?

If blue is now associated with the color of his eyes, maybe it's no longer a sad color. Maybe it's a happy one.

John Bailer

Well, context is important when trying to do an analysis like this.

There is clearly deep knowledge here, and deep research. This is very impressive. I love the different dimensions you've explored as part of this project.

You also included a component where you asked whether you could predict the sentiment of a song by looking at the color of the dress being worn. Can you talk a little bit about how you approached that?

Marina Ferrari de Aquino Klemm

For that analysis, we automated the sentiment scoring of each song using the sentimentr package in R. That gave us an average sentiment score for each song.

Then came the extra work—the unfortunate extra work I had to do—which was to go through every single mention of a color in the lyrics. I think there were about 180 individual color references. I scored each one myself as positive, negative, or neutral.

For example, in the lyric, "You paint me a blue sky, then go back and turn it to rain," the phrase "blue sky" is positive. It's a happy image, even though it later turns into rain. So, I coded blue in that context as positive.

For a neutral example, there's the lyric, "Drowning in the Blue Nile, he sent me downtown lights." Blue Nile is the name of a band, so it's neither good nor bad—it's simply descriptive.

For a negative example, there's the lyric, "It's blue, the feeling I've got," from "Cruel Summer." That's associated with the anxiety and uncertainty of falling in love, so I coded blue as negative in that context.

So, I went through the task of rating all 180 color mentions and determined whether each reference was positive, negative, or neutral.

Charlotte, do you want to talk about how we compared them?

Charlotte M. Jones-Todd

Yes. Once we had those scores, we looked at what sentiment was generally associated with each group of colors.

We wanted to see whether the average sentiment connected to a color in the lyrics was related to Taylor wearing that color later. But, for me, what was more interesting than predicting what she might wear next was looking at the overall distribution of color mentions.

Were certain colors mainly associated with positive sentiment or negative sentiment? As Marina mentioned, many colors had different meanings in different contexts. We looked at how often each color was coded as positive, negative, or neutral and asked whether those frequencies were different from what we might expect by chance.

The statistical term would be something similar to a chi-square test. We went a bit beyond that, but the basic idea was to compare what we observed against a null hypothesis—the "boring hypothesis," if you like—where nothing particularly interesting is happening.

Under that hypothesis, we might expect positive, negative, and neutral references to occur in roughly equal proportions. Using statistical methods, we could assess whether certain colors were more likely to be associated with positive or negative sentiment.

From there, we created what the technical description would call a reduced-dimensional space. But the simpler explanation is that we examined the associations among colors based on whether they tended to be positive or negative.

That allowed us to see groupings of colors. For example, blues and reds tended to cluster together because they were more often associated with negative sentiment. Meanwhile, greens and multicolored references tended to be associated with more positive sentiment.

For me, it was interesting not just to say, "Blue is more likely to be negative," but to look at how all the colors related to one another and fit together in this broader visualization. That color plot was one of my favorite parts.

John Bailer

I thought it was really neat that you also provided tools for fans to explore the data themselves.

You set up resources on GitHub and also created a Shiny app so people could build their own playlists. Can you tell us a little about those resources and how fans can use them?

Marina Ferrari de Aquino Klemm

Oh, I loved hunting for Easter eggs throughout her lyric books and music videos.

I thought a fan like me reading this article might not necessarily stay for the statistics, but they might stay for the Easter eggs. Throughout the article, there are lots of lyrics and references that you might not notice if you're not a fan. Every paragraph has at least one mention of a song that you might not immediately recognize.

I also had this huge metadata spreadsheet and thought, "Who's going to download a spreadsheet?" It's a bit boring to look at. I mean, I like it, but other things are more relatable.

So, I created a Shiny app that lets people build playlists. I have lots of different data fields, such as song sentiment, who fans think the muse is, or what color dress she was wearing during the Eras Tour. I turned those into a few dropdown menus so users can create playlists based on different criteria.

For example, you can generate a playlist of all the songs she performed while wearing a pink dress, all the songs about heartbreak, or all the songs about betrayal. Then you get a list that you can add to your own playlist.

Charlotte M. Jones-Todd

I think it was based on the kinds of connections fans make—those massive boards with strings connecting different concepts and clues that we've talked about.

It was good fun to give people tools they could use. They can download the data and explore it in whatever software they like, or they can use the more user-friendly tools Marina created.

I've actually used some of the examples in my classes. I teach a biostatistics class, which is obviously very closely related to Taylor Swift.

My students have been able to play around with the data and learn a little about Markov chains and hypothesis testing. So, it's out there for people to use in whatever way they would like.

John Bailer

I'm curious—have you heard from any Swifties about the Significance article?

Marina Ferrari de Aquino Klemm

I have heard from another Brazilian Swiftie because we're big fans. We're professional fans.

She also published a paper about using Taylor Swift music videos to teach botany, so we are definitely professional fans.

If you go to any band's social media page, there's always someone saying, "Come to Brazil." We are professional fans.

I posted the article on Reddit, and there were a lot of comments. Many people who teach statistics were happy to see it, and they were happy to see that it was coming from a fan. I was using the same terms fans use, including the names fans give to the dresses. On Reddit, people were really excited that something like this had come out of the Fandom (a subreddit).

John Bailer

Okay, so what's next? What's going to follow this work?

Charlotte M. Jones-Todd

We've got to wait for at least a few more albums, right? Taylor Swift needs to step up her game a bit.

Marina Ferrari de Aquino Klemm

I'm very curious about the colors and whether their meanings continue to change. Who knows what's going to happen now?

John Bailer

Well, I'm afraid that's all the time we have for this episode of Stats and Stories. Marina and Charlotte, thank you so much for being here.

Marina Ferrari de Aquino Klemm

Thank you.

Charlotte M. Jones-Todd

Thank you very much.

John Bailer

Stats and Stories is a partnership between the American Statistical Association and Miami University's Departments of Statistics and Media, Journalism, and Film.

You can follow us on Spotify, Apple Podcasts, or wherever you find podcasts. If you'd like to share your thoughts about the program, send an email to statsandstories@amstat.org or visit statsandstories.net.

Be sure to listen for future editions of Stats and Stories, where we discuss the statistics behind the stories and the stories behind the statistics.

Hit Songs by the Numbers & What They Reveal About Us | Stats + Stories Episode 379 / January 8, 2026 by Stats Stories

The Billboard Hot 100 has been ranking the week's most popular music since 1958. The first song to top the chart was Ricky Nelson's Poor Little Fool. The most recent song to do so is Taylor Swift's The Fate of Ophelia. A lot has changed in the music industry between those two songs, not only in the types of songs that top the charts, but also in how they're promoted and how they're determined. A new book explores the statistics behind the Hot 100, and it's the focus of this episode of Stats + Stories with guest Chris Dalla Riva.

Dr. Matthews is Associate Professor of Statistics and Director of the Center for Data Science and Consulting at Loyola University. He also is a data artist who developed and promoted the Data Art Show, which debuted at the 2016 Joint Statistical Meetings. He performs with the Uncontrolled Variables comedy troupe at the Lincoln Lodge in Chicago and you can see his data art, links to his comedy performance, and much more at his website, Stats in the Wild.

Episode Description

A statistician walks into a bar, and a comedy and art show begins. Creative work for scholars can extend beyond novel research and application. In today's episode of stats and stories, we see how the intersection between interest in statistics and art, as well as the intersection of statistics and comedy, with Dr Greg Matthews.

+Full Transcript

John Bailer A statistician walks into a bar and a comedy and art show begins. Creative work for scholars can extend outside of novel research and application. In today's episode of stats and stories, we see how the intersection between interest in statistics and art, as well as the intersection of statistics and comedy, is realized. I'm John Bailer. Stats and stories is a production of the American Statistical Association as well as Miami University's departments of statistics and media, journalism and film. I'm joined in the studio by Rosemary Pennington from the Department of media, journalism and film. Our guest today is Dr Greg Matthews, Associate Professor of statistics and Director of the Center for data science and consulting at Loyola University. He also is a data artist who developed and promoted the data art show that first appeared at the ASA joint statistical meetings in 2016 and he performs with the uncontrolled variables comedy troupe. You can see his data art, links to his comedy performance and much more at his website, stats in the wild.com,

John Bailer
Greg, thank you so much for being with us today.

Greg Matthews
Oh, it's great to be here. I love talking about statistics and art and comedy.

John Bailer
Well, thank goodness, because we weren't prepared for anything else. Perfect. I have to thank you for giving so many, so many options of where and how to begin this episode. You know, I was just spinning my wheels for a while, but I guess I think I'll start with art. So, so how about just the a basic idea of how you would differentiate data art from, say, data visualization,

Greg Matthews
that's a spectacular question. So I have this sort of like corny answer for the difference between data visualization and data art. I give a I give a data art talk, and this is what I always say, is the difference data visualization answers the question. Data art asks a question, pretty deep, right? And the idea is that data visualization, I mean, there's certainly overlap, but in my mind, data visualization is essentially functional. You're trying to answer a question, you're trying to summarize data, you're trying to convey information to a viewer. Whereas data visualizations can become art when presented in the in the in the correct context. But they don't always have to be good data visualizations, right? The goals are different data like, you know, like a pie charts, not a very good data visualization. And there's rules to good data visualization, just like there's rules to good art, and you can sort of break those rules of data visualization when you're making data art. But I do see a lot of overlap in these two these two ideas, visualization and an art, and I think depending on the context, they could be one in the same, depending on how you presented it to a viewer.

Rosemary Pennington
I was looking around on your website at some of your art, and it struck me that if I did not know that you were doing this based off of data, I would have had no idea, right? It just looks like really interesting abstract art sometimes. How did you get started creating this kind of art? Well, so

Greg Matthews
it's my wife. My wife gets all the credit. So my wife went to art school, and we started dating, we got married, and over the course of, you know, our our our time together. I basically got in an art school education, and she sort of got me really interested in art. And then I started thinking about, Oh, I can, I can make art with my computer, and I can, I can make data art. And so that's what got me hooked. And then I just started doing it. And I really, I really like doing it. And what. And to your other point about how you don't you didn't know that it was data art. My goal, when I make data art is to make something that will stand on its own, and you can look at it visually, and it's interesting by itself, and then when you learn that there's data behind it, it becomes even more interesting. That's That's my goal. That's like my what I view as success in something that I've created artistically. It doesn't always work, but, like, that's my goal when I'm making it, it's more interesting when you when you realize there's data

John Bailer
behind it. Oh, that I agree. And, you know, that's, that's really cool, just to see this process. Can, can you talk a little bit about the process that you follow when you're producing a piece of data art? You know, I looked at, you know, looking through some of your collections that you have online, there's, there's some analysis and rendering that seems like it's that's hiding behind the scenes here. And so could you talk a

Greg Matthews
little bit about that? Do you know which pieces you're specifically talking about? So how

John Bailer
about, how about, no, just all of them. We have about five minutes go through it. Ready Start. How about, how about celebrity and gun. Okay, so a couple of recent ones.

Greg Matthews
Yeah, so these are from the Google Image Search series. So what I've been trying to do more recently is, you know, last five years, or whatever, five or six years, I'm trying to find data sets that are, like, more interesting, and I'm interested in exploring, like, data sets that are personal. And so I've done work with, like my Fitbit data and and things like that. But with the Google image search data, the way these are created is I'm doing a Google image search, and I'm saying I pick a word, so I the word was gun, or what was the other one you mentioned? Celebrity, celebrity. So I google celebrity, or gun, or any of these other words, and I and Google will give you back a set of images. And these images that it gives you back are it's what Google thinks you think those words mean at that particular time, right? If I Google the word, if you Google that word at a different time, you're going to get different results than I'll get. So there's something very personal about what is returned from Google. It's about the time and the place and your personal search history that you're getting these results back. So it's like, what it's what Google thinks I think this word means at that time. So I'm taking these images and I'm trying to put them into a composite image. Now there's certain ways you could do this. You could take the pixels and you could, like, average them, but this gives you kind of what I think are kind of either not boring images, but they're not as exciting visually as they could be. And there's actually an artist at University of Chicago who did this 20 years ago, taking images and making composites. His name is Jason saliva, and he has a lot of really interesting work. So what I was trying to do is make these composites. They're actually made using cart models. Where I'm I'm building a cart model to try to predict the color of each pixel. And now you could do this very well. You could build a very accurate cart model to predict the color of the pixel, the average color of the pixel, very accurately. But it's not that interesting visually. So what I'm doing is I'm actually I want to produce a cart model that is doing worse than you could, because it creates more interesting visual images. And so by creating a bad statistical model, I think you get more interesting visual images. And there's this idea in art where you need to understand the rules before you can break them, right? So I think there's analogies here where I have to understand what a good cart model looks like so that I can break the rules to make art. And that kind of connection really brings me a little bit of joy.

Rosemary Pennington
I wonder what your process is like and how you developed it, right? Because I, as you know, John pointed out, there's gotta be layers and of this, and there's lots of analysis going behind on behind the scenes that we don't see. But when you are trying to figure out what you want to create, like, how are you identifying the data, the process that you're going to use? And how do you like, what is your process for for getting to an image that you are happy with

Greg Matthews
well, so I'll usually start by picking a data set that I think is interesting, and so like I had the Google, the Google, Google Images, or fit it, I'm looking around my office. But what else I was using? I have some some work done that was based on the GPS coordinates your phone is tracking you constantly, right? So I have these images that are like the, I know it's a podcast, you can't see any of this stuff, but they're, they're, they're just like, single days of my life based on where my phone tracked me, right? And that's that's really interesting personal data and and I think good art asks a question, and one of the questions that I want to ask is about, you know, are we okay with all this data being collected about us? Right? There's just constant data being collected. And so if I find a data set I think is interesting, I'll start with that, then I just start writing code in R. So I do this, I do all this stuff in R, and. I will just go through, I'll just start with, like, you know, basic data visualizations. I'll start plotting pixels, and I'll see, all right, what if I write code like this? What does that look like? And then it's just, you know, you try 1000 things, you find the one that you like, and then that's what, that's what you you see, right? It's very much like doing statistical research. I you fail 10,000 times, and then what succeeds you write in a paper. And so people only see the successes, but there's a, there's a lot of trial and error, or, like, you know, you try something and you think it's you get excited about it, and then you look at the image, and you go, I don't really like that. So it's just a, it's just a lot of, you know, experimentation, trying something and seeing if it works, and seeing if it works and if it doesn't work,

Rosemary Pennington
you try again. You said you got into art via your wife. What does your wife think of your data art?

Greg Matthews
She thinks I'm a paradigm shifting, generational genius. She likes it. She thinks it's really good. She will she we have a relationship where, like, I can make something and she can, she can genuinely say that's good or that's bad. Here's why I don't like it. And so one of the things she's taught me is, when she was in art school, they would do like these critiques, and you had to say, I like something or I don't like something, and then you had to explain why. And that process her doing that with me, and then her getting me to do that has been really helpful in thinking about art in a different way for me, you know, I think a lot of people think about art as, you know, paintings from the 1500s about, you know, they're depicting, like, biblical scenes, like, that's what art is for a lot of people, like in a museum. But like, art is so much broader than that, and I had no idea about this until, you know, well, I met my wife, and we started talking about art a lot, right? But like, she will give me feedback, and she'll say, I like this, or this is working. This is what's good about it, and this is what's bad about it. And you could try to do something better here. So, like, we have that kind of, she helps you with that kind of stuff.

John Bailer
That's, yeah, I look you through some of your pictures. I mean, some of them, like, like beach, for example, seemed like it was really cut from kind of this impressionist cloth. Or, like the, you know, sometimes I've seen, like, the Scottish colorist cloth that that has that, that lovely image, those lovely colors. So you seem like you have this this recently, a lot of times these prompt from from image searches that are there. How has your art, your data art, changed over time? Do you find, where did you start, and what are some of the paths towards where you've where you've come to now?

Greg Matthews
So when I start, so when I started making art, I would describe, I make a distinction between computer art and data art. And when I started, I was making a lot more computer art. And what I mean by that is I was generating things randomly. So some of the other I did a lot of games of chance. So I was messing around with things like dominoes, Powerball, Craps. I have keynote in here, and I was using those as, like, a process of generating images randomly, using, like, you know, ours pseudo random number generator to generate images, and then picking the ones, picking the random numbers that I liked. But I don't that's not really data are because it's not, there's no data behind it. I'm just generating the images. And so at some point I did this for I did this for years at the beginning, and at some point I said, you know, I would, I would like there to be something more behind this than completely randomly generated numbers, even though I think a lot of this stuff is very visually interesting. But as I said before, once you find out there's data behind it, it adds a layer to it. And so I started shifting towards using using data as the the primary source of the art, right? And I like to pick data sets that have, you know, some kind of meaning beyond just, you know, here's a random data set from Kaggle, right? So I like to use a lot of my I like to use a lot of data that's generated by myself, because I think that gets, if other people are seeing it there, they can potentially think about what data is being collected about me. Are we okay with that? These are, like, big questions that I don't think people think about at all. They just use their phone all the time, and we don't. I mean, if you really think about it, you know that there's a ton being collected on you, but I think we just don't think about it's just part of life every day, right? But I've also, I've also, I've also worked with data that's like the census data about wealth and race, right? I had a series called American money, and it looked at difference. It looked at how money is distributed by within zip codes, and then the demographic makeup of those zip codes. And you know, you can see big differences between these things and but that it also stands alone as interesting images. But then when you see what's behind it, it's even more interesting. So I like to pick data sets that are, you know, meaningful in some way, and they sort of ask bigger questions. It's not always successful, but like, that's what making art is. It's trial and error. Mm. Yeah.

John Bailer
So you're listening to stats and stories, and we're talking to Greg Matthews about data art and comedy. I think we're, it's about time for us to shift to maybe some uncontrolled variables. All right, spectacular. So how did you get involved? What is uncontrolled variables? And how did you first get involved?

Greg Matthews
So uncontrolled variables is a science and comedy show. And the way it works is we, we bring, we get some Chicago area comedians, and we get Chicago area scientists, and we do a show together. So the basic premise of the show is a comedian comes on, and they do a regular set of comedy. And then we, we give them the we give them the scientist slides, and they present the slides, never having seen those slides before. And hilarity ensues as as you know. And then we bring the real scientist up, and they, they present the slides again, and they, like, you know, address all the things that the comedian screwed up, which is everything, and so we get so the audience gets comedy, and they're actually getting to see a real scientist, and they get to talk about the work that they're doing. My involvement in the show is I help produce it, I help find the scientists, but I also do a, what we call a guest lecture, and I take the topic of the show, and I do a data and I do a, like, a completely absurd data analysis related to the topic of the show. We actually had a show last night. We do it once a month, and you just happen to, we just happen to be talking the day after the show. Last night's theme was environmental science. So I did a data analysis on enteric fermentation, which is livestock flatulence, and about how methane is released into the air by livestock. And I killed,

Rosemary Pennington
you know, it's we talk a lot on stats and stories about how, you know, we're living in this environment where there does seem to be a bit of distrust around science and expertise and facts. And I wonder, as you are working to produce these shows, how you are, who you're imagining your audience is for this, and sort of how you're imagining this, interfacing with this larger, sort of cultural distrust of science.

Greg Matthews
So, I mean, I know exactly who the audience is. The the audience is a lot of graduate students in, you know, science in STEM and there's a lot of professors who show up. We sort of have a whatever scientist we bring in, they'll bring all their colleagues, they'll bring the people in their labs. So it's sort of like a comedy show cheat code, where we are guaranteed at least, you know, 15 people to show up, because the scientists will bring a bunch of people, because it's their one, their one time to be on stage. That's, that's sort of who the audience is, like, we're not reaching a huge it's not like random Chicago. And generally, though, there are people who just show up and they'll, they'll go, I want to go see a show tonight. And they end up at the show not really knowing what it is. And they seem to, they seem to like it. Your other question, though, about the culture, I think is really interesting. So I hadn't thought about this for like, ever. Like, okay, so there's, there's a woman who is currently filming a documentary about our show, and we did an interview. I did an interview with her for the documentary, I don't know, three weeks ago, and she was asked, she asked sort of the same question. She goes, What do you think about like when you do this show? Is there a goal of trying to reach a bigger audience, or, like, is there like a serious goal of trying to communicate science to the general population? And when I started, when I when I first got involved the show, I didn't start the show when I first got involved in the show, it was no it's just a fun science and comedy show. But since, I don't know if you know what happened in last November, but since then, the show is sort of taken on, I do feel like there's a little bit more, there's a little bit more of a serious side to that. There's a little bit of resistance in doing the show, right? So we just did a show in environmental science, and we talked about global we talked about greenhouse gasses and global warming, and I think the fact that we're just talking about this in this environment, it's a little bit resistancey. It feels different than it did, you know, a year ago or two years ago, and we're like, not we're leaning into this. We're going to do a show on. On the biology of gender in June. Oh, right, yeah, we're bringing in, we're bringing in a biologist who is going to talk about, you know, gender from a bio he studies lizards. He's going to talk about the biology of lizards. But like doing those shows a year ago, would be very different than doing those shows now. And I hadn't really thought about this, but I do think there's something a little bit more serious now about even just talking about this stuff, right? And so I feel a different sort of I don't know if responsibility is the right word, but it does feel different to me in a way. Does that make sense? Yeah,

John Bailer
absolutely. So as you think about next month and preparing for this, you know, you're going to be giving a guest lecture again, yes. So, so what's, what is the, when you start thinking about the, you know, the topic going in, how do you start kind of finding that, that hook, that connection, that you want to really, really build on, I mean, the Cal flatulence, that seems like a really good call for environmental science piece. So, so where do you Where are you going to start now in this process for the next month?

Greg Matthews
So this the the process is, I panic for a week trying to figure out what I'm going to do. When I settle on that. Then I do some data analysis. I see what the results are. I go, that's funny, that's funny, that's funny. And then I put them in slides, and then I like, add jokes in at the very, very end, right? If you can get a decent analysis, the jokes like write themselves. You just find some like, everything's funny about it's easy to make fun of science. I think the hard part is finding what you're going to talk about. So what I do is I go to, I go to Kaggle data sets, and I'll like, just search. I'll search like, environmental science or gender or whatever, and I'll see what data sets come up. Because I just want to, I just want to, I just want to see what's out there. And there's, like, this process of, you know, what is even available for me to look at, and then I go through, I don't know, 10 or 15 data sets just to see if there's anything interesting in them. And I'm always worried there's not going to be a light bulb moment, but there always has been, and there always seems to be something that like, oh, that's the right thing to do, and then we go from there. But the important thing is just picking something, so I have enough time to do it in the month. So I at this point, the day after the last show, I have no idea what's going to happen next month. I'll spend Monday, Tuesday night next week, looking through data sets for next month.

Rosemary Pennington
I wonder what you've learned about communicating science in different environments from working on this. So

Greg Matthews
I think in, like in the totality of so I did improv, I like took improv classes, I performed as improv. And this is stand up comedy. And sort of through all of, all of that, I think it's, I think it's changed the way I teach in in class, you know, from simple things that, like, where you stand when you're facing an audience, you know, to to, you know, not being boring, Right? Like it's if you just stand up there and talk about, you know, statistics. I know you and I don't think this is true, but some people think statistics is dry. And if you can take a dry subject and teach someone and teach someone that, but also keep their interest with you know, things that are at least, you know, funny to some people, it creates a good environment for learning. So like, the I'll give you an example where, when I would teach stat 203, it's Introduction to Statistics. Instead of, like, writing out examples, what I will do is, I'll be like, All right. Someone shout out, what do you want to do an example about and they will be like, All right, horse racing. And I'll and I'll sit there, and I'll make up an example, and we're going to do a hypothesis testing example, and I will make it about two horses or or two different groups of horses. They each run something, and this other horse runs something, but they eat a special kind of oat. And then you get to, like, make up a little story, and you're doing, like, a little bit of improv, and that gets them interested, because they get to choose what we're doing. The example about, it's all the same hypothesis test behind the scenes, but, like, that kind of stuff really works, right? 19 year olds aren't all that interested in hypothesis testing in general. But like, if you can get them to, you know, be involved in any way. It's, it's really helpful educationally. I think,

John Bailer
yeah, that kind of audience involvement sounds like an awesome strategy. I I often thought, when I was teaching, teaching stat classes, that that I was glad that the expectations were low coming in for many, you know, because it's, it's, you know, if you're, if you're teaching certain classes, the expectations are coming in sky high. You know, if you're teaching a geography of wines, there's a very different expectation than if you're teaching hypothesis testing and inference and but that's a that's an opportunity. It's not, it's, it's, it can be a blessing. So I, I really like that, and I, I'm curious, you've mentioned this kind. Of how the comedy and sort of that the improvisational has impacted your your thinking about this as a presenter and as a teacher. How about your consulting? Have you? Have you found that how has has kind of this, this work in comedy and in data art changed the way, or changed some of the ways you interact with with clients in a consulting setting.

Greg Matthews
So I don't actually, I mean, I don't know if it actually impacts that, but I will say that I've learned things through making data art and from doing analysis for the comedy show that I've actually used in consulting projects or in my research, right? I haven't used this yet, but like the most recent example of this is last night. I did, I did some change point analysis. Well, I didn't do it last night, but in the show last night, I presented some change point analysis using the pelt algorithm. I had never used this before, and I would never have come across it other than just I had the I wanted to use it for a comedy show, and so I studied the pelt algorithm for detecting change points because of a comedy show. That's incredible, right? I feel much more comfortable working with image data, right? So, like, when you do some when you do, like convolutional neural networks, or you're doing image classification, the reason I know how to do any of that stuff is because of the data art and working with images like as a hobby. And so I've actually learned quite a bit that helps me professionally, from doing these hobbies, right, from doing the comedy shows, or from doing from making this data art, because it's all, it's all coding, right? And so, you know that also brings me joy. I can justify doing it because it's professional work, right?

Rosemary Pennington
I wonder, as you've been preparing these guest lectures, was there ever a topic that you really struggled to just sort of get written in a way that you felt was funny and compelling and sort of what helped you get through that

Greg Matthews
last June. So June is pride month. Last June, we did LGBTQ health. And I am a straight white man, and I had to be very careful the way I wrote that. But I did. I think I did a very good job. I looked at, you know, I looked at, I did a statistical analysis of legislation in like Oklahoma, or legislation across the states. But Oklahoma has a lot of these that is trying to, like, you know, legislate LGBTQ topics. And so I did the, I, I just presented other people's work, or other people's proposed legislation on this, but that was a difficult topic because of the sort of sensitivity around it. But it was, it was a fun challenge, and I had to do it in a way that was aware of who I am and the topic that I was talking about. But I would say that was the most challenging topic because of what it was.

John Bailer
So what kind of recommendations might you have for people who are interested and in getting involved in data, art or or getting involved in and comedy?

Greg Matthews
So with a lot of things, like, if you want to be a lawyer, you got to go to law school, and you got to pass the bar, and then someone calls you a lawyer. If you want to be a professional athlete, you got to, you know, make a team and sign a contract. The bar to becoming an artist is you just saying, I make art now, right? There's no, there's no there's no entry, there's no gatekeepers. You can just make art, right? And it's the same thing with comedy. Comedy is a little harder, because you got to get people to show up. People to show up, to your to your shows, but like you can make art right now, right? There's no There's literally no bar to it. If you want to be an artist, all you have to do is say, I'm an artist now I make art. Just go do it. And I think people are afraid to fail, and they need to stop this. You're gonna fail a lot. Like every time you see someone who did something really successful, all you're seeing is their successes. Right? Whenever you see a comedy show, whenever you see someone do an hour long comedy special that took them months to write and they failed the whole time before that, what you're seeing is the final, finished product. Same thing with an artist. They screwed up that that piece 1000 times before they got the final thing. So don't be afraid to fail and just go do it. It doesn't have to be art or comedy. This applies to like everything, whatever it is, whatever it is that you want to go try. Just go do it, right? Just try stuff. It's okay.

John Bailer
That's good advice. That is good advice. And you know, in this we like to end with good advice, so I'm afraid that's all the time we have for this episode of stats and stories. Greg, thank you so much for joining us today. Thank you. It was an absolute pleasure. Yeah, thank you so much. Stats and stories is a partnership between Miami University. Whoops, I'm going to do that again. Stats and stories is a partnership. Between the American Statistical Association and Miami University departments of statistics and media, journalism and film. You can listen to us on Spotify Apple podcasts or other places where you find podcasts. If you'd like to share your thoughts on our program,

Music Streaming Statistics | Stats + Stories Episode 354 / December 19, 2024 by Stats Stories

Chris Dalla Riva is an analyst for the music streaming service Audiomack by day while spending his nights writing and recording music and writing about music for his newsletter Can’t Get Much Higher.

Check out the Full Article in Significance Magazine

Episode Description

Artists of today are still making albums, however with so much emphasis being put on streaming charts how many of today's album streams are being made up by a few hit tracks? That distinction is the focus of today's episode of Stats and Stories with guest Chris Dalla Riva.

+Full Transcript

Coming Soon

The Statistical Kings of Comedy | Stats + Stories Episode 348 / October 24, 2024 by Stats Stories

Sachin Date works for VitalEdge Technologies and has, over his career, worked in two research labs, three software companies including two product companies, and in a classroom. He has built and delivered all kinds of software including massively distributed discrete-time simulations, data science stacks, a new programming language, and dozens of mobile apps, including the world’s first Napster app for Blackberries. Along the way, Sachin taught 100 liberal arts majors how to program in BASIC and built a mobile applications practice from scratch.

Check out the full article in Significance Magazine.

Episode Description

A journalist, statistician and sound engineer walk into a bar. Well, well, actually, to a studio to record a podcast. Comedians have been a source of great amusement and delight over generations. Popular comedians can earn a great deal from their live shows. In 2023 billboard reported that Kevin Hart earned 67, and a half 1 million dollars from 82 shows with 631,000 tickets sold. Comedies are also a popular genre for television and movies. One of the most successful shows, Seinfeld, created by Jerry Seinfeld and Larry David ran from 1989 to 1998. Have you ever noticed an echo of one of your favorite comedians from the past in the work of a comedian today that’s the topic of this week’s episode of Stats+Stories with guest Sachin Date.

+Full Transcript

John Bailer
A journalist, statistician and sound engineer walk into a bar…well, actually, to a studio, to record a podcast. Comedians have been a source of great amusement and delight over generations. Popular comedians can earn a great deal from their live shows. In 2023, Billboard reported that Kevin Hart earned 67 and a half million dollars from 82 shows with 631,000 tickets sold. Comedies are also a popular genre for television and movies, one of the most successful shows, Seinfeld, created by Jerry Seinfeld and Larry David, ran from 1989 to 1998. Have you ever noticed an echo of one of your favorite comedians from the past in the work of a comedian today who may have influenced Seinfeld or David? How would you know? Stay tuned, and you will get your question answered on this episode of Stats and Stories, where we explore the statistics behind the stories and the stories behind the statistics. I'm John Bailer. Stats and Stories is a production of Miami University's departments of statistics and media, journalism and film, as well as the American Statistical Association. Joining me is regular panelist, Rosemary Pennington, chair of the department of media, journalism and film at Miami University. Our guest today is Sachin Date. Date works for Vital Edge Technology. His career has included work in two research labs, three software companies, including two product companies and in a classroom. He has built and delivered all kinds of software, including massively distributed, discrete time simulations, data science stacks, new programming languages and dozens of mobile apps, including the world's first Napster app for Blackberries. I remember Blackberries and Napster too. For that, he has also taught 100 liberal arts majors how to program in basic and build a mobile applications practice from scratch. Date’s recent Significance article entitled that Shakespeare influenced Seinfeld provides the background for our conversations today. Thank you so much for joining us today.

Sachin Date
Thank you for having me, John.

John Bailer
So what is it? What inspired you to embark on this project? Right?

Sachin Date
So I didn't actually start with the intention of establishing the patterns of influence between specific comedians and their influences. What really happened was, I was browsing through the Wikipedia pages of some of the comedians I follow, and I quickly discovered that a lot of these pages have material on them that seem to indicate that the comedian was heavily influenced by other comedians, and sometimes not necessarily other comedians, but also writers and a lot of other, you know, kinds of people, like family members and friends and so forth. So I clicked on the links of some of these influences, particularly the influences of influences that came from other comedians, and I discovered that the Wikipedia pages of those influencers also contained information about whom they influenced. So I clicked on those links. And then I kind of kept on going back in time, until I ran into Wikipedia pages of writers in the 18th century, 17th century, 16th century. At one point, I opened a Wikipedia page of Shakespeare, William Shakespeare, and I realized that I had actually basically followed the links through from someone who is alive today in the 21st century, and then kind of transported myself back in time all the way to William Shakespeare. So that made me wonder, well, how common is this pattern? Are there other comedians who also have influence data listed on their Wikipedia pages? So I kind of started clicking around, and I discovered that a lot of comedians actually have this kind of data on their Wikipedia pages. Additionally, the Wikipedia pages of very influential comedians like Richard Pryor, for example, or John Carlin, have legacy sections on them which contain information about whom they have influenced. That's kind of part of their legacy. So there's those backlinks also to be followed. So I figured, well, let me actually see if I can do a systematic study of this topic. But when I started doing that, I realized that, well, the number of comedians involved is very big. Wikipedia itself has about, I think, 50 to 100 different categories devoted to comedy. So I figured, well, let me, let me, kind of just put a circle around my research. I'll focus only on the comedians who are contemporarily the most popular comedians in America today, and then I'll start tracing the links back from that set of comedians. And let me see how far back in time and how widespread those things kind of get. And that's kind of, you know, what motivated the research on that topic.

Rosemary Pennington
How did you determine who were the top comedians working today?

Sachin Date
Yeah, so I was interested in the way of finding that information, what I thought I would do and not actually work remarkably well was that I ran a couple of well, actually, I ran three pretty straightforward Google searches. So the search text basically went: most popular American comedians in 22x where that X was either one or two or three. So basically, the most popular American comedians in 2021, 2022, 2023 I figured, well, the last three years could be considered as kind of the window or the most popular contemporary comedians. So sure enough, Google showed a lot of search results. So I tweaked those results by setting the time frame filter to include only the results that were published in the October through December timeframe. So as soon as I did that, that brought forth research that was really more focused toward the end of the year, rankings and less and ratings that were available on the internet. And then I started going through those research, and sure enough, there was a large amount of diversity in there. So for each one of those three years, 2021, 2022, 2023, what I did was I essentially identified about 10 different types of sources, and I tried to keep those sources as different from each other as possible, just to kind of, you know, reduce the bias and improve the diversity in the data. So that gave me essentially a mass of comedians to work with, and then I merged that data, and then kind of arrived at the list of what I consider to be the most popular contemporary American stand ups.

John Bailer
So let's name some names. So who are some of the comedians that you ended up including kind of from this, this three year window?

Sachin Date
Well, there was Jerry Seinfeld, of course, and then there was Hasan Minhaj. Well, let's see. There was John Mulaney and Taylor Tomlinson, David Chapelle. A lot of the same, you know, same set of people started repeating in those names and those things. So one thing that kind of was common amongst them was that a lot of them were very active in stand up comedy. I mean, not just now, but I mean just, you know, three years ago, four years ago, 10 years ago. So they've been doing stand up for a long period of time.

John Bailer
So how many different comedians did you identify in this collection? I mean, once you filtered it based on you said that they were American comedians in this time window that were identified in October through December of these three years. So what was the total number of comedians that you included to start building this connection of influence?

Sachin Date
So the three sources that I ran those searches produced several 100 different comedians, and once, I kind of twittered out all the ones that were not US persons, because my focus was only on American stand ups, so I filtered those out. Then I also filtered out comedians which did not have Wikipedia pages, because my study was really kind of just focused on data that came from Wikipedia. I also filtered out comedians who had not really performed any kind of stand up or improv or sketch comedy. So once all those filters were applied, I narrowed the space down to about 100-175 to 200 comedians. So, that was kind of the social network of comedians that I started with. Now, this was the set of the most popular contemporary comedians as of the end of 2023, now, of course, a lot of those the Wikipedia pages did not have the influence data on them. In fact, I think for over 100 of those 175 or so comedians, there was no good data available on Wikipedia on who influenced them. So those were really isolated nodes in the network, and then the balance set of comedians who had that data, I kind of followed the links back in time and also across in space to build a social network. So in the end, I basically ended up with about 64 to 70 comedians who had a lot of influence data associated with them, and then the social network was kind of based off of that set. The overall network, once you kind of factored in all the influences on those comedians, the overall network of influences ran up to 200 and about 250 to 260 nodes and around 700 of influence.

Rosemary Pennington
What concerns did you have about using Wikipedia data?

Sachin Date
Right? Yeah, so Wikipedia, on one hand, most of the data that's mentioned on Wikipedia is referenced very nicely. So that's kind of one advantage you get from using Wikipedia data, that you can follow through the reference links and just kind of verify that the influence that is mentioned on the page actually does ring true. The text talking about the influence, it is actually a valid influence, but it kind of links through to some article somewhere that mentions how the comedian actually was influenced by someone else. On the other hand, with Wikipedia, there is really no way for you to know the strength of the current strength of the influence, so you're forced to consider that influence as a binary variable, so either the influence is there or the influence is not there. But in reality, of course, influence is much more complex than that. Someone could be influenced by someone else, very heavily in the past, but not really so much anymore. And that character of the influence isn't really brought out very well. Actually, it's not brought out at all in most cases on Wikipedia. So that's another problem. Well, it's really not so much a problem about Wikipedia as much as it is with the nature of the influence itself. I mean, it's an inherently qualitative measure. And in fact, one of the goals of the study was to kind of work, work around that, try to work around the qualitative nature of the influence. But yeah, back to your question about the limitations of Wikipedia data. So there was that, that the influence of nature was entirely binary. You either assume that the influence was there or it was not there, depending on what was mentioned in the page. The other aspect of information on Wikipedia is that you have to be very careful to interpret the text, the sentence, the context around the influence very carefully. So I mean, in fact, I'll give you a couple of examples. In one instance, I think this was on the page David Letterman's page, where he talks about how Norm McDonald has been one of the greatest comedians that he has run into, but that that kind of a text is really more in the context of Letterman considering Norm McDonald as really a great comedian, not so much an influence. So you have to be careful about creating the text around words such as great comedian or my hero, or anything like that, so it can kind of, you know, the there's a lot of subjectivity involved over there,

John Bailer
You're listening to Stats and Stories. Our guest today is Sachin Date. So you've talked a lot about this idea of an influence network. So help the audience. Picture this. You have a cloud out there, and each comedian is some, I don't know, some unique cloud itself that's connected potentially to others, and those edges that can check them. Those nodes are comedians. The edges are if they hit one influences the other. There's direction here if one is influencing the other. So you've built this from the data. What kind of influences or influencers surprised you most after having built this, this network out?

Sachin Date
Well, okay, so let me kind of give you some examples here. So one of interesting findings was that people such as Charlie Chaplin and Stan Laurel and Oliver Hardy of the Laurel and Hardy fame, they, all three of them, in fact, individually seem to either directly or indirectly influence almost a third of the contemporarily most popular American stand ups who had influences listed on Wikipedia. So I kind of found that to be quite interesting. What that also pointed to was that a lot of the influence was coming from people who were not really stand ups in the currently understood definition of that term, a lot of the influences or influencers were writers, comedic writers, or stage performers, or people like Charlie Sharply, who were clearly not stand ups, not stage performers as such, also, but very accomplished comic actors and directors and producers. So that was one interesting thing. I found another thing worth mentioning is to do with the data about the birth dates of the influenced comedians and their influencers. So as I was kind of tracing out this network, one of the things that I was doing was also capturing the dates of birth of the comedians and their influencers. And what I found was an overwhelming volume, actually almost 100% of the volume, I think, like more than 95%, 95 point, some percent of direct influence volume came from individuals who were at most two generations older than the influenced comedian, and more than half of the direct influence volume on the contemporary most popular American stand ups came from people within the same generation. So it just kind of seemed like a lot of the, I would say, an overwhelming majority of American stand ups are drawing their influence from people who are kind of roughly their age, or not really very much older than them. Now if you also factor in the indirect influences, meaning, let's say comedian a was influenced by comedian B and comedian B was influenced by comedian C, so comedian C indirectly influences comedian A. So I guess that was kind of one of the fundamental assumptions of the paper over there, the birth year to birth year time spans naturally swept across a pretty vast period of time, and that that period of time was like, truly vast. I mean, it was 10 years to more than 400 years, with a median time span of like around three years. So overall, what it was pointing to was that, well, first of all, there was a very strong pattern of influences, like an 80-20 pattern, where a large fraction of the influence was coming from a very small fraction of influencers. And then if you combine that with the vast span of birth year to birth year time spans, if you kind of put those two things together, the kind of the conclusion to draw from that was that most of the contemporarily most popular American stand ups drew their inspiration from A small set of influencers who were themselves, spread across multiple centuries. So that was kind of an interesting thing, an interesting conclusion that I drew.

Rosemary Pennington
I'm looking at your visualizations of the influence chains from William Shakespeare to first Jerry Seinfeld and then to Larry David. And the thing that I was struck by looking at these is that the chain of influence to Larry David seems a little more direct than it seems to have been to Jerry Seinfeld. And I wonder, you know, what do you make of that, given that Seinfeld and Larry David are so, you know, tightly connected as far as comedians and producers. But also, were there chains that influence that you found particularly interesting as you were combing through this what must have been a vast bunch of data?

Sachin Date
That's right. So there's definitely a very large diversity in the structure of the influence chains. Now one thing to kind of keep in mind over there is that the data definitely has some degree of what we could consider as some form of, you know, non response bias, and that's because a large number of comedians simply don't have influence data mentioned about them on their Wikipedia pages. So, that's going to generate some kind of a bias, which is kind of similar to the sort of bias that one encounters on surveys, where people simply don't respond to the survey. So that's missing data bias associated with that kind of missing data. So there could very well be influences which are not represented accurately enough by the crafts that you see in the paper. And that's almost certainly because the data for them is simply not available. But at the same time, there is still, I think, enough data on Wikipedia to draw the conclusion that the influence networks of a lot of these comedians have a lot of diversity in them. Now going back to your question about some kind of interesting features about these graphs. Well, one of the things that I noticed fairly consistently was that Woody Allen seemed to be performing the role of what you might consider as a router of influence. So his position in the influence networks was such that he seemed to be routing over influences from what were essentially writers in the 1800s, 1700s, 1600s all the way back to William Shakespeare, over to the set of modern day American stand ups. So on one side of the craft there were a bunch of writers and humorists and playwrights, and on the other side of the craft were people who were largely American stand up comedians with Woody Allen. The node representing Woody Allen kind of sits in between. So that I found it interesting in the way that it, you know, this pattern repeated so often. The other thing, one other kind of interesting feature I ran into was just the lengths of some of these influence chains. So for instance, I observed like 20 long, really long chains of influence. And they were about, I think, 12 to 15 influences in each chain. And then, as you kind of go back in time, starting with present day comedians like Hassan Minaj or Michelle wolf or Taylor Tomlinson, if you kind of trace back the chains from comedians such as those, you slowly start hitting notes that represented comedians of the American vaudeville era of the early 1900s to late 1800s and then before that come the notes that represent comic writers like James Joyce or Ken Jeong, and then you keep following through on those chains until you kind of finally reach people like William Shakespeare in one instance, and then in another instance, Miguel de Cervantes, the creator of Don Quixote. So that's more than 400 years ago. So that's like more than four centuries of influence carrying over from me, well, the Cervantes, all the way to the 21st century comedians.

John Bailer
So what's next for you? I mean, you know, you've looked at this kind of connection here, of comedians, you mentioned some gaps that were in the Wikipedia study. And I think even in your article, you mentioned Lenny Bruce, not being within this influence graph. Do you have any thoughts of back filling some information that you thought were gaps, or are there sort of next projects that would be associated with these types of investigations?

Sachin Date
So with Lenny Bruce, one of the things I noticed was that a few previous studies on scholarly influence in general, not necessarily our district influences on comedians, but scholarly influence in general, those studies did mention Lenny Bruce. Those Lenny Bruce didn't really appear to be one of the major influences over there, but the moment you kind of look at Lenny Bruce's influence and the context of comedy, it kind of bubbles up to the top very quickly in terms of influence. The interesting thing about that is that there's simply, you know, not a whole lot of data available about some of these comedians, and in some cases, there's a lot of data available about others. So it's quite possible that Lenny's position in the influence structure is very heavily dependent by simply the availability of data associated with the comedian. Now, well, in terms of future work, one of the things I'd like to do is to essentially look at the influence structures of individual comedians and comic actors. So I mentioned Woody Allen. Woody Allen turned out to be a router of influences from writers to stand up comedians. So I'd like to inspect the influence structures around other famous personalities in this space to see if they are also routing over influences in a particular manner, from their influencers to the people who they influence. And then the other kind of natural extension to this study is to go beyond the contemporary, most popular American stand ups, which is what the focus of this study was, and then study all American stand ups, or maybe all comedians who have performed stand up of some kind all over the world, and then inspect the influence structures associated with that much, you know, much more, much more comprehensive set of comedians. So one of those things I've already done is a paper out recently from me, where I've extended this study out to include basically all American stand ups, and then studied the influence structures on that body of comedians. And one of the things I found was that a lot of the results of this paper in significance actually carried through very nicely in that bigger body of American stand ups as well.

John Bailer
Well, I'm afraid that's all the time we have for this episode of Stats and Stories. Sachin, thank you so much for joining us today.

Rosemary Pennington
Yeah. Thank you for being here.

Sachin Date
Thank you for having me.

John Bailer
Stats and Stories is a partnership between Miami University's departments of statistics and media, journalism and film and the American Statistical Association. You can listen to us on Spotify, SoundCloud, Apple podcasts, or other places. You can find podcasts and follow us on LinkedIn and Twitter. If you'd like to share your thoughts on the program, Send your email to stats and stories@miamioh.edu or check us out at stats and stories.net and be sure to listen for future editions of Stats and Stories, where we discuss the statistics behind the stories and the stories behind the statistics.

Math and Music | Stats + Stories Episode 317 / February 29, 2024 by Stats Stories

Long after Harry Nilsson said, “one is the loneliest number,” and after Bob Seger sang about feeling like a number, music streaming services are using data to help of discover new music that connects to our frequent plays and preferences. Dr. Kobi Abayomi helps break that all down in this episode of Stats+Stories.

The Info Graphic Visionary | Stats + Stories Episode 315 / February 15, 2024 by Stats Stories

Good data visualization can catapult a news story or research article from ho hum to extraordinary. A new book series is exploring the careers of information graphic visionaries. And that's the focus of this episode of Stats+Stories with guest RJ Andrews.

Careers in Rom Coms | Stats + Stories Episode 264 / February 9, 2023 by Stats Stories

Romantic comedies are rife with plucky heroines. Small bookstore owners are being pushed out by big corporations, runaway brides, and Perpetual bridesmaids. But where are the scientists, microbiologists and engineers, and statisticians? One researcher went looking for them, which is the focus of this episode of Stats+Stories with guest Veronica Carlan.

Introducing Our New Guest Host | A Special Stats + Stories Episode / June 16, 2022 by Stats Stories

In this special episode of Stats+Stories we announce our new guest host Dr. Regina Nuzzo, a professor at Gallaudet University and freelance science writer, who will be joining us for the next couple of months. We will also be looking back at some of our favorite interviews from the past 12 months from the likes of...

Predicting the Weather with Pietro the Weather Tortoise | Stats + Stories Episode 225 / March 31, 2022 by Stats Stories

Meteorologists go to school to be able to predict the weather accurately, but for some people, weather prediction is a hobby. Maybe they have a trick knee that hurts when it rains or perhaps they know when a storm is coming by how the birds at their feeders are behaving. Some lucky folks have pets that can help them figure out what the weather is going to do and that’s the focus of this episode of Stats and Stories with guest Connor Jackson.

The Best Friend on Friends | Stats + Stories Episode 220 / February 24, 2022 by Stats Stories

Since the 1990’s people have been trying to figure out who’s the best friend. Is it Chandler because of his dry wit? Phoebe because of her unabashed enthusiasm? Joey because his loyalty? Well, leave it to statistics to give us a firm answer. Who’s the best friend from the show Friends is the focus of this episode of Stats and Stories with guest Mathias Basner

A Not So Standard Podcast | Stats + Stories Episode 212 / December 16, 2021 by Stats Stories

Our lives are increasingly shaped by statistics and data. However, they remain concepts that can be difficult for broad audiences to understand. A number of outlets, including this one, have sprung up to help make them more accessible. Today another one, the “Not So Standard Deviations” podcast is the focus of this episode of Stats+Stories with guests Hilary Parker and Roger D. Peng.

#MemeMedianMode Contest Winner! | Stats + Stories Episode 200 / September 16, 2021 by Stats Stories

At Stats+Stories we're lucky to have listeners who put up with John's bad jokes and our general shenanigans. In fact, you've listened to 199 discussions of the statistics behind the stories and the stories behind the statistics. To mark our 100th episode we asked you to submit statistical headlines and a haiku won. For 200 we took to Twitter using the #MemeMedianMode hashtag and this time those that rose to the top actually memes. Today we're talking to the creators of our top two.

Nynke Krol (@krol_nynke) is a statistician at statistics Netherlands who also submitted a stance mean that caused both, John and Rosemary, to actually laughed out loud when they saw her take on data normality.

Eric Daza (@ericjdaza) is a data scientist statistician who focuses on digital health, he submitted several means to our mean, median, mode contest, including one that made me flashback to my first graduate class in research methods, on causation/correlation.

The "Key" to a Successful Kickstarter | Stats + Stories Episode 197 / July 29, 2021 by Stats Stories

About 20 years ago, most people would have been unfamiliar with the term crowdfunding. Now, when it comes to the arts, you can crowdfund anything from comic books to Mystery Science Theater 3 Thousand to musical compositions. What it takes to successfully crowdfund a rock project is the focus of this episode of Stats and Stories with guests Moinak Bhaduri, Dominique Haughton and Piaomu Liu.

The Statistics of the (Stay-at-Home) Year | Stats + Stories Episode 169 / December 31, 2020 by Stats Stories

This has been a year for numbers. COVID states have been a collective obsession. Vote percentages surprising. Hours spent online ... unending. The Royal Statistical Society has run the numbers and has voted for its Stats of the Year. That’s the focus of this episode Stats and Stories with guest Jennifer Rogers.

The Stats of the Decade | Stats + Stories Episode 120 / January 2, 2020 by Stats Stories

Iain Wilton directs the Royal Statistical Society’s policy, public affairs and external relations work. His team’s responsibilities include the production of the RSS member newsletter, Significance magazine and the RSS’s policy briefing papers for MPs and peers. Iain’s team also organises the All-Party Parliamentary Group on Statistics as well as the RSS Statistical Ambassador network and the annual Statistical Excellence Awards. Iain has a doctorate from Queen Mary, University of London and has previously worked for the BBC, the Cabinet Office and the University of Essex. He has also written a biography of the sportsman, writer and politician CB Fry.

What Do Seinfeld, The Tonight Show And Stats+Stories Have In Common? | Stats + Stories Episode 7 (REPOST) / December 5, 2019 by Stats Stories

Rick Ludwin was hired by NBC Entertainment in 1979 and made director of variety shows there in 1980. He then became vice president for specials and variety programs in 1983; senior VP for specials, variety programs and late-night in 1989; and executive VP for NBC’s late-night and prime time series in 2005. In its 57 years, The Tonight Show has had five permanent hosts, and Rick has been the boss of three of them. His late-night division at NBC developed the hit comedy Seinfeld. Rick, a 1970 Miami University grad, joined the Stats+Stories regulars to discuss the use and impact of ratings on television programming

How Esports Stats are Tracked | Stats and Stories at JSM / August 7, 2019 by Stats Stories

Brian McDonald is currently the Director of Sports Analytics in the Stats & Information Group at ESPN. He was previously the Director of Hockey Analytics with the Florida Panthers Hockey Club, an Associate Professor in the Department of Mathematical Sciences at West Point, an Adjunct Professor in the Department of Management Science at the University of Miami, and an Adjunct Professor in Sports Analytics in the College of Business at Florida Atlantic University. He received a Bachelor of Science in Electrical Engineering from Lafayette College, Easton, PA, and a Master of Arts and a Ph.D. in Mathematics from Johns Hopkins University, Baltimore, MD.

Using the Stats to Improve Your League of Legends Game | Stats and Stories at JSM / August 6, 2019 by Stats Stories

Michael Schuckers is the Charles A. Dana Professor of Statistics at St. Lawrence University in Canton, NY. An applied statistician he has received funding from the US National Science Foundation, the US Department of Defense and the US Department of Homeland Security. He is the author of over three dozen publications including Computational Methods for Biometric Authentication (Springer, 2010). Additionally, Schuckers has done work in sports analytics particularly ice hockey including consulting with a MLB team and an NHL team. For his work in this area, he was named a American Statistical Association's Section on Statistics in Sports "Significant Contributor".

The History of Stats + Stories | Stats + Stories Episode 100 / June 13, 2019 by Stats Stories

We have reached episode 100 of Stats + Stories and therefore we felt like it would be a good time to have John Bailer, Richard Campbell and Rosemary Pennington sit around and talk about what all has brought us here and what more to expect in the future.

Stats + Stories

A Podcast About The Statistics Behind the Stories and the Stories Behind the Statistics

Taylor Swift and Markov Chains | Stats + Stories Episode 390 / July 9, 2026 by Stats Stories

Episode Description

Full Transcript

Hit Songs by the Numbers & What They Reveal About Us | Stats + Stories Episode 379 / January 8, 2026 by Stats Stories

Comedy, Art, and ... Statistics? | Stats + Stories: Episode 370 / August 28, 2025 by Stats Stories

Episode Description

+Full Transcript

Music Streaming Statistics | Stats + Stories Episode 354 / December 19, 2024 by Stats Stories

Episode Description

+Full Transcript

The Statistical Kings of Comedy | Stats + Stories Episode 348 / October 24, 2024 by Stats Stories

Episode Description

+Full Transcript

Math and Music | Stats + Stories Episode 317 / February 29, 2024 by Stats Stories

The Info Graphic Visionary | Stats + Stories Episode 315 / February 15, 2024 by Stats Stories

Careers in Rom Coms | Stats + Stories Episode 264 / February 9, 2023 by Stats Stories

Introducing Our New Guest Host | A Special Stats + Stories Episode / June 16, 2022 by Stats Stories

Predicting the Weather with Pietro the Weather Tortoise | Stats + Stories Episode 225 / March 31, 2022 by Stats Stories

The Best Friend on Friends | Stats + Stories Episode 220 / February 24, 2022 by Stats Stories

A Not So Standard Podcast | Stats + Stories Episode 212 / December 16, 2021 by Stats Stories

#MemeMedianMode Contest Winner! | Stats + Stories Episode 200 / September 16, 2021 by Stats Stories

The "Key" to a Successful Kickstarter | Stats + Stories Episode 197 / July 29, 2021 by Stats Stories

The Statistics of the (Stay-at-Home) Year | Stats + Stories Episode 169 / December 31, 2020 by Stats Stories

The Stats of the Decade | Stats + Stories Episode 120 / January 2, 2020 by Stats Stories

What Do Seinfeld, The Tonight Show And Stats+Stories Have In Common? | Stats + Stories Episode 7 (REPOST) / December 5, 2019 by Stats Stories

How Esports Stats are Tracked | Stats and Stories at JSM / August 7, 2019 by Stats Stories

Using the Stats to Improve Your League of Legends Game | Stats and Stories at JSM / August 6, 2019 by Stats Stories

The History of Stats + Stories | Stats + Stories Episode 100 / June 13, 2019 by Stats Stories

FOLLOW US ON