Deciphering Dishonest Charts | Stats + Stories Episode 374 / by Stats Stories

Nathan Yau is the author of several books on data visualization, including Visualize This and a number of other works. He also runs the Flowing Data blog, where he works to make the process of creating data visualizations accessible to a wide audience. He recently published a defense against dishonest charts on his blog, which serves as a guide to determining which visualizations to trust.

Episode Description

Data visualizations are everywhere, showing up in social media, in the news, and on company websites. With this onslaught, it can be hard to know what visualizations to trust. Learning how to navigate bad graphs and charts is a focus of this episode of Stats and Stories with guest Nathan Yau.

+Timestamps

Flowing Data's Origin and Development 2:32

Surprising Insights and Misleading Charts 7:55

Anatomy of a Chart and Common Misleading Techniques 12:25

Strategies for Reading Data and Interactive Charts 16:31

Feedback and Tools for Visualization 23:26


+Full Transcript

Rosemary Pennington
Data visualizations are everywhere, showing up in social media, on the news and on company websites. With this onslaught, it can be hard to know what visualizations to trust. Deciphering dishonest charts is a focus of this episode of Stats+Stories, where we explore the statistics behind the stories and the stories behind the statistics. I'm Rosemary Pennington, Stats+Stories is a production of the American Statistical Association in partnership with Miami University's departments of statistics and media, journalism and film. Joining me, as always, is regular panelist John Bailer, emeritus professor of statistics at Miami University. Our guest today is Nathan Yao. Yao is the author of two books on data visualization. Visualize this and Data Points. He also runs the flowing data blog, where he works to make the process of creating data visualizations accessible to a wide audience. He recently published their defense against dishonest charts, a guide to figuring out what visualizations to trust. Nathan, thank you so much for joining us today.

Nathan Yau
Thanks a lot for having me.

Rosemary Pennington
How did flowing data get started?

Nathan Yau
It was actually a grad school project that I had to put online. And so I bought some hosting for, I don't know, $3 a month, and then it came with a free domain. And the class was called data flows. And data flows were taken. So I took flowing data.com and then from there, it just sort of became a site where I documented things that I learned and thought that was interesting, and I just continued it through school. And I started posting my own projects as I learned more. And it just kind of grew from there. And then after I graduated, it just kept going. And so now it's kind of what I do

John Bailer
That's, that's a pretty good, good gig to start that in grad school and have this see the seed grow into this incredible product. Congratulations on that. Thanks. Thanks. Yeah, you know, I first encountered some of some of your work, not not through the not through the website, but, but through your first book, The visualize this, and when I was was developing a data viz class, your your your structure and organization proved to be just really, quite, quite useful and and they're probably students that now curse both you and me because you were assigned reading in this in this class. So I'm curious what inspired you to write this, this first book,

Nathan Yau
It stemmed a lot from the blog. Because I would make a lot of projects. I still publish a lot of projects on flowing data. And if it becomes popular, then people wonder how I made it. And so I would have to explain to a lot of people how, how I made it. And I would write tutorials sometimes, but then, as it kept going, more projects and just more inquiries, then the book became sort of a natural extension of the site. And I tried to make it as the site is very. I try to make it less technical and try to make it fun and approachable to people who don't necessarily always work with data and so visualize. This was my attempt to make visualization less stuffy and. Sort of, I guess, finding a way that where they could, people could chart, maybe their data in their own lives, and see how the charts, the charts are a representation of, you know, what they see in the world, and then kind of explore it from there and answer their own questions.

John Bailer
So one of the things that struck me was that you had a pretty interesting internship when you were a student. So you spent some time in the world of journalism,

Nathan Yau
yeah, that was, I mean, that was at the New York Times, and it was, I was just getting started in visualization, because I had told my advisor, Mark Hanson, that I wanted to go into visualization, but I didn't really know In what regard. And at that point, it was just, I like making charts and publishing it to make it for reports, mostly, and analysis, kind of like the John Tukey area, whereas more of exploratory data analysis. But then I went to the New York Times, and that completely changed my perspective on what visualization can be used for and who you are communicating to, because you have millions of people instead of you knowing yourself or a handful of people, and so it was about explaining the data in less technical ways. And I mean, that's sort of kind of like the basis or core of what flowing data is.

Rosemary Pennington
Now, how do you decide what you're going to write about for flowing data?

Nathan Yau
It's very much about

Nathan Yau
My own curiosity is so if you go back to the very beginning, it's sort of the beginning. Projects are very much my own explorations of what visualization is, because I don't know what I'm doing, like I very clearly don't know what I'm doing, and which is a good thing at times, because you don't know what you're restricted by. So you just keep doing whatever you want and just disregarding all the rules and then. But more recently, it's very much about what's going on in my life. And so you end up with things like relationships, aging, having children when people have children. And as I get, you know, I'm pretty squarely in middle age now, and so I'm very much about, I think about aging a lot, a lot, because just my body tells me so that I,

John Bailer
You know, you also pick a lot of really important data sets. I think, I think things like that. The American Time Use Survey is one that seems to be a real favorite of yours in a lot It

Nathan Yau
is, yeah, that is my favorite. It is my favorite data set, just because the data is so granular that you can look at one person and you can see what they did during that day. And then you can kind of, it's been going on since, I don't remember the beginning here, but it's been going for a while. So you can see, you know what people were doing, then what they're doing now. And then bring it all together to see what the aggregate is. And it gets, it gets really, like really detailed, because it's just a journal of people entering things, and then those activities were encoded by someone else. And for a more analytical approach, but it's fun to look at the anecdotes.

John Bailer
So what would you do when you've been doing this, these, these explorations as part of this, this this flowing data.com what has been kind of the most surprising insight that you've, that you've gained from from one of these visualizations.

Nathan Yau
Let's see, it's been so flowing data is seven going on, 18 years old now, and so there's been a lot of charts, but the one, the one that jumps out in my mind is when I look at how people meet and stay together. Data Set from Stanford University research. I don't remember the department, but they surveyed people, how they meet and how they stay together, and sort of kind of look at their timeline of when they start dating, when they get engaged, when they become romantic, and when they get married and live together. And so I applied it to every single one. It's sort of similar to the time use thing where you could see the individual's timeline, and if you plot all of them, it's sort of everyone is kind of the standard, more traditional timeline, where you, you know, you start dating your early 20s, and then a few years later, you get you get married, you live together. But then. And there's this one outlier that you can see because of the way that I made it, because of, like, the little dots that are moving around individuals getting together. And so they met 40 years prior, and then they stayed friends for a very, very long time. And then something happens where they become romantic, and then they eventually live together, and then, you know, 40 years later, they finally get married. And so I think that's like a very beautiful thing to see in a data set. And I had presented that at, I had presented that as in a talk, and then this couple came up to me there, and they had a similar timeline that they were so thrilled to see that they saw themselves in the data. Because they came up to me, they're like, that was us, that we were those people. So I really like visualizing data in a way that individuals can see themselves in some way. And so that was, I mean, the talk was very specifically about seeing yourself in the data. And they came up and saw themselves in the data. And so I felt very satisfied in what what it showed

John Bailer
on flowing data. You have several of these guides, and one of them is the one that I mentioned in the open to show defense against dishonest charts. And I'm just wondering what propelled you to create that?

Nathan Yau
I guess there are a lot of things happening right now that as statisticians, we're seeing, you know, data being taken down and charts showing maybe the truth and maybe not the truth. And so it's been going on for a long time, but I think in these recent times, and especially going forward, that there's going to be more misleading charts that are harder to decipher if they are real or not. And that's partly, it's from maybe generative AI type things, and maybe it's from people or organizations that have agendas that they want to push. And I think as a whole, it's in everyone's best interest to be able to, you know, at least, read a chart and figure out what is wrong with it. And in my experience, a lot of people will say a chart is misleading because maybe they don't agree with it or it doesn't match their expectations. But technically, the chart is fine. It's just that the person or the group who made it is showing a different angle, and that's, you know, that's part of the fun part of that, the fun part of data, where you can see many angles, but it can also be used for bad things. And so I tried to make a guy that was very visual, very interactive and very easy to see how small changes can make a big difference and what the data shows.

John Bailer
You know before we start diving into some of the these, these anomaly, these misleading ways that people can deal with, with charts and in this, this aspects of it could could you talk, give us a structure here, and you do that beautifully in the in the blog post about what is the anatomy of a chart, particularly these ideas of encoding and scaling.

Nathan Yau
Yeah, I mean, because, I guess to understand visualization, people kind of look at the chart as a whole, because in Excel, you just have a spreadsheet, then it outputs a chart. But I think it's very important to understand the components of it, and there are various ways to break it down, but the way that I go about it, for as big as audience as I can who's going to pay attention to you know, something about dishonest charts, I break it up into visual encodings, the things that represent the data, like shapes and lines and geometries angles. And then I go into the scales, which dictate how those encodings represent the numbers. Like, is it big? Is it small? And then those are just like the main components that you put together to form different different types of charts. Like a standard bar chart the visual encoding is going to have length for the bars. The longer the bar, the higher the value. And then there's the scales that dictate what that length of the bar represents. And so once we go into, you know, what those things mean, how things can change, I go into, you know, try to show very generally how those small changes can shift the point of view. And then from there, we go into the types of what that you. Different, very specific ways that people shift those things to show the data in the way that they want.

Rosemary Pennington
You're listening to Stats+Stories, and we're talking with Nathan Yale of flowing data.

John Bailer
Well, there's certainly a large set of options for kind of being misleading with graphics. I mean, it's a, you know, and we want to make sure that people don't think of this as a, how to guide that this is a, this is self defense. So as you go through this, you know that you start with a damper, and it goes all the way to Faker and the descriptor of the list of these, you know what? What is your favorite in the sense of most frequently detected of these misleading variations?

Nathan Yau
Yeah, I have one called the storyteller, which is where you have a single data set, but depending on what section you look at and how much of it you look at, and what scale you look at, it is going to tell very, very different stories. And so I have, in this example, I use the three pointers that are the percentage of NBA shots that are three pointers over, over the years. And so if you zoom into, you know, like a very small section, you can, you can say that three pointers are down, but if you keep going, you can see that three pointers are up, and they only went down because of the three point line was moved, and people just changed their behaviors, but then they adjusted, and then kind of it went up, went up again. And so we see this in, you know, non basketball context all the time, where people are, they have the story first, and then they just, kind of, you know, you know, poke at the data to try to get that story. And that's kind of, kind of the root of a lot of misleading charts that happen.

Rosemary Pennington
A lot of these charts on this guide are interactive in nature. And I am just sort of wondering, how did you decide what was going to be interactive, and what are you hoping people get out of playing with these charts?

Nathan Yau
Yeah, because, I guess, because this was many years ago. I had made a guide about misleading charts, and it was static. And since then, I've read other guides that are about misleading charts, and they're always while they're to the point, and they explain what it is. It's very hard to see it, and just the interaction was probably the, I felt like interaction was the key, key component that was missing in these guides, just because it's, you know, it's if you're saying, like, the scale has to be shrunk, and then has to be grown, and then increase this increase that things get bigger and smaller and You don't actually see it, then it's kind of, it's hard, very hard to imagine. So I made it interactive, and I tried to make it an easy interaction, where it was consistent across all the charts and all the types,

Nathan Yau
so that you could see the changes right away.

John Bailer
You know, as you were talking about the storyteller that is kind of tied to my favorite. If I had my top four in your list, I mean that that storyteller tied the cherry picker a lot in terms of filtering the data points you show. But I, but I also thought, you know, the time gap, base stealer and over-binner are also three of the others that kind of surfaced for me, beyond that storyteller, cherry picker. Can you talk about one or two of those as how they play out?

Nathan Yau
Yeah, so I'm like the

Nathan Yau
time gap. One is sort of my favorite one, because I use baby name data, which might be my second favorite data set behind the time use, just because it's so consistent. And everyone can relate to that right away, because they can look at their own name. I looked at my own name and to show everyone how Nathan has become, you know, a really great name. How has become super popular. Everybody wants to be Nathan. So it shows counts for 1960 1970 and 2010 and it shows a very steep increase between 1970 and 2010 but it's missing, you know, all the years between 1960 and 1970 you know, several decades. So as we zoom out and we fill the gaps in time, we can see that, you know, Nathan has increased, but maybe it's also on its way down. And again, I think so. I. People are. It's a variant of the cherry picker, in which people are just picking points and then seeing what fills their narrative and just going from there.

John Bailer
So one of the things you do within this piece is that you suggest some strategies for reading data, you know, so that you're kind of okay now that you've seen some of these things. Here's what, here's what I suggest you do. Can you, can you talk through some of those with us? Sure.

Nathan Yau
So, in case it's not obvious, it's very Harry Potter based, because my kids are very into Harry Potter, and I kind of like, was trying to figure out, like, the framing of how I would talk about misleading charts. And I didn't want to just talk about, you know, charts that lie. I wanted to talk about, like, if you see a chart, then what do you do? And so it came, it occurred to me that there were spells and counter spells, and in Harry Potter, and so in this defense against the dishonest charts, is that there's something that's misleading, and then there's a counter chart that you can kind of explain it. So for example, the cherry picker is going to which is a chart right? That shows the percentage of people who wear glasses given your age. I recently had to get glasses. I had perfect vision for a very long time, and was very proud of that. And then I went to the optometrist, and they told me, in about a year, I would not be able to read closely and I would have to get glasses. So I was looking at that data set a lot, but if you look at the cherry picker in the initial view, it is like 59 to 60 years olds. It shows us very gentle decline, where fewer 60 year olds wear glasses and 59 year olds, where you would expect more people to wear glasses as they get older, you know, as so the counter chart to that is to point out that you there's a, you know, wider age range than one year, between 59 and 60 so use the slider and It shows the full percentage range from age 18 to to 85 where you can see, of course, where there's a a sudden spike in the percentage of people who have to wear glasses, which happens to be around my age, and then it kind of levels out around, you know, 60s, In the 70s, and if you're wearing glasses, then by, you know, 60 you're probably going to be, I mean, you're going to wear glasses. And then a lot of people, I guess, don't wear glasses. I don't know what, I don't know how that goes. I'll tell you when I get to that age.

Rosemary Pennington
So visualize, this is in its second edition, and you have data points. Do you have plans for other books that might be based on the work you've been doing for flowing data?

Nathan Yau
Not right now, some people have suggested that I turned something like dishonest charts into a book, which could be a possibility. But for me, most of my work is online. So I've tried to try to focus on that where I have, I write a lot of guides and and courses online through flowing data. And so I try to focus on that for now.

John Bailer
Yeah, I think that as I was thinking about that idea of what you just described, of expanding on this into a book, I think the interactivity and the sliders for, for, for exploring the concepts is such a powerful, powerful technique for, for letting, letting, kind of the person who's who's encountering the information, interact with it directly. So I, I, you know, as someone who does like books and who's been involved in writing them, I still think that there's some things that this type of direct connection and manipulation seems to be powerful. What do you think about that in terms of static versus interactive displays? When you're working through problems,

Nathan Yau
I always try to go as basic as I can first, and then if it needs interaction as I'm making the basic view, then I try to go more advanced. So with this again, it was I had made a bunch of static charts at first. I've done various guides where you have a single data set, and I've made many, many charts based on a single data set to show that it has various angles, but I felt like it was missing the interaction. Because when you have the static. It's a view, and then there's a break, like a visual break, and there's another view, but you can't see the transition of how it got there. So at this point, with this guide, the interaction was very important to see how you got there. And I've also tried to try it with animation, which is another way to get from point A to point B. But with animation, a lot of time people are going to, I guess, be satisfied with what you show them with the animation. You know, like the animation plays, and they just watch it go, and they might miss something with the interaction, you can kind of, it's, it's fun. For me, it's fun to just, like, shift it back and forth and swing it around. And so just with that, if you have someone playing with it, then you kind of have a deeper understanding of how it happens.

John Bailer
You know, when, when we were starting this conversation, when one of the things you said about people misreading charts was kind of your, your prior belief shaping how you consume a graphic, what kind of ideas do you have about trying to address that? You know, if you know that, that there's sort of this visceral response that someone who may not agree with conclusions from it, how are ways that you could anticipate and possibly integrate that into a display?

Nathan Yau
So what I would Okay, let's, I'll start with the default approach, which a lot of people like to do, is they're worried that they're going to offend a lot of people, or that they're going to show an angle that a lot of people don't agree with, but they know that it's true, so they will compensate by showing as many angles as possible, all the angles. And it could be this, it could be that, and annotate as much as possible, and they go overboard with showing everything, instead of focusing on what they actually wanted to show. And by trying to show everything, they end up showing nothing, because it just ends up as a garble of charts. And so my approach is usually to focus on the question that I'm trying to answer, and I'm trying to make a chart that specifically answers that single question the best, best that I can. And people might not agree with that answer, but I know that what I'm showing is, to the best of my ability, is is honest to the data and so and there are always different ways to see that data, and I always point people to the data source so that if they disagree or they think it's assert a different way that they're always able to go to that data set. And so I think the weird thing is that people don't, most people don't go to the data set like the people who strongly disagree, don't look at it themselves. They just want to disagree. And so there's with flowing data, just because of, like, a very general audience, there's always going to people who very vehemently just disagree with what you have, no matter how innocent that I'm like, I'm talking about glasses or something like that. Some people like to disagree about things, and sometimes you just have to ignore it, and other times you can take it in and think of a different angle that you can talk about later, and it becomes kind of a continuous narrative of how you're looking at a data set.

Rosemary Pennington
That's all the time we have for this episode of Stats+Stories. Nathan, thank you so much for being here today. Thanks, David, thanks a lot. Stats+Stories is a partnership between the American Statistical Association and Miami University's departments of statistics and media, journalism and film. You can follow us on Spotify, Apple podcast or other places where you find podcasts. If you'd like to share your thoughts on the program, send your email to stats stories@amstat.org or check us out at Stats+Stories.net and be sure to listen for future editions of Stats+Stories where we discuss the statistics behind the stories and the stories behind the statistics.