Amanda Makulec is the Senior Data Visualization Lead at Excella and holds a Masters of Public Health from the Boston University School of Public Health. She worked with data in global health programs for eight years before joining Excella, where she leads teams and develops user-centered data visualization products for federal, non-profit, and private sector clients. Amanda volunteers as the Operations Director for the Data Visualization Society and is a co-organizer for Data Visualization DC. Find her on Twitter at @abmakulec
+ Full Transcript
John Bailer: Social distancing, frequent handwashing, remote work and shelter in place ARE new Norms during the COVID-19 pandemic. ideas that may be ghosts of algebra class past, such as exponential growth and doubling-times of confirmed cases are now subjects of daily discussion. One common feature of daily reports are charts and graphs designed to convey information of this worldwide epidemic and to explain proposed control measures. Understanding and evaluating Coronavirus charts is the focus of this episode of Stats and Stories, where we explore the statistics behind the stories and the stories behind the statistics. I'm John Bailer. Stats and Stories is a production of Miami University’s Departments of Statistics, and Media, Journalism and Film, as well as the American Statistical Association. Joining me are regular panelist Richard Campbell, former chair of Media, Journalism, and Film. Rosemary Pennington is away today. Our guest on our first-ever Stats and Stories home edition is Amanda Makulec. Makulec is Senior data visualization Lead at Excella and co-organizer of Data Visualization DC. Thank you so much for being here today, Amanda.
Amanda Makulec: Thanks, John.
Bailer: Amanda, I thought your Fast Company article “A complete guide to coronavirus charts: Be informed not terrified” was great. What inspired you to write this article?
Makulec: So, the original the article was actually originally written for data-visualization practitioners and framed quite differently. We saw that early on in the coronavirus epidemic, that there was a lot of information and data being made publicly available about the new cases that were emerging every day, and in some cases being tracked in on an hour-by-hour basis. That was in part because of the work that John Hopkins University did to create an early COVID-19 tracking dashboard and make the underlying data publicly accessible to their GitHub repository. Because that data was so available, we saw a lot of people from the data science and data visualization fields jumping in and wanting to analyze and build their own charts and graphs of that data and information. And exploring data could be really valuable for them on your own understanding, but it's challenging and this kind of global pandemic environment because there's so much subject matter expertise from public health and biostatistics and epidemiology, that we really need to understand to actually wrap some context around that data. There's a ton of uncertainty in those numbers. There are calculations that you can run, like case fatality rates in the early days, but really should you, when we really have uncertain denominators or uncertain numbers of cases depending on how many people are being tested? And so the original the article actually started as and was published as “10 Considerations before you publish another chart about COVID-19” designed for the data visualization community, and then later adapted into the post that you saw in Fast Company that reframed some of those same points and kind of same key learnings, but for a broader audience who are really consuming all of the charts and graphs that are saturating the media right now.
Richard Campbell: Amanda, you were talking there about uncertainty and one of the things that we know we're all struggling with this is- a lot of us want more certainty at a time when it’s kind of early on in this contagion. A lot of journalists are asking questions that kind of jump out ahead of everything. When are we going to know this? When are we going to know that? Do you have any advice for journalists in wrestling with the whole question of uncertainty? I've learned over time; this is something that statisticians wrestle with all the time. I'm journalists seem to be a little less patient about this, and we really need that kind of patience now, I think.
Makulec: I think that patience is really important because of the danger of misinformation, right? So as we look at a lot of uncertain numbers things, again, that are tempered by how many people are being tested and what we know, I think that we have to be really cautious and careful about how we frame information to the public. The thing that jumps right out to me there is what kind of headlines we write about data and information being put into the public domain. So, a quick example that is top of mind right now, for me, is there was a headline in the New York Times around the large share of young people; I think they said 40 percent of hospitalizations are among young people; they sent a CDC report. But when you go back and take it to the CDC report and you look at the actual numbers of cases that were analyzed and then segment down to the number of cases that had relevant data, I'm going to not get the numbers precisely right here, but the ballpark was around 4500 total cases, around 2400 cases with age-related data available, so we're already cutting this down significantly, right? Like we’re already chopping that number almost in half, and then further segmenting down to the I think 1500-ish cases that actually had 1500-ish cases were missing hospitalization data, so we're now down to five 508 cases that have available data about hospitalization status and information about age. Now, these CDC reports are typically designed for a technical audience tracking updates on a disease, right? I'm very empathetic to the CDC teams. We don't usually have this wide public media attention looking for hour-by-hour updates about the data and information that we know. And so, for journalists, I would say a big word of caution around how you frame and share key takeaways. Nowhere in that entire report did they say the key takeaway CDC report was some kind of 40% of hospitalizations are young people. In the CDC report their key takeaways run critical actions people could take from a public health perspective. So, I would ask can we try to go ahead and amplify what those kinds of key takeaways are that are being shared by the key agencies responding to this. Whether it's the CDC or more local, state, and even county Public Health departments, rather than focusing on looking for in digging for those statistics that make flashing clickbait headlines.
Bailer: The point that you made about complex metrics, even though methodologically simple, as being really tough thing to discuss. So, you talked about case data being deceptively complex. Can you talk a little bit about that complexity and what…?
Makulec: Sure, so as a person who has a background both in public health and data, when we look at how we actually track information about an emerging epidemic, the data that we have available about cases is the best we know, right? When you start to look at the modifying words on cases, even though we may refer to it that way, you're looking at things like confirmed cases. And those confirmed cases could be cases that have been laboratory-confirmed by a CDC laboratory. At a lower level, those confirmed cases could include cases that we usually called presumptive positive cases in the CDC reporting, that then are cases that were confirmed in a local laboratory, but not validated at a CDC laboratory. You can start to see the complexity as you break down all these different kinds of cases that we have out there. We have the added challenge that without robust widespread testing, we don't have a really clear denominator on how many cases we even have in the US or some other countries. A country that's done a lot more widespread testing, like South Korea, has done a lot more work to actually capture a more accurate denominator and actually have a better understanding of the share of their population who were infected. And so there are a lot of challenges and how we quantify that case information, and we're seeing that now with the public health recommendations being put out there, that if you are not going to have a materially different experience, in terms of the treatment from the medical system, that you may not qualify to be tested, even if you show a lot of the symptoms of COVID-19. So, it's going to be hard to actually ever have- until we have some way of quantifying who was infected previously to other kinds of tests, it'll be really difficult to have a really accurate denominator for what the total case volume was.
Campbell: Amanda, you talked in your article that John cited at the beginning from Fast Company, about the story or the person behind each data point. This is one of the considerations that you talked about in that article. Could you talk a little bit more about that?
Makulec: I think that there are two things to consider when we think about the people behind those data and those numbers. And I think that's becoming much more personal to us as we see this epidemic unfold here in the US. I think on one side, we have to think about all the people who are part of these high-risk groups and think about the language that we use as we create our data visualizations and the text in the words of the content that wraps around them that recognizes that people who are in high-risk groups are also people too, and how would they feel reading about or seeing the kind of content we're putting out? Modifying words like only the blank group are at high-risk for this disease. I think it's really other-ing and doesn't recognize that this is a shared burden that we all face in terms of how we respond and the actions we each can take to make a difference. I think, also, it's really important to try to better help people understand the how the actions of any one individual person can have a huge ripple effect with a disease as infectious as COVID-19, especially when you can be asymptomatic, not know that you’re sick, and still be spreading virus, which is what one of our big challenges is right with COVID-19, is that you may not be intentionally spreading the virus, but the choices you're making about going about your everyday life, or going to large mass gatherings when they were still happening, can have a material impact. And one story, I think, that does an exceptional job of trying to put that person at the center, and individual choices at the center of the COVID-19 story, and the data is a story from Reuters called The Korean Clusters that actually walks through and visualizes the various contact tracing that the team in South Korea did to identify all the contacts related to each individual infected patient, or each individual infected person, and found that there was one single patient, patient number 31, who was responsible for spreading the virus in such a way that it was at the root of the one single biggest cluster around one of the churches in South Korea. So that one decision to go to church that one day, and then eat at a hotel buffet later, became the thing that was the kind of index patient or index case for whole entire cluster of other people getting sick. And I don't think- I don't think that any of that information in that story at patient 31 should be used to point fingers at or try to accuse someone of spreading a disease. I think that's really important as we think about contact tracing and kind of what these index cases look like. I think it's really important to remember that most people are probably doing this, not out of any kind of malice, but out of just not realizing that- what impact they're having. And so those kinds of stories and numbers that as you scroll down that page on the Reuters page, and you see this this case cluster of individual people all clustered together get bigger and bigger and bigger and fill your entire browser. You can't help but see that visualization and think wow, like there really are ripple effects to the choices each of us are making right now.
Bailer: Can you talk a little bit about why that can be such a complex statistics, and maybe a comparison that one should do with caution?
Makulec: So, let's go ahead and unpack those two things a little bit differently. One around diseases, I think, and one around country comparisons and populations, right? So, one of the infographics that we saw go really viral and really adopted and shared widely was the comparison that looked at the total deaths per day for different diseases. Things like malaria, TB, things endemic in different countries and around the world at this point, compared to COVID-19. And when you have such an early stage epidemic, trying to compare I'm kind of daily death count per day, is functionally meaningless, and might really understate the severity of the impact that this disease could have. And so, my nervousness is actually even less with the math, and more with what someone who doesn't understand that complexity might see in the graphic. So when you see COVID-19 down there called out at the very bottom, so small, relative to TB and malaria and other things that we don't feel like we're shutting down our society on a day-to-day basis, it might cause you to think that it doesn't matter if you take some of the public health actions that are being recommended. And instead, we have to think about all the other parts of that disease, that health systems we have helped you treat and manage those diseases, the treatments, the vaccines, the knowledge that we have about the virus or the bacteria that causes these different diseases. All of those things influence what that fatality might look like, what fatalities might look like, or the deaths might look like. And so, as I look at charts and graphs, I think about like to your point, I think that we talked about earlier, how would someone who's not as ingrained in the data and public health space actually see this information? And what would it cost them to do? So, making those kinds of comparisons, I think, between diseases is really challenging when we have such limited data and information right now. And there's a lag in terms of the deaths, right? Someone is going to have to go through a sequence of steps and care before they die. From COVID-19, so it's a lagging indicator for us when it comes to how we look at this epidemic. I think your second question about how do we go ahead and benchmark countries against each other, or not, is important because it speaks to the complexity of health systems as we look at the data that comes out from different countries, one of the best trackers I have seen about comparing countries against each other is John Burns Murdock's daily charts that he's doing from the Financial Times, which thankfully are all removed from the paywall at this point, which makes me really happy. I really appreciate the ways in which different media outlets are making content more accessible. What John does really well is that he goes ahead, and he plots new points for the new cases and now the new deaths per day for different countries and looks at their trajectory. He normalizes where they are the epidemic by normalizing them down against the days since the 100th case so that we're not comparing China… And what was that?
Makulec: Yeah and we’re not comparing China, where their epidemic started so much earlier to the US, where we're starting later, and we can better look at what trajectories we’re on. But I think most importantly is the adds an annotation layer to the actual charts and graphs themselves that helps you understand what actions were taken in different countries. And those text annotations that he updates, and he looks for feedback on it takes input on, help people make sense of what that data means. Why is it that South Korea kind of flattened out? What does it mean when the US is spiking up higher? And at what point do we need to look more granularly than a country? The US is a broad and diverse country and we really have to start looking at what's happening in our individual states and cities to get a better perspective on what's happening within this epidemic.
Bailer: You're listening to Stats and Stories and today we're talking with Excella’s Senior data visualization Lead, Amanda Makulec. Amanda, you wrote about how visualizations can impact, and in fact, encourage social responsibility. Can you describe an example or two of this with COVID-19?
Makulec: So, one of the most impactful visualizations from COVID-19 seems to be around flattening the curve, and this idea of how do we go ahead and slow the spread of disease so that we don't overwhelm our health system capacity? That looks like moving a very high spike on a curve to something that happens over a longer period but is slower to emerge. And while I saw a lot of different static graphs and charts that were trying to make sense of and communicate that idea, most of which were really information graphics that were based in kind of a qualitative idea, not quantitative data, which I think is important. It was plotting an idea. The one that really took off, and I saw get re-shared by people of all different backgrounds and all different areas of expertise, was the Harriet Stevens’s Washington Post animated explainer about how COVID-19 spreads, and why flattening the curve is so important, and why social distancing is so important. Instead of going ahead and try to communicate something in a static graphic or an animated gif, you can see that he actually walks through the stages of showing us what happens in a simulated Tau, and how many different people could get infected based on these little moving bubbles on a screen and little moving dots. And plots the curve depending on different actions taken and different social distancing done, to show how our individual actions can help flatten that curve out. And as I understand it, is the most viewed piece of journalism on the Washington Post website of all time, which should say something about the ways in which data visualization, even conceptual visualizations, are making concepts and ideas more accessible to people, and helping them understand the role that they play in helping to slow the spread of COVID-19.
Campbell: How hard is it to do those kinds of visualizations, those moving graphics that are so stunning? I mean, I know exactly what you're talking about and it was amazing to watch it. How long does it take? What goes into that? How many resources do you need? And what- we need more of that…
Makulec: So, I think that that's a challenging question because it depends on the scope of the project, right? I think that one of the great opportunities we have is that there are so many open-source libraries and platforms that enable us to do more animations. I mean even platforms like Tableau have released animation features that allow you to see how dots and how marks move in space. And so, some of those newer products are making this animation feature more accessible to non-coders. I think to develop a lot of the really stunning animated graphics and visualizations we see on like the New York Times, in their recent visualization around how COVID-19 is spread, or with the explainer that was produced by Harry, I think that you have to have more in-depth knowledge and a stronger background and more of the code-based data visualization platforms, using tools like D3 and other platforms, to build those kinds of concepts. But the thing I would reinforce is it's not just about knowing how to use the technology. It's about consulting with experts in the field and the space. It's about the person that has that knowledge about how to create that stunning animated visualization, partnering their expertise with some Public Health experts, who understand kind of what those key messages are that we need to get out. And the more that we find ways to collaborate across our different fields and different sectors, I think the more we can make sure that we make information accessible and available through visualization to a broad range of people in audiences.
Bailer: Oh, Amen, that’s great. I love the call for this kind of collaborative teamwork. You know, I'd like to follow up on one of the things that you mentioned in your blog post on small design choices, and how much they can impact and how you interpret a visualization. And I thought that was really cool, and I was wondering if you could just talk a little bit about some of the stuff with some examples of that.
Makulec: So, I think the challenge that we face is data-viz developers is that there are so many tools and products that promise to make choices for us, nowadays. There are show-me features and recommendation engines and things, but those small design choices are really what change how someone sees the information we present. So, I think the simplest and most kind of biggest example I've seen with COVID-19 is a real overuse of the color red. It's the color red, the color red representing cases and the ways in which we see red and immediately, it causes a sense of something bad happening, panic, fear. Now. I'm not saying that cases of COVID-19 are good, but the way that we interact with and see information is really important, because when we see a lot of red, and we see big red bubbles on a graph that looks almost like targets, I mean that just creates a certain visceral response within us that maybe isn’t the healthiest for being able to objectively interact within view of visualization. So, I think one of the examples I called out the article was on maps, for example looking at using a red palette versus a blue palette on a map, and how we're- they're able to focus our attention differently when we're not distracted by this blood-red color that's the glaring out of us on the map. I think we also have to think about accessibility, right? Using reds and greens together that don't have enough contrast can make it challenging for someone who is colorblind to actually see what we’re showing in our charts and graphs. And I think that using really purposeful text, and I keep coming back to this is so important, there are many people who maybe don't quite understand how to read a chart or graph, but they're going to read the words that you use there. They're going to use the word- read the words you use around it. So be specific and the kinds of information you present, the text that you wrap around it, and add explaining context and detail like there's a great set of responsible charts and graphs from Data Wrapper out of Germany that is doing a great job of taking the information out there and creating very usable, intuitive charts and graphs about this epidemic, or this pandemic. And one of the things they do really well is they actually have a reference line on their trend charts that shows back in February when China changed their definition of a confirmed case. So, it's easy on that line chart, or an area chart, that you see suddenly in mid-February this spike new confirmed cases. That would look alarming, right, for anyone looking at a chart or graph of new disease cases that are emerging, but when you then see this reference line that said China's changed how they count cases, that gives you some context, because when you then go you say okay, something changed here. I might not know the details from this reference line, but now I know what to go look for, and I can go find more information, and I can be a good consumer of this chart and go say well what changed? And what changed was they started counting both laboratories confirmed and clinically diagnosed cases. So now you're including a lot more cases than you had before, and so that explains that spike. So I think in the onus is on us as visualization designers to look at what we're plotting, look at the anomalies of the things that might look scary to somebody in this context, and go probe and found out if there is a reason why something happened, and then make that information that we find also more accessible to others through our charts and graphs.
Campbell: Amanda, when you look at when you want reading just being a news consumer? I know one of the things I'm always doing as somebody who's been a journalist and has taught journalism for many years, I'm always asking students and thinking what's missing from this article. What aren’t they asking? When you look at news do you think like a data designer, you’re looking at how could this story be improved by data, and also were generally what could journalists be doing better right now?
Makulec: So, there's such a myopic focus, I feel like, and maybe it's just in which gets shared and amplified on counting cases, and such a narrow focus on counting cases, and how many cases, and what new cases… And watching that happen is challenging because A there's so much uncertainty in that data and nuance, as we talked about, but B, as a general consumer of information in the public, what am I going to do that's materially different when I see that there are five new cases yesterday in a given state or a given city. What am I materially going to do that's different about my behavior, based on that number that information? And it makes me wonder what other information and stories could we be amplifying further in terms of what is happening right now, who is being impacted, how we support each other? I'd love to see more amplification of that. I'd love to see more reporting of ranges instead of rates. When we report information, like case fatality rates, or other data that have a lot of complexity and uncertain denominators right now, when we report those as individual points, or we report them and plot them on a bar chart or a lollipop chart against each other, we're assigning a certain amount of certainty that that data, and we need to be better at reporting that kind of information that has uncertainty as ranges. And so, I really appreciate what I've seen some of the shifts that some of the news outlets have made, like a recent infographic from Fox actually had case fatality rates for different countries as ranges, instead of as individual points. And I've really appreciated where I've seen pivots from journalists where they've gotten feedback and they've started actually shifting how they're framing or reporting information because the more that we plot points on a graph or a chart the more we assign a certain amount of certainty to that be the number and we create expectations from people about that number staying the same, when in all actuality that number is going to change with time.
Bailer: You know, I'm curious what would you do to recommend students do if they wanted to follow your career path? To go to go into this kind of visualization.
Campbell: We need to point out John that she came from Miami University too.
Bailer: I was just about to throw that as a softball, Richard. This was the you know, what? Geez. Oh, you’re killing me Richard and we're not even in the same studio. So, you know, yeah. Yes, we know you went to an outstanding undergraduate institution Amanda, and you are welcome just to shut that out. But also, just the idea of getting involved at doing this kind of visualization on important problems, what kind of things might you recommend
Makulec: I think it's a different world now than it was when I started out. When I started out it was an accidental career almost. Nowadays, it sounds like you guys are actually teaching this at Miami which makes brings my heart much joy. But I think critically to being able to work in data visualization is mastering skills from a few different disciplines. If this is what you want to do as a career path, I think you have to understand the stats in the math, obviously, so that you're representing information accurately. You have to learn how to interact with and use and analyze large data sets and be able to both scrape them find them to shape them. You don't have to be a data engineer. I think that you don't have to be a specialist in everything but you should be conversant and what's required to get data into a normalized table that you can use for analysis purposes, and I think you have to learn some fundamental principles from the UX and graphic design fields. Functionally, you're trying to take tables of information and transform them into some kind of visual story that someone who is, maybe less numerous and or data-savvy can interact with and understand and clean information from. I think, though, that there are so many divergent careers that involve data visualization, right? You can become a very good data journalist who works in a newsroom. You can become someone who builds business intelligence tools, that help Fortune 100 companies make decisions about where they allocate resources. You can do what I did and find a career path in the social sector, where I went on to get a master’s degree in public health after finishing my undergrad; a shout-out to the zoology and sociology departments who prepared me well to go into public health, where I brought together that biology stuff, along with all of the thinking about society and demographics and everything else. But I went ahead and focus on the specific domain of expertise, and where I think I've really enjoyed doing my work the most is where I've been able to say, you know, I understand you as public health people, and what you're trying to do and what your goals are and what your methods are and what programs look like, and what indicators you care about. I understand those things because I have that subject matter expertise, and I've also worked to master a lot of the knowledge around data visualization best practices, and how we visualize and communicate that data. So, it makes it into something that's a personal passion for me, that I can help to take some of that great knowledge that we're learning from evaluations of public health programs, and from the bigger data sets that are out there, and finding ways to communicate that in more accessible ways than what I saw when I first started my career.
Bailer: Well, you know, I'm afraid that's all the time we for this episode of stats the stories Amanda you have been an outstanding and delightful guest. Stats and Stories is a partnership between Miami University's Departments of Statistics and Media, Journalism and Film and the Association. You can follow us on Twitter, check out the handle @Statsandstories, @John_Bailer, and @rompenni, or you can follow us on Apple podcast or other places where you find podcasts. To share your thoughts on the program, send your e-mail to Statsandstories@miamioh.edu, or check us out at Statsandstories.net and be sure to listen for future editions of Stats and Stories, where we discuss the statistics behind the stories and the stories behind the statistics.