Rage Against the Machine Learning | Stats + Stories Episode 125 / by Stats Stories

Cynthia_Rudin-024_F.JPG

Cynthia Rudin is a professor of computer science, electrical and computer engineering, and statistical science at Duke University. Previously, Prof. Rudin held positions at MIT, Columbia, and NYU. Her degrees are from the University at Buffalo and Princeton University. She is a three-time winner of the INFORMS Innovative Applications in Analytics Award. She has served on committees for INFORMS, the National Academies, the American Statistical Association, DARPA, the NIJ, and AAAI.

+ Full Transcript

Rosemary Pennington: Machine learning dates back to the mid-20th century, however the idea that computers can be programed to learn things on their own has really only recently, perhaps because of the sometime oddly specific viewing suggestions curated in spaces like Netflix. At the same time machine learning methods into a variety of academic disciplines, and various government agencies are exploring how machine learning can help them better do their jobs. Machine learning is the focus of this episode of Stats and Stories where we explore the statistics behind the stories and the stories behind the statistics. I’m Rosemary Pennington. Stats and Stories is a production of Miami University’s Departments of Statistics and Media, Journalism and Film as well as the American Statistical Association. Joining me in the studio is regular panelist John Bailer, Chair of Miami’s Statistics Department. Richard Campbell of Media, Journalism and Film is away today. Our guest is Duke University’s Cynthia Rudin. Rudin is a professor of computer science, electrical and computer engineering and statistical science at Duke where she directs the Direction and Analysis Lab. Rudin is also an associate director of the statistical and applied mathematical sciences institute. Cynthia thank you so much for being here today.

Cynthia Rudin: My pleasure.

Pennington: How did machine learning become your area of expertise?

Rudin: Well, I loved the idea of being able to predict the future.

[Laughter]

Rudin: I picked up a book from someone, it was Vapnik’s book on statistical learning theory, and I just fell in love with the book, and I said I have to do this. These people can predict the future and I want to do it.

Pennington: What was it about the book that hooked you?

Rudin: The sort of fundamental idea of statistical learning theory, which is that the thing that allows you to predict the future is- you know the fact that when you’re a baby that you can learn so quickly, you can learn language so quickly and the reason for that isn’t because a baby’s brain can learn anything, it’s because a baby’s brain is wired to learn language. And language in which humans communicate, so, it’s the fact that the baby’s brain is limited that allows the baby to really learn. And it’s the same thing with statistical learning theory. If you limit how the algorithm is allowed to learn, then when the data comes the algorithm can learn better. So that limitation, rather than the expanse of what can be learned is what attracted me to the area; the fundamentals of statistical learning theory.

John Bailer: So, for example if we were to take a step back and say what is machine learning? A machine learns and people learn, machines aren’t learning in the same way people are, are they?

[Laughter]

Rudin: So, machine learning is the art of learning from data. So, it’s the idea that data plus prior knowledge allows you to predict the future. And that’s so cool, right? It’s not just data, it’s not just you feed it a whole bunch of data and it can understand things, it’s data plus knowledge, and you have to put those together to create predictive models. So, for instance if I somehow know that- let’s say that I’m trying to build a predictive model for which manholes are going to explode next year in New York City, right? This is a problem I’ve actually worked on, I actually spent years working on this problem, and we had data from the power grid and our goal was to predict which manholes were going to have an explosion or fire next year. And so, that data alone was a giant mess. There was no way you could predict- if you just took that data and just fed it into some black box and told it predict me manhole events, it just wouldn’t work. And in fact, it didn’t work. It’s somehow crafting the model, forming this predictive model based on this data using our prior knowledge that that somehow was able to help us predict manhole fires and explosions. For instance, in some cases that prior knowledge might be very minimal. So you might know, for instance, that the model has to be simple in some way, that the model has to be sparse in the number of factors, and those factors you know what they are and you have to craft them out of the data, and there’s a skill to doing that, and that’s kind of what data science is about and that’s what machine learning is about.

Bailer: Okay. Can you talk about some of the inputs that you had for this manhole explosion prediction? What were some of the things that were important variables to consider there?

Rudin: So, I was working with ConEdison on this project as well as Becky Passonneau, and we had data about- we had accounting data from ConEdison dating all the way back to 1890s from when power grids were first introduced to the world. New York City is the oldest power grid in the world. So, we had data from all the way back in the day when the original cables were put into the ground, and some of them are still in the ground. We also had data about the manholes, what type of manhole it was, was it a manhole or service box? So we knew all the cables that were connecting all these manholes and service boxes and connecting into the buildings, we had information on inspections and we had information about past events on the power grid, like if there was a fire or explosion or smoking manhole someone would call up ConEdison and they would start filling out a trouble ticket for the problem while they were directing the action about what to do about it. So, you had this sort of documented and detailed- all the responses for what was going on. Like did they bring a big truck in to stick a big tube down the manhole to suck up all the gunk so that somebody could get in and fix all the burnouts, so those were the sorts of data that we had.

Pennington: I know one of the criticisms of machine learning has been that it is kind of a black box, and I’ve been at talks where someone will be talking about what can we learn from algorithms or machine learning and then say well, you know, we don’t really know what’s happening because it’s a black box. So I guess I’m wondering from your perspective what kinds of things do you think data scientists can do to make that less opaque for maybe not even just a general audience, maybe for other scholars who are trying to use and make sense of this work as well?

Rudin: Well, so the reason that I decided not to work so much on black box models is because of the power grid project I was just talking to you about. So, we were trying to predict fires and explosions and we realized that there was something wrong with our model and we didn’t know what it was. Like we showed ConEdison something important about the model, which is that it was relying on the number of neutral cables in the manhole. And they said there’s something wrong with your model, we don’t know what it is but there’s something wrong because those are not predictive. And neutral cables are not predictive, they don’t carry current. So, this shouldn’t be predictive, so we went back and after a few months we went back, and we realized that there was something really important about the data that we didn’t understand that was causing a confounding problem in our model. So, when we fixed it we got a much better model and ConEdison was able to leverage that knowledge that they gave us essentially by talking with us. And I said that’s it; I’m not doing any higher stakes anything that even vaguely high stakes with these black boxes because I don’t understand what they do and there could be some very serious problems with the models that you don’t notice, and it’s gotten worse. So, after I stopped working on black box models for high stakes models and started working on these interpretable models a lot of people started working on black box models for high stakes decisions and bad things happened. For instance, during the California wildfires last year, Google replaced their air quality index from the EPA, like this trustworthy, statistician heavy air quality index with a black box learning machine model from a company called BreezoMeter. And all of the sudden people from California started writing on social media that there’s something wrong; there’s a layer of ash on my car, why is it saying that its safe to go outside? You don’t know how many people these things affect when you do them and if there’s a mistake in it and it severely impact someone’s life that’s bad, right?

Bailer: Absolutely.

Rudin: So yeah and it’s been going on continuously. Even in the criminal justice system, in healthcare, they recently found that there’s a model being used for healthcare that has severe differences between the service that’s being provided to white people and the services being provided to black people. And it was because the model was predicting the wrong thing. It was predicting the cost that individuals would – you know they were saying, well we’ll give people extra services if we’re predicting that they will be high cost next year. But the problem is that black people were getting lower cost health care. So even though they were more severely ill, the model predicted that they were not going to need these extra services. So, there’s been a lot of problems like this where people didn’t really understand what the machine learning was predicting, or how it was making its predictions, and that led to serious implications for other people. We don’t want that to happen, so we’re trying to design machine learning tools that predict and explain themselves in a way that people can understand.

Bailer: Well it sounds like the inputs for these machine learning algorithms are just critical, and understanding how they’re being incorporated into these predictions just seems like such a dramatically important thing to do- that if you’re just naively building based on a whole collection of variables without insight that you could really go down some bad rabbit holes.

Rudin: Yeah, and this is obvious to a lot of statisticians and a lot of data scientists, but there are many people for whom it is not obvious to and that’s where the problem lies.

Bailer: You’ve given some examples of some fails in machine learning; the example of the wildfire prediction and some of these healthcare examples, are there some success stories that you would highlight as being really impressive?

Rudin: Oh, there’s a lot of really impressive success stories.

Bailer: I was figuring that. This is a lob to the net Cynthia.

Rudin: Well, I think the fact that we’re able to search the internet so effectively at all is a big success story for search engines in general. And search engines are heavily machine learning- they have other things in them besides machine learning, but they also have machine learning in them; some search engines more than others. Recommender systems, even though you get a little scared that they might be a little too good sometimes, but they are a good success story for machine learning.

Pennington: Are you talking about like Netflix telling me I should watch a lot of British crime shows because I do in fact watch a lot of British crime shows?

[Laughter]

Rudin: Yeah, there’s machine learning in a lot of places that you wouldn’t expect right now. I think people are trying to automate huge amounts of service-oriented types of things, where you have a limited number of people who can service other people. So, you have to prioritize who you need to service. So, I don’t think the emergency room- I don’t think they are using machine learning to do what they are doing, but they have critical problems, like how we can free up beds so that we have them or people that need them, and these are logistical questions, but they also involve predictions. Like can we predict how many people are going to come into the emergency room today can we predict beds can we make those beds available? So, these are going to be problems that arise in the future that are more high stakes. So, most of the major successes in machine learning, like Facebook being able to tag images with information, that’s sort of more low stakes and there’s a distinction between those tasks which- machine learning is all over Facebook, right? But there’s a distinction between those tasks which are low stakes decisions. Like if you get it wrong it doesn’t really matter that much- between those decisions and the high stakes decisions, like can we make sure there are free hospital beds- that’s a high stakes decision. Or can we make sure that our loan decisions are done correctly so that people can get the critical loan that they need to start their life or career.

Pennington: You’re listening to Stats and Stories and our guest today is Cynthia Rudin. Cynthia I’m going to switch gears a little bit and ask you about some of the work you’ve been doing on predictive policing and wondered if you could just first explain for people who may not be aware what predictive policing is.

Rudin: Yeah so I’m not sure that what I am doing is exactly classified as predictive policing, I think predictive policing is petty broad. It’s sort of how police can use data analytics to predict what would be effective actions for them to take. So, should they go to a certain area? Is there a certain street corner where there might be a violent crime next week? Should we send somebody there? Or should we send someone to patrol in this area? So, I haven’t worked on the problem, but I worked on a different problem that’s very closely related, which is crime series detection. And I got involved in this when I was faculty at MIT and someone from the Cambridge Police Department, Dan Wagner, came into the University, came to MIT and asked for some help with a problem, he wanted to try to determine which crimes in the database – which crimes that had already been committed were committed by the same individuals or a group of individuals. So, this is called a crime series. So there might be groups of two or three people that work in one neighborhood and they do house breaks and they have similar motus operandi for all the crimes they commit, and the question is can we figure out that those crimes were committed by the same people? Because if we can, then we know that information from one crime can be leveraged for another crime. So, for instance if we solved – if we figure out who committed this one crime and connect them to this other crime as well, then maybe we can solve the other crime too. The police think they can’t do anything about these problems unless they know these problems are occurring, if there’s a crime series in the area, unless they know about it there’s not much they can do about it. But if they know that there’s a crime series occurring in that area, there’s actions they can take preemptively to go and deal with it. So, we wrote a piece of code together with the police department after they shot down a bunch of our ideas- these guys were brilliant. Working with them was just unbelievable. They shot down a bunch of our ideas and finally we came to an algorithm that really worked, and it worked- we blind tested it in Cambridge and it really was able to find some crime series that the police took a very long time to find, and in fact there were some cases where we were able to connect multiple crime series together that the police thought were separate.

Pennington: Oh wow.

Rudin: Like there was one really interesting crime series where the police thought it was two separate crime series, but it turns out that they’d kind of taken a Christmas break in between.

[Laughter]

Pennington: Everyone needs a vacation.

Rudin: Like they managed it in mid-December and now they were at the end of January. And so, we put this code online and we found out a few years later that New York City had picked it up and they’re using it in their paternizer algorithm that’s running live at the NYPD.

Bailer: So, if you have these very machines learning literate criminals, can they see some of the inputs to this model, and start to change their behavior?

Rudin: Oh yeah they can just vary their behavior enough so that we wouldn’t be able to find out it’s a crime series. But then of course they would have to try different modus operandi, right? They would have to enter into the building a different way or go at a different time of day and that could put them at risk, right?

Pennington: Yeah because clearly it’s working for them the way it is now…

Rudin: Exactly, exactly.

Bailer: That’s true, these are the survivors in this process. There’s some filtering if you’re caught.

Rudin: Yeah exactly so they wouldn’t want to vary their modus operandi too much.

Pennington: I do have a question related to predictive policing, and maybe about machine learning more broadly too, because as I’ve been doing some of the reading around this one of the concerns around using algorithms or AI to help law enforcement, whether it’s at the local level or more broadly, is the concerns about the re-inscribing of stereotypes around communities that have been long surveilled. So, it sounds like what you were doing here is not related to that, but it is a growing concern when it comes to the use of machine learning for these applications. I wondered what your thoughts are on how people can use these tools work to make sure they’re not reinforcing stereotypes about particular communities.

Rudin: Yeah, so like you said the crime series prediction project was about specific crimes associated with specific individuals, so that’s not the same thing at all as the reinforcing these problems. So, these are very serious problems and there’s not really an easy way around them, because the only data that are collected are from situations where the police actually were. So, you can’t collect data from situations where the police weren’t. so, it’s really hard to figure out how to effectively handle the situation. I could say more about these problems in the criminal justice system rather than policing, because I have experience in that.

Pennington: Well yeah, and that would be fine, because I was reading something where an expert on this issue of policing was making the point that one of the ways around this problem can to remember that there are people all sides of this. So, there’s this algorithm, there’s this tool that is meant to sort of help in this process, but in the middle of all this sometimes the human that sometimes humans on either side get lost when we think about using them or how to use them or how to understand their use.

Rudin: The problem is that there’s no good solution to this, right? It’s just a problem, and what would be great is if we could dedicate resources instead of necessarily to making people upset by sending police officers into situations where the police don’t want them to be and the community doesn’t want them to be, you know if we could send social services into kind of diffuse these situations beforehand, but this isn’t really my expertise on how to handle this problem, so-

Pennington: Right, so what does it look like in the criminal justice system?

Rudin: So the criminal justice system has a slightly different problem where you have these risk scores that are used to predict future crime, and the risk scores are being used for serious decisions like bail and parole and sentencing, and the question is well how much do you rely on these risk scores? Because the risk scores take into account criminal history and age, and both criminal history and age have problems in terms of racial bias. Let’s say there’s a neighborhood where police target young black men right? So, in that case, even if you construct a policy that focuses on age, then you’re targeting blacks because the younger people are black, and they have longer criminal history simply because they were targeted as younger people. So you have all these risk calculations that depend heavily on age and criminal history, and these risk scores go into these important life changing decisions, so what you’re ending up with is policing ending up inside these life changing decisions, because the policing can cause the age at which these people were first arrested and then the age of first arrest goes into the risk score. So, then they may be denied bail essentially because of a policing decision. So, the problem is if you don’t use statistics here, the problem just gets worse, and we know that. There’s been many studies showings that if you don’t involve statistics at all the problem is worse. If you actually use statistical models for these problems then you can actually help try to undo that bias because you know its there. It’s not easy to undo it but you can at least mitigate the problem by saying okay we know that there’s this problem lets try to figure out how to handle it. So, for instance, you really do want to involve criminal history in these scores because if you have somebody who has like 30 past crimes then you know- yeah you should probably deny them bail; if they’re continually committing crimes they should be denied bail. But on the other hand, if you have someone and this is their first misdemeanor and they’re young, maybe you want to think about diverting them to one of these programs that helps prevent people from committing crimes, rather than sending them immediately through into the criminal justice system, depending on what they’ve done. And so there’s a huge number of programs right now to try to figure these things out, and somehow I ended up in the middle of all this because I work on interpretable machine learning, so some of these- so yeah, I’m trying to help with this.

Bailer: Well that’s great. So, what do you think about some of the coverage you see in the press about machine learning algorithms, or particularly in the work that you’ve done on the crime series? What’s been done well and what could be done better?

Rudin: Well, I think the media in some cases saying wow machine learning is amazing; it’s solving all the world’s problems, and that’s not really true. And in some cases, they’re really against machine learning; machine learning is causing all these algorithms are causing all these problems. When, in fact if you read a little closer you’ll find that what they’re calling an algorithm is a simple formula. Like take one point of your age is below 20, and two points if you have at least one misdemeanor, or something like that. Just these very simple scoring systems they’re calling these algorithms. These are not algorithms, they’re simply models. They’re either machine learning models or they’re models that a human created. And then they’re saying we’re blaming the algorithm, it’s the machine’s fault, it’s the algorithm’s fault, when in fact, it’s actually just a simple formula. So, I think there’s a lot of craziness going on. The ProPublica story is kind of the- well that one was particularly bad.

Pennington: Which story is this?

Rudin: You know about the ProPublica story right?

Pennington: Which story is this that you’re referring to?

Rudin: I think it’s called Machine Bias, where they accuse this black box machine learning model of being racially biased, did you know about that one?

Bailer: I heard something about it, but give a little more of the background.

Rudin: Well they were saying that this model that’s used widely across the US justice system is racially biased, and the model is called Compass, it’s used across many states and they did a very simple calculation and they showed that compass- and they had a bunch of examples and they showed that blacks with lower criminal history had higher Compass scores and whites with higher criminal history, and so they were saying oh there’s these pairs of people and so it’s racially biased. But the story was all wrong – they completely messed it up. It turns out all the examples they had were typographical errors somewhere in the data set that they were using, and it’s really not clear whether Compass is racially biased in the way- it certainly wasn’t racially biased in the way that they said it was. So, the way they said it was racially biased was that even if you take age and criminal history into account, Compass still depends on race, and there’s no evidence that that’s true, in fact we found evidence to the contrary. So, the report really made it look like- even if you take age and criminal history into account that it still depended on race, but I don’t believe that’s true. I mean age and criminal history are themselves arguable. You can argue that those data themselves have some element of racism in them, but if you take age and criminal history into account I don’t think Compass depends on race.

Pennington: It sounds like one of your criticisms of journalism and reporting is that there’s a lack of understanding of what it is people who machine learning are doing.

Rudin: Well, for that one it was criminologists. Immediately after the report came out the criminologists wrote an article saying what are you guys doing? Did you ask a criminologist? So, we’ve been trying to try to figure out what was inside Compass, trying to detect exactly what their dependence on age is, and what their dependence on race is so that we can at least use transparency to help us figure out what’s going on. Whereas the media, well I shouldn’t say all the media it was just this ProPublica group just came to this conclusion based on bad data science. It was just bad.

Pennington: So, what advice would you give to someone who wants to do the work that you’re doing?

Rudin: You mean the machine learning?

Pennington: Yeah, so someone who may be an undergrad who thinks this looks really interesting and compelling, and wants to get into the work you’re doing now, what would you suggest they explore or do to prepare themselves?

Rudin: I think whatever domain they are interested in they should become an expert in. So, if they want to work in criminal justice, they have to read a lot of criminal justice. If they want to work on power grid maintenance and repairs they should learn a lot about power grid maintenance and repairs, and there’s no excuse for putting garbage in a black box. They should know the domain well enough to know that the statistical work they’re doing is sound, as well as they should take a whole lot of machine learning classes.

[Laughter]

Pennington: That was the plug I was kind of waiting for you to make. Well that’s all the time for this episode of Stats and Stories. Cynthia thank you so much for being here today.

Rudin: My pleasure.

Bailer: Thanks Cynthia.

Pennington: Stats and Stories is a partnership between Miami University’s Departments of Statistics and Media, Journalism and Film and the American Statistical Association. You can follow us on Twitter, Apple podcasts, or other places you can find podcasts. If you’d like to share your thoughts on the program send your emails to statsandstories@miamioh.edu or check us out at statsandstories.net. Be sure to listen for future editions of Stats and Stories, where we explore the statistics behind the stories and the stories behind the statistics.