Kerrie Mengersen (@KerrieMengersen) is Distinguished Professor at the Queensland University of Technology in Brisbane, Queensland, Australia, and past-President of the International Society for Bayesian Analysis (ISBA) . Her research spans Bayesian statistics, computational statistics, environmental, genetic and health statistics
+ Full Transcript
Rosemary Pennington : When it comes to addressing environmental or public health related challenges, the conversation often centers on what we need to do to fix the problem, whether that be a warming planet or dying bees. Less focus is spent on understanding how we know what we know. How do we know the planet is warming? How do we know that certain insecticides are killing bees? The statistics that help us better understand such challenges are the focus of this episode of Stats+Stories. Stats+Stories is a partnership between Miami University's Departments of Statistics and Media, Journalism and Film, and The American Statistical Association. I'm Rosemary Pennington. Our regular panelists are Department of Statistics Chair John Bailer and Department of Media, Journalism and Film Chair, Richard Campbell. Today's guest is Kerrie Mengersen. Mengersen is a distinguished Professor in Statistics at Queensland University of Technology, Brisbane Australia. She works on statistical methods and computational tools seeking to apply them to real world problems and Health, Environment and Industry. Mengersen is also the President of the International Society for Bayesian Analysis. Thank you so much for being here today Kerrie.
Kerrie Mengersen : You're welcome. It's exciting to be on.
Pennington : Just to get started if we have a lot of different listeners, if you could just take a moment to explain what Bayesian Analysis is, that would be great.
Mengersen : Bayesian Analysis is a type of statistics. It's both a way of thinking about statistics and a way of applying it or analyzing data. There's a couple of things that happens in Bayesian Statistics. First, we try to understand what the underlying parameters or drivers of a model might be. So, if we're trying to describe the system then we're after good estimates and good understanding of the underlying dynamics or features or the factors that drive that system. And those factors and dynamics in the way that we describe the system is through a statistical model. and we're interested in being able to estimate those parameters that underlie that model. So, what we want to do is not just get point estimates for those parameters but we want to understand the whole distribution of those parameters. And the way that we think about that distribution is that it characterizes the uncertainty or our state of knowledge of that parameter that we're interested in. The other thing that we think about is that in doing this we can actually add different types of information to our models. And we do this through prior distributions. So those prior distributions might be uninformative, which means that the data tell a whole story, but they might also be informative because what we find is that very often when we're looking at a problem we have the data but we also have a lot of other information to make our estimates and our understandings much more rich.
John Bailer : So, I'll ask Richard a question. So, Richard what was the most confusing part in thinking about this for you?
Richard Campbell : I think I would like an example. If she could illustrate this with an example.
Bailer : Okay, so pulling this into the realm of environmental applications, could you flesh this out with a simple example? Maybe a simple example with maybe one parameter that's driving the bus here.
Mengersen : So, suppose that we have a diagnostic test, and if you have cancer then the test can be very good at diagnosing that you have cancer. So that's the probability that the test is positive, given that you have cancer. But what happens if you go in to the doctor and they say well the test has come back and the test is positive? So now what you really want to know is what is the probability that you have cancer given that the test is positive? Now that's a very different question and so the first one was the diagnostic capacity of the test. Which is - what's the probability that the test is positive given that you have cancer and the other one is - what's the probability of cancer given that the test is positive? So we're turning the question around now and we really want to know about that probability of cancer. And so, in order to get to that we need to use Bayes rule to turn that probability statement around. And we need prior information about how rare that cancer is. And that's what we do in Bayesian Statistics. So, if we take an environmental problem then we could think about what's the probability of a species being present in an area, given the data that I've got? So, another way of thinking about that is in a classical or frequentist statistical problem what we would say is, what's the probability our observations or our data set given that the species is present. but what we really want to know is what's the probability of a species being in this area or being present given the data that we've got. So, we're turning the problem around and that probability is presence then we can model through Bayesian Statistics.
Campbell : I understood that! That's good. Well done.
Bailer : Yeah, I knew we needed to push the pause button there, because I saw Richard going a little green earlier.
Mengersen : How much statistics can we talk about? Like can we talk about theta given x and so on, presuming [inaudible]
Bailer : Probably not. I mean you and I can but for this group what we're trying to do is, we're trying to pitch this in a way, if you were going to describe this to the general public. We want to tell the statistics behind the stories and the stories behind the statistics. That's our catch phrase in describing the purpose of the program. So, in some sense the question is, "if you've done this complicated model that involves prior specification where you're specifying the uncertainty associated with parameters that are important for making the model for making decisions", how ultimately do you take something that's very complex like that and then make it accessible to a larger audience? So, one thing that might be a fun way to begin this is "what's one of the most interesting environmental applications that you've worked on?
Mengersen : Sure. So, well, there's been a number of them and I'm proud of many of them. If we take, for example, a recent one we've been working on which is hunting jaguars in the Amazon and what we wanted to do there was create a Jaguar corridor across the Amazon so jaguars are a threatened species and we're interested in being able to create this corridor where they'll have safe movement across the amazon. The problem is that we don't have a lot of data to tell us about where these jaguars are or how many there are. So, what we're interested in the model then is what is the probability of a jaguar living or hunting or moving through a particular area given the data? But the problem is we don't have much data. So, then we have to say, "well what other information could we incorporate into this model to help us? So, the Bayesian framework allows us to incorporate that model through prior distributions and through other means of combining data or information in a very principled manner. And so, the kind of information we can use is expert information from local people and also information from experts around the world.
Bailer : That sounds really cool. so, did you define that corridor?
Mengersen : So, what we did was we had to work out how to get expert information from around the world and it's very difficult to take the experts to the Amazon, so we were thinking about ways and we've been working on this for quite a while how we actually get good information from experts, and this expert elicitation from people is a statistical problem all of its own, so how do we ask questions in a way that people can answer them and then we can add them to our models? So if we think about a little thought experiment, if I had a map in front of me and I asked you at different places how likely is a jaguar to be here, then you might look at that area, and given that you are an expert on jaguars, you might say, "well, there's a 70% here +/-, because you're not really sure about that number, and if you said that in a number of different locations and I know the characteristics of those locations then I can build a statistical model that will enable me to represent your understanding of where jaguars live based on the features of that landscape. Now that gives me a statistical model of your expert information, and I can add that to my statistical model based on the very little data that I've got and that makes a very rich model. So, we then had to work out, well, we could use map and that would be fine but what if we could actually put the experts into the jungle then that would be a lot richer. And probably give us more information. But we cannot take all the experts to the jungle, not all of them, so we took the jungle to the experts by creating some virtual reality. So, we went into the jungle we took 360 cameras and worked with the local people. They loved the cameras, they took the cameras to places we couldn't get to deep in the jungle and then we created virtual reality scenes from those 360 photos and films and then we aware able to present those to experts. And from that we've could create better models. We've been able to use that then to identify areas that are more likely for jaguars to live and then work with the governments in Peru and this is work that is still going on to connect those areas. And that creates the corridor. Part of that work is still in train for the research that we're doing, but it's been completed there and it's been handed over to local conservation organizations to continue the discussions with the governments. But they're very excited about it. They love the virtual reality and they're all on board in the project. So that's very exciting for us.
Pennington : You're listening to Stats+Stories where we discuss the statistics behind the stories and the stories behind the statistics. The topic today understanding the environment through stats. Our special guest is Queensland University of Technology Distinguished Professor of Statistics, Kerrie Mengersen. You were just talking about some very technologically rich and savvy ways of creating this statistical model. How often are you, and I know you have a research group, using these techniques to create statistical models?
Mengersen : Well, we've been using these different kinds of technologies to improve our statistical models in a number of applications so as well as creating the jaguar corridor we've been using these techniques to better monitor the Great Barrier Reef here in Australia. So, The Great Barrier Reef is one of the world's natural treasures, it's 2100 kilometers long, so like in a lot of Europe you would cover going from one end of the Reef to the other north to south. We do a lot of monitoring of the Reef over the last 20 years because it's so big that monitoring has only happened in a small number of areas and there's a lot of Reef that we just don't have monitoring on. So how can we get information from those areas to help improve our models of the Reef and Reef health, coral cover, fish bio-diversity, the impact of cyclones and crown of thorns and so on? Well, there's any number of divers out there diving on the Reef. And, if we could use their photos that they're taking perhaps we could get experts to go into those photos and then tell us about the state of the Reef. And we could use that information to improve our models. So, we've been doing that. We've created a virtual Reef, which allows people then to go onto that site and geotag their photos to different areas they've been diving and then we can get experts and local people to access those photos in 2-D and 3-D and virtual reality photos. We can go into those and extract information about what they see about coral cover and fish bio-diversity and then improve our statistical models. So that's really exciting. We're working with different groups that have underwater vehicles that take photos and film underwater, we can use that information as well.
Bailer : So, it sounds like that would be establishing a baseline for current Reef health, is that right? Is that what it's doing?
Mengersen : That's right and also it's a really dynamic way of modeling because there's people that will be uploading photos all the time and then that model keeps changing. So, if you imagine that you have a map of the Reef and the Reef health and then as people add information or add photos to the virtual Reef then that map changes so you're getting this dynamic updating of the health of the Reef, or of the underlying statistical model as people add that information.
Bailer : So how long has this model been running?
Mengersen : So, we commenced the project in 2016. We developed the virtual Reef at the end of 2016 and then last year we've been finishing the statistical modeling that's underneath it.
Campbell : Kerrie, one of the things a lot of our guests is about the way that your work or scientists work that statisticians work gets translated to the general public through journalism through news reports. And you've had to do some of this work, I suppose, where your work gets represented to the general pubic, can you talk a little bit about what a journalist might do to improve how they report on the work of statisticians based on your own experience?
Mengersen : I think it comes both ways. The statistician learns through working with journalists of the best ways to be able to tell their story. The journalist then also gets to learn how best to guide the statistician in telling that story. I think that there's a balance between being able to tell the story and being able to still be true to the science or the statistics underpinning that. It's a difficult thing that we have to learn to do. Sometimes the story is about the story and it's conveying that statistics can work in a range of areas and that can be exciting to bring statisticians into that area. For example, we have a lot of young people here just at the moment, we have our vacation research projects where young statisticians in their first year come to work on a project a lot like our jaguar project but on koalas. So, they're, we want to know how many koalas are in the areas that they're looking for conservation or development, and they can be tricky to see, so we're creating virtual environments where we can put experts into those environments and then say how many koalas would live in these areas and then we can build models that would be able to predict the number of koalas in target areas for councils. So, students in statistics can come in from first year to third year, they not only learn about statistical modeling but then they go out and they just sampling in the field they take 360 photos, they build virtual reality and then they get to interview people about what they see in those photos and then they get to add that to their models. That's exciting for us to be able to do that. When it comes to telling that story as a media story it could just be about that we're coming up with better ways to be able to monitor koalas to help councils, but it could also be about how young people can get involved in problems that are really important in our world through statistics. Or it could be about the statistical models themselves, depending on the audience.
Bailer : I'm sure that your jaguar and Reef analyses have gotten attention in the mass media, what was it that was the focus of these reports? Did they dive in at all to any of the modeling that you did?
Mengersen : In those stories it was more about the stories. The Reef Project. Well, both of them picked up on the different kinds of information we can use to improve our models and our understanding of environmental systems. So, we can use data but we can also use expert information and citizen science. And so, there's a big interest in how we use Citizen Science and there's a lot of problems with those kinds of data but there's also a lot of potential. So, if we can learn as statisticians to better use citizen science data, then we have a really rich resource with which to develop our models and better understand our world. One of the projects that we have that has attracted media attention and where the modeling has been important is in the development of a national cancer atlas. So, in this we've also been using Bayesian Statistical models, because we want to be able to develop good probabilistic estimates of cancer incidents and survival across Australia and we want to be able to do that at the small area level. So, then we have to be careful that we preserve privacy and have robust estimates. And that requires us to build careful statistical models, and a Bayesian Framework is best for being able to borrow strength from neighboring areas to improve estimates in a small area level because, for each particular level be don't have a lot of data but we can borrow information from neighboring areas to improve the estimates of each area, and we can also then have estimates not only of incidence and survival but also the uncertainty around those estimates and that's important then for managers and also for ranking those areas. Understanding the differences between rural and urban areas which in Australia is a big issue in disease and medicine and in particular cancer outcomes.
Pennington : You're listening to Stats+Stories and our discussion today focuses on some real-world application of Bayesian analysis. Kerrie, what advice would you give to a young person who is at university and is interested in doing some of the work that you've been talking about. Whether it's the cancer atlas, the koala work, the jaguar work, someone who is interested in Bayesian Analysis, what should they be thinking about as they move through university?
Mengersen : I find that the people who have a quantitative background particularly a stats background have a real advantage in whatever area they want to work in. There's such a demand for people with good quantitative skills and that creates the foundation for going into different areas. So, if they want to work in applied areas, then they have a strong statistical background. We have students coming in from first year to third year, they're amazing. They're picking up new statistical methods that are required for problems that they haven't seen before and they access information from the web to learn about those methods and then they work out the coding and they work out how to apply them but it's because they have this underlying statistical foundation in their training. And then they have the adventurous spirit in the statistical sense in that they're willing to push the boundaries of what they know. So, having that openness in learning new methods and then just going and finding teams. None of this happens in isolation. The projects that I'm talking about require people in computer science, and conservation and public health and visualization and it's very exciting to work in these teams and with the industry people as well. So, people from the Great Barrier Reef Foundation, people from the Australian Institute for Marine Sciences… one of the ways that I got to work on interesting projects in the Antarctic was going along to a meeting where there was some research being presented on the Antarctic and saying "I do stats, is that of interest?", and thinking nobody would pick that up but next thing I know I'm doing some really cool work in the Antarctic in the way that fuel is delivered in the Antarctic is by helicopter and drums of fuel are being dropped at different sites, and , "what if a drum explodes?" and then you have some area that's being very toxic in the soil, the soil's very thin, how does that affect soil bio-diversity? I thought soil, I thought how can soil be interesting? But it turns out its hugely interesting and there's a lot of dynamics that happen with all of the little bugs and species in the soil and I never knew that until I went to that meeting and put up my hand and was willing to work on the project. So, I think having the underlying skill set but then finding an area of interest and then going and putting up your hand and saying, "Hey, can I work with this team".
Bailer : It's neat to hear you say that. I think one of the joys about being in statistics is being able to play in other people's space and to learn, to continue to learn about the problems that they're working on.
Mengersen : Certainly, but I also think it's important to respect that we're a profession of our own and so we absolutely have the ability to develop, we need to develop the methods that we're using so it goes hand in hand. By working on an application, you see that the models and the tools that we have are not sufficient for many real-world applications and we need them to further develop the methods and the theory and the computatuional tools that we have and when we develop them then we can answer more questions which raises more questions which means more development of the theory and the method. So, they go hand in hand, and there's a real pipeline behind the real need for good theory and then the translation of that to methods and computational tools and back again.
Bailer : No disagreement here. I'm really intrigued about some of the stuff that you mentioned earlier, the idea of "Citizen Science", and that to me is really awesome to consider how that might play out in an analysis. I guess one example was your Barrier Reef example where you were having divers take photographs at certain locations and then geo coded in terms of there they were taken and then having a deeper exploration of what was going on at those different sites. Can you give me a couple other examples of citizen science that you've been using in analyses?
Mengersen : Well, one example is if you think about birds and mapping of bird species, or even understanding the dynamics of bird's movements for example. So, there are a lot of people who are interested in bird watching and there's a lot of data that's been collected in terms of records of where people have seen birds. And we know that there's problems with that so statisticians say "you could never believe those records. People only record birds where they are, so we're only ever only going to see birds in areas, if we use those records the birds are only ever going to be in the areas where the people are", and also they might misreport or want to embellish what they've seen, there's a lot of things that could go wrong with those data, and so we could throw the out but then you think, "well, there's a lot of data there so maybe there is a real signal in all of that noise". And as statisticians that's our job, to understand the signal in all the noise and also to be able to pull out the stories that the data is telling us. And so, if we can do that with citizen science and come up with ways to address the problems that are in those data then we've got a really rich source of information that we can use.
Pennington : Well, Kerrie Mengersen, distinguished professor of Statistics at Queensland University of Technology, thank you so much for being here today.
Mengersen : You're very welcome.
Pennington : That's all the time we have for this episode of Stats+Stories. Stats+Stories is a partnership between Miami University's Departments of Statistics and Media, Journalism and Film, and The American Statistical Association. You can follow us on Twitter or iTunes. If you'd like to share your thoughts on our program send your emails to Statsandstories@miamioh.edu , and be sure to listen for future editions of Stats+Stories where we discuss the statistics behind the stories and the stories behind the statistics.