Xiao-Li Meng, the Whipple V. N. Jones Professor of Statistics, and the Founding Editor-in-Chief of Harvard Data Science Review, is well known for his depth and breadth in research, his innovation and passion in pedagogy, his vision and effectiveness in administration, as well as for his engaging and entertaining style as a speaker and writer. Meng was named the best statistician under the age of 40 by COPSS (Committee of Presidents of Statistical Societies) in 2001, and he is the recipient of numerous awards and honors for his more than 150 publications in at least a dozen theoretical and methodological areas, as well as in areas of pedagogy and professional development.
+ Full Transcript
John Bailer: Welcome to today’s Stats and Short Stories episode. I’m John Bailer from the Department of Statistics at Miami University and I’m joined by my colleagues Rosemary Pennington and Richard Campbell from the Departments of Media, Journalism and Film. We’re delighted to have joining us today Xiao-Li Meng. Meng is the Whipple V.N. Jones Professor of Statistics at Harvard University and also the Editor in Chief of the recently launched Harvard Science Review. Xiao-Li thank you so much for being here.
Xiao-Li Meng: You’re very welcome and it’s really a great pleasure to be here.
Bailer: Oh, indeed it’s a great pleasure to have you join us. I’d like to talk to you a little bit about data. In particular you like to talk about data in terms of multiples. Multi-resolution, multi-source, and multi-phase. Can you talk a little bit about that?
Meng: Absolutely. This notion of these three multiples were formulated during the time that I was asked to write an article for the 50th Anniversary volume in terms of what they call the Past, Present and Future of Statistics. And what I had in mind is to motivate the future talents to look into these statistical problems where there’s absolutely no solution because they are so hard. The notion there is I want to really portray that statistics is not just useful, but it actually provides incredibly intellectually challenging problems.
So, these three multi-frameworks are really because of the complex problems out there. Let me start with the multi-resolution, and the idea here is that you will be analyzing data with different kinds of resolutions. Here, I’m literally using the phrase similar to you would be using for a camera. Because you know the when you look in more detail that’s a higher resolution that, and when you look at it from 3,000 feet away, that’s sort of a lower resolution. And the problem motivating me to think about that is really that whole notion of seeking statistical evidence for individualized medicine, individualized treatment. And that, in some sense, is the highest of resolution, because we want to make a prediction for each individual. But if you think about it, the data never have that fine of a resolution. In fact, you cannot really attest to the drug on someone before you say it’s effective for this person.
So when I started working on this thing, I realized that the notion of resolution is absolutely crucial, because typically we stay with some type of low-resolution then say that’s good enough for the high-resolution prediction. So that’s one example, there are many examples. Essentially, if you don’t have the data at the refined level, you only have the more aggregated level, but you need to make prediction inference for something in a much refined level. So that’s the way to formulate this problem. There is no clear answer to it. But you know there are many wrong answers out there in the problem. The multi-source one is really quite easy, because these days as you know that we have so many different data sets, that some of them are collected for statistical purpose, others are collected not for statistical analysis but really required by law or by other requirement- you know, there are a lot of administrative data, a lot of census data, and a lot of tax data, and the US government, for example, is trying to think about Census bureau particularly- think about how we can integrate data from different sources. And the challenge there goes beyond the typical statistical way of combining the data sets, is these data come with very different quality. The traditional way of combining data is to worry about their size, variance, you do some kind of inverse proportional to the variance. But those things are no longer the right thing to do when you have one data actually is quite biased because they were not representative, but they are properly in a large percentage, so they do carry some information. So, the multi-source one is to think about how you can combine different sources data, and then there you really have to emphasize data quality, not just the data quantity. And the last multi-phase is one because of the weighing of the data, typically collected by one team, processed by another, analyzed by a third team, and maybe even interpreted by another team. So, you really have these phases going through this entire process, from the conception of the idea, to collect some data, to how the data eventually turns into an action. The emphasis here is not just their multiple stage, but all stages that may or may not be compatible with each other because they do not have access to what previous stages have done. They may only have the output, that actually itself posed some really challenging statistical mathematic questions because very briefly there’s not a single, or encompassing model that can incorporate all these phases, even in theory. So, it creates what I call “incongeniality”. Very interesting in mathematics and statistics, that very challenging but that’s the reality so that’s why I put together these three kinds of multi-problems.
Bailer: Listening to you talk about this reminds me of an answer you gave in an interview- I think it was to Statistics Views- about your mentor. And one of the very first things you said was that- and I think you were sort of comparing your training in China versus your training in the United States, you said that the most important thing that you’ve learned from this mentor, and it was the first thing you said, was to think intuitively.
Bailer: Can you talk just a minute about that and what that means to you?
Meng: Yes, yes. That’s actually important because when I came to the United States I was thinking mostly mathematically. That’s because I was trained in pure mathematics. Everything starts from a set of assumptions, and then there is one and only one logical conclusion can follow from it. And that’s the beauty of mathematics but that’s also the restriction of mathematics. Thinking intuitively really involves understanding something without doing mathematics, but you can both have a sense of where the general picture is, but also understand that whatever you come up with, there is probably a more nuanced angle to it. And that took me a long while, and I thank really hard in all my advisors, and professors, including my peer colleagues, and fellow students that put me in this environment that I was constantly pushed to a corner thinking, okay, now I don’t know how to think because there is no unique answer. And now I don’t know where to go because I solved one problem and another one showed up. And eventually, just to make myself feel comfortable, in a sense saying, this problem does not have a solution and that is why we need to work on it, and that is the beauty of that kind of problem.
It took me a long while. It took me a while to feel comfortable instead of being agitated by it. [Laughter] And literally. But now it’s become- now I’m not only comfortable, now I love it. Now, probably for anyone who has heard me give a talk, it tends to go to these kinds of audience saying, look you should have thought intuitively. But my intuition myself has been really evolving and of course, the other great thing about intuition is that it’s sort of a life-long task. You would be surprised by how something you think is so intuitive, later you realize, oh holy cow, that’s actually not.
Bailer: Well Xiao-Li, that’s brilliant. And I hate to say this but I’m afraid that’s all the time we have for this episode of Stats and Short Stories. So, Xiao-Li thank you for being here and best wishes for the launch of the Harvard Data-Science Review. It’s a really interesting and engaging effort and we’re excited to see what’s produced by it.
Meng: Well, thank you very much for the opportunity, and my wish is that you will be doing so well with the Stats and Short Stories but someday you will create a stat and a long story.
Bailer: We will aspire to that Xiao-Li. Stats and Stories is a partnership between Miami University’s Department of Statistics and Media, Journalism and Film, and the American Statistical Association. You can follow us on Twitter, Apple podcasts, or other places you can find podcasts. If you’d like to share your thoughts on our program, send your email to firstname.lastname@example.org, and check us out at statsandtories.net. be sure to listen for future editions of Stats and Stories, where we discuss the statistics behind the stories and the stories behind the statistics.