Survey Statistics: Where is it Heading? | Stats + Short Stories Episode 292 (Live From the WSC) / by Stats Stories

Natalie Shlomo is Professor of Social Statistics at the University of Manchester and publishes widely in the area of survey statistics, including small area estimation, adaptive survey designs, non-probability sampling, confidentiality and privacy, data linkage and integration. She has over 75 publications and refereed book chapters and a track record of generating external funding for her research. She is an elected member of the International Statistical Institute (ISI), a fellow of the Royal Statistical Society, a fellow of the Academy of Social Sciences and President 2023-2025 of the International Association of Survey Statisticians.  She also serves on national and international Methodology Advisory Boards at National Statistical Institutes.

+Full Transcript

John Bailer

There's just one week left to cast your vote for the Stats and Stories 300th Episode data visualization contest. Be sure to check out and interact with all the finalists that statsandstories.net/voting. Well last month, I was lucky to be able to attend the 64th World Statistics Congress hosted by the International Statistical Institute in Ottawa, Canada. While I was there, I was able to talk to a number of amazing statisticians and data scientists for the show, including this week's guest Natalie Shlomo. Shlomo is professor of social statistics at The University of Manchester and publishes widely in the area of survey statistics, including small area estimation, adaptive survey designs, nonprobability, sampling, confidentiality and privacy, data linkage and integration. She is an elected member of the International Statistical Institute ISI, a Fellow of the Royal Statistical Society, a fellow of the Academy of Social Sciences, and President 2023 to 2025 of the International Association of survey statisticians. She also serves on national and international methodology advisory boards at the National Statistical Institutes. So, present John turns it over to past John, take it away. And once again, don't forget to vote in our 300th Episode contest at statsandstories.net/voting. Well, I'm here at the World's Statistics Congress, and I'm having the opportunity to talk to friends from around the globe. Today, I'm delighted to have a chance to chat with Natalie Shlomo. She's a professor of social statistics at The University of Manchester. Natalie, welcome.

Natalie Shlomo
And I'm delighted to do this podcast, Stats and Stories.

John Bailer
We could not be more excited. So Natalie, you and I have known each other for quite a while. And you've been a leader in the world of survey statistics. One of the things that I think maybe people don't appreciate is: why is survey statistics such an important activity in our society?

Natalie Shlomo
Well, survey statistics originated back in the 50s, and sort of grew from the National Statistical Institutes. And you know, the idea is to learn about finite populations, right? So no, we're not in the world of theoretical statistics. This is all about finite populations, inference on parameters in the population. And the idea is to draw a random survey to make inference about populations to target populations that we're interested in. And it sort of evolved in the 50s. It got more sophisticated with different ways of drawing samples, you know, I think, was a seminal paper back in the 50s, who said, We should stratify our samples, you know, not just draw random sample, simple random samples, but you stratified sampling, we gain efficiencies that way, the National statistical Institutes were also concerned about budgets and costs. So we went into cluster sampling and the implications there and how we lose efficiency, but how we might gain it through various ways of estimation and post stratification. So it really has evolved over the last 50 years. And I just wanted to say that I am now the incoming president of the International Association of Survey Statisticians. We are celebrating our 50 year anniversary of the association. So it's all very exciting.

John Bailer
Oh,happy birthday, IASS. And you're mentioning this idea that may be new to a lot of people that are listening, the idea that there are designs for how people take a representative subset of the population, I think it blows people's minds to think that you can take 1000s of observations in a sample to represent millions in a population and it was some precision under a probability scheme. So tell us, probability sample, can you give just like a quick intro to what a probability sample is?

Natalie Shlomo
So a probability sample would generally mean that you have a list of frames. Now in national statistical institutes, we often don't have a list of people, but we'll have a list of addresses. But the point is, in a probability sample, that every single individual or whatever enterprise, whatever you're investigating, has a chance of being selected. So that's the crucial thing. No, no, that probably yes, while you design it, yes. So if I want to have 1000 in my sample in my population, and I want a sample of 100. Yes, I would know that if it's a simple random sample, I'm going to take one out of 10. Yes. So that's the designs and then people talk about the weights.

John Bailer
So essentially, every observation that's in your sample represents some number of individuals in the population effectively.

Natalie Shlomo
Exactly. I wanted to sort of point out the seminal paper by Horowitz. And then Thompson, who was Thompson estimator, which basically came up with the point that you make exactly that every individual should be weighted by their inverse inclusion probability. So in my example, every person would have a weight of 10, because they represent 10 people in the population. And that was quite a seminal piece of work in the 60s, you know, using that inclusion, because our samples have become far more complicated, and they might have different inclusion probabilities, usually not in the case of a simple random sample. So you know, we would wait accordingly. And that was a very, you know, seminal piece of work, and how we derive estimators for population parameters, like averages or totals, by weighting with the inverse of the inclusion.

John Bailer
So now that we're talking about this 50th anniversary of IASS, so can you give us a couple of examples of kind of allstars of sample surveys, you know, what are some of the ones that really were defining in terms of their impact on society?

Natalie Shlomo
I work a lot with the National Statistical Institutes. So the big surveys that govern every country, the Labor Force Survey, the Family Expenditure, you know, budget survey, health surveys, I mean, governments, that's how they inform policies, by going through their statistical agency and drawing samples into try to learn properties of the population and where to invest resources. And so that, you know, my space is sort of that official statistics. And so those are the examples I can give you. But a lot of the academic world, they're involved in more sophisticated things, so things like the aging longitudinal, I should mention longitudinal surveys is where we repeat observations over time. And so in the academic world, you'll find things like surveys of aging populations, longitudinal surveys, household longitudinal surveys, and that's kind of more in the academic movie sphere.

John Bailer
So what's an example of one of a recent survey that you've worked on?

Natalie Shlomo
Well, that my most recent survey was not a great example of a probability based survey because it was a non probability survey.

John Bailer
Let's take one from a probability example. And then we'll come back to how surveys have changed over time.

Natalie Shlomo
Exactly. So one of the crucial problems now in survey statistics is dropping response rates. And response rates have been dropping so much that we're really into the space of almost non probability. So we have control of the design. I know how you're sampled, but I have no control whether you respond or not. Well, I do, indirectly, I have good, you know, target data collections. We're in this space. Also, mixed mode surveys. So trying to save costs. So maybe we'll start with the intranet questionnaire, and then we'll move to the phone questionnaire and then face to face if I can't reach you. So there's a lot of attempts to try and deal with this IASSue of dropping response rates. But sadly, they are dropping, and we're almost almost reaching this point. Is that actually a probability based survey? Because, you have no control over it.

John Bailer
Yeah, when I was in graduate school, I remember in a survey sampling class, we were doing a survey for the local hospital, and we were using random digit dialing. And it was at a time when, you know, you knew that it was in a certain area code in the United States, that that pretty much was the catchment area for that hospital. And you could, you know, the fact that if you excluded someone, it was because they were a business or something like that. And now I think zero, right, right. Right. That's, they shouldn't be there. But now, I was thinking, you know, I am talking to any group of people, I say, Okay, where are your phones? Where are they registered? What's your area code? It doesn't matter. So I just imagined that something that was a go to.

Natalie Shlomo
You know, to the web surveys, okay, not so much these random digit dialing, I don't know.

John Bailer
Right, right. So that's sort of, you know, that's the old school. So, as you said early on, there was a lot of development in terms of these probability samples. And that was at a time when people were pretty willing to respond if selected. If I said, Natalie, I've selected you for my survey, you'd go, oh, great.

Natalie Shlomo
Especially for government service. And I'm talking about marketing. Colleen, right. We're not there. Right.

John Bailer
So early in the practice, this was something that people were pretty likely to respond to. And as time has gone on, it's like, maybe not. So what, what's going on now? What are sort of the latest and greatest and sampling ideas?

Natalie Shlomo
Well, so as I said, there's been a lot of attempts at mixed mode data collection. Actually we have the keynote speaker, Steve Penix, the president of isI speaker is Bob Groves, who is a famous survey methodologist as well and he developed something called adaptive survey design. So trying, now we're getting methodology involved in data collection, let's try and optimum strategies. Where should we put our resources? You know, let's try and think of some quality indicators that we can optimize, you know, who should we go out and look for. So a lot of attempts to try and deal with drop in response rates, but at the end of the day, they are dropping. And so now it's very interesting you ask, because what is happening in this world of survey statistics is a lot more work knowing inference from non probability. So that's where we're coming.

John Bailer
So you have to do some adjustments, there has to be some modification of what the information that you're getting from those who respond.

Natalie Shlomo
So what happens to nonprofit samples, a good example, our web surveys their opt in. So they're their selection bias. And the idea is, if it's designed properly, and well, even a non probability survey can be designed quite well. And the idea is that you can use probability based surveys to try and adjust for the selection bias through propensity scores and inverse propensity scores and things. And post stratification, there's always that need for calibrating to known population totals, that helps.

John Bailer
Yeah, so this idea of calibrating is saying, I'm looking at the distribution of some trade in a sample, and I'm looking at the distribution of the trade in the population. And if there's a real disconnect between that, are there ways to tune the information to their ways of estimating us?

Natalie Shlomo
We call it model assisted estimation. So it's basically a generalized regression estimator, but we use the auxiliary information that we know from the population to sort of wait and benchmark, the survey weights so that they add up to known population totals, and you're hoping that the auxiliary variables are somehow correlated to your survey variable or target variables so that you can actually say something about reducing non response bias.

John Bailer
So let me ask you to put on your president of the International Association for surveys statistics hat and talk about what you see happening in the next couple of years in surveys statistics?

Natalie Shlomo
Well, certainly in the realm of nonprobability samples, because even a probability sample, when you're getting down to 50-40%, response rates, you can call it a non-probability sample. And I think that there's always going to be a need for probability surveys. But I think that the function of these probability surveys is sort of to correct selection bias for alternative sources data, not just non probability samples, huge move now in integrating data sources, say from administrative data, new forms of data, even big data has now come into the realm of survey statistics and using that as auxiliary information to try and bolster our samples. So definitely the idea of nonprofit samples, and we have wonderful service statisticians that are working in this space and, and, you know, thinking about inference on the nonprobability sampling, integrating data. So that's a really up and coming topic. And so the survey that I worked on, as you asked, was the University of Manchester, we have a center of dynamics of ethnicity, very prominent Center for the Study of ethnic inequalities. And they invited me, which is unusual. Yeah, you know, sociologists came to me and said, Can you help us with this opt-in web survey, and we're investigating ethnic inequalities and COVID experiences. So we designed a nonprobability web survey. And the idea is that they have a lot of connections. So they partnered with all these umbrella organizations, you know, the Muslim and the Jewish, you know, all these organizations who helped us recruit. And then through this recruitment drives, and these umbrella organizations, we were able to, but it's still a non probability sample. So the challenge, of course, was when you're doing this from the design, so you're able to develop the questionnaire in such a way that this non probability sample can actually talk, you know, reference back to a probable sample because you can design the questions from scratch. And so use the combination of the social survey and the labor force survey to use as a reference sample. So, you know, think of each sort of stack the two surveys together a new estimate, or propensity score was a logistic regression, various ways of that, and also benchmarking, of course, calibration. And that's how we adjusted for the selection bias. So, we just published a book, a big, very, very prominent survey. And the most interesting thing about it was it was funded by the UK Research and Economic Economic Research Council. And I think that's the first time that a large scale non probability sample was actually funded by the Research Council. So it made a lot of impact and my challenge now I mean, the weights Okay, did the weights but now the challenge, of course, is how do we do confidence intervals?

John Bailer
So we're looking at the uncertainty

Natalie Shlomo
For non probability samples, this is all, oh, you know, new, new new ideas?

John Bailer
Yeah. And so I'm trying to picture now, someone thinking, wow, when I read a newspaper, and I see that a government percentage of the population that that has XYZ characteristic, and this margin of error or this interval estimate, there's a lot of work from ranging from forming the survey in the first place in a way that kind of collects information in a sensible, reliable, valid way to then all the work to even say how we want to try to select the information to ultimately, given this information. How do we do what we can do to try to calibrate it to the population?

Natalie Shlomo
That's all the survey statistics all about. I mean, there's other spaces, I mean, a huge amount of our members work on something called Small Area Estimation. So you know, that in surveys are typically designed for national levels, or maybe large geographic set, but how do you get to a small area domain, where you maybe have one unit sampled or maybe no unit sample, so a huge amount of resources will go into something we call small area estimation, where we actually use model based methods for estimation. So we take a direct estimate, if there's one, probably have to smooth out the variance a bit for one or two units. But then we combine is sort of a composite estimate with a synthetic estimate, you know, from a regression model, and sort of in an optimal way, we combine the two sort of James Stein estimator approach, so that's smaller established, and another huge amount of work in that area. Now looking at things like using big data's auxiliary variables or so bringing all the supplemental information from other sources.

John Bailer
So to improve what you've learned from a sample that you've selected from a population, to do that right is such an important task. And the work that you and your colleagues do in survey statistics, helps governments make better decisions for the people they serve.

Natalie Shlomo
Yeah, yes. You know, you and my spacing, again, is official statistics. But so you know, there was a little bit of a mix with the International Association of Surveys Statistics, and official statistics. Sometimes people are in both realms. So it's challenging to find that. But yeah, we definitely can identify ourselves as the statisticians as the methodologists. You know, how to make surveys better?

John Bailer
Oh, well, this has just been a delight, Natalie, it's so fun to be able to see you in person, and to be able to talk to you about the important work that you and your colleagues do. So thanks for taking the time.

Natalie Shlomo
Thank you so much for inviting me, it was great fun.

John Bailer
Stats and Stories is a partnership between Miami University's department of statistics and media, journalism and film and the American Statistical Association. You can follow us on Twitter, Apple podcast, or other places where you can find podcasts. If you'd like to share your thoughts on our program. Send your email to statsandstories@miami.oh.edu or check us out @statsandstories.net and be sure to listen for future editions of Stats and Stories, where we discuss the statistics behind the stories and the stories behind the statistics.