Paul Scanlon is a survey methodologist and research social scientist in the Questionnaire Design Research Laboratory at the Center for Disease Control and Prevention's National Center for Health Statistics. His research focuses on attitudes towards privacy and confidentiality in federal surveys and on how questionnaire evaluation methods can be used to validate international health surveys.
+ Full Transcript
Bob Long: If you want to make people bristle, just mention the words the federal government and privacy. The information from your federal tax return, the questions you answer for the US Census, things that national security officials know about you, you may think the government knows way too much about your personal life, but on the other hand, have you thought about the fact that Google and Amazon also know their fair share about you? I'm Bob Long and we welcome you to another edition of Stats and Stories; it's a program where we look at the statistics behind the stories and the stories behind the statistics. Our focus this time is on questions of privacy and confidentiality. Joining me on Stats and Stories for our discussion on privacy and confidentiality are the man who comes up with these ideas for the shows, Miami University Statistics Department Chair John Bailer and our special guest today, Paul Scanlon. Paul is a survey methodologist and research social scientist at the Centers for Disease Control and Prevention's National Center for Health Statistics. Before we start our discussion with Paul Scanlon, our Stats and Stories reporter Emily Potten went out and did a story for us on how your information is used in the health care field.
Emily Potten: The issues of privacy in health care have only gotten more sensitive as recent reforms push for insurance coverage for all Americans. The struggle to gain important data from patients is a problem for medical practices, but it also stretches into the collection of research data. Tissue, blood, and urine samples, and geographical data are just a few of the topics patients and research subjects are reluctant to give without the promise on anonymity or confidentiality. Miami University research fellow Dr. J Scott Brown says the reasoning and concern for this is understandable.
J. Scott Brown: It would not be that difficult if you were really an unscrupulous person or someone with an agenda to use those data to try and find people with certain characteristics and be able to deductively disclose who they are.
Potten: Brown does gerontology research for Scripps Gerontology Center on Miami University's campus. His daily work consists of several procedures to make sure all confidential information is kept safe.
Brown: It is stored on a portable hard drive in a locked file cabinet that I only have a key to. When I plug it into my computer, I have to actually log into my computer first, logout, unplug the network cable, and then log back in so when I work with that data, there is no internet connection on the machine. Any printouts I produce have to stay here in my office and I have to destroy it by shredding any printouts that I'm going to not use or after I'm done using them.
Potten: This level of security is actually pretty tame. Brown says other researchers have more intensive methods of data.
Brown: Your data are taken very, very seriously, and it's becoming much more prevalent as privacy concerns increase with, I think with internet access and other aspects of widespread, very quick public access to things, people worry more. You're seeing these kinds of data agreements and data issues throughout a lot of any sort of social science, statistics, anything that you're working with where you're working with secondary data that contain relatively private information.
Potten: The real question is why is security of health data so important? It all boils down to insurance. For patients, the worry of exposing health history and conditions threatens their coverage and may raise their payments. It's for this reason that researchers and medical professionals are charged with a federal offense if confidential information gets into the wrong hands. This charge would include a $10,000 fine and multiple years in prison. The reforms of the affordable health care act aren't making patients any more comfortable in their response to health care questions; however, Brown says most people respond honestly once they are assured their information will be kept secure. For the Miami Public Radio Project, I'm Emily Potten.
Long: I think when we start talking about this particular issue, I had mentioned there at the beginning, so many people think, you know too much about me already, but in your work for the Centers for Disease Control, many people have heard of the CDC, let's talk about the kinds of things that you're doing, of course you are dealing with issues that are private or confidential.
Paul Scanlon: Right, so at the National Center for Health Statistics, we collect a lot of data on public health, on health care and on people's health outcomes, and this is legitimately private information. We are responsible for weighing, finding a balance between asking people for too much and getting the data we need to promote public health in the country and actually across the world as well. So we collect statistics such as how often people visit doctors, what diseases are present in the population, people's nutrition habits, eating habits, stuff like that. The question is why do we need this? We're responsible for the public health, that's one of the big functions of government, to make sure that we can all at least try to be healthy, not necessarily making everybody healthy, but at least giving people the opportunity to be healthy, and one of the key ways of doing that is to let people know what outcomes there are from going to the doctor and what diseases are present, stuff like that. We use surveys to collect data on things like this. We have two major surveys at NCHS that collect this data; one is the NHANES, the National Health and Nutrition Examination Survey and the other is the National Health Interview Survey. And through this we collect all of this various public health data and eventually we release it to the public and policymakers so they can make decision and make policy and hopefully help the country.
Long: John Bailer, I'll turn to you for a question.
John Bailer: So can you talk just a little bit about, how would NHANES or NHIS be used as sort of a promotion of public health? Can you give an example of the kind of question that might be addressed in NHANES and then ultimately how that question, or answers to that question might translate into something that could be some action that public policy might address?
Scanlon: Sure and let me just say right up front that I'm talking as for myself, and not as somebody for the CDC; these are my personal opinions. I just needed to put that disclaimer out there. So on the NHANES we ask about things like people's eating habits and we can actually look at, you know what kind of foods they're taking in. So if we have that information, we can look at it across the population. By looking at the eating habits of people in a state or people across the country, we can then tie that to various health outcomes, such as diabetes or heart disease and policy-makers, when they see that link, not policy-makers only at the federal level, but policy-makers at the state and local level, can say, "Hey, we see that people are eating 'X' and the health outcome is 'Y.' Well we don't like that, so let's see if we can make some policies that make it easier for people to not end up with outcome 'Y': diabetes, heart disease, whatever."
Bailer: Just as a quick follow-up, can you give an example of a question you view as a sensitive question to answer on one of these surveys?
Scanlon: That's an interesting question because what people consider to be sensitive varies a lot across the population. Just asking people "Do you drink a lot of soda?" in some people's mind, that's a sensitive question.
Long: That's real sensitive to me.
Scanlon: People just don't like the government knowing that, right?
Scanlon: You know the traditional sensitive questions that we have on surveys like this deal with sexual practices and your diseases. A lot of people don't want to say that they have certain diseases, HIV, even things like diabetes or a heart disease, I mean there's social stigma around a lot of disease, and so by asking that, we are kind of trying to ignore that stigma, so that's kind of getting into the realm of violating privacy, in a way.
Long: I think sometimes when people fill out different surveys, there may be a lot of different questions on there and they might think, well why does the CDC for example need to know the answer to this question? I think one of the misconceptions that might be out there is that somehow, let's say there's certain information that the IRS has or somebody else has about me, and people may think that you have access to all that stuff, is that true, or not?
Scanlon: Right. So it's not true generally. I know with the recent conversation about the NSA, National Security, there is this general thought that government data is shared across all of government, so sitting on my desk at the CDC I have information to your cell phone records, and that's not true. There really are barriers between agencies and functional groups within the federal government. So take something like the IRS form, you send your information in, hopefully, every April. So that form that you send in contains a lot of information that say, the Census, actually collects. It has your age on there. It has the number of people in your economic household, but the Census Bureau doesn't have the ability to just reach over into the IRS's servers and pull that information and put it into their forms. Now that would, in a way, be nice and I think that's a conversation, as a society, we need to have because there's a cost savings there, right? If we can get data, say for the decennial census, and I used to work at Census, but I'm not talking as someone who's doing census policy here. If we don't have to spend the money on sending people to people's houses to collect the data that you didn't send in on the original census run, and instead, we went to your IRS records, or your Social Security records, or your Medicaid records, and got that information, we would save money and it would probably be just as close to accurate, maybe even more accurate.
Long: I want to explore that a little a bit more, because I think that is a really interesting topic, but we want to remind people that you're listening to Stats and Stories, where we talk about the statistics behind the stories and the stories behind the statistics, and we're focusing this time on some issues related to health care and what the government is trying to find out about health care issues and how that impacts, also, privacy and confidentiality. I'm Bob Long. Along with me are our regular panelist Miami Statistics Department Chair John Bailer and our special guest is Paul Scanlon, and he's a survey methodologist, research social scientist at the Centers for Disease Control and Prevention's National Center for Health Statistics. We also thought it would be fun to find out what people on the street know about our topic so we're asking them, "What's the difference between privacy and confidentiality?"
Woman 1: When something is confidential, it's kind of like going to a therapist or going somewhere. That's what I think of. But privacy I feel like is totally different…I don't even know the words to think of.
Woman 2: Confidentiality you think of like your doctor. They can't share information. Privacy, I mean I don't think there's a big difference, but I think privacy is just what you do and don't tell others.
Woman 3: I think confidentiality is more an agreement between like a doctor or something, so you would have to do a waiver for them to share your information. Privacy is information that you want to keep private. Like if you're posting something on say Facebook or anywhere. You'll have like your privacy settings, and it is certain things you wouldn't want getting out.
Man 1 : Confidentiality means when you're sharing something between two people that you no longer want it to be shared with others. Privacy is basically keeping your information secret. So if I'm putting my social security online, if I'm using my account numbers and things like that, that's the difference to me.
Long: Well, I want to go back to that subject; we'll ask more about the difference between those two, but before we do, you were talking about an interesting issue, but I don't know, John, if you wanted to follow up on that at all about government agencies, right now not sharing information, but as Paul mentioned, some of the cost savings and things that could go into that.
Bailer: Yeah, I think that there's some questions about how you work in partnership with other organizations in the federal government, not just collect unique information, but to make sure you're collecting the information you intend. To me, it's an interesting question when you look at a survey or look at a questionnaire, what are you trying to measure? How do you know what you're trying to measure? And how do you guarantee or how do you feel confident that you're measuring this with some kind of accuracy? So like the ideas of validity and reliability of things, can you talk just a little bit about that?
Scanlon: Yeah, absolutely. That's actually a lot of my work is on this, on making sure that what we collect, what we actually burden the public with, because when we send someone a survey, it's a burden, even if it's not a hard survey to fill out, we're taking your time, so we want to make sure that we're collecting data that matches what we want, so that we're not wasting your time and wasting taxpayer money. So really, the key way of making sure that the questions we ask on surveys and even on forms, such as IRS forms and other government forms. The key way of testing that validity is to do qualitative research on it, so we do things called cognitive interviews where we will sit down with respondents and actually go through a form and say, okay, answer this question, how old are you? Alright, so what did you mean by that? Why did you say you were 35 years old? You know, these are really simple questions, right?
Bailer: That one seems kind of easy. I feel like you don't need a lot of testing there.
Scanlon: You don't, you know, it's interesting that some of these really easy questions, we're making assumptions about them, but we don't actually, you might be right that they're really easy, but we don't actually know that.
Bailer: What's a harder; give us an example of something that's harder to measure.
Scanlon: "Have you been to the doctor in the last year?" Okay, so I give that question. And this is from an example given by the founders of this method, Gordon Willis, who's at the National Cancer Institutes. So I ask people, "Have you been to the doctor in the last year?" And they say yes. And I say, what were you thinking about that? And it turns out that they weren't thinking about a dentist. Well maybe we wanted them to think about a dentist, maybe we wanted them to think about all kinds of health care providers. When I said "the last year," well maybe they were thinking about the calendar year or maybe they were thinking about twelve months. So there are these points of variation within a question and it's our job to make sure there isn't that much variation in a question when we're designing it, to make sure we're actually capturing the really tight construct and not wasting your time.
Long: Back to that question, as you were saying about potentially, it could save money if some questions you could find out from going to the IRS or going to another agency, that would be expedient, but we have this public opinion problem. I'm kind of curious if there's any Gallop Polls or anything out there that show how people would feel, if for example, you were able to share information because they're so upset, with all these privacy issues right now.
Scanlon: Right. So we have been doing some research on that and we haven't quite finished it, so I don't want to give away the results. But a bunch of the federal statistical agencies, and I guess I should kind of explain that, so there's the Census Bureau which I think a lot of people have heard about, things like the Bureau of Labor Statistics, and they're the ones that release the monthly employment numbers that you get on the first Friday, my agency, the National Center for Health Statistics, there's about fourteen federal statistical agencies, and we all kind of work together. So we started to fund some research on this question itself, and we were doing an attitudinal survey to see what people think about combining data from other sources. You know, our early findings are that there, but our other finding is that we can say certain things, that we can say that we're using the data to do "X," that there will be a public good out of it, or that we will be saving money because of it, or that it will become more accurate, that the data in the end will be more accurate if you let us do that, and that actually ticks up public opinion just a little bit. So I think one of our jobs as a federal statistical system is to figure out what we can tell the public that will make them allow us, that's terrible, but will shift public opinion in a way that we will be able to do this and get some cost savings and get some accuracy improvements like that. That's one part. The other part of this is that we really need to focus in on what particular data we want to grab from other agencies. It can't just be, we're going to get all the data and put it into a pot. That's first of all not fair because of privacy concerns and it's also a waste of our time.
Long: John Bailer, go ahead.
Bailer: I think it's probably appropriate to start thinking about transitioning this to thinking about this privacy and confidentiality, so can you give us some sense, what differentiates between privacy and confidentiality?
Scanlon: So privacy is, if you think about the data collection process, privacy happens before. So privacy is what people are actually willing to give. So it could be a government agency asking, it could be Google asking, it could be a private polling firm. So there are certain pieces of information that I consider private that I'm just not going to give to everybody. So privacy is almost a personal feeling, and there's societal links, we can look across the culture and figure out if there's certain classes of information that private, that people consider private. It's our job, when we're collecting data to convince people that we'll be using it in a responsible way that then they will allow us to get some of that private information, and we can't always get all of it. There are absolute limits on what people will give us, but if we get informed consent, we convince people that the work we're doing is important for the public good, we can usually say, we know this information is private, we hope you trust us with it. So part of that though, is confidentiality. Confidentiality is kind of on the back end of data collection, and confidentiality is holding on to the data and doing with it only what we said we were going to do. Okay, so federal data is collected usually under a promise of confidentiality, what that means is when you turn in the census form, under Title XIII of the United States code, nobody who sees that data at the Census Bureau can give that information to anybody else. So we have, at the Census Bureau, your name, we have your address, we have stuff like that, that can't leave the Census Bureau, and that's under law. And it's actually been upheld in the courts. The other federal statistical agencies operate under a similar thing, it's just title XLII. This is really wonky, they're just two separate statutes enshrining in law that we can't share that data outside of what we said we were going to do.
Long: After I've given it to you, and I've done it in confidence with you, therefore it shouldn't be shared.
Scanlon: Right. Actually that's part of the issue when we're looking at bringing that data from somewhere like IRS because when you send your data to IRS, if you read you 1040 next year, there's actually a line that says the data will stay within the IRS. So it's a matter of informed consent that the Census or NCHS can't just grab that data without going back to you and getting your consent. So there's also an operational issue there.
Long: Great point. You're listening to Stats and Stories and again we're talking about some issues of privacy and confidentiality. I'm Bob Long and with me today our regular panelist Statistics Department Chair John Bailer. Our special guest is Paul Scanlon and again he's a survey methodologist and research social scientist at the Centers for Disease Control and Prevention in their National Center for Health Statistics. And again we'll mention that Paul is here representing himself, not the official positions at the CDC. We'll just throw that in. We also thought it would be interesting, because Paul just raised another issue that we'll address next of whom do you trust more, the Federal government having your private information or how about companies like Google and Amazon?
Woman 1: Personally neither, but if I had to go with one, I guess the government. But I would really prefer neither.
Woman 2: Neither one, but I'd probably just choose the government.
Man 1: I'd probably put my trust more in Amazon and Google because the government has some loopholes. But I use that more. I only deal with the government basically with my info when I get a paycheck or when I file my income tax.
Man 2: You know after the recent Target debacle and things like that, overall I'd say I trust the government more. But I don't know what they're using it for. So that's kind of the dilemma is who's safer? I think the government's probably safer but what information do they have and why do they have it? So I guess there's a lot more question marks there.
Bailer: Have you seen any research on this? Have you seen any studies that show how willing people are to share their private information as privacy is concerned with some vendor, a company, versus with the federal government that might be used for policy or planning at federal and state levels?
Scanlon: Right. So I think there's a few ways of looking for that data. We can actually just look at response rates to surveys between the federal government giving the survey and private firms, and the federal government does have a better response rate, and so, in a way, that's kind of a proxy for saying, maybe they trust us and the federal system a little more. I mean we do have, again, that promise of confidentiality under the law that private firms can't promise. But I've actually done some qualitative research on this. A colleague of mine who is now at the University of Missouri- Kansas City, Michelle Saranova and I, did a research project where we actually gathered focus groups together and talked to people about this exact issue, and I think when people give information, even that they consider private to places like Google and Facebook, their immediate reaction is that they're getting a service, they're getting an immediate service, right? So they might have a privacy concern about something, but they want their Gmail, you know, so there's informed consent there because they consent to use the service, and so it's almost two separate ways of doing it because we're just asking for data and saying, hey, in the long run there's going to be a societal good. When we're looking at a lot of these private corporations, at least some of the big ones that we think about all the time, Google, Facebook, Twitter, there is this immediate "I'm getting something right away." So I think we still need to do more research on that and whether or not people are ignoring the privacy concerns the moment they say, I'm going to sign up for a Gmail account or I'm going to send this Gmail, or they just don't care.
Bailer: You raised an observation that I thought was really cool: the idea that there's a direct service. In some sense, you're being paid for the release of your privacy, and you're saying, in contrast, some of the information that you would be collecting with NHANES or NHIS or other surveys, there's a delay of this. How do you tell that story in such a way that someone would say, "If I'm participating in this health and nutrition survey, there is some benefit." Saying it's societal is different than knowing that I've got my email account. I mean there's that direct benefit. So how do you make that story real? How do you say, if NHANES were to disappear…on one level I was thinking, if NHANES were to disappear, that health and nutritional survey disappeared, what would be the impact on society? So can you talk a little bit about that?
Scanlon: Yeah, so let me actually go to a different survey to explain this. When I was at the Census Bureau, I worked on what's called the American Community Survey which is the new long-form census, so I think people used to get a really long form; I think it was like thirty pages, at every decennial census and now at the Census Bureau they've switched it to a yearly survey, where three and a half million people, I think, get the survey every year, and we're actually able to release annual data. So it's still a really long survey, and it's kind of burdensome, and it collects a lot of information that people kind of find a little off putting. I mean it asks for the number of toilets you have in your house, you know, it asks for a lot of personal relationship information and stuff like that, and when I worked at the Census on this survey, one of our jobs was to tell that story, to say, if you don't give us this information, what happens then, why should you give us this data, and we would turn to things like: firehouses, schools, roads, so we need this information to do something that will directly impact your life, and yeah, it might not happen as quickly as getting an email, but if we collect this data and we know you're underserved by schools, well then maybe more funding can come into your locality so that you can get a new school or a new firehouse, or we could repair bridges. And so that's the kind of story that we need to tell, and it's on us, and this relates back to the privacy thing. We're telling the story not only to get past that privacy barrier, but even just to get the information in the first place because we're asking them to do this burdensome thing, to fill out a survey for half an hour to an hour.
Long: We're almost running out of time here today, but I wanted to kind of shift, Paul, because with your work with the Centers of Disease Control, you're also doing other kinds of research work and I know one thing that kind of struck me, we are in an era where different states are passing same sex marriage laws and things like that, but somehow the government sometimes doesn't get…the question is how accurate the government information might be on same sex couples, how many are there? And some of that probably relates to privacy issues too, and the way we fill our census data and things like that, is that true?
Scanlon: I think some of it probably has to do with privacy data and privacy, and whether or not people are willing to talk about that on a form. There's actually been a lot of research done about this and I've been involved. It's been an interagency committee with fifteen or so agencies and we've been looking at how to improve these statistics, especially now that the Supreme Court has struck down some key parts of DOMA, that the federal government now recognizes legally performed same sex marriages. We actually do need that data now, and we need valid data. So what we've done is research at the Census Bureau at NCHS, we've been looking at kind of changing the questions we have around household relationships and marital status and cohabitation so that we can more accurately capture whether people are actually in a same sex or opposite sex marriage relationship, or they're in a same sex unmarried relationship, we never had in previous forms whether or not you had a cohabitant. I live with my fiancé, so that wasn't captured before, so now we're going to do that, and that's all kind of tied together in making that data more valid, so now that we do have to deal, more states are passing laws, and now that the federal government has to recognize those marriages, we do need that accurate information, and we're working towards that, and hopefully by the next census we'll be able to provide some good data on that.
Long: John Bailer, time for a final question from you, too.
Bailer: I find this really exciting to think about the purpose and effort that goes in to the variables that you measure and some of the critical components that go into these surveys that we see, that I don't think may people appreciate just how much effort goes into making a good survey. We probably all have experience with seeing bad ones. I think it's exciting to see the kind of effort and work that goes into it. I was wondering if you could talk about some of the future work, you had mentioned something in a previous discussion about this idea of a verbal autopsy project that you were going to do. That sounded really fascinating, I was wondering if you could share a little bit.
Scanlon: Sure, so the CDC is responsible for collecting vital statistics and we also help other countries collect those, and by vital statistics, I mean birth and death records, and the causes of death. So a verbal autopsy, and this is work we, at the CDC, are doing along our partners at the WHO, is a way of collecting cause of death information where there aren't necessarily coroners or medical professionals who can give us that accurate information. What the WHO has come up with, and what we're helping revise is a verbal questionnaire that we would go to somebody's loved ones or care providers and ask a bunch of medical kinds of questions, "Was this person taking this medicine? Did this person display these symptoms?" and go through this whole questionnaire and at the end, it will hopefully shoot out a cause of death so that we can improve those vital statistics and we're actually going to do some testing on this in Kenya in November, so I'll be heading there with some colleagues and hopefully we will start testing that in Kenya, and maybe other countries and eventually we want to be able to improve the vital statistics across the world.
Long: Paul Scanlon, great information, very interesting, and we really appreciate you being here. Again, Paul is a survey methodologist and research social scientist at the Centers for Disease Control and Prevention's National Center for Health Statistics, again, our pleasure to have you on Stats and Stories. All right, we want to just remind you too, that if you'd like to share some of your thoughts about our program, maybe topics you'd like to hear in the future, send us an email to firstname.lastname@example.org. Be sure to listen for future editions of Stats and Stories, where we'll talk about the stats behind the stories and the stories behind the statistics.