The Urban Data Platform | Stats + Stories Episode 331 / by Stats Stories

Kathy Ensor is a leading national voice in statistics and data science and a recognized expert in the methodological development and application of statistics to advance wisdom, knowledge, and innovation. She is the Noah G. Harding Professor of Statistics at Rice University and director of the Center for Computational Finance and Economic Systems. She served as chair of the Department of Statistics from 1999 through 2013 and is the creator of the Kinder Institute’s Urban Data Platform. Ensor’s research specializes in understanding dependent data and developing computational statistical methods to solve practical problems. Ensor served as the 117th president of the American Statistical Association (ASA), heading the ASA board of directors, and has represented the statistics profession on numerous national boards.


Episode Description

Community leaders regularly make decisions that impact the lives of community members. From where green space will be located to what businesses to approve to what public health interventions to put in place. There’s a growing recognition that such decisions should be informed by data that come from the community itself. Community analytics are the focus of this episode of Stats and Stories with guest Kathy Ensor.

+Full Transcript

Rosemary Pennington
Community Leaders regularly make decisions that impact the lives of community members from where green space will be located to what businesses to approve to what public health interventions to put in place. There's a growing recognition that such decisions should be informed by data that come from the community itself. Community analytics are the focus of this episode of Stats and Stories, where we explore the statistics behind the stories and the stories behind the statistics. I'm Rosemary Pennington. Stats and Stories is a production of Miami University's departments of statistics and media, journalism and film as well as the American Statistical Association. Joining me is regular panelist, John Bailer, emeritus professor of statistics at Miami University. Our guest today is Kathy Ensor. Ensor is the Noah G. Harding professor of statistics at Rice University, and director of the Center for Computational finance and economic systems. Her research specializes in understanding deep end data and developing computational statistical methods to solve practical problems. Ensor served as the 117th president of the American Statistical Association, and is a fellow of the AASA and the AAA. Kathy, thank you so much for joining us today.

Kathy Ensor
My pleasure, I'm happy to be here. Thank you for the invitation.

John Bailer
Kathy it is just a complete delight to have you join us virtually, it'd be even better if you were here in person. But you know, we gotta take what we can get. So I'd like to just have you start telling us a little bit about an origin story, you know, that you were involved in starting something called the Urban Data Platform? And I was hoping you could give us a little bit of background? And how did that start?

Kathy Ensor
Yeah, sure. That is definitely a project of passion. And I committed a significant amount of time to get that up and going. So I'm happy to tell you why and how we got going on it. About 20 years ago, I'd say, I started working in this area that I've been calling either community or urban analytics, where I bring the talents that I've developed to our communities. And so that, in itself, brings a lot of data to us. And some of it, like when I work with the city, some of it is confidential data. And so that raises high hurdles in terms of how you interact with that data. So we need some infrastructure to do that. And for a long time,Rice helped us piece it together. And then I think it'll say 2015, I guess, Rice started a new institute, the Kinder Institute for Urban Research. And so we saw an opportunity to pull that type of data together and to create something new. And that was the initiation of the Urban Data Platform. And I was asked to be the founding director of that effort. And it's today an incredibly strong component of the Kinder institute that is headed up by Ruth Turley here at Rice.

Rosemary Pennington
So you talked about this, this idea of community analytics, what sorts of things are you looking at?

Kathy Ensor
So my early work was air pollution and health. And so we focused on finding causal relationships in what causes, what does, when does air pollution contribute to cardiac arrest for example, and this was not particulates this is ozone and, and no two, and then also trying to identify sort of the asthma signals for the Houston region based on this and that included. So to do that, we had every single EMS call over a 10 year period. Yeah, that's a wow moment. And it was in collaboration with the city of Houston, because Houston was very progressive about wanting to find solutions, not just find problems, and so the question was, could we use our talents to help guide solutions? And I'm proud to say that we have an asthma air aware day alert that's been going out for over six years on a daily basis if there's an issue.

John Bailer
Oh, that's really cool. I'd love how the goal is stated for your institute to advance knowledge and information about Houston's people. The government and built environment I just in that's, it's really fascinating that you've had this connection directly with the city as well. So it was that right from the start that that partnership existed between the efforts to try to build this data platform and the city officials themselves.

Kathy Ensor
Yeah, I think so. But you know, we see the Urban Data Platform as transcending the city. So it actually serves a 15 county region with Houston in the center. For example, one of the data products that I actually have contributed to the Urban Data Platform is a 22 year, six month time slot, and uses a cover satellite product over the 15 county region. And, and so the Urban Data Platform, just is supposed to serve this region of the country, pulling together all of the really fascinating data sets. So each dataset on the urban data platform actually does get a DOI, so it's registered, it's permanent, it has a permanent home, it can become, it can be referenced, it can be by others. And the Kinder Institute has continued to grow this database, and in addition to using it for their own research, it's available for registered users, as well.

Rosemary Pennington
You mentioned how the work that you did on air quality sort of produced this Asthma Awareness Day. I wonder, are there other moments from your work whether with this platform or something else, where you've been working on this kind of community analytics, that has led to the policymakers picking it up that you're particularly excited about having happened?

Kathy Ensor
Well, sure. I mean, you know, we all rolled up our sleeves during COVID. And so I was definitely positioned to help the city of Houston and was happy to do that, through COVID. And so there were a couple of things. One, we did early on, do a seroprevalence study of Houston, when nobody really understood the extent of what was happening within any of the large cities. And I love that project, because it actually drew upon all of my statistics knowledge, I actually got to go back and do some sample design. And it was really a lot. You know, I mean, the times were serious, the project was serious, the endpoint of the project was serious. But you know, I appreciated the opportunity to really roll up my sleeves and use my wide breadth of statistics knowledge. So we actually went into the homes of over 600 people in Houston and asked them to roll up their sleeves and donate blood for seroprevalence study. And they did it. And, the levels in Houston at the time of our study, show that they were a factor of three higher than what was originally understood. And that was huge, though.

Rosemary Pennington
Just how did that data inform the decisions that were being made by the leaders in the city?

Kathy Ensor
Yeah, so we have a paper on it. And then the leaders of the city actually speak to that issue. But just informally, they, first of all, we were able to do it by neighborhood. And so we were able to identify the most vulnerable communities. And the Houston health department was proactive about trying to get the messaging out into those communities, about the, you know, the levers that we had to protect ourselves. And also, just the fact that we found, we actually found a huge differential in the city between some neighborhoods and others, and that was really important information as well, just to help the health department help our community. You know, we didn't have very many tools as individuals early on. But the ones that we did have helped save lives. And so the health department was able to do, you know, really strong education plans. And Houston is very international. And so we'll have communities that only speak in Vietnamese, or only speak Chinese, or only speak Spanish. And so for the health department to launch education, they have to make sure that the education materials are speaking to the community that they're trying to reach.

John Bailer
You know, I was really impressed that the breadth of topics that are part of this, this dataset, I mean, there were something like 25 different tabs worth of datasets there, you know, with lots of business and jobs and unemployment, lots of real estate, lots of education, lots of environmental components, including like tree canopy. I'm just curious, how do datasets, data products get populated? I mean, do you ever have requests? Can you build a dataset for this? Or what's kind of the inspiration for something being added to this collection?

Kathy Ensor
Well, it is a resource question. And in the first six years, our mission was to envision first, build the second, so that took half of our six years. And then once we had built the infrastructure, then the team which was really a phenomenal team, I mean, just phenomenal. We just went out and pulled together as many datasets as we thought that we could curate. We tried to prioritize the ones that we thought would have lasting value. That ideology continues. And as you know, I am a researcher, so I, you know, I still am obviously still doing research for the city of Houston, or for the greater 15 county region. So I could submit a dataset for publication on the Urban Data Platform. And so anybody who studies this region can actually submit a data set, and we hope that they do. I haven't checked in to see how well that's going over the last year. But, you know, early on we did we did have users who would contribute their own data,

Rosemary Pennington
You're listening to Stats and Stories, and our guest today is Rice University's Kathy Ensor.

John Bailer
Yeah, it's interesting, you're talking about the curation of some of the datasets because one of the things I noticed is things like the American Community Survey, it looks like you were doing some extracts of it that would be relevant for this 15 county region. So you were looking at some existing sources and saying, Okay, what's the component of that that we really need for our community?

Kathy Ensor
That's right, that's right. And the Kinder Institute designed its own unique population units. And so a lot of it is curated to those population units as well.

John Bailer
So you know, as you've looked at this, what's kind of the most surprising result that you've had from one of these datasets, or that someone has extracted from one of these datasets?

Kathy Ensor
You know, one of the stories I'll tell, which is kind of a good story, actually. So when we started doing the house, the tax data for housing, and we found, you know, we're doing the curation, we're doing the geocoding. So we find mistakes. And so it was great, because there was a feedback loop with Harris County where, which is the primary county that covers Houston, there was a feedback loop on telling them what we found, and then they would make corrections. So there was an iteration there. And you know, not government agencies don't always have the time to do the level of curation that data scientist, and statisticians would prefer and sociologists, and so, I liked that story, because it brought a different kind of value, one that we didn't really expect.

John Bailer
You know, it's funny that you mentioned that we were involved in projects, looking at some of the data that the state of Ohio was producing for COVID. And just we had the experience of working with this with some visualization components in classes where the dataset changed. They just, it just was a kind of a redo. And I think it was one of those times when data is being populated into some product, and thus, the circumstances are changing rapidly. It's not always, it can't always be designed in advance. So I can well imagine that there would be many challenges and trying to build something that's kind of reliable, stable, and can be readily updated.

Kathy Ensor
Right. And the datasets do have revisions. And so we built a revisioning process into the system. And so but you know, help you help that most of it'll be relatively stable. The other aspect of the Urban Data Platform that I think is important to remember is that it is permanent. And, you know, Rice University has committed to the permanence of this urban data platform. And so, the DOI that is issued that costs money, it costs money to store this data. And that's a, you know, right, that's a university commitment to keep that going within the framework at the Kinder Institute for Urban Research. And so, I think that we should be careful not to undervalue that because that's a huge, huge commitment by a university just to maintain the information that so many of us rely on, on answering questions around community and urban analytics. You know, the data sets were originally set in so many different places that it would take, you know, maybe some of most of them are public, but it takes so long to find them. So we see it as a one stop shop for this kind of information. and the other aspect that shouldn't get lost? Is it also the urban data platform that has that secure computing environment to provide the infrastructure for key questions using private data under our rigid IRB protocols, you know that you have to have a need to be able to do that.

Rosemary Pennington
So Kathy, I'm gonna shift just a little bit because I know you gave a talk at the triple as in, I think, February, about wastewater monitoring in Houston. And so I'd love to hear about that project. But also the theme of your talk was sort of how to communicate risk effectively. And so I'm just going to ask if you can sort of provide us some information about what was the waste month watering Monitoring Project and sort of why you were tying it to this theme of communicating risk?

Kathy Ensor
Ah, sure. So the wastewater project is huge. So it's a bigger conversation in itself. But that data sits on the Urban Data Platform, and a synthesis of that data sits on the urban data platform. But what wastewater epidemiology does is we have the opportunity to understand viruses and activities by humans, through our wastewater in a public area in a population surveillance mode. So it's not that we're getting information on individuals, we're getting information on, you know, half a million people at one time. And so through COVID, the wastewater epidemiology proved to be an incredibly valuable tool for communities, cities, states, nations, Canada is really setting this program up from Indiana across our nation. And we had the opportunity in Houston to build what I would say is one of the strongest in the end programs globally, I really have to, I don't think I'm bragging to say that we really, the infrastructure in Houston worked well, for wastewater epidemiology, we were able to pull in enough resources to do it like we wanted to. So we are still going on today. So that's the last three years of my life, moving from urban data platform to wastewater epidemiology, in service of our community. And so my job was to build out the whole analytics piece of that wastewater epidemiology and program. And, and so as we get the information, and it's, you know, how do we share this with the public? And so we did decide, or the city did decide I have to say any of these partnerships with the city, you have to let them drive the messaging, because, you know, I'm not a public health official, I don't know how to talk directly to effectively talk directly to the public. But we did stand up a public dashboard pretty much the whole world did. But it's used on a regular basis. And so even I use it, like I can go and say, Oh, I might not go to this part of the city, because the wastewater levels are really high. And which means that it's, there's a high level within the community and my exposure, probability is high. So people got used to that. And, and we have evidence that this information proved very useful to individuals, it definitely was useful to the leadership of the region, and the leadership at the Texas Medical Center as they went through their own planning. But it also was helpful for individuals as they, you know, personally assess their risk.

John Bailer
So can you talk a little bit about some of the components of this? So I'm sort of picturing you know, you're starting with kind of the individual's contribution to the wastewater stream. But then ultimately, you're having to set up a series of monitors, and some models for surveillance. So if you could just talk about some of the points on that process?

Kathy Ensor
Yeah, so in Houston, we measure 200 locations on a weekly basis. And 39 of those locations are wastewater treatment plants. And so the wastewater is obtained before it's cleaned, I guess, and then it's taken to the lab. The lab geniuses do their work, and then they send me numbers that represent copies per liter. And then I also have information on the flow through the wastewater. And so if I think of those 39 treatment plants as my base that covers the 2.1 million people in Houston. And so just think of a spatial temporal model of 39 observations and highly nonlinear because its levels are going up and down. But then I'm able to take those and come up with what I call now casting of the levels of COVID within communities, and then we can actually pretty much forecast at least two weeks in advance, and so we have good trajectories, what's gonna happen. And then so that's 39, of our roughly 200, places that we sample. But so you can go more locally. So if you think school, you think nursing homes, you think jails, so these communities can be sampled, as well. And so that gives you an immediate action point, if you, you know, if you see a COVID flare up in a nursing home, there is an immediate action point, right? So we use the 39 treatment plants to talk about neighborhoods, but then we're also able to sample at other locations, where we're able to speak directly to what might be going on in that smaller community.

John Bailer
So if you're doing this process, and you're modeling it, you see this spike at one of these wastewater treatment plants, that sort of trigger then sort of the follow up and say, Okay, who are some of the places where the most susceptible might be present? And you sort of mentioned, you know, sort of congregate living communities are kind of at risk. So that sort of, does that trigger, then kind of this other cascade of sampling for?

Kathy Ensor
Well, not necessarily my trigger sampling. And I think in our, so right now, we're moving to our steady state cost reduction model, where we're not going to quite be sampling as much. And so we are looking at that hierarchical approach of where are we? Where should we consistently sample and then if we see a spike, we can go in and sample more intensely. So the samples are taken on Monday, we get the date on Friday, we have our answers, our analysis. By Friday afternoon, it goes to the city leaders. And then on Tuesday, there was always a data-to-action plan. And so the city of Houston leaders would take this information and act. And that's the important piece. So if there's no action to what we're collecting, there's no reason to do it. Right. So, one of the, and I think this goes back to the AAA is a risk, it wasn't just to communicate with individuals, but how do we communicate with the city leaders? And the leaders have the TMC in terms of providing them information that is useful? And so we worked, you know, in that first six months, we piloted a lot of different messaging strategies. And then once they said, Yes, this works for us, this is the level of information we need. This is something we can act on. We operationalized it and it just produces, we produce it every week.

Rosemary Pennington
It's clear that you care very deeply about this work, what drew you to it?

Kathy Ensor
You know, I think, what drew any of us to the COVID table, right? I mean, we were all faced with a world we never saw before, one that we didn't understand and many, many, many people did their part. And this was a place where I could help. And so I think that's it, you know, that's, that's really it, you know, where could I contribute in a real way? Not in a gosh, I want to talk about it in my class way. But, you know, where can I help our community mitigate the consequences of the global pandemic? But you know, I'm not alone, right. So I think so many in our statistics community, and the science community in general, really raised their hand and said, you know, what do you need, what can I do? And so it's really, that's all it is that, you know, and this is where I landed. The opportunity was before me and the question I had was, Do I want to do it or not? And it, you know, did because everything was moving so quickly, and we were all sitting in our homes, isolated. It did require Kathy to remember how to program, so to all those tidy verse users out there, I have, I really came out strong and fast. All I could say is I wish I had had a copilot at that time to help me. So getting the analytics, you know, when you've never, no one had ever done this, I had no model of what to do. And so there's four of us that led it. And there's the public health official for Houston. He was part of our team, but the operational team was me on the analytic side, Lauren Stadler, the wastewater epidemiology person. So she's an engineer that really understands this. And then Lauren Hopkins plays the critical role of the go between us and the decision makers. And so, so many really serious meetings, and for the first year, Baylor College of Medicine was also part of this. And then when we needed not to sample so much, they went off, and they still have a huge effort going on, but they're not. So there's kind of like two efforts underway. But it was really intense there that first year. And, you know, those 24 hour programming sessions were not rare. Like, I'm not, that's not 20 Just what I was 20. But yeah, from that point of view, it was kind of fun. And I'm probably nicer to my graduate students as well.

John Bailer
You know, you talked about the results being presented as a data action plan. And you know, Kathy, you've kind of been living this data, data analytics, data science statistics to action. Life, I mean, so it's really neat. So what's next? What's the next challenge?

Kathy Ensor
Well, no, I did like to close things off well, so. So we did finally just say that the wastewater epidemiology we do, just actually, last week, we finally got the big grant to Rice, that will allow us to sort of make that transition to something that is ongoing. And not only us, but the city as well. And, the money came from CDC, through the city of Houston.. So I'm excited that we're going to be able to not let this just be a flash in the pan for COVID. But transition more to part of the health department's infrastructure. But so that's kind of you know, that if you actually move things into their stationary point, then you begin to look at new opportunities. I am starting to go back to some of the air quality and health questions and have strong collaborators who have moved to Chicago with good connections. And so I could see trying to make it broader beyond Houston in terms of some of the air quality and health pieces that we've done. But you know, who knows, right? So it's I haven't given it too much thought, although, I will say I am thinking about it.

Rosemary Pennington
Well, that's all the time we have for this episode of Stats and Stories. Kathy, thank you so much for being here today.

John Bailer Thanks, Kathy.

Rosemary Pennington Stats and Stories is a partnership between Miami University's Department of Statistics and media, journalism and film, and the American Statistical Association. You can follow us on Twitter @StatsandStories, Apple podcasts or other places where you find podcasts. If you'd like to share your thoughts about the program, send an email to statsandstories@miamioh.edu, or check us out at statsandstories.net, and be sure to listen for future editions of Stats and Stories, where we discuss the statistics behind the stories and the stories behind the statistics.