Writing the Book on R | Stats + Short Stories Episode 222 / by Stats Stories

Roger D. Peng (@rdpeng) is a Professor of Biostatistics at the Johns Hopkins Bloomberg School of Public Health and the Co-Director of the Johns Hopkins Data Science Lab. His research focuses on the development of statistical methods for addressing environmental health problems and he has made major contributions to our understanding of the health effects of indoor and outdoor air pollution.

Episode Description

Impacting statistical and data science communities is an aspiration that many of us share. Outlets for such impact include work environments where we may collaborate with interdisciplinary teams. Other newer outlets are podcasting in a variety of publishing platforms. Today we will explore the origin story of such a contributor with guest Roger Peng.

+Full Transcript

John Bailer
Impacting statistical and data science communites is aspiration that many of us share. Outlets for such impact include work environments where we may collaborate with interdisciplinary teams. Other newer outlets are podcasting in a variety of publishing platforms. Today we will explore the origin story of such a contributor. I'm John Bailer. Stats and stories is a production of Miami University's Department of Statistics and media journalism and film, as well as the American Statistical Association. Joining me as always is regular panelist Rosemary Pennington of Miami's MJF department. Our guest today is Roger Peng. Peng is a professor of Biostatistics at the Johns Hopkins Bloomberg School of Public Health, and the co director of the John Hopkins data science lab. His research focuses on the development of statistical methods for addressing environmental health problems. And he's made major contributions to our understanding of the health effects of indoor and outdoor air pollution pings, the author of the popular book, our programming for data science, and 10 other books on data science and statistics. He's also a fellow of the American Statistical Association. Roger thank you for being here.

Roger Peng Thank you for having me

John Bailer You know, Roger, I gotta tell you I'm a huge fan of the MOOCs and lean pub. And I've recommended to a number of students to, in fact, do your MOOCs for our programming, and some of the other components of that is, some background preparation when we didn't have some of that formality in our department. So I'm just curious, you know, how do you go from doing academic biostatistics work to, you know, to jumping into MOOCs, the massively open online courses, and then and then publishing via lean pub. So just let us know how this started.

Roger Peng
Ah, yeah. So I mean, I think one of the things that has been really helpful for me, it's just being in a department that just lets you kind of do whatever, you know. And it's been very supportive. And it's just, I think it's unique in that way. And, you know, ever since I got to Johns Hopkins now, um, you know, almost over 18 years ago, we, a bunch of us have been talking about what we should be putting content online, and we should be putting it on the web, and, you know, but the really, the reality was, there was no mechanism to do it, there was very, I mean, we will be building our own content management systems that are own servers, and it was just a, you know, it would have been a nightmare, right? So we talked about it for a long time, but didn't do anything about it. And at some point, you know, these platforms started coming online, one of them was Coursera. But there are other ones too, and they came to the University saying, Hey, did you have any people who, you know, who want to do this, and it's one of those kind of, like, you know, luck favors the prepared mind, you know, it was we had been talking about is forever, and here they are, with this golden opportunity with exactly what we needed to have a platform, they had a delivery system, they had, you know, a whole thing that we didn't, we wouldn't have to build ourselves. So we thought, we'll give it a try. And so we started with a few courses at once, you know, in the beginning, and then, and then we launched our whole data science program. And, and I think, in that time, we really benefited from a kind of first mover advantage, you know, we had the, one of the first sequences up there, and, and, and there was kind of nothing else out there for a long time. And so, and I think at that, even in 2013ish, it wasn't necessarily immediately clear where the data science thing was going. And, and so it's, I think it helped that, you know, data science really kind of grew and then ours, our program was just sitting there waiting for people to take it.

Rosemary Pennington
I know, you've written this book that is very popular about AR. I wonder how you approached writing that because it feels like it feels like such a massive thing to try to, like, get your arms around and figure out like, what does someone need to know. So I wonder if you could just talk about, like, what your approach to writing that book was?

Roger Peng
Well, my approach to writing any book like this is to travel back in time, like 10 years, then write the book, because, frankly, it was a lot easier to write the book in 2013 than it would be now. I've been using AR since 1997. And so it was a much smaller language back then. And even in the early, you know, or mid 2000s, you know, it was still a much smaller language. And I've been teaching it for, you know, for a long time, I had a lot of notes. And so, that book essentially assembled all my notes together. And, and, and it really covers the kind of base our structures and foundations and doesn't really, and I think one of the nice things about it is, it does have a little longevity to it, because I didn't really cover anything that was kind of cutting edge at the time. And, but and so when I look back at it I think almost everything in this book is still correct. But it's, but there's but the universe of ours has grown so much since then. And so it's obviously missing a lot of things that are a lot of things that I use every day are not in that book. And so it's but you know, so it's, but you have to make choices. And I think that's one thing that we really emphasize in our program, which is that we have to make choices about what to teach and what not to judge and the more choices we make for them, the better it is.

John Bailer
So what kind of biostatistical work is the focus of your kind of research world these days.

Roger Peng
So I promote primarily involved in in the in studies that look at the health effects of indoor and outdoor air pollution. So outdoor air pollution is kind of like, you know, what, what you would see outside when you know, when Hillary looks out her window, and it's a wildfire haze, you know, that's outdoor air pollution. And then indoor air pollution. You know, there's a lot of types of pollutants inside the home, whether it's secondhand smoke, or it's allergens or it's mold or so there's a lot of complex environment inside people's homes. And one of the things that's interesting about that work is that, you know, we often do intervention, so where we can try to modify people's homes and improve them to kind of improve their disease, morbidity. And so there are a lot of interesting statistical methods that are associated with these kinds of studies, whether it's kind of causal inference or mediation and The intervention trials or its kind of time series analysis or spatial statistics and the outdoor air pollution stuff.

John Bailer
Well, thank you so much, Roger, for taking the time to join us on this issue of Stats and Short Stories. It's a pleasure to chat with you. Stats and Stories is a partnership between Miami University’s Departments of Statistics, and Media, Journalism and Film, and the American Statistical Association. You can follow us on Twitter, Apple podcasts, or other places you can find podcasts. If you’d like to share your thoughts on the program send your email to statsandstories@miamioh.edu or check us out at statsandstories.net, and be sure to listen for future editions of Stats and Stories, where we discuss the statistics behind the stories and the stories behind the statistics.