Historical Data Finding | Stats + Stories Episode 334 / by Stats Stories

Hanley is a professor of biostatistics in the Faculty of Medicine at McGill University. His work has received several awards including the Statistical Society of Canada Award for Impact of Applied and Collaborative Work and the Canadian Society of Epidemiology and Biostatistics: Lifetime Achievement Award.


Episode Description

We leave data behind as we travel across the internet, our preferences and purchases transforming into a veritable goldmine of information for companies hoping to convince us to buy their new product or service. We often imagine this data mining and tracking as an invention of the so-called information age, but Victorians were tracking and mining data too. That's the focus of this episode of Stats and Stories with Dr. James Hanley

+Full Transcript

Rosemary Pennington
We leave data behind as we travel across the internet, our preferences and purchases transforming into a veritable goldmine of information for companies hoping to convince us to buy their new product or service. We often imagine this data mining and tracking as an invention of the so-called information age, but Victorians were tracking and mining data too. That's the focus of this episode of Stats and Stories, where we explore the statistics behind the stories and the stories behind the statistics. I'm Rosemary Pennington. Stats and Stories is a production of Miami University's departments of statistics and media journalism and film, as well as the American Statistical Association. Joining me is our regular panelist John Bailer, emeritus professor of statistics at Miami University. Our guest today is James Hanley. Hanley is a just-retired professor of biostatistics in the Faculty of Medicine at McGill University. His research interests include the history of public health, epidemiology, and mathematical statistics. Hanleys work has received several awards, including the Statistical Society of Canada Award for Impact of Applied and Collaborative Work, and the Canadian Society of Epidemiology and Biostatistics, Lifetime Achievement Award. He's also the co-author with Elizabeth Turner, and has an article in Significance about Victorian data mining. James, thank you so much for joining us today.

James Hanley
I'm delighted to talk to you guys.

Rosemary Pennington
So your article in Significance begins with this discussion: how this grocer who is selling coffee started weighing its customers, why did they start doing that?

James Hanley
We think in the 21st century that loyalty programs are a new idea. Yeah, I feel like a lot of things. There's no new ideas, there's just old ones recycled. And this is a loyalty program. I'm sure they were very smart people trying to stay alive and keep their business alive.

Rosemary Pennington
In London, you could come and get weighed, and that was going to bring you back to the store.

James Hanley
Yes, of course, I say at the end of the article, how would you even work to when I was growing up, I grew up in a small island, off the coast of Ireland, about a mile offshore, we only got to the mainland for, you know, the big goods, and the dry goods and all that sort of stuff twice a year in the summer. And there was a guy there who sold flour in big bags. And he would put us up on the balance beam as well. And he would say to us, oh, your mother, what are you feeding this guy he's growing so well. So that was the service, basically, the little side service that cost nothing, but made sure we didn't go across, or my mother didn't go across to her father to the other street where the other merchant runs their business. So it's not a new idea.

John Bailer
I love this idea that,you know, getting your weight is something that was a service that was provided by, by people selling goods, you know, it's so easy for us in modern times to step on your scale in your bathroom in the morning to say oh no, or whatever.

James Hanley
And track it on your iPhone as well. Exactly.

John Bailer
So this was you talking about this? There's this history of measuring weight, which I thought had sort of emerged as part of your story. Could you talk a little bit about kind of where things you've, you've started to talk about where things were? How did things start to change over time with the expansion and you call democratization of self tracking of weight?

James Hanley
I remember as a kid in the 1960s and 70s. The first train I took at the train station, there was a big girl weighing scales with a big circular thing showing with the hand so you put in a penny I think it showed your weight and I think for another penny they could print out a little card with a punch hole in it as to where your weight was. So it was because weighing scales came in at home I think in the 30s or 40s and 50s. They were in public places in the late 1800s. And Barry says he was surprised that weighing scales came so late to Ireland but you know they probably were 30 years or 20 years ahead and in the USA we had weighing scales on our farm but it was spring scales. We sometimes weighed our siblings, my parents with our siblings on it, how you did it. You got the diaper, which is a cloth diaper in those days, the four corners of the diaper you stuck a little hook into and then you weighed the baby to make sure the baby was going okay. So this is growing up in rural Ireland and rural America. I'm sure they had exactly the same thing. So farm instruments were very multi-purpose, and it was important to know your weight.

John Bailer
So can you talk a little bit now about what data was recorded, you know, and this in the shop where people were going to the coffee shop and getting very little?

James Hanley
I didn't actually, don't wait there, I had to cover it up inthe article because it was an agreement about confidentiality, these are only two bots to be consulted by, you and yourself. They were kept behind the counter and the ledger, all they did is they put your name down first, and they kind of opened up a chunk of the page for you. They put the date you came in, and they put your weight in stones and pounds, and to the nearest quarter of a pound, because they had the instrument right there, a really fancy instrument that was good to a quarter of a pound wood weights. We weighed ourselves on it in 2009. So nothing has and then sometimes a remark heavy boots, overcoat. And when they started weighing the women, they were always having excuses for why they were heavier, lighter, something like that. So just that, and then serially, one column, one row after the other, they just kept going. And there's one or two of them that are page long with 70 or 80 entries. Yeah, so that's how it worked, very, very little. And oh, the thing that really troubled us is that we didn't have their ages and neither did Galton. Galton didn't have their ages. And that's why he picked a very public group, there was a group of members of parliament in England, members of the House of Lords. They were living nearby in the wintertime when the government was in session. And so he could get their birthday from public sources. He needed that to calculate their age. So you didn't know their age, and most of them were adults. And you could figure if they were 50 years there, you could make a rough guess. And in the ladies ones, there's three or four, there's a few instances where the ladies brought their daughters or sons with them as well. And in one of my graphs there, you'll see that I had to kind of guess at what age the daughter or the son was, because she hadn't reached 18 yet, but I don't know her age exactly, but I kind of slid it in until she stopped growing and put it that way.

Rosemary Pennington
I was gonna ask what made Francis Galton want to study this data? Like how did he find out about it? And why did he study it?

James Hanley
Oh, Francis Galton measured everything that could move or think or do anything biologically. He started off with his, he was a measure all the time, he would have measuring instruments in his pocket walking down the street measuring the beauty of English women. He was more seriously interested in heredity. So he was growing peas early on. And then he tried to get data for children for him because he was a first cousin of Darwin. And so he got turned on to this area of when, when Darwin published his book on evolution and heredity, that's when Galton turned over into this site. He had been interested in geography and other things before he had discovered anticyclones in meteorology. He was just a polyglot, an amazing polyglot. Of course, he had some bad ideas and races on eugenics too, but who will take the good as well as the bad? So he was just interested in everything. And I think it was later on certainly a little later in than it right and coming up to the 20th century, the British were fighting overseas that were in the Crimea. They were in at the very end, they were in the Boer War, and the British Psyche was starting to lose confidence in itself, especially the thinker's and so on, that the British race was going down. So looking back at it now, I think he was trying to document that the British race was getting soft and losing, they're losing their mettle. And so I think that, in fact, we kind of say that the analysis is pretty awful. The longitudinal analysis he did, which here he had lovely longitudinal data, but he didn't know what to do with it. And his analysis is really bad. And his curves are fit by hand, you know, and he puts a straight line where they were, I'm sure at the end, for the last generation where I'm absolutely sure if he had followed them properly, it would have curved downwards as well. Because in an earlier version of the paper I wrote about what I'll be a weight over your age look like and they should dip down and you come to 60 or 70. And you should start going down. And Shakespeare even says that in his seven years of seven stages of met, most of the time, people lose weight at the end. But in the third curve, they're the one for the youngest generation. He had it going up and then it gave them a chance to complain about how this generation is out of control basically, is what he said. So I think it was political. As much as anything, he talked about our race and so on, but he collected anthropometric data cross sectionally on people in an entrepreneur metric lab set up and it, for the world, was there in England in the 1880s, I think. So he was interested in measurement of all kinds. He never calculated a standard deviation. He didn't know how well their standard deviations were there back when he'd ever calculated one, he always used the interquartile range. And half of that was his problem there. And he had very clever ways of summarizing data as well, which we can come to in a minute.

Rosemary Pennington
You're listening to Stats and Stories, and our guest today is James Hanley.

John Bailer
So one of the things that you talked about when looking at these trajectories, when he could, when Galton could figure out an individual's measurement, would you have these measurements over time? Yeah, and then, but Galton then looked at kind of within person fluctuation. Oh, and that's between the intergenerational differences. Could you talk a little bit about how Galton evaluated this within person fluctuation?

James Hanley
That's really fascinating to me. And I think because he wasn't the modeler, although he loved the normal distribution, he never calculated standard deviation. He just was there. He worked from the quantiles. But it was a genius piece to me, how he said, Imagine you're gonna, he didn't say it. But I imagine if he was telling me, so you're going home Hanley from the pub in Ireland, and it's a straight line to your house, but you wander all over the place. And he says, If you could measure the track of what you did, as a distance, and put that as a ratio over the straight line as the crow flies one, and that little ratio, I'll tell you how wavy the line is. It's magic to me when I saw it. I couldn't believe I said that you can explain this to anybody. Right? Like that road and you can do it. And when the editor of Significance said to me, Jim, you'll have to explain what that means. Because I didn't know the convoluted way and his language is convoluted. And I said, just go on Google and take two alternative routes from one city to another. And there's the you know, the highway, that's probably the straight line way. And then you can look at the local roads that wave back and forth. And if your car would just record them automatically, you could take the ratio of one to the other, and that's a measure of volatility and unreliability of data. And he called it. Yeah, so that's what he did. And he had a guy who measured that with a little machine, the same kind of a machine with a wheel that you use for measuring distance when you're doing things. So he was a genius at instrumentation and measurement. Yeah. And the only trouble I see is that the statistical properties of this estimator would be miserable.

Rosemary Pennington
I was interested, you know, being a journalism professor in that the New York Times actually wrote an editorial about Dalton's research. Why do you think that the New York Times picked up on this? You're better known?

James Hanley
No, I have to go back to what was on their desk. What were the political issues in 1884? I have a feeling because they mentioned Texas and got and Gladstone and maybe it had to do with the fact that there was something about soft and beater that were drinking beer at the time, Gladstone was trying to put import or export duties on something in England. And I suspect it was, you know, political, and they used this, they had great fun saying the British are going to go soft, and the House of Lords is going to go extinct. Right. You see that there? I think they were having fun with the British. I don't know what was going on in 1884 between the US and Britain. But you know, they always love to nudge them and needle them a little bit about the bridge, sir. You know, they think they're great, but they're going under the table. You know, I think I think it was political. Yeah, there was a lot of tongue in cheek and it but Iā€™m not enough a historian of those days to know otherwise. But it was fun to see. And when I wrote to Mr. Berry, I didn't put this in the article. But Mr. Berry, I can tell you guys now because we're on the record differently. Mr. Berry is gone from the company. When I showed it to Mr. Berry, he thought he was the CEO and I in turn 20 whatever year we did 2014. I think he said, and I'm glad to see the British aristocracy, the British peers have not disappeared. It took Tony Blair and the other guy is finance George Brown or whatever his name is. The guy who followed him in England, it took those guys to take away the peerages in England. So there's politics everywhere. died out because I don't think people would understand.

Rosemary Pennington
I appreciate, in the New York Times, his editorial where they wrote that it even predicted the House of Lords might quote, abolish itself as its members grew thin and fragile was the culture of Bostonians. Well, I was like,

James Hanley
Yeah, double. And there was obviously somebody on the desk, you know. And what I was just interested in is how quickly news traveled across the world at that time. We're writing another historical piece on how a famous experiment in Paris, while the news, was a human medical experiment with a vaccine, how that traveled to the US, and I've tracked that, and it's great fun to see how news traveled back then. And most of the time once it got to the US, then every newspaper from here out to California just repeated it with exactly the same word. So it's a story by itself. Yeah, travel of news in those days.

John Bailer
This is such an amazing story. I mean, it's amazing to me also that there still was a Mr. Berry involved in running this company in 2014 for you to talk to.

James Hanley
Oh, they are so proud of that company that's been there for 300 years. And they let us, when we photographed the ledgers, he wasn't there. We said I'm going to arrange for you guys to be brought into the boardroom. They will have a small boardroom of the company in the back. And those bottles up in the wall of Cutty Sark, Cutty Sark was their first entry into whiskey. And they blended the whiskey. And he was a genius there as well. When they were coming home from Europe? After the war, they gave him all sorts of Cutty Sark to bring home with them. And I don't know how that, oh yeah, you have to be ahead of the curve when you're marketing with liquor and coffee or any commodity because some other guy is going to be smarter than you. So yeah, but he was very proud of all that. He said they stopped doing that, if you go now, I don't think you can be Wade, it's too much of a distraction. They're not keeping, they're not keeping up the records anymore. Yeah, but he was fun to deal with. But his clients, you see her next door, they're the queen and the king. Sorry, now the king of England, you know, they had a warrant in the video, one of the things I did is I put it on my website and it is right, the link is at the bottom of the article I put on several videos. So you can actually visit and see the scales yourself. That's actually key. Because you'll get the fun of it. You can see the people today are just as driven and proud. And then they have a video showing some of the early ledgers. They have one or two that they do behind the scenes now on a locked cabinet. But you can still see them they're all bound, they bound the ledgers, I guess after people died, I don't know how they arranged it to know how many times you've come in. But they may have transcribed the numbers again to a different ledger so that they were easy to find. And then there's an index to each book. So that you can find where your name is and then you can go in and find by page your record is on. So there's a page or half a page, whatever. But I imagine they waited until the end before they knew what the record was, finished.

John Bailer
So one of the things I noticed was that you had shared an earlier version of your paper, and it was basically tracking different birth cohorts and how weight changes in different birth cohorts. Can you give us a quick summary of what's happened in this sort of way to the present cohort and trajectory of weight gains and where?

James Hanley
They're like, well, we're getting a little older, a lot older, we're getting bigger, taller as well. So that's a little bit of a problem. And they didn't measure height. So you're kind of stuck with that. So that it's hard to take that now, then they did not do it. But we would get heavier, there's no doubt about that. You can see it in the one. I'll have to put that on my website. Yeah, I put that on my website. So people can actually look at the American courts. I think they're, I think you're ahead of the British there. A little bit. Yeah, a little bit. Some pretty well. The other thing that's interesting about weight is back then, being thin was scary for people. We wouldn't get married, you wouldn't be a great marriage prospect. If you were thin, you are a better marriage prospect if you were a little more rounded, and in fact, Pearson says that we pick the well rounded people and being good and strong was equated with heavy weight back then. And if you're thin, it meant you might be dying of tuberculosis or something like that. So even the perception of weight and height was the good end of the scale and the poor 100 scale has changed. So it's very different but they rounded class in those cohorts, their birth cohorts and gone through time where they were holding their own with us now. Yeah, yeah, I mean, if you look at the portraits of portly gentlemen, I guess you'd call them.

Rosemary Pennington
This is such an interesting historical story. What do you think that we now, who are teaching stats, are using stats? What can we learn from this? If our classrooms are for our own work?

James Hanley
I think first we should be interested in raw data and AI and collecting raw data, I don't think we should ever take data from the back of the book for anybody. There is no reason in the world now why we can say to everybody get out your phone, and tell me how you were exercising during COVID? How did COVID change you pull that data, mind your phone and take it out of there? Honestly, you have to make teaching of data that you have to make the data personal to the people, so it is relevant to them. And that's number one. So data and the closer to you with self it is the more relevant and the more interesting the story is, especially if it's something you don't know the answer to, you know, data about tossing coins and stuff like that, where you kind of know the answer already is not very interesting. Even the data about where you want to roughly know, or how deep the oceans are, which is another article I've done, you gotta go, it's easy to go and get those data now and do it from scratch, see the problems of getting the data to see the problems or recording them all the practical behind the scenes, things before you get to do the analysis. So data collection, to me, is primary. And I guess the other thing is showing that a lot of our life, even if we don't say it, this statistics involved in our life is statistical reasoning every day of our life, when you think of something causal or something, this is something that's so trying to make statistics relevant for young people. So if I were doing now I don't have a phone myself, but I would be getting our students to use phones for everything. I do get them to use phones to measure when we have students stand up and give a presentation, I have the rest of the class estimate the students, the presenters height, that's a really so to me being practical and bringing statistics of everyday life into the classroom and not making it a course on mathematical statistics or mathematics. That's to me, the biggest problem with the way we're teaching at the moment, is where we are teaching as mathematics.

John Bailer
You know, I find it just marvelous that the examples that you've just described, and using in your class sounds like something that Barry brothers might have done. Exactly. You're embracing anthem, anthropometric kind of measurements and, and using that as your entree.

James Hanley
It's gotten, who saw the value of them, you know, God, the other data sets I love, and I use in class sometimes, are, there's two unusual datasets where they're following climate change. And you know, you're leading or ice core records and deserting whether we're getting what's changing. But the two I love are, there's a 250 year sequence of data of when the flowers in your garden come out first, the first flower daffodils and the first everything, there's one of those from Britain, where one family has been doing it for 250 years, and a consistent basis. There's another fantastic one from Alaska. And it's a huge database, and it's hard to actually even work. You have to sample it to figure it out. It is when the ice breaks up in a certain River in Alaska. And people in Alaska didn't have a lot to do because of the wind. So they would bet on when the ice was going to melt and they put a pole in the middle of the ice in the river. They tied a line from the pole to the nearest pub, or bar, and then people bet on exactly which minute it would go, we'll be able to tell that and another story one day, it's fantastic. And now the data is so carefully done. They're exactly the minute as to what people bet on. And everyone has to be right and have it transcribed right and everything. So it's a fantastic dataset. Yeah. And it's again, you can see, I think they published it in science. You can see how the winters are advancing in Alaska. And also this is real data imaginatively, you know, it's a data mining project. But the data was collected for the lottery, it was beautifully done. Not the lottery, the competition, the data beautifully recorded, except they're all on paper. I had to scan them even to get what I did. I had to scan them out of a PDF or something. Yeah. So I think there's a lot of teaching implications using real data in class. Yeah, I'm working on another one to do with using maternal ages. You know, your mother and father's age when you were born has a lot to do with a lot of things. And mother and father's heights is what what gotten used for his first regression, your mothers and fathers heights and then your own, but there's others as well to do with your mother and father's ages when you were born and how many mutations you have now, read because of that because the older you are as a parent, the more gene mutations you pass on. So I think we need to really stay apart from getting serious about teaching statistics properly.

Rosemary Pennington
Well, that's all the time we have for this episode of Stats and Stories. James, thank you so much for joining us today. Stats and Stories is a partnership between Miami University's Department of Statistics and media, journalism and film, and the American Statistical Association. You can follow us on Twitter @StatsandStories, Apple podcasts or other places where you find podcasts. If you'd like to share your thoughts about the program, send an email to statsandstories@miamioh.edu, or check us out at statsandstories.net, and be sure to listen for future editions of Stats and Stories, where we discuss the statistics behind the stories and the stories behind the statistics.