The Data Economy | Stats + Stories Episode 213 / by Stats Stories

Harkness writes and presents BBC Radio 4 documentaries including the series FutureProofing and How To Disagree, and Are You A Numbers Person? for BBC World Service. She formed the UK’s first comedy science double-act with neuroscientist Dr. Helen Pilcher, and has performed scientific and mathematical comedy from Adelaide (Australia) to Pittsburgh PA with partners including Stand Up Mathematician Matt Parker and Socrates the rat. 

Her latest solo show, Take A Risk, hit the 2019 Edinburgh Festival Fringe with randomized audience participation and an electric shock machine. A fellow of the Royal Statistical Society, she’s a founder member of their Special Interest Group on Data Ethics. Timandra’s book Big Data: does size matter? was published by Bloomsbury Sigma in 2016.


Episode Description

Do you remember the first time you saw a prompt in social media asking about a product you were searching for on some other online platform? How about the first time you received coupons sent from your local grocery that incentivized buying your favorite consumable items? Today’s episode of Stats+Stories focuses on the origin, expansion, and future of the data economy with guest Timandra Harkness and guest host Brian Tarran.

+Full Transcript

John Bailer
Do you remember the first time you saw a prompt in social media asking you about a product you were searching for on some other online platform? How about the first time you received a coupon sent from your local grocery that incentivized buying your favorite consumable items. Today's episode of sets of stories focuses on the origin, expansion and future of the data economy. I'm John Bailer, Stats and Stories is a production of Miami University's departments of statistics and media, journalism and film as well as the American Statistical Association. Joining me as a panelist is Brian Tarran, editor of Significance magazine, Rosemary Pennington is away. Our guest is writer, comedian and presenter Timandra Harkness. Harkness writes and presents BBC Radio for documentaries and is the author of the book big data to size matter. She is a fellow of the Royal Statistical Society, a founding member of their special interest group on data ethics as well. More importantly, for today's program, she contributed a four part series in Significance magazine on the data economy to Montreux. Thank you so much for being here today.

Timandra Harkness
It's a pleasure.

John Bailer
Oh, it's so good to have you here. Again, I really enjoyed your series of columns. It was really interesting to see this collaboration between Significance and Impact in this magazine of the Market Research Society. But to start our conversation, I'd like to know how do you define the data economy?

Timandra Harkness
That was quite a tricky point. Because when you say the history, the data economy, you think, Well, that could be almost anything, but it's only come up to today, there's very little in the economy that data isn't part of. But I suppose what we had in mind was something a bit more specific, which was the way data works in marketing and market research and distribution and public surveys. Because I think that's got a very specific history that goes back to maybe the mid 19th century, and obviously comes up to the present day, and is only getting more important in the future. So really, data about the public, if you like, and what the public are interested in what the public want, what the public like and don't like, partly as it applies to marketing products. But there's also a huge crossover with the way that public bodies and governments use data to understand what the public are thinking and wanting and what they might go for or not go for or even I mean, lately how we might behave. Although that's less a part of the series I wrote.

John Bailer
Yeah, I'm sure we'll get back to this issue about describing what you might do versus what you do as part of what's being measured. But I'm curious, and you know, both you and Brian can can probably reflect on this, how did Significance pairing up with Impact on this project?

Brian Tarran
Well, I mean, for that link, between the two magazines was quite, it was quite straightforward. So I was the launch editor of Impact magazine. So I helped to create it when I worked for the Market Research Society, just seven years ago now. And so, you know, I knew the editor there. And I thought, you know, this is something that will be a shared interest to our audiences, market researchers and statisticians and data scientists. And I think for us, the idea came out of one of our editorial board members who had suggested a series looking at companies that work with data in all their different guises. And as we talked further about it, we thought, actually, we could, we could tell a story about the history and you know, where it's going next, you know, for these businesses. And for me, obviously, when I was at impact when I was at the Market Research Society, that's about 15 years ago, I started working there. And I felt like it was a really important moment in time for the market research consumer insight industry, because in internet research, business was just starting, I remember going to an early meeting, where they talked about how to make internet research acceptable to companies. And then we had social media, and then mobile phones and so all these new ways of finding out about people and connecting with people. And you know, I hadn't worked in the market research space for seven years, I actually really liked the idea of the series because I could reconnect with what was going on now. And, you know, through to managers writing about it. Understand, you know, how it was all connected, how the past sort of led to the present and where we're, where we're heading to for the future.

John Bailer
So Timandra, how did you get involved in this project, but how did it start?

Timandra Harkness
Well, I have written various pieces for Brian for Significance, ranging from the history of 17th century proto statisticians to interviews with people who are working with data at the moment. But when he suggested it, I really liked it because I'm in the middle of writing my next book, which is really about how things are personalized, and perhaps more importantly, why everything is personalized. So I was already starting to look at that question of what has changed between maybe the early 20th century where, obviously, adverts were kind of targeted according to what publication they were in or who they were trying to reach. But the difference between that, and today when my mobile phone knows kind of where I am, and what my browsing history is, and can target things, as it thinks really specifically to what I want, although maybe not in practice. So I was really interested. And in fact, one of the things I really enjoyed was that crossover between the history of market research and the history of data, and being able to hook up with some of the people in the Market Research Society who kind of lived through those key changes. And were able to say, Oh, well, you know, we used to do this, and then this came along, and we weren't sure how much it would change everything. And then we found that it did actually change everything, or some of them going, well, I could see that this was gonna change everything. But when I first raised it at my company, they were very skeptical, and I had to convince them.

John Bailer
Could you just give us a little bit of the sort of the foundation, I mean, you've referred to history, and this kind of this coevolution and history of both kind of this interest in examining consumer behavior and insight, as well as kind of the data that's being produced. And I mean, in fact, you're the first part of this, the series is, subtitled this the birth of consumer insights. So give us a short history lesson.

Timandra Harkness
Well, that goes back really to the 19th century, when obviously, it was all analog, it was all paper and pens. But interestingly, I think the United States really drove that because you had a very widely distributed market and mail order was quite important. And people were perhaps quite geographically spread out. And that had a couple of effects, one of which was that mail order and mail order lists became really important and so valuable, in fact, that lists of potential customers, maybe for your mail order medicines, were so valuable that there was a falling out to two men in Chicago, who used to work together had a had a mail order business together. And then they fell out. And there was an argument about who got the mailing list and who had stolen data off the main list, and one of them ended up shooting the other one dead. That is probably the first lethal crime connected to data theft. So you had this thing where mailing lists are really important. And that was obviously about how do you get hold of not only the names and addresses of people, but they would get people to write in with their symptoms. So you had quite sensitive personal data you had, like somebody's name and address and the symptoms they had, and then they would sell these lists on to other suppliers. And you know, there wasn't much data ethics, I think, in those days. But there was also this other thing going on, which became important later from another direction, which was, if you're moving around, people didn't know who you were. And so you go into a store and ask for credit, say, you know, I've got money coming in, can you let me have this stuff on credit, and I'll pay you when I get the money, and nobody knew who you were. So lists of people who were reliable or unreliable, also became very valuable for the, for the merchants, the shopkeepers. And so that was quite an early thing. Again, this started to happen in the 19th century, and those credit lists grew up and became credit agencies. And then those merged with consumer lists, and became the kind of data brokers we have today, where they have a history of your, your credit behavior, and also have stuff that you're interested in stuff that you've bought, and now they can put it together with your address and how to find you, and so on. And that's why they're so immensely powerful as sources of information today.

Brian Tarran
So these would be the people that essentially try to categorize groups of consumers right into certain types and would say, you know, these, these people would buy this x product versus why. And that's only continued really as we move from, you know, those early days until we get into the social media and the sort of mobile phone era where you can actually categorize people down to even finer detail, right.

Timandra Harkness
That's right. I think very early on it was about very broad categories. It was okay, on the sales side, you got maybe people who've suffered these symptoms or we know these people are of a certain sex and age and they live in a certain areas, so we know that they're going to want certain things and if if they've bought farm implements before, they probably want some more farm implements. And on the other hand, really quite a binary kind of scale of Do you trust these people with credit or not? And now, the categories that you put people into, can be so small that maybe there's only literally one or two people in a postcode or zip code area who will fall into that category. So when we talk about things being personalized, that it really is, you're still essentially just giving people a niche, marketing niche, but the population that they're in can be so small that you can meaningfully say it's personalized. And that obviously brings ethical problems about privacy. But really, if you think about the 19th century, and these, these lists where you'd written to a magazine asking for advice, or your, your hemorrhoids or something, and somebody has now added you to a list sold your day, but a dress or a list blocked hemorrhoids, to some company you've never heard of, then that was also arguably not very respective, totally respectful of your privacy.

Brian Tarran
You know, when I was reading that firstpart, the comment about some of these early research pioneers, Archibald Crossley is one that you mentioned, and the invitation to set up a research department. And his response was, you know, what is it? You know, I don't know, either. And I always thought, if you were to go to take from that time capsule back from that past today, you know, now the question is, you know, we want to set up a data science department, you know, what, what is it? What should be part of it? So I thought, I thought that the evolution also of the support systems, to kind of frame the research that's being conducted, as well as the data that that were being collected was was a pretty interesting part of the story. Can you talk a little bit about that?

Timandra Harkness
Yes, I think so. I mean, that's the slightly different side, I guess that that comes from the idea that you can scientifically study populations and get an insight into what they think. And that that was a really interesting marriage, I think of commercial forces, but also psychology that psychology was becoming a more quantitative discipline. And the behaviorists were going oh, well, yes, you know, we, if we, if we give people the right stimuli, then we can predict how they're going to behave just like rats and pigeons, which is not entirely true, I think. But that was certainly an approach that fed into the early pioneers of doing opinion polls and research. And they also, because they were a bit more statistically rigorous, they were more successful. So you know, you've got the famous story of Gallopin self saying, Well, you know, I can beat these ad hoc newspaper surveys, because I'm actually being a bit more methodical and thinking, you're going to reach these people, but you're not going to reach these other people. And therefore, your results will be skewed. But I can, I can allow for that. So I think the role of early quantitative psychology, in those in advertising essentially, at that point, although also, I think, in public opinion surveys for other reasons, was quite important. And it brought together this idea that you can get insights into people that go beyond what the people themselves, consciously know about themselves, with the idea that if you're statistically rigorous, you can get results that will enable you to make predictions in the real world, which, as we know, actually holds true in all sorts of fields, and medicine and weather and all sorts of things. So, the early days of people saying we can bring this inside and use it, it benefited from that hype of everything today is scientific, everything is modern. Psychology is quite a new science. But you know, it was this. It was like the big data of its day in a way we can use statistics and psychology and we can tell you what your customers want and what they're going to do. And then the advertising agencies themselves wanted to be able to sell to their potential clients and say, we are much more scientific and methodical than our rivals. We have a research department, hence he said that great stories like you know, super Research Department, of course, I would, what is it? I don't know, either. There was actually a much more recent echo, I think one of the later pieces. Somebody saying, oh, you know, I got this job at a big company as head of social media research. But none of us really knew what it meant. But it was a kind of a sign that we thought it was going to be important. So in the early days, I think there was an element of snake oil, there was an element of hype or slightly over promising what you could actually achieve with research. But then, of course, as they went on and started to genuinely compete and develop their methodology, then it did become useful and they did find that they could actually predict, to some extent, what people will do. And if you're in marketing, you don't need to get it right with everybody, you just need to get more numbers right than your rivals are.

John Bailer
It's very interesting to me. I've worked with some folks that were quantitative psychologists and Horne performance testing groups very early in my career. And seeing that community represented there, it was really an interesting phenomena that this was long before the kind of big data investigations. And so I find that part of it being woven into your story to be fascinating.

Brian Tarran
I, the thing that I really, really found intriguing was how we move from, I guess, survey based research, opinion based research to much more behavioral based research, if you like, or research on human behavior, rather than people asking people what they thought actually observing what they did. So, you know, can you talk about that evolution and how that came into the story.

Timandra Harkness
That was, I think, as we started to use technology much more in our everyday lives, both for actually looking for things and buying things directly, but also, for all sorts of other things, traveling, looking up information, even communicating with each other, then suddenly, there was a lot of data available that you hadn't had to go out and collect yourself. And I think the change there was that people started to realize there was all this data in the wild. That wasn't, if you like, skewed by asking people questions. So people weren't aware that they were being observed in many cases, or that they were being asked questions. And so they were acting much more naturally, and if you like, authentically, and so I think a lot of the researchers started to think we can actually get what people really do rather than what they say they do. When I interviewed somebody from PepsiCo, he said, well, we know, you can ask parents about what they put in their kids lunchbox, and they'll say, oh, you know, I put in vegetables and healthy things. But actually, we asked some families if we could put cameras in their kitchens. And when we did that, we discovered that, yes, the dad was putting this healthy stuff, the kids lunchbox, and then the kid came home from school, and ticked all that out of his lunchbox because he hadn't eaten any of it. And he wouldn't get that from asking because obviously, if you ask someone what to put in a kid's lunchbox, they get older, and it's all very healthy. So that kind of insight was seen as very valuable. But then the flip side of that is because you haven't collected that data yourself, and you haven't taken the care that you would take statistically, if you were doing the survey and making sure that you ask the questions in an unbiased way, and so on. The data isn't as good quality. And so how do you? How do you compensate for that? And I think for that reason, there was quite a lot of skepticism early on, people said, Well, okay, yes, all this stuff is out there. But what can you really deduce from that? And also, is it really that unbiased social media data? In particular, you think, Okay, well, you're not responding to a survey from us. But you are posting things that you know, other people are going to see. So, I mean, I don't know if the stuff I post on social media is a highly selective picture of my life. Certainly, it doesn't include all the chaos and so on. So I think people were reasonably skeptical of what you could actually glean from that. But then realize that if you combine the two, the survey is you're directly asking the question, and people are consciously responding, giving you their conscious answers. And you can, you can control who they are and how you've chosen them, and so on. And then out there in the wild. People spontaneously tweeting about what they're up to, and what they want other people to know what they're up to, then that can give you some pointers about things that maybe you hadn't thought to ask. I think that one of the differences is somebody, I think it was Ray Pointer, said very well, you know, its very good at answering questions that you haven't asked, but it's not very good at answering questions that you have asked. So you need to bear in mind the limitations of both of them.

John Bailer
You're listening to Stats and Stories. Our guest today is writer, presenter and comedian Timandra Harkness. In the fourth of your Significance Impact articles you wrote about Tim Berners Lee's concern that the Internet has become a machine for monetized surveillance rather than an ecosystem of cooperative sharing. I love that that's brilliant to Mandra that and that a new vision of the web might be emerging. It seems like we're, you know, starting where you've just been commenting about this idea of moving from survey to observation. Then you start talking about fusion of data to ultimately the president. Can you bring us from that past into where we might be now?

Timandra Harkness
I think where we are now is there is a lot of data out there and The difficulty is, how to select it and how to use it for real insight. And the other thing that's changing, I think, is that in the last few years, we're all much more aware of what data is out there about us and how it's being collected and who's using it. And this has changed people's attitudes, not only as individuals, but I think there's a lot of governments and regulators are starting to say, you can't just essentially stalk people everywhere around the internet and watch everything they do and eavesdrop on their private conversations as they believe, and then use that to sell them stuff because it's it's not right. It's exploitative, and it's on equal, it's an asymmetrical relationship. So there are definitely moves to regulate what can be done, and how much personally identifiable information about each of us can be used and sold and passed on. So one of the big things that's coming out looking at the future is giving people much more individual control over individual data about them, either in private parts, or data stores or data trusts, and letting people authorize other people to use their data, for reasons that have to be explicit. And in return for certain benefits. I think there's an interesting comparison to be made maybe with the Tesco ClubCard when that was introduced, and they very explicitly said, we're taking your information and in return, you'll get discounts and specialized offers, and nobody forced you to get a club card. But if you did, then that was the deal that was quite explicit, whereas a lot of data collection now is not explicit. So maybe we're seeing a move back a bit towards that. But one interesting thing, I think, is that, in response to the art, a lot of the companies that depend on data, people like Google and Apple, are just finding different ways to do very similar things. So they're finding ways that they can target advertising to each of us without necessarily knowing who we are. But inferring things from other sources of information, or even letting our technology interact in a way that protects some of our privacy, but hands over useful insights without again, necessarily saying who we are. So we could get the thing where our browser is essentially, haggling with a data broker to say, I'm going to tell you that we, we in this browser, are interested in this stuff and say, you can send us information about this stuff. But we're not going to tell you any more personal details about the person that's using this browser. I think what's interesting about that is it kind of reveals that innocence, the companies don't care about us individually, they don't care who we are, as long as they can effectively use insights about what we do, what we like and what we might buy.

Brian Tarran
Were you surprised that that was, you know, where the story ended up? Because you kind of imagined these things in your head when you're commissioning them, right? And I thought the story would be ever more personalization and infringement of privacy. But actually, you know, we've got to the point where we can get all this information about people, but we don't really need it, we don't really need it to do what we want to do. So was that a surprise to you?

Timandra Harkness
It wasn't a way that I did think that the tech companies might spend more energy, trying to get around regulation to continue to know who we personally were, rather than going, that's fine. We don't care who you personally are, as long as we can leverage the information that's there. Although I mean, in a way, I wasn't surprised, because the whole ethos of it has always been, we just want the information to do what we want to do. And we, you know, we don't really try to get inside your head, or as long as you do what we want you to do. Or as long as we can predict what you're going to do. But in another way, I think what will be interesting will be to see how much people do go down that route. Given the choice how many people go. Yeah, I want to really hide my personal information from you and only give you things that I foresee myself will benefit me because some other research I've been looking at since since I wrote the articles suggests that in fact, people have a much more sophisticated relationship with personalized adverts that although we kind of hate them, we kind of hate being stalked but we also kind of like them if they pick up on things that we like about ourselves. So if you get an advert that's personalized to you, because I don't know because you have bunions, then you don't like that. And I think you get to a certain age and they start advertising things like slippers to you. And then the next stage partner has talked to some friends the weekend And the next stage is they start advertising incontinence pads, and then funeral plans. And nobody likes that kind of personalization. But if they start advertising things to you saying, well, we can tell that you're very, you're a very caring person, or you're a very sophisticated person. So you're getting this advert? Well, that's kind of nice, isn't it? Isn't it nice that the algorithm recognizes what a caring person or what a sophisticated person you are. And that I think, is going to be very interesting that we will actually find out that maybe people don't mind that much, as long as what the algorithm reflects back to them is a nice reflection. Maybe maybe we're heading more for a future of was The Picture of Dorian Gray, that that even though the tech companies have a picture of us in their attic that's hideous and needs incontinence pads, slippers, that one may reflect back to us is is a sophisticated, caring, outdoorsy, socially responsible person. And so we continue to give them our data.

John Bailer
I'm sort of picturing this algorithmic affirmation as being the target of the future.

Timandra Harkness
I think I think there's a lot in that. I mean, that's kind of where my book is going. So that's why I'm continuing to do this research. But I think our relation with people gathering data about us is very ambivalent. I think, however much we say, oh, it's awful. And what about our privacy? There are aspects of it that we like..

Brian Tarran
Oh, definitely. I mean, this goes back to the things like recommender systems, isn't it on, whether it's Amazon or other retailers, or Netflix, or other streaming services, we like being told, Oh, we've seen you like this, you might also like this, it helps. It certainly helps speed up the choice process in the sort of quite time pressured world we live in, doesn't it?

Timandra Harkness
Yes, and when they get it right, I think it can be a good balance between being faced with a ridiculous range of choices, and no way to pick between them. And just getting recommended the same things you've already seen. I think the good recommendation systems managed to say, yes, you will like this, here's something that's a bit different. But we think you might like it, why don't you try it? And then you do discover new things. But the new thing is that statistically, you're more probable to like them.

Brian Tarran
I find this question of this tension between privacy, disclosure, and the value that you receive for having this information being released. And, you know, there's there is your point earlier about that, that there's a greater awareness that that the information that your transactions are monitored, and somehow that's feeding into the systems is becoming more people are more aware, and they may be changing somewhat, their behaviors as a consequence, maybe the browser that they're using or that they're looking to use when it's locked down. But there is also what you've just said, which is this, this value that is received, and also this idea of maybe this affirmation that might come to this. And I think that's gonna continue to evolve. And it seems like it harkens back to the issue of psychologists being involved in this discussion, you know, what, what helps what helps engage people?

Timandra Harkness
And yes, exactly, I think that's where the qualitative side of psychology also comes in that good marketers say, but if we present it to people like this, then we're making them feel good. And I mean, we no longer live in an age where you can only afford to buy the stuff, you absolutely need to keep you alive. You know, we're not, we're not the 19th century farmers looking through the Sears catalog and going, I just want the most efficient axe because that's the only thing that's going to stop us freezing to death in the winter, we spend a relatively small amount of our income. Now most of us focus on the things that keep us alive. And so we get choices, and we want to buy the things that make us feel good in some way or another. And if the process of the advertising is also contributing to making you feel good, then why would you buy it?

John Bailer
I got one question that comes to mind: is how has writing this art, these four articles, changed the way you behave? I mean, does this awareness alter your interactions?

Timandra Harkness
In a way? I think I've always been well, not always. Ever since writing the first book, which was all about data, I have been quite careful about what data I give away. And I default to sharing less data, even though a little bit of me is always said, Of course, this is rather selfish, you know, if I don't let the navigation system follow my location, for example, I'm not helping to contribute my data to making it more efficient. And in a way, it's made me more thoughtful about how these kinds of systems can be a social benefit, as well as just kind of evil tech behemoths exploiting our data for their own uses, especially when it comes to systems that can let us have more explicit say about how our data is used. I am sad to think, well actually, you know, really, how much does it matter to me if, if this system knows what items to buy online, if it can actually help things become more efficient, very often they don't, they are just for somebody to sell me stuff. But in some ways, I think maybe it's maybe a little bit more relaxed, although that is starting from a position of being ultra ultra privacy conscious. But perhaps I'm now thinking, well, maybe this isn't the only important thing, actually my individual privacy as an individual person. Maybe I should think more about what are the broader social uses this stuff is put to you is it being put to just try and get us all to change our behavior in some way or another, in which case, I should probably zoom out and worry about the broader social uses of data, and a bit less about my individual data being used to sell me slippers.

John Bailer Well Timandra that’s all the time we have for this episode, thank you for joining us. Stats and Stories is a partnership between Miami University’s Departments of Statistics, and Media, Journalism and Film, and the American Statistical Association. You can follow us on Twitter, Apple podcasts, or other places you can find podcasts. If you’d like to share your thoughts on the program send your email to statsandstories@miamioh.edu or check us out at statsandstories.net, and be sure to listen for future editions of Stats and Stories, where we discuss the statistics behind the stories and the stories behind the statistics.