From the Royal Statistical Conference | A Stats + Stories Special Episode / by Stats Stories

belfast.png

This episode features a number of interviews from the recent Royal Statistical Society International Conference from last month. Today's guests include, Iain Flint of G’s Growers talking about the IceCAM project, which helps to minimise food waste by adapting the growing programmes of iceberg lettuces according to weather predictions. We also have James Tucker, head of the Quality Centre and Methodology Advisory Service at the Office for National Statistics talking about respondent confidentiality, and data privacy and protection. As well as, Kevin Johanson from the Expert Group on Sámi Statistics based in Norway, on how the group is working on developing statistics on the Sámi people and how these statistics can lead to better policymaking.

+ Full Transcript

Rosemary Pennington: Welcome to this special episode of Stats and Stories. This is Rosemary Pennington and today’s as well as next Thursday’s episodes of Stats and Stories are brought to you in partnership with Significance magazine and the Royal Statistical Society. Each episode will feature a series of interviews with a number of great guests from the recent RSS International Conference, in Belfast Ireland. To start Sara McDonnell speaking with engineer Iain Flint about minimizing food waste.

Sarah McDonnell: Hi, this is Sarah McDonnell from the Royal Statistical Society, and I’m at the RSS 2019 conference with Iain Flint from G’s Growers to talk about one of the sessions we’re holding today for the business industry and finance industry in the conference, minimizing food waste by adapting growing programs to the weather. So, hi Iain.

Iain Flint: Hi Sarah.

McDonnell: Thanks for coming to talk to us.

Flint: Oh, it’s a pleasure.

McDonnell: Could you give some background to G’s Growers and what they do and what your role is there?

Flint: Sure, so, G’s Growers are part of G’s Fresh. As a group we are one of the biggest suppliers of fresh produce, so salad crops and other vegetables in Europe. Specifically, I am part of the innovation team within G’s Growers who is the cooperative of growers in that group. And our innovation team we specifically look at two areas; we look at precision farming, so that’s trying to farm intelligently using data to farm, as well as sustainability to try to minimize the impact that we have while growing.

McDonnell: Great. And you talked specifically about the ice cam project, which is a crop growth model, could you give a brief explanation of what it is and what it was created for?

Flint: Sure, so the iScan project stands for the iceberg, meaning iceberg lettuce, our main crop. Iceberg crop adaptive model. And effectively what we’re trying to do is understand how the crop responds to the weather. So, we operate on two core assumptions: one is that there’s an optimum temperature for growth and there’s a saturation level of lights, meaning a level of light above which the crop is not going to grow any faster, and this is based on nearly 10 years of research in photosynthesis, so, it’s a well-understood response. Based on that assumption, and another one where we assume that it takes a set number of days for the crop to come to maturity under optimum conditions, we then build a model based on observations of planting dates and harvest dates for when we think a crop will be ready to harvest based on the weather. And this allows us to adapt our growing programs that we have built to others. When we see it’s going to be warner or colder, we can adapt what we’re doing to avoid wasting crop by growing things that we’re not going to be able to sell, or not being able to grow enough to meet demand.

McDonnell: And is crop growth modeling commonly used in agriculture?

Flint: So, it is used in agriculture, it has been for a while in what’s called arable crops. So that’s your cereal crops like wheat, barley, soybeans, potatoes and that’s primarily for food security reasons. So, in a lot of developing countries the U.N. has tried to push crop modeling to help farmers understand what’s going to happen to their crops to respond to the changing climate, or just to seasonal variability that’s normal, to help prevent food shortages going on. We’re in what’s called horticulture, so that’s vegetable growing, and that’s – it’s really not very common at all. I haven’t’ actually come across any other businesses that do the kind of work that we’re doing. And for us it’s an interest primarily because we can’t store the produce that we produce. We have to sell it the week that we harvest it, or it will rot. So, we really need to be asking the correct sowing and growing decisions every week. And that’s why we have been motivated to making this project.

McDonnell: And what sort of things have helped to improve crop growth over time? Is it fine tuning the model as to better predictions? Or are there other factors involved?

Flint: So, it’s a combination of the two really. We are fine tuning the model every year. Every year we get more information, we respond – we see a different weather pattern in each region, which gives us a different perspective on the data, and that has been slowly improving the accuracy. But, of course better weather forecasts is going to help us out. We’re talking about making sowing decisions when we actually put the crop into the greenhouse 10, 12, 14, 16 weeks before we harvest it and know where the forecasts go out that far, so we really have imperfect knowledge about what is going to happen, even if our model perfectly responds to the weather information we have perfectly. So, we’re constantly on the search for better weather forecasts, and – in the conference today I was actually going to help me understand what his business does in terms of seasonal weather forecasting. So, we’re constantly on the hunt for especially better weather data.

McDonnell: And you’ve actually been doing some of your own weather, you mentioned in your session.

Flint: Yeah, so we actually have 15 weather stations that we manage across the U.K. So in East Anglia, Sussex and Norfolk we have 15 weather stations that every hour are recording the temperature, sunlight, precipitation, these kind of variables, humidity, so that we can actually calibrate the weather forecasts that we’re getting in to understand how it’s affecting the weather observations that we’re making.

McDonnell: So, as well as weather there are other factors involved in your modeling that you mentioned in your session today – things like sales expectations, other things that happen during the process, can you just talk us through a few of those?

Flint: Sure yeah, so we have a sales program that we get from our customers so Tesco, Sainsbury- we have an ongoing relationship with them about an expectation for what they’re going to need every week. But those sales demands will change with time. Things may happen in the market that won’t cause them to realize they need to change how much they are selling in a given week and so we need to try to respond to that, or at least be ready for the fact that that could change. So, we have uncertainty from the weather side but actually from the goal side, from actually what we are trying to achieve it’s a bit of a moving target in terms of sales. We need to be ready to respond to that.

McDonnell: Yeah, and you also mentioned something in your session about the Agrieye project, would you like to say just a few things about that?

Flint: Sure, yeah so Agrieye is again, like iScan, we’ve given a name to our project to help sell it in the business. It’s always helpful to do that and sell it outside. So Agrieye is our remote sensing project. So, what we’re doing there is we’re actually using light aircraft in the U.K., as well as drones in the U.K. and Spain where we grown crop in the wintertime. We’re using these two platforms to take inventory at three-centimeter resolution. so, each pixel was 3x3 centimeters, that is fine resolution enough that we can distinguish between plants in the field. So we take imagery early in the crops life- before the leaves are overlapping one another, meaning we can actually distinguish between them, count them in the field through the imagery and actually size them to see where the small ones are, where the large ones are, and gain a huge amount of information about what’s going on in the field that the farmers would never be able to see at the ground while walking through the crop.

McDonnell: And this will help farmers sort of optimize the amount that they can grown and the size they can grow their crops?

Flint: Yeah so, the goal really is to try to treat every crop individually. For the last hundred years we’ve been able to grow at larger and larger scales; that’s the innovation in agriculture. That means bigger tractors, bigger machines, everything, bigger irrigation booms, but it doesn’t mean we can ever actually do anything different within that. We’re just doing the same thing on a very large scale. And what information technology now means is that we’re able to start going back to the original way of farming where intuition of the farmer allows them to treat each crop individually. We want to be able to do that at the hundred hectare scale, where we can actually apply a variable rate of nitrogen or other fertilizers or pesticides to each crop, so we’re using just as much as is required, and also improving the uniformity of the crop and therefore the overall yield. It minimizes costs for us, minimizes environmental damage. Overall, it’s a win-win. But it’s a challenge to achieve that.

McDonnell: Sure, and I liked the phrase you used, data driven approach to growing, and that that’s the approach that you’re using within your team in farming.

Flint: Yeah absolutely.

McDonnell: So, can this model be adapted for crops other than leafy vegetables? And if so, how might it differ? You mentioned cereals earlier on.

Flint: Yeah so, it’s based on those two simple assumptions about how photosynthesis responds to temperature and light, and that under optimum conditions there will be a set number of days for it to grow. So, I think those assumptions may have been maybe more or less relevant for different crop types, but we’ve extended this model beyond just iceberg lettuce to, obviously, other types of lettuce, since they’re quite similar crops. We’ve extended it out to radishes, to baby leaf crops- which is your spinach and things like that- and similar approaches worked in all those sectors. We don’t grow a lot of cereal grains in our business, it’s sort of a side business. Actually, it’s not associated with G’s Growers, but there’s no reason to think why it wouldn’t work if you wanted to have that type of approach. The value in doing what we’re doing is based on the fact that we have to deliver every week what we harvest so it’s fresh produce, you don’t really have that problem in cereal grains where you can store things. So, if you have too much this week, not enough the next, that’s not such a problem when you can accumulate supply. For us that’s a huge problem and that’s what’s driving our model development.

McDonnell: Well it sounds like an absolutely fascinating project so thank you ever so much for coming in to talk to us.

Flint: Thank you.

McDonnell: This is Sarah McDonnell from the Royal Statistical Society 2019 conference. Thank you very much.

Thank you, Sarah. And next we have editor of Significance Magazine Brian Tarran interviewing the head of the Quality Centre and Methodology Advisory Service at the Office for National Statistics James Tucker about threats to data privacy.

Brian Tarran: Hi this is Brian Tarran with Significance Magazine and I’m at RSS conference in Belfast, and I’m talking now with James Tucker of the INS, Hi James.

James Tucker: Hi Brian. How are you doing?

Tarran: I’m good thank you. You’ve just come from a talk about privacy methods so we’re going to have a little chat about that, and you can explain to listeners what it is you are talking about. But first do you want to give the listeners a little background as to who you are and what you do at INS?

Tucker: Yeah, sure. So I’ve been at the INS now for about eight years, and I’ve worked with the government departments before and I’ve also been a long standing friend of the RSS too, so I did a stint on the RSS council and now I’m involved in the editorial board for Significance Magazine. My role at the INS is improving the quality of statistics, so I work across the whole of the government stat service, not just INS to improve the quality of everybody’s outputs.

Tarran: So, where does your interest in privacy methods come from?

Tucker: In my area we have this program of reviews called National Statisticians Quality reviews, so in the past they’ve been quite narrowly focused, a bit of a deep dive into very specific statistics, but we realized there wasn’t a way of looking at the really big issues affecting the statistical system at large. So, we’ve completely revamped these reviews. Our first one was on privacy and confidentiality, which is actually quite a tough one to start out with, but definitely a worthwhile thing to look at

Tarran: So when we’re talking about privacy in this context, in the INS context, is it everything from making sure when people take part in surveys, that the data is protected all the way up to when data is released, that it can’t be- that the individual’s information can’t be identified based on the information that’s released?

Tucker: Yeah, that’s exactly right. So, our main focus really is on the data that we put out there. So what we found is that while there’s a huge amount of new data sets available- and this is really exciting for people working on data, it opens up all sorts of new opportunities to innovate with data and essentially people are like a kid in the candy store with it- but on the flip side it does also open up more ways for malicious people to use that data for their own ends. So, we have to keep pace with all of this and make sure that the methods we use are fit for purpose still.

Tarran: I guess privacy, or respondent protection, identity protection, and that kind of stuff- in the old days might have just been taking the names off of a record or making sure that particular identifiers weren’t included, but now it’s much more difficult than that, right? Because people can put together different data sets to figure out who people are based on things that- you might think one data set might fit sufficiently, but if they can pair up unique characteristics across data sets then there’s a possibility of being able to identify someone.

Tucker: Definitely with a proliferation of different data sources people can- if they want to- use this to look across these and reconstruct an identifier and individual from those. And another emerging issue is the use of social media. so people put a lot of information on social media, and although there are privacy settings, most of the platforms have to go from the premise that everything you put on social media is public, because as soon as you share something, regardless of your privacy settings, somebody else can just share that and send it out into the public domain. With these supplementary pieces of information, it adds another dimension to the complexity of this.

Tarran: So, what sort of privacy methods are you looking at in particular? I guess there’s a range, so maybe you want to talk listeners through a couple of them maybe.

Tucker: Yes, sure. So, there’s a lot more demand for custom tables. So, for example, with the 2011 census, the tables were published, and all laid out as static tables, but there’s a lot more demand now for table builders and things that people can produce their own sets of variables. But then that throws up the issue of how do you protect tables that you don’t know are being produced? So, there’s techniques you can do to add another layer of protection on those. An area that I’m particularly interested in- there’s a lot of research going on in the INS and elsewhere, is on synthetic data, which isn’t actually a new concept. The idea of using an artificial data set that has the same statistical properties of the real thing. When I say it’s not new, it really combats the floor since we’ve had the power to produce much larger data sets than we could do previously. And in a sense, it offers the ability to circumvent the privacy protection thing by producing artificial data set that could take the place of the real thing.

Tarran: These aren’t just made up numbers, are they? Is it a case of constructing data from real people, but in a way that you’re swapping data and characteristics and aspects of a real data set to create something new?

Tucker: Yeah, the idea is to understand key statistical properties of that data set and produce it in a way that doesn’t reveal the characteristics of individuals. I mean, it is still in a growth area, so there’s a lot of research going on at the moment. A while back we hosted an event at the LNS, and we were really taken aback by the interest in it. And we had people from 30 or so organizations from across the country coming to attend. So, it’s far from the niche area, but I think there’s still some important questions to be answered about it. So, on the one hand it does have this potential, but then there’s also a question of how accurately does it simulate the real thing? And if you get closer in terms of accuracy, do you then end up introducing privacy disclosure risk into the data, even though it’s artificial?

Tarran: So, I certainly understand there is a lot of research to be done. This isn’t the kind of thing where you can say oh, we think this works, so let’s just try it and see what happens. So how are you testing these things to make sure that they’re doing what you want them to do before they become part of the way of releasing information?

Tucker: Yeah, that’s an interesting question. It also kind of – that sort of thinking impacts all the privacy methods that we are looking at. So, yes, you are right, you can’t just introduce something and hope it works. So, we do these sort of pilot studies. For example, we have done a pilot studies on creating a synthetic version of labor force survey data, which is one of our major data sets that we collect. Also, an area that we’re looking to expand is on intruder testing. That’s where you really have to sort of get into the mindset of somebody who wants to crack these data sources, and that’s actually harder than it sounds because if you bring in people who are kind of friendly sort of intruders trying to sort of get something out of these data sets, they might not have the level of determination and deviousness, perhaps, that the real criminal would have.

Tarran: That’s kind of the analogous to when companies do penetration systems right? You want to simulate an attack, and does it stand up to that attack? But I guess as you say, you can’t throw it out there to the real bad guys because they might show cleverer ways or more devious ways of going about it, and then you’re at risk of identifying people.

Tucker: Exactly, yes. There’s a sort of fine line to walk there, but over time we have built up a set of realistic intruder scenarios to really understand how this would happen. But I think the important thing is that the area as a whole isn’t one that stands still, so you can’t just introduce an approach to privacy protection and then leave it. It has to stand the test; it has to keep evolving with the times. And essentially you can end up in a sort of arms race break the protection of these things.

Tarran: So, you’ve basically embarked on a research project that will never end?

Tucker: Yeah, I mean obviously protection in data is one of the most important things. I mean it all boils down to the tension between the use of a data set. So, on one extreme if you didn’t publish anything then you could relax about the privacy side, but then there would be no statistics to inform job policy, you know jobs for the likes of us, and the other extreme is you can throw all the data out there. So, it’s about finding that sweet spot where you have a balanced approach which reflects the risk associated with the data. So, for example some data is more sensitive than others, and you’d be a lot more stringent with that than perhaps others.

Tarran: Well, I don’t envy you for having to do that, but good luck with it. Thank you for taking the time to talk to us today.

Tucker: That’s great, thanks Brian.

Thank you, Brian. Lastly, we have Mags Wiley from the Royal Statistical society interviewing Kevin Johanson from the Expert Group on Sami Statistics in Norway about using statistical data to improve welfare policymaking.

Mags Wiley: Hi I’m Mags Wiley and I’m reporting from the RSS annual conference, which this year comes from Belfast. I’m with Kevin Johansen, who is from the Expert Group for Sami Statistics based in Norway, who is delivering a session on statistics to ensure welfare for ethnic minorities. Welcome Kevin.

Kevin Johansen: Thanks.

Wiley: For those not in the know can you tell us who the Sami people are, where they’re based?

Johansen: It’s indigenous people based in Norway, Sweden, Finland and Russia, and we think that there might be something like under 100,000 Sami people altogether, but the numbers are not totally sure, we’re not totally sure about that yet. There were also originally 10 different Sami languages so there’s not only one of them. Mostly because of assimilation but also because of other reasons, many of those languages are now extinct. In Norway for example there are three different Sami languages that are spoken daily. The biggest one has between 20,000 and 35,000 speakers and the smallest one has only 600 people speaking it. So that is the situation.

Wiley: So, tell us a bit about your talk.

Johansen: We’re going to focus on how it developed – the Sami statistics because that has been relatively quickly when we started with it 20 years ago, we had to start totally from scratch, we hardly had any Sami statistics available. We had to start making it; we had to make phone calls to find out for example, how many Sami students you have study Sami language? How many people speak Sami language? We didn’t have any of those numbers at that time, which made it very difficult for politicians to develop policy on how to strengthen and vitalize Sami languages, for example. It was the same case with health issues; how many Sami patients did you have? And did they have any other needs? For example, a minority population in Norway and so on. So, where we are now, I kept on this product for 12 years, and we publish annually a book on Sami statistics and Sami topics and that has been the Bible for politicians to deal with the numbers of the Sami population.

Wiley: And how did you go about putting together the statistics on the Sami people?

Johansen: That is a very good question because it was a very hard job. When we started, it was more or less from scratch. We had to find out everything by ourselves. We had to make phone calls to everyone in Norway we spoke with hospitals, we spoke with schools, we spoke with everybody who had in one way or another was in touch with Sami people and in contact with Sami people , and so that way we started to gather information and numbers. In the beginning it was maybe not the most accurate because we didn’t know if we reached everyone, but now after 12 years we are quite sure that the numbers we present now are pretty correct.

Wiley: And did you encounter any challenges along the way?

Johansen: There are many, if I should mention the most important one is feelings because Sami people have been under discrimination and assimilation for many years. Because of that, they don’t have much faith in the government or in research and so on. So not all of them want to give out that information. They didn’t want to admit if they were Sami or not. I can understand that because two generations ago, if you happen to be registered as a Sami, that would only have a negative impact on you. So the older Sami people, we’ll have to work hard to convince them that this was actually for the good of the Sami people. The younger people are much easier, they don’t see it- they don’t have the same experience, so for them it’s much easier to work with them to develop some statistics now than it was when we started.

Wiley: And what did you learn about the Sami population?

Johansen: A lot of things. There are – when the minority creates statistics on indigenous people, it’s often very focused on the differences and what the minorities lacked, when compared. That would mean that they often point out that they have less education and lower income and so on. We also found on other things that may develop in minorities. Statisticians that represent the minorities not always think about that. For example, a families of us within society can be a big benefit for Sami people and also difficult times in life and so on, so it’s a complicated question to answer, but we found out some good things and some things we have to work on. This month we are publishing an article on the sexual abuse in Sami societies, and when we see the situation there, we also see the Sami society we have enough issues ourselves to work on to make better.

Wiley: And that leads me to my next question, how do you think these stats will help policy makers in the future?

Johansen: I think it will help a lot in the response we get from both politicians and the Sami parliament, but definitely from politicians in the Norwegian parliament. This is what they use when they develop politics on the area, so they finally get knowledge on how the city actually is, that they didn’t have before. Like when I used to work for the Sami parliament myself back in the days, very often unfortunately we had to make guesses on how the situation was. Sometimes those guesses were relatively correct, and sometimes not correct. Now it’s much easier to make policy that actually works for the benefit of the Sami people.

Wiley: And what are the next steps for the project?

Johansen: We are going to continue publishing statistics on the issues that are important for the Sami people and also for the government, and for the minority, for researchers and so on. The main focus will probably be on education and on health. We see those two topics are very important. But we also focus more now on publishing statistics in Sami languages, to use this to strengthen and vitalize the Sami languages. So, there’s always a question on content but also on which language we publish it in.

Wiley: And how do you think these new statistics will help tackle prejudice in those communities?

Johansen: I hope, and I think that it’s going to help the situation quite a bit. We recently published an article on discrimination against Sami people in Norway, and the numbers are very clear. It’s a 9x higher chance that you’ll be discriminated against in Norway if you are a Sami, and if you belong to the minority. So very very many Sami people, at least once in their life in many cases, at least once this last year been discriminated against for several reasons. We also see there’s very high numbers, unfortunately, in domestic violence and numbers of violence against Sami women is extremely high. So, these are things we are focusing on a lot right now. We almost force the politicians to make a stand and deal with these issues and solve the problem.

Wiley: Thank you very much Kevin.

Johansen: Yes, thank you.

And thank you Mags. Well that is for the special episode of Stats and Stories in partnership with Signifciance Magazine and the Royal Statistical Society. Stats and Stories is a partnership between Miami University’s Departments of Statistics, and Media, Journalism and Film, and the American Statistical Association. You can follow us on Twitter, Apple podcasts, or other places where you can find podcasts. If you’d like to share your thoughts on our program send your comments to statsandstories@miamioh.edu or check us out at statsandstories.net. Be sure to listen for future episodes of Stats and Stories, where we discuss the statistics behind the stories, and the stories behind the statistics.