Statistical Anti-Trafficking Efforts | Stats + Stories Episode 328 / by Stats Stories

Nickolas Freeman, Ph.D., is an Associate Professor of Operations Management in the Department of Information Systems, Statistics, and Management Science. Dr. Freeman is also an Associate Editor for INFORMS Journal on Applied Analytics (IJAA). He is an active member of the Institute for Operations Research and Management Science (INFORMS) and earned their Certified Analytics Professional (CAP) certification in 2014. He is also a member of the Production and Operations Management Society (POMS). Dr. Freeman is an active researcher with interests that include healthcare operations management, supply chain risk management, and applied analytics. He has several publications in journals including Manufacturing & Service Operations Management (MSOM), Production & Operations Management (POMS), Omega, the European Journal of Operational Research (EJOR), and INFORMS Journal on Applied Analytics (formerly Interfaces), and IISE Transactions (formerly IIE Transactions).


Episode Description

More than 27 million individuals are the victims of human trafficking globally that’s according to the US State Department. The 2022 United Nations report on global trafficking suggests that 39 percent of trafficking is associated with sexual exploitation, while also noting that’s likely an underestimate. An initiative at the University of Alabama is working to develop methods for finding evidence of trafficking online, that’s the focus of this episode of Stats and Stories with guest Nickolas Freeman.

+Full Transcript

Rosemary Pennington
More than 27 million individuals are the victims of human trafficking globally. That's according to the US State Department. The 2022 United Nations report on global trafficking suggests that 39% of trafficking is associated with sexual exploitation, while also noting that's likely an underestimate. An initiative at the University of Alabama is working to develop methods for finding evidence of trafficking online. And that's the focus of this episode of Stats and Stories, where we explore the statistics behind the stories and the stories behind the statistics. I'm Rosemary Pennington. Stats and Stories is a production of Miami University's Department of Statistics and Media, Journalism and Film, as well as the American Statistical Association. Joining me is regular panelist, John Bailer, emeritus professor of statistics at Miami University. Our guest today is Nick Freeman. Freeman is an associate professor of Operations Management at the University of Alabama, and serves as an assistant director of the Institute of data and analytics. He's a co-author with several colleagues at Alabama who have an article in Significance on using data to fight sex trafficking. Thank you so much for joining us today.

Nick Freeman
All right. Well, I appreciate it.

Rosemary Pennington
I know you co-lead to sex trafficking analytics for network detection and disruption in Alabama. Could you tell us what that is and how it got started?

Nick Freeman
Yeah, so we got started in 2019. And we really were, so the effort is with my colleagues, Dr. Berger, Keskin and Dr. Greg Bhatt. And we were really kind of interested in, you know, kind of using our skills and data analytics to do something that had, you know, real societal impact. So Dr. Keskin at the time was working on some grants around wildlife trafficking and Dr. Bhatt connected on a shared interest in kind of turning the attention towards human trafficking and particularly sex trafficking. And they kind of, you know, brought me into that loop. So what we were really kind of starting out at the very beginning, one thing that we've tried to be very aware of is making sure that everything we do is not academic, right, so we really wanted to make sure that we were well aligned with with practitioners, and that we were doing research and coming up with solutions that would help people in the field. So we had the opportunity early on to connect with some connections through the Department of Homeland Security. And we also have a local task force here in West Alabama. And they really kind of open our eyes to the way that the internet facilitates sex trafficking in the United States. So that was really kind of how we started. And we've really kind of been focusing on that side in particular, you know, online advertisements that are linked to commercial sex trafficking. So we started there, and that really has kind of, you know, ramped up since 2019. We bring down a lot of advertisements from various online providers, many of which there's a lot of news reports, and a lot of, you know, kind of media coverage on data showing and demonstrating those links to sexually exploited individuals. And this really, you know, kind of explores the data related challenges associated with that data.

John Bailer
So can you talk a little bit about the challenges when you're thinking about trying to quantify sex trafficking?

Nick Freeman
Yeah, so a big challenge again, so going all the way back to Craigslist, in 2010, to Backpage in 2018. So again, the internet, the online commercial sex advertisements have been well documented to be a place where individuals that are trafficking, others will essentially create advertisements and post that, and that facilitates that exploitation of that individual. The major challenges that we see, first off the data is extremely, extremely large, it was shockingly large. So we bring down just as kind of a, give everybody an idea of the numbers that we're talking about, we bring down anywhere between 100,000 or 120,000 ads per day, currently, about 12 sites. And associated with each one of these advertisements, there's text, there's numeric data, you know, with contact information, of course, we can represent that as text. And then there's a lot of images. So you know, we bring down again, 100-120,000 ads per day, usually, each one of those ads can have anywhere between three to 10 images. So bringing that data down from different sites, linking it all up, and then you know, kind of turning that into a product that law enforcement nonprofits can use to kind of inform operations. Another big challenge is all of the data of course, there's noise in any data. But in this data source in particular, there's a lot of people that are posting scam ads. So if you think about it, of course, commercial sex in pretty much everywhere in the United States is illegal. You know, soliciting prostitution or prostitution is illegal. So again, if you think about kind of the, I guess the way that these sites think that they're, you know, they intend to be created as for people who are voluntarily doing this to post advertisements for commercial sex, since that's illegal, what scammers will do is they will create ads for, you know, individuals that aren't real. And they're on the other side of that communication. And when people are trying to purchase sex, usually, there's kind of some back and forth. And before there's any, you know, location information exchanged, they will ask us for some type of an electronic deposit, maybe Cash App or some other means, and get that money from that individual. And of course, the communication ceases at that end. So there's a lot of scam ads. So ultimately, a big part of what we do, we're bringing down this high volume of ads, ad data, trying to link it up so that we can start to understand kind of the spatial temporal patterns that exist in that data, and then really try to figure out, you know, what's the signal and what's noise in that data so that we can hand off our law enforcement, nonprofit partners, something that you know, close to being as clean as it can get a some type of data product that's representing the commercial sex activity in their particular jurisdiction.

Rosemary Pennington
So when you guys are, are working with this data? I mean, obviously, people who are not trafficked, but who work in the sex industry, are you using sites like this to advertise? You know, their work? Are you, is your initiative doing any of that work to kind of identify this is stuff that looks like it could be trafficking? And this is stuff that seems like it is just run of the mill? You know, sex ads? Are you guys doing that kind of work? Are you handing that off to law enforcement and allowing them to do that work?

Nick Freeman
I think, you know, kind of what you've described is kind of like, the holy grail for us that the reality of it is right, we don't, there is no ground truth, right. So we've really been kind of working diligently towards over the past few years is really figuring out a way where we can take all of this data and again, get really good at filtering out that noise. And then we hand off products that deliver value to our partners. And then what we're really trying to work with through our partner network is to get them if they do have interactions in the field, and there are, you know, kind of indicators that, okay, well, this was actually a situation of exploitation, then can we kind of get that back. So we can go back and re-label the data to start to get a better picture about, are there indicators that we can see in the data that would actually, you know, indicate that these are individuals at higher risk for sex trafficking or exploitation. So we're constantly working to get there. And again, that is the ultimate goal. But of course, you know, we're researchers, and, you know, we understand kind of the Data View and how to work with the data, but it's ultimately our partners that are boots on the ground, that are going to be able to get that that true indicator that hopefully, we can go back in and do some relabeling and start to learn because, you know, we've seen, we're not the first people that have worked with, you know, commercial sex ads as a data source. But, you know, at least at this point, nobody's really been able to identify any, you know, crystal clear indicators of, you know, is this individual doing this voluntarily? Or is this individual being exploited, there's some red flags that we might look for, like movement between state lines and movement between jurisdictions. But again, ultimately, we've got to rely on our partner network to help us figure that out.

John Bailer
You know, I'm just just envisioning this, this gigantic pool of sites and ads that you're having to process. So I was hoping that you could talk a little bit about the workflow, you know, so I think you've described at least one part that sounds like one of the early filters might be a filter for scam ads. So if you can take that 100-120,000 ads per day, and you're going to reduce it to some subset by filtering out the scam. So what are some other kinds of steps along that process? And then ultimately, what's the product of that process?

Nick Freeman
Yeah, so I mean, kind of from end to end. So we do a lot of web scraping, right. So we've written custom applications that go out and collect the data from these sites. And we're actively monitoring different sources, because this ecosystem is very fluid, new sites emerge, and some sites, you know, kind of lose their prevalence over time. So we're constantly trying to keep abreast of what are the current sites that people are posting on. And that's even challenging in and of itself, because it varies by region. We do have some sites that are just very prominent nationally, but there are also some regional sites that are very prevalent in certain areas. So we bring that data in. And the way that we represent the data for analytical work is as a graph, right? So if you think about it, if you think about an ad, you can say, well, there's a piece of text and that might be like a post heading. And there's a certain number of images that were posted with that and there's pieces of context or contact information. So if you think of after we can represent, each one of those pieces of information is kind of a node in the network, right? So we came up with a graph representation. But then if we kind of build this graph out, we can start to see where you know certain pieces of data get overlapped, right? And then upon that graph, that's really kind of we can look at structural patterns in that graph to start help us understand, are there certain data points that are being used in a way that would would indicate maybe, you know, a non realistic use, you can look at how it's being posted and advertised in different locations over time, and see things that are just not going to be practical for for real individual. So we do a graphic representation of that data, we've got some initial filtering that can help us with, you know, some of the easy cases of you know, this, this just wouldn't be possible, it's not possible for an individual to be active in three different states, or, you know, maybe five different states every day of the week, right. So that's one of those really kind of simple filters that we can do. And then, you know, we actually have a recently developed kind of a machine learning pipeline to help us further kind of enhance the ability to predict scans. And the way that we did, this was one of the sites that we collect data from, they actually had a profile ad that we can utilize. So we can tell actually the profile that a particular ad was being posted under. So we could actually look at the variety, you know, the variability in the text that those individuals posted the images that they're using, and we can start to kind of think about, since we had technically a labeled data set, we had that profile ID to identify whether two different ads or the data associated with ads came from the same person, we could train a classifier, so that's, that's involved in that current iteration of the pipeline. But there's a lot of other challenges. So you know, the ML part that, you know, is a recent advance, and that's, you know, relatively sophisticated, and it's helped us quite a bit. But you know, there's, you know, some challenges that might seem, you know, a little bit, you know, more mundane, but they they're significant in this space, for example, the way that most of these boards work is if if you're a provider, if you're going to post an ad, you first have to select the location that you're going to post the ad targeting, right? So I'm in, you know, Tuscaloosa, Alabama, Birmingham, Alabama. So there's going to be individual boards that host ads that are targeting those cities. And if you imagine we're collecting data across 15 different sites, they don't name those boards the same way. Right. So a good example of this is like the Dallas Fort Worth area. Well, some sites might have a separate board for Dallas and Fort Worth, some people might have it all lumped together, some people might have, you know, some type of a different designation. So as you're collecting the data from these various sides, standardization becomes a huge issue. Because, you know, again, if people are using different boards, that's going to of course, you know, kind of give you some misleading information regarding the movement patterns, especially over space, if you don't handle that standardization piece. Well. And hopefully, that answers some of that question. There's a lot of it and I feel like I can go on forever.

Rosemary Pennington
So you're listening to Stats and Stories, and we're talking about work at the University of Alabama, to use data to fight sex trafficking with Nick Freeman, you are talking about these partners you work with and nonprofits, and obviously, law enforcement, and I wonder, what kinds of conversation did your team have about how you were going to engage with law enforcement? Cuz, you know, sometimes academics don't want to work with police too closely. And so I wonder, as you guys were figuring out how you were going to handle this, this state or this work, what kind of conversations you had around the ethics of working with law enforcement?

Nick Freeman
Yeah, so really early on, a lot of it was just understanding how the Internet was facilitating this, right. So the early conversations were really us listening to them understanding the sources that they monitor. Because for us to do any type of analytics, we had to get a good grasp on the data. And that was, that was really the place that we split, a lot of initial effort. And then we were actually again, we have an active task force on focusing on human trafficking here in West Alabama. So we were able to actually go out on a few operations. And really what we wanted to see was in, you know, kind of in the middle of these operations, how are they looking at information that's, you know, available to them through other platforms, and using that to kind of inform their decision making, right, so a lot of our early conversations, were just listening, right? Just trying to understand that in the middle of, you know, operations as you're as you're kind of going out into the field and you're trying to do counter trafficking efforts, you know, what signals what sources of information are available to you? How do you use those in decision making? What type of you know, maybe audible, so called in the field and how can we better support those decisions? And then so that's for the, you know, the law enforcement side, we do also work with a lot of nonprofits. So nonprofits, of course, you're gonna operate in that space a lot differently than law enforcement. In particular, the law, the nonprofits that we're working for are those that are focused on outreach. So these are individuals that are in their area, and they reach out to people that you know, are kind of posting on these various sites that might, you know, be exploited individuals, and they're really just trying to form a relationship, and just, you know, maybe provide personal care items. And just let them know that, you know, if things ever get to a point where they feel they have no way out, they know that they have that one person that they can call, and they're, they're also connected to other shelters and whatnot. So really just trying to build that relationship. And a big piece of what we do for them is a lot of times it's nonprofits are going to be volunteer driven, right. So we're going to conduct outreach to these people in our area that might be victims of exploitation, we need volunteers that are going to, you know, essentially reach out via phone through text and try to set up interactions with these individuals. And this could be, you know, they're getting volunteers from maybe a faith based type of organization. And the way that they would typically run these outreach is they're going to bring in these volunteers, and they're going to go to the sites, and they're going to start looking for numbers. So you can imagine that, you know, it's, of course, the content that's on the sides. It's very explicit about the text and the images. So, you know, if you're trying to get volunteers for your organization, saying, Well, this is kind of how the process goes, right, we've got to sit here and look through the sides of a very discomforting information. And then in addition to this, the volunteers, you know, some of the some of the people that volunteer with nonprofits are former victims themselves. So getting them to go through that process, even though again, there's kind of a very good goal behind that. There's a risk of retraumatization. So So one thing that we do for both, we actually, you know, the outputs of our our data pipeline are different for both. So for the nonprofits, right, we can say, Well, hey, you know, within the last 6030 days, this is a list of of contact information that we've seen coming through your particular focus area, right, wherever that is geographically, and they can kind of just go down a list, right, and try to reach out and make contact to these potential individuals. Whereas with law enforcement, of course, it might be a little bit more detailed on the on the movement over space and time, because sometimes they can use that is kind of, you know, some type of an indicator of kind of thinking about what should we, you know, try to reach out to this particular individual. So, again, a lot of it early on, was just listening, right, we tried to understand the needs of both sides, which are very different, and then seeing how we can, you know, take the data that is available on the internet, and then translate that into data products that would kind of rectify some of those gaps we saw with our two partners.

John Bailer
Well, this is a remarkable project. You know, one thing that people may not really appreciate, if they've never seen the idea of a graph representation of data, you know, could you maybe talk through an example of some of the idea of, of water a little bit more about what's in a node and how the graph is, is kind of relationships among nodes and, and edges between nodes as defining it just to maybe help paint a picture of what does a data point look like? That's one of the places where you might start further analysis.

Nick Freeman
Yeah, so um, you know, for thinking about the graph representation, and how we work with that. So again, if we think about an ad, kind of at a very high level, there's three pieces of information that we're bringing down. So we're bringing down text, right, so most of these ads have some type of a heading text. So that's just, you know, a string representation. They have some type of contact information, or they have an image. So if you imagine, we've got to add, there's a piece of post text, right kind of this heading text, and there's a piece of contact information and maybe three images. Well, we can represent since we saw all of those pieces of information posted on the same ad, we can essentially, you know, come up with a representation where we've got the nodes, a node corresponding to each of the three images, the piece of contact information, and that post heading. And then we can establish a link between all of those nodes, because we saw them all together in the same ad. Now, and as you imagine if we're bringing down hundreds of 1000s of these ads every day, so each one of those ads can be represented as is that little small network, and eventually, we're going to, you know, encounter places where pieces of that information overlap, right, so we saw the same contact information on two different ads. So if we do that graph representation, that's going to be a single node, and that's linking them up. The key challenge is images, right? And actually, you know, again, we weren't the first people to start working with sex ed data, but early on some of the earlier research that preceded us, they kind of avoided the images because it's challenging right working with work. With text or phone numbers or whatnot is a lot easier, we can decide, are they equal? But how do you say that an image is equal to one another, especially if they're coming from different sites, because they might be displaying them in different resolutions. So we had to come up with a way because what we found out is that these people that are, you know, being posted or being advertised, the phone numbers change quite frequently, right? So with Voice over Internet Protocol, you can, you can change the phone number in a few minutes, right. And so, and we see that happen. So the same advertised individual might be advertised with different phone numbers over time, that's actually quite common. So we knew we had to go the image route. And what we did is we leveraged a class of techniques called perceptual hashing techniques, where basically, if we've got an image, so you know, in cybersecurity, you know, we hear about hashing techniques, and, you know, like the Sha 256, or the MD five class of hashing algorithms, but perceptual hashing techniques are techniques where, or essentially algorithms where you can provide an image, and what it's going to do is it's going to come up with a lower dimensional representation of that image, which captures the perceptual information, and allows us to compare for equality, images collected across different sides, even if there's slight modifications to the image, like difference in resolution, slight differences in cropping or whatnot. So that's been kind of a really a core, you know, algorithmic component of our ability to link that up, because we need a way to kind of link those images, especially when they're coming from different sites.

John Bailer
Now, this is, this is amazing. I mean, it almost makes me wish I could take a time machine back and be starting to do this stuff, again, in an early part of my career. I mean, this is really, this is fascinating. As you think back on what you're doing here, I was listening to describe some of the challenges and one challenge, when you're thinking about some of these advertisements going across state lines, I started thinking about, what about jurisdiction? I mean, you know, when you think about trying to say, there's something illegal going on here? I mean, well beyond ethical, but also illegal. So as you as you do that, how do you how do you kind of have this, this broader coordination, I love the idea that you've embedded yourself with some of these explorations early to just try to understand the magnitude of some of the issues in this, this these types of data, and in the needs of the communities that are trying to investigate them. But how does this then expand when you're looking at cutting across jurisdictions and cutting across state lines?

Nick Freeman
Yeah, so one thing that we've really been pursuing lately, so early on, we were very much working with, you know, law enforcement in particular area, and we've worked with, with law enforcement, nonprofits, and you know, across the United States, but you're right, there's a lot of differences between jurisdictions, right, definitely between how states kind of, you know, proceed with counter trafficking, but we've even seen it within a state right, if you if you go outside to a different county, the way that they're, they're kind of, you know, handling things is very different. And the key issue is right on, like we see a lot here in West Alabama, there's a lot of people that might be moving in between Tennessee or Mississippi, really what's needed, if we're going to be successful in kind of, you know, combating human trafficking is you've got to kind of think about efforts that extend beyond those jurisdictional borders. So what we've been pursuing lately is kind of partnerships with individuals or nonprofits in this case that have already established partnerships at the state levels. So in particular, there's a group that we've recently been working with called allies against slavery. So they're essentially a Texas based organization, but they have partnerships with the state of Texas, Louisiana and Florida. So, you know, they've kind of already started to establish some of these, these bigger partnerships. And what we're doing is we're essentially, because our, you know, with our skill set, our expertise is largely on the analytical side of this and working with this data, and just from a resource perspective, right, we don't have a tremendously large team. So what we're really trying to do is identify people that are, you know, already kind of engaged with the states on the various jurisdictions of various organizations throughout the country, and really seeing, you know, ways that we can kind of become like, you know, a data provider for them because we feel like they're, they're well equipped, because of, you know, kind of how they're situated to do that. So really kind of thinking about how we can kind of facilitate their work and then hopefully, you know, through the partnerships that they have in the various states, be able to, you know, figure out some pathways for us to continue to sustain what we do, right because it is a time and resource incentive type of jobs. But ultimately, again, another big goal of ours in addition to kind of closing that feedback loop is really we want to kind of demonstrate, you know what, because you know when I look at when I look at Kyle you know, the Human Trafficking and counter trafficking efforts are kind of conducted. Now I see a lot of local effort, right. And that's fine. I mean, I understand. So again, you go to a particular county, like in West Alabama, we have, we have this task force, and it's actually three police agencies that are working together. But that looks very different if you go to other parts of Alabama. So it's kind of inconsistent. But one thing that we really hope to do, and this is more on the research angle, and I don't believe, you know, I haven't seen this done, especially for the current online ad ecosystem is just to figure out what are the connections geographically within the United States. And if we're thinking at a national level, and we're looking to invest in certain, you know, efforts to reduce tracking, it's like, what agencies should be working together, right? Should it be Tennessee, Mississippi, and Alabama, should Alabama be more tightly coupled with Georgia, and really kind of understanding from across all of the ad sites, how people are moving around is going to help you inform those decisions. So that's, of course, a big goal of ours as well to kind of bring up that national attention that it's not a local problem. We've all got to work together, if we're going to, if we're going to solve this.

Rosemary Pennington
Well, that's all the time we have for this episode of Stats and Stories. Nick, thank you so much for joining us today.

John Bailer
Yeah. Thanks, Nick.

Nick Freeman
I appreciate it.

Rosemary Pennington
Stats and Stories is a partnership between Miami University’s Departments of Statistics, and Media, Journalism and Film, and the American Statistical Association. You can follow us on Twitter, Apple podcasts, or other places you can find podcasts. If you’d like to share your thoughts on the program send your email to statsandstories@miamioh.edu or check us out at statsandstories.net, and be sure to listen for future editions of Stats and Stories, where we discuss the statistics behind the stories and the stories behind the statistics.