Skip to main content

Human Trafficking dataset creation & analysis

Introduction

The goal of this project is to create a Human Trafficking dataset from reliable sources such as news articles, Government agencies, etc and analyse the pain points in this area.

Motivation 

What is human trafficking?

Human trafficking involves recruitment, harbouring or transporting people into a situation of exploitation through the use of violence, deception or coercion and forced to work against their will.

In other words, trafficking is a process of enslaving people, coercing them into a situation with no way out, and exploiting them.

What is it important? 

Did you know that in 2015 alone, Human Trafficking generated $150 billion, more revenue  than Google, Nike, The NFL and Starbucks combined?!?! 
Sounds crazy right? Well there is more to this story than you know, that's why 18th of October is the EU Anti-Trafficking Day.According to a September 2017 report from the International Labor Organization (ILO) and Walk Free Foundation: 
  • An estimated 24.9 million victims are trapped in modern-day slavery.
  • Of these, 16 million (64%) were exploited for labor, 4.8 million (19%) were sexually exploited, and 4.1 million (17%) were exploited in state-imposed forced labor.
  • 71% of trafficking victims around the world are women and girls and 29% are men and boys.
  • 15.4 million victims (75%) are aged 18 or older, with the number of children under the age of 18 estimated at 5.5 million (25%).

Why we choose it as a project?

Initially when we started out to look for dataset on the internet, we were shocked to see although it being such a large scale issue, there was almost no public data available. Most of what was available was proprietary and couldn’t be accessed by students like us.
So, that’s when we decided we will initiate a step towards the creation of a public dataset which when contributed by more people could become a much bigger dataset for others students and people who want to explore this area.

Work done

Data Collection

  1. Manually went through different news articles, and collected relevant information.
  2. Used https://www.alexa.com/siteinfo to verify the reliability of the resources - articles with country ranking <= 100 

Dataset Statistics

  1. Has total of 302 authentic rows collected from over 2000 News articles.
  2. Data has been collected from year 2014 - 2019
  3. There are 10 columns which include the following: 
    • URL of the article/resource
    • Origin of the crime 
    • Destination of the crime
    • No. of people involved 
    • Ages of the people involved 
    • No. of male victims 
    • No of female victims 
    • Category of the crime 
    • Date of the crime
    • City of the crime
 
CTDC Dataset
We have also used the dataset available on https://www.ctdatacollaborative.org/ for our analysis. The Counter Trafficking Data Collaborative (CTDC) is the first global data hub on human trafficking, with data contributed by organizations from around the world.
It has data from 2004-2018. Although it has ~90k cases of data, most of the data was not fit for analysis and hence we had to do extensive cleaning in order to use it for our analysis.

Analysis of our dataset

Victim count at gender level

Insights

  1. Females are highest victims in every country, with USA and Malaysia topping the list. This happens as Sex Trafficking and Prostitution are the major categories in human trafficking
 

Victim count at source & destination level(where from & where to trafficked)

Insights

  1. USA seems to hotspot for trafficking, both as source and destination as all collected data shows the cases are US internal only
  2. There was one sting operation in italy where about ~1200 victims were saved
  3. There seem to be high no. of people being trafficked from India to other countries
  4. Countries like Thailand, Indonesia are hotspots for destination than sourced from. This is because of them being a top choice for holidays, there is a bigger market for prostitution in these countries.
 

Crime category distribution

Insights

  1. As expected, Sex Trafficking, Prostitution & Labour seem be the major categories
  2. The high contribution of labour is also the reason for the high victims in ages 30-40. Young people are lured in the name of employment

Yearly Victim count distribution

Insights:

  1. It is very clear that the no. of crimes have been increasing yearly, which shows that technology still needs to penetrate into this field

Analysis of the CTDC dataset

Victim count based on age group(2004-2018)

Insights

  1. The age group of the victims is mostly unknown while the cases were reported. However, the analysis also reveals that the age group between 9-17 years is the most targeted indicating children are more susceptible for trafficking.

  2. People aged 30-38 also have high victim count. This is mostly because of bonded labour.

Yearly victim count based on age group

   

Insights:

  1. Women (age >=18 years) are the most frequent victim of human trafficking and this is closely followed by girls (age <18  years).
  2. A close look at the plot also reveals that men are also trafficked just as frequently as women, the reason for this is analyzed in the following part.
  3. Also, the frequency curve shows the rate of the crime has been ever increasing over the years. This also shows severity and the scale at which human trafficking is done.
 

Reason for human trafficking

Insights

  1. The most common category of human trafficking is Sexual Exploit and Labour.
  2. This together constitutes 46.9% of the cases (22.7% sex and labour, 16.7% forced labour and 7.5% sexual exploit).
  3. This analysis also gives us more insights into why Women and Men are trafficked almost in similar number over all the years. The reason is that women are trafficked for sexual exploits and men are trafficked for bonded and other type of labour.
 

Relationship of victim with accused


Insights

  1. Here we look at the relationship of the victim with the accused to find out the reason of how the crime of human trafficking originates and is perpetrated.
  2. From the analysis we can see that 59.4 % of the victims were closely related to the accused - 19.8 of them were friend, 19.8% were members of the family while other 19.8% were an intimate partner.
  3. Only for 18.4 % of the cases the accused were not known to the victim.
  4. The rest of the 22.3 % of the cases could not record the relationship between victim and accused.
Codes and dataset available at https://github.com/starry91/human-trafficking
Please reach out at praveeniitkgp1994@gmail.com for any queries.

Comments

Popular posts from this blog

BSafe

Problem Statement The course Big Data and Policing  has given us a detailed account about the prominence of Data and how it can influence Policing and general safety.  We as students had the chance to attend talks from policemen to lawyers who discussed their role in collecting and analysing data of any form to conduct policing in a smarter way. Our focus was to try and develop something that can tackle the issue of safety and provide a service that helps in general policing. We decided to come up with an application that could aid the process. Preliminary Idea  We started off with the idea to develop a web and mobile application primarily intended for women safety. We wanted to collect data about narrow streets and roads and understand how unsafe it would be for women mainly as pedestrians. The application allows the users to mark a particular spot on the street which they deem as unsafe. It also allows them to enter a short description about the area and

InstaBully

Introduction Cyber bullying has become prevalent in today's social media driven world. Awareness about it however, is not very widespread. Given that there is usually no escape for cyber bullying victims from their bullies, it is even more devastating than traditional bullying. Sometimes it is also hard to distinguish between simple negative interactions and cyber-bullying. Keeping this in mind we wanted to create a program that would help detect cyber bullying on Instagram accounts given only a username. Relevance In India, nearly 40% of people have never heard of cyber-bullying. Furthermore a majority of people think that current cyber-bullying measures are insufficient. 45% of parents say that their children have been cyber-bullied. Out of all the various ways in which people can be bullied online social media is the most common and also the most personal.  Although the nature of the bullying changes from platform to platform the effect does not change. we picked