Skip to main content

Human Trafficking dataset creation & analysis

Introduction

The goal of this project is to create a Human Trafficking dataset from reliable sources such as news articles, Government agencies, etc and analyse the pain points in this area.

Motivation 

What is human trafficking?

Human trafficking involves recruitment, harbouring or transporting people into a situation of exploitation through the use of violence, deception or coercion and forced to work against their will.

In other words, trafficking is a process of enslaving people, coercing them into a situation with no way out, and exploiting them.

What is it important? 

Did you know that in 2015 alone, Human Trafficking generated $150 billion, more revenue  than Google, Nike, The NFL and Starbucks combined?!?! 
Sounds crazy right? Well there is more to this story than you know, that's why 18th of October is the EU Anti-Trafficking Day.According to a September 2017 report from the International Labor Organization (ILO) and Walk Free Foundation: 
  • An estimated 24.9 million victims are trapped in modern-day slavery.
  • Of these, 16 million (64%) were exploited for labor, 4.8 million (19%) were sexually exploited, and 4.1 million (17%) were exploited in state-imposed forced labor.
  • 71% of trafficking victims around the world are women and girls and 29% are men and boys.
  • 15.4 million victims (75%) are aged 18 or older, with the number of children under the age of 18 estimated at 5.5 million (25%).

Why we choose it as a project?

Initially when we started out to look for dataset on the internet, we were shocked to see although it being such a large scale issue, there was almost no public data available. Most of what was available was proprietary and couldn’t be accessed by students like us.
So, that’s when we decided we will initiate a step towards the creation of a public dataset which when contributed by more people could become a much bigger dataset for others students and people who want to explore this area.

Work done

Data Collection

  1. Manually went through different news articles, and collected relevant information.
  2. Used https://www.alexa.com/siteinfo to verify the reliability of the resources - articles with country ranking <= 100 

Dataset Statistics

  1. Has total of 302 authentic rows collected from over 2000 News articles.
  2. Data has been collected from year 2014 - 2019
  3. There are 10 columns which include the following: 
    • URL of the article/resource
    • Origin of the crime 
    • Destination of the crime
    • No. of people involved 
    • Ages of the people involved 
    • No. of male victims 
    • No of female victims 
    • Category of the crime 
    • Date of the crime
    • City of the crime
 
CTDC Dataset
We have also used the dataset available on https://www.ctdatacollaborative.org/ for our analysis. The Counter Trafficking Data Collaborative (CTDC) is the first global data hub on human trafficking, with data contributed by organizations from around the world.
It has data from 2004-2018. Although it has ~90k cases of data, most of the data was not fit for analysis and hence we had to do extensive cleaning in order to use it for our analysis.

Analysis of our dataset

Victim count at gender level

Insights

  1. Females are highest victims in every country, with USA and Malaysia topping the list. This happens as Sex Trafficking and Prostitution are the major categories in human trafficking
 

Victim count at source & destination level(where from & where to trafficked)

Insights

  1. USA seems to hotspot for trafficking, both as source and destination as all collected data shows the cases are US internal only
  2. There was one sting operation in italy where about ~1200 victims were saved
  3. There seem to be high no. of people being trafficked from India to other countries
  4. Countries like Thailand, Indonesia are hotspots for destination than sourced from. This is because of them being a top choice for holidays, there is a bigger market for prostitution in these countries.
 

Crime category distribution

Insights

  1. As expected, Sex Trafficking, Prostitution & Labour seem be the major categories
  2. The high contribution of labour is also the reason for the high victims in ages 30-40. Young people are lured in the name of employment

Yearly Victim count distribution

Insights:

  1. It is very clear that the no. of crimes have been increasing yearly, which shows that technology still needs to penetrate into this field

Analysis of the CTDC dataset

Victim count based on age group(2004-2018)

Insights

  1. The age group of the victims is mostly unknown while the cases were reported. However, the analysis also reveals that the age group between 9-17 years is the most targeted indicating children are more susceptible for trafficking.

  2. People aged 30-38 also have high victim count. This is mostly because of bonded labour.

Yearly victim count based on age group

   

Insights:

  1. Women (age >=18 years) are the most frequent victim of human trafficking and this is closely followed by girls (age <18  years).
  2. A close look at the plot also reveals that men are also trafficked just as frequently as women, the reason for this is analyzed in the following part.
  3. Also, the frequency curve shows the rate of the crime has been ever increasing over the years. This also shows severity and the scale at which human trafficking is done.
 

Reason for human trafficking

Insights

  1. The most common category of human trafficking is Sexual Exploit and Labour.
  2. This together constitutes 46.9% of the cases (22.7% sex and labour, 16.7% forced labour and 7.5% sexual exploit).
  3. This analysis also gives us more insights into why Women and Men are trafficked almost in similar number over all the years. The reason is that women are trafficked for sexual exploits and men are trafficked for bonded and other type of labour.
 

Relationship of victim with accused


Insights

  1. Here we look at the relationship of the victim with the accused to find out the reason of how the crime of human trafficking originates and is perpetrated.
  2. From the analysis we can see that 59.4 % of the victims were closely related to the accused - 19.8 of them were friend, 19.8% were members of the family while other 19.8% were an intimate partner.
  3. Only for 18.4 % of the cases the accused were not known to the victim.
  4. The rest of the 22.3 % of the cases could not record the relationship between victim and accused.
Codes and dataset available at https://github.com/starry91/human-trafficking
Please reach out at praveeniitkgp1994@gmail.com for any queries.

Comments

Popular posts from this blog

InstaBully

Introduction Cyber bullying has become prevalent in today's social media driven world. Awareness about it however, is not very widespread. Given that there is usually no escape for cyber bullying victims from their bullies, it is even more devastating than traditional bullying. Sometimes it is also hard to distinguish between simple negative interactions and cyber-bullying. Keeping this in mind we wanted to create a program that would help detect cyber bullying on Instagram accounts given only a username. Relevance In India, nearly 40% of people have never heard of cyber-bullying. Furthermore a majority of people think that current cyber-bullying measures are insufficient. 45% of parents say that their children have been cyber-bullied. Out of all the various ways in which people can be bullied online social media is the most common and also the most personal.  Although the nature of the bullying changes from platform to platform the effect does not change. we picked...

Social Media and Policing

Social Media and Policing Traditionally, Police all over the world have utilised a one-way communication model, sending information to the public either directly or through news media and not receiving communications back. Social media tools are changing these communication models, creating possibilities for interpersonal, participatory, and interactive communications. Our project focuses on the use of the social media tool, Twitter , for the job of policing. We analysed the official Police handles of Mumbai, Bangalore, Delhi and Hyderabad on Twitter. The purpose of our analysis was to determine what type of information is shared by city police departments over Twitter and how the public uses the information shared to converse with the police departments and with each other. Data Collection We analysed 24,110 posts authored by the 4 city police departments and 2,31,589 posts of Twitter users who tagged these handles. The analysis showed that city police depar...