Skip to main content

Crime in India: Past, Present and Social Media

The course Big data in Policing has taught us about the growing importance of social media in approximating the degree of crime which are happened or happening across India. Due to internet revolution, We can see mass present of people on Social media like Twitter, Facebook, Instagram.

With this work, Our focus is to take advantage of social media data to identify crime patterns in social media and validating the analysis using news articles. This involves understanding and analyzing the tweet based on multiple features for a certain issue based on time, influence of account, public reaction and post actions on it, etc.

This project demonstrates the reported cime by goverment agencies (from data.gov.in) and to extend this idea, we identify different reported crime on the social media (specially on twitter). This can be effectively used by police and several other others to tackle crimes without time and information bariors

Crime in India (till 2016, Major in broad categories)

  • MURDER
  • RAPE
  • ROBBERY
  • RIOTS
  • ARSON
  • Drug
  • NAXALS
  • OTHER IPC CRIMES

Following are the number of IPC crime stats across india Following are the number of IPC crime stats across india

Dataset

  • For this work we uitilised twitter api to collect tweets for our crime analyis.

Samples

22  #DeathWish review: A straight reboot that does...   0
23  #RedSparrow review: despite a brilliant cast, ...   0
24  The Telugu film industry hasn’t had a great ti...   0
29  Praveena Paruchuri is a name unfamiliar to Tel...   0
... ... ...
11976   Sri Lanka's Test captain Dimuth Karunaratne ar...   1
11978   Mumbai businessman tries to buy VVIP mobile nu...   1
11986   Australian Indian charged with Murder of Preg...    1
11987   Convicted Drug Dealer ordered to Pay £19,000 D...   1

Data stats

We manually annotated crime reported text (of twitter) and its data numbers are as shown in below table.

Text_type #Tweets
Crime reported Tweets (languages, English, Hindi, Telugu) 3062
Non crime reported Tweets (languages, English, Hindi, Telugu) 13040

Classification and Named Entity Identification

To build a classification model which can predict given sample into crime relacted vs non crime sample, we followed following process.

  1. Text preprocessing module :
  2. Data subsampling :
  3. Classification (Fasttext):
  4. Transfer learning for text
    • ELMO based embedding :
    • CNN based classifier :
  5. Spacy and NLTL Named Entity identification:
  6. Crime type identification :
    • Clustering : Based on LDA and TSNE :
    • Data annotation :

      text    label
      381 rt @airnewsalerts: #kenya: 14 people were kill...   other
      478 rt @timesofindia: delhi: fire breaks out after...   fire
      451 rt @timesofindia: #newzealandshooting many dea...   shooting
      522 rt @toihyderabad: man’s murder: wife, lover &a...   murder
      88  #justin | palestinian killed by israeli fire i...   fire
      288 j&k: unidentified gunmen suspected to be t...   terrorist_attack
      577 two people in #seattle were killed and two oth...   fire
      141 11 bank employees booked for sexual harassment...   assault
      424 rt @htmumbai: teenager killed in mumbai after ...   other
      264 gurpreet singh: where is the outrage for more ...   other

    • Classification based on point 4 : accuracy numbers are reported below section
  7. Same Event grouping:

Classification Accuracies

Algorithm Train Test Accu F1-Score
Fasttext 10113 1785 0.937 0.836
ELMO + CNN 10113 1785 0.9407 0.853
ELMO + CNN (sub-sampling) 6452 634 0.954 0.883
ELMO + CNN (crimetype) 512 91 0.857 0.878

Accuracies

  • crime vs non-crime numbers and crime type numbers

Interesting findings from live model

    Interesting findings from live model , textual tweets

  • in up woman raped by husband and his brother in-law on wedding night ‘over dowry’ https://t.co/c48snvbers via @toicities

  • @xxxxxx hi anna, day before yesterday 6yr old baby bruatually raped and murdered by one unknown bihari, local polit… https://t.co/gcxuuffpr7

  • @xxxxxx good morning sir .... please respond sir ....march 21st in alwaal 6 yrs baby brutally raped by stupids ...… https://t.co/lnkgamtahy

  • @xxxxxx sir 6 years old small girl raped and murdered brutally. till now what action was taken ? who is that culpri… https://t.co/hfk2xwn6dj

  • @xxxxxx sir please respond on pravalika rape case a 6 year girl raped by 6 members u leaders are busy with your ele… https://t.co/kcnsj5jymx

  • @xxxxxx .... ఎర్రిపుాక కెటిఆర్ ... 6 years old girl had been raped and killed ... what action you have taken ...

  • @xxxxxx ktr anna garu a small school going child is gang raped and been killed in alwal area yesterday,which is rea… https://t.co/ughxmtv4i3

  • @xxxxxx sir please respond on 6 year girl raped n killed by bihar people on holiday sir

  • @xxxxxx the incident happened on holi.little innocent 6 year old girl pravallika from alwal,has been raped and kill… https://t.co/7ct87088pv

  • @xxxxxx @trspartyonline @telanganacmo 6 year old raped and brutally killed in the state and no coverage or justice… https://t.co/pxbsrwqiqs

  • @xx1 @xx2 he raped his tenant small daughter . abuse him like a hell so that he realised how big… https://t.co/mmak1ylhpu

Acknowledgments

  • Dr. Ponnurangam Kumaraguru
  • Dr. Manish Srivastava

Comments

Popular posts from this blog

BSafe

Problem Statement The course Big Data and Policing  has given us a detailed account about the prominence of Data and how it can influence Policing and general safety.  We as students had the chance to attend talks from policemen to lawyers who discussed their role in collecting and analysing data of any form to conduct policing in a smarter way. Our focus was to try and develop something that can tackle the issue of safety and provide a service that helps in general policing. We decided to come up with an application that could aid the process. Preliminary Idea  We started off with the idea to develop a web and mobile application primarily intended for women safety. We wanted to collect data about narrow streets and roads and understand how unsafe it would be for women mainly as pedestrians. The application allows the users to mark a particular spot on the street which they deem as unsafe. It also allows them to enter a short description about the area and

Human Trafficking dataset creation & analysis

Introduction The goal of this project is to create a Human Trafficking dataset from reliable sources such as news articles, Government agencies, etc and analyse the pain points in this area. Motivation   What is human trafficking? Human trafficking involves recruitment, harbouring or transporting people into a situation of exploitation through the use of violence, deception or coercion and forced to work against their will. In other words, trafficking is a process of enslaving people, coercing them into a situation with no way out, and exploiting them. What is it important?   Did you know that in 2015 alone, Human Trafficking generated $150 billion, more revenue  than Google, Nike, The NFL and Starbucks combined ?!?!   Sounds crazy right? Well there is more to this story than you know, that's why 18th of October is the EU Anti-Trafficking Day.According to a September 2017 report from the International Labor Organization (ILO) and Walk Free Foundation:   An es

InstaBully

Introduction Cyber bullying has become prevalent in today's social media driven world. Awareness about it however, is not very widespread. Given that there is usually no escape for cyber bullying victims from their bullies, it is even more devastating than traditional bullying. Sometimes it is also hard to distinguish between simple negative interactions and cyber-bullying. Keeping this in mind we wanted to create a program that would help detect cyber bullying on Instagram accounts given only a username. Relevance In India, nearly 40% of people have never heard of cyber-bullying. Furthermore a majority of people think that current cyber-bullying measures are insufficient. 45% of parents say that their children have been cyber-bullied. Out of all the various ways in which people can be bullied online social media is the most common and also the most personal.  Although the nature of the bullying changes from platform to platform the effect does not change. we picked