Skip to main content

Crime in India: Past, Present and Social Media

The course Big data in Policing has taught us about the growing importance of social media in approximating the degree of crime which are happened or happening across India. Due to internet revolution, We can see mass present of people on Social media like Twitter, Facebook, Instagram.

With this work, Our focus is to take advantage of social media data to identify crime patterns in social media and validating the analysis using news articles. This involves understanding and analyzing the tweet based on multiple features for a certain issue based on time, influence of account, public reaction and post actions on it, etc.

This project demonstrates the reported cime by goverment agencies (from data.gov.in) and to extend this idea, we identify different reported crime on the social media (specially on twitter). This can be effectively used by police and several other others to tackle crimes without time and information bariors

Crime in India (till 2016, Major in broad categories)

  • MURDER
  • RAPE
  • ROBBERY
  • RIOTS
  • ARSON
  • Drug
  • NAXALS
  • OTHER IPC CRIMES

Following are the number of IPC crime stats across india Following are the number of IPC crime stats across india

Dataset

  • For this work we uitilised twitter api to collect tweets for our crime analyis.

Samples

22  #DeathWish review: A straight reboot that does...   0
23  #RedSparrow review: despite a brilliant cast, ...   0
24  The Telugu film industry hasn’t had a great ti...   0
29  Praveena Paruchuri is a name unfamiliar to Tel...   0
... ... ...
11976   Sri Lanka's Test captain Dimuth Karunaratne ar...   1
11978   Mumbai businessman tries to buy VVIP mobile nu...   1
11986   Australian Indian charged with Murder of Preg...    1
11987   Convicted Drug Dealer ordered to Pay £19,000 D...   1

Data stats

We manually annotated crime reported text (of twitter) and its data numbers are as shown in below table.

Text_type #Tweets
Crime reported Tweets (languages, English, Hindi, Telugu) 3062
Non crime reported Tweets (languages, English, Hindi, Telugu) 13040

Classification and Named Entity Identification

To build a classification model which can predict given sample into crime relacted vs non crime sample, we followed following process.

  1. Text preprocessing module :
  2. Data subsampling :
  3. Classification (Fasttext):
  4. Transfer learning for text
    • ELMO based embedding :
    • CNN based classifier :
  5. Spacy and NLTL Named Entity identification:
  6. Crime type identification :
    • Clustering : Based on LDA and TSNE :
    • Data annotation :

      text    label
      381 rt @airnewsalerts: #kenya: 14 people were kill...   other
      478 rt @timesofindia: delhi: fire breaks out after...   fire
      451 rt @timesofindia: #newzealandshooting many dea...   shooting
      522 rt @toihyderabad: man’s murder: wife, lover &a...   murder
      88  #justin | palestinian killed by israeli fire i...   fire
      288 j&k: unidentified gunmen suspected to be t...   terrorist_attack
      577 two people in #seattle were killed and two oth...   fire
      141 11 bank employees booked for sexual harassment...   assault
      424 rt @htmumbai: teenager killed in mumbai after ...   other
      264 gurpreet singh: where is the outrage for more ...   other

    • Classification based on point 4 : accuracy numbers are reported below section
  7. Same Event grouping:

Classification Accuracies

Algorithm Train Test Accu F1-Score
Fasttext 10113 1785 0.937 0.836
ELMO + CNN 10113 1785 0.9407 0.853
ELMO + CNN (sub-sampling) 6452 634 0.954 0.883
ELMO + CNN (crimetype) 512 91 0.857 0.878

Accuracies

  • crime vs non-crime numbers and crime type numbers

Interesting findings from live model

    Interesting findings from live model , textual tweets

  • in up woman raped by husband and his brother in-law on wedding night ‘over dowry’ https://t.co/c48snvbers via @toicities

  • @xxxxxx hi anna, day before yesterday 6yr old baby bruatually raped and murdered by one unknown bihari, local polit… https://t.co/gcxuuffpr7

  • @xxxxxx good morning sir .... please respond sir ....march 21st in alwaal 6 yrs baby brutally raped by stupids ...… https://t.co/lnkgamtahy

  • @xxxxxx sir 6 years old small girl raped and murdered brutally. till now what action was taken ? who is that culpri… https://t.co/hfk2xwn6dj

  • @xxxxxx sir please respond on pravalika rape case a 6 year girl raped by 6 members u leaders are busy with your ele… https://t.co/kcnsj5jymx

  • @xxxxxx .... ఎర్రిపుాక కెటిఆర్ ... 6 years old girl had been raped and killed ... what action you have taken ...

  • @xxxxxx ktr anna garu a small school going child is gang raped and been killed in alwal area yesterday,which is rea… https://t.co/ughxmtv4i3

  • @xxxxxx sir please respond on 6 year girl raped n killed by bihar people on holiday sir

  • @xxxxxx the incident happened on holi.little innocent 6 year old girl pravallika from alwal,has been raped and kill… https://t.co/7ct87088pv

  • @xxxxxx @trspartyonline @telanganacmo 6 year old raped and brutally killed in the state and no coverage or justice… https://t.co/pxbsrwqiqs

  • @xx1 @xx2 he raped his tenant small daughter . abuse him like a hell so that he realised how big… https://t.co/mmak1ylhpu

Acknowledgments

  • Dr. Ponnurangam Kumaraguru
  • Dr. Manish Srivastava

Comments

Popular posts from this blog

Traffic Violations in Metropolitan Cities

Introduction With the advent of the smartphone era and the availability of 4G internet across the country, police forces have begun to use electronic receipts of the traditional traffic challans. E-Challans are electronically generated penalty receipt that takes the place of the physical paper receipts and helps in digitizing the whole process of collecting challans and penalizing violations. In this project, we analyze the set of all unpaid E-Challans collected in metropolitan cities over a large span of time to gain unique insights about the nature of traffic violations in such cities. The problem is very relevant for a course on Big Data & Policing as it tries to answer the following important questions: How are traffic violations distributed spatially and temporally across the city boundaries? Can the most common violation types be characterized and be used for providing intervention insights? How can police leverage social media for increasing awareness and for targe...

Real-Time and Predictive Traffic Data Analysis

Introduction Traffic prediction is crucial to many applications including traffic network planning, route guidance, and congestion avoidance. We have tried to minimize the time required for a vehicle to go from point A to point B, and maximize the efficiency of the flow of traffic, to help the traffic police in managing traffic. Several essential factors affect traffic prediction: Geographical factors such as topology, etc. Social factors such as holidays, concert, weekends, etc. Limited Dataset, i.e., either small or not a publicly available dataset. The primary aim of the project is to use historical and live traffic data to control the traffic lights for efficient traffic flow. Why is the problem statement important? The number of vehicles on the road in India have increased 2-fold in every 8 years since the year 2000. Apart from not having adequately constructed roads, there is no proper system for helping traffic police officers in controlling the flow of traffic...

Detecting Vulnerable regions in metropolitan cities

Introduction The problem is to handle the growing violence rate by estimating the probability of the upcoming violence, especially in metropolitan cities. Why is the problem important? This is important since if by doing so, we could somehow able to stop even 10-15% of upcoming threat then it can have a vast effect. Who will benefit : Police can analyze data in real time and may increase patrolling if required. Based on available data, police can effectively maintain law and order in  vulnerable areas. Our strategy For this we chose the social media platform twitter 1) First of all we collected tweets with geo tagged locations for the last 7 days for 4 citites hyderabad, mumbai, kolkata and delhi 2) But only 2% of total tweets have geo tagged locations. So what we have done is that, we made a dictionary of areas of these cities from maps of india and find   the location if it is mentioned in the tweet like My bag is stolen from CP D...