Skip to main content

Real Time Identification Actionable Posts on Twitter

Motivation for our Project


India has one of the lowest ratio of police personnel per lakh population in world.

This makes policing in India difficult on many fronts and with its current status as developing country it would always imperative that law enforcement arm of country is as effective as possible to keep law and order as well as crime in check. 

But for day to day concerns such as Traffic, Law and Order issues,gatherings etc. police can get information with assistance from public themselves



As we know Social media platforms have obtained substantial interest of police to connect with residents.This has encouraged residents to report day-to-day law and order concerns such as traffic congestion, missing people, and harassment by cops on these platform , these are actionable posts . But these messages are lost in flood of unnecessary posts such as "Thank You notes", "Good Morning Posts" , "General known facts" etc. . Therein lies the challenge

What is Our Project

We collected Tweets from Official Police Handle in Twitter(@hydcitypolice) and tried some NLP techniques to identify such actionable information from user posts.

Our solution downloads tweets from Police handles and after processing feeds them to a Dashboard(Python/Flask based) which displays those tweets in categorized format. Categories are decided based on consultations from police personnel [1]. As we will see categorization is also used as attribute in identification of such important posts which is prime objective of our Project.

We can see from above distribution which contains all categories , categories such as Appreciation,Suggestion, General Info  are not Non -Serviceable.

Data Collection


We use official Twitter API ,Tweepy to fetch posts from verified Police Handles in Twitter(@hydcitypolice in our case). We extract features such as Date,Tweet Full Text , tweet_id.

Then we manually annotated all tweets into 13 different Topics which are displayed as below.This is done under the guidelines provided in reference paper published by Niharika Sachdeva [2].

Also we add Ground Truth label column which is to determine if tweets are Serviceable(S) or Not Serviceable (NS).



Some Preliminary Analysis

Tweets were analysed for Emotional Attribute Measures to establish some kind of co relation between Police responce on tweets.

Below are few examples:
  1. "Instagram account Shoppers Kart is doing all sorts of online fraud and cheating innocent people.He makes the Instagram users to pay advnce and blocks them once he receives payment.Plz take a serious action on him sir"
        "disgust": 0.461023,"anger": 0.422057
         Police Response : Yes
  1. "there is patient at my home but how these people making us in trouble at midnight...can i expect action"
         "disgust": 0.32599, "anger": 0.338764 
          Police Response : Yes
  1. "Can you take some action against this TV channel for misguiding the civilians with this news. They cant even differenciate @narendramodi official twitter account and spreading fake news."

    "disgust": 0.355119, "anger": 0.224149
    Police Response : No
  1. "One can hide a pain behind a smile .Cops need to eke out that hidden pain to turn a smile into a permanent glee - always"
        "disgust": 0.012144, "anger": 0.092993,
         Police Response : No 

We can see there is a threshold for Emotional scores and Police response on tweets. We will use this as measure of severity while ranking the serviceable tweets on the Dashboard.


How we Do it

Topic Modelling

 We use Bag of Words Algorithm to come with a lookup table with word tokens as columns and 13 topics as rows. Each cell value

is frequency of each word in each topic and it is TF-IDF  normalised.

TF-IDF

This technique is used to take care of imbalance in class distribution in Dataset.

Procedure: We tokenize each tweet and for each token topic with maximum weight is allocated from look up table.

More Features

Added to Topic Feature we extracted using BoW algo we also extract emotional features such as disgust, joy, anger and sadness. These scores are extracted with help of IBM Watson API. Also sentiment scores are extracted.


Training

Next we use annotated data of 500 collected tweets as previously mentioned and train model using different ML algorithms such as Logistic Regression, SVM and Multi Layer Perceptron.

Prediction and Performance

We then predict Serviceable tweets with trained models and check performance on all the above models.

  Interactive Dashboard (UI) 

Tweets are embedded in dashboard and once user clicks will be redirected to the Twitter page itself.

There is a drop down Menu which contains Serviceable and Non Serviceable select options and further each of them contain the relevant subcategories.

The framework is Python/Flask based which renders the Json fed from the back end. Each tweet is stored in Sqlite database. We have remove button feature for each tweet that removes the tweet from the Dashboard in case user has completed the required task.



Each time user will refresh the page it runs a script which collects tweets using API and calls the NLP module which tags each tweet its topic and predicts Serviceable and Non - Serviceable returns it in json format. We store it in DB and render it on dashboard

References 

[1] Call for Service: Characterizing and Modeling Police Response to Serviceable Requests on Facebook.

[2] Social Media for Safety: Characterizing Online Interactions between Citizens and Police. In Proc. HCI (2015).

Gallery


 


Comments

Popular posts from this blog

Human Trafficking dataset creation & analysis

Introduction The goal of this project is to create a Human Trafficking dataset from reliable sources such as news articles, Government agencies, etc and analyse the pain points in this area. Motivation   What is human trafficking? Human trafficking involves recruitment, harbouring or transporting people into a situation of exploitation through the use of violence, deception or coercion and forced to work against their will. In other words, trafficking is a process of enslaving people, coercing them into a situation with no way out, and exploiting them. What is it important?   Did you know that in 2015 alone, Human Trafficking generated $150 billion, more revenue  than Google, Nike, The NFL and Starbucks combined ?!?!   Sounds crazy right? Well there is more to this story than you know, that's why 18th of October is the EU Anti-Trafficking Day.According to a September 2017 report from the International Labor Organization (ILO) and Walk Free Fo...

InstaBully

Introduction Cyber bullying has become prevalent in today's social media driven world. Awareness about it however, is not very widespread. Given that there is usually no escape for cyber bullying victims from their bullies, it is even more devastating than traditional bullying. Sometimes it is also hard to distinguish between simple negative interactions and cyber-bullying. Keeping this in mind we wanted to create a program that would help detect cyber bullying on Instagram accounts given only a username. Relevance In India, nearly 40% of people have never heard of cyber-bullying. Furthermore a majority of people think that current cyber-bullying measures are insufficient. 45% of parents say that their children have been cyber-bullied. Out of all the various ways in which people can be bullied online social media is the most common and also the most personal.  Although the nature of the bullying changes from platform to platform the effect does not change. we picked...

Social Media and Policing

Social Media and Policing Traditionally, Police all over the world have utilised a one-way communication model, sending information to the public either directly or through news media and not receiving communications back. Social media tools are changing these communication models, creating possibilities for interpersonal, participatory, and interactive communications. Our project focuses on the use of the social media tool, Twitter , for the job of policing. We analysed the official Police handles of Mumbai, Bangalore, Delhi and Hyderabad on Twitter. The purpose of our analysis was to determine what type of information is shared by city police departments over Twitter and how the public uses the information shared to converse with the police departments and with each other. Data Collection We analysed 24,110 posts authored by the 4 city police departments and 2,31,589 posts of Twitter users who tagged these handles. The analysis showed that city police depar...