Skip to main content

Real Time Identification Actionable Posts on Twitter

Motivation for our Project


India has one of the lowest ratio of police personnel per lakh population in world.

This makes policing in India difficult on many fronts and with its current status as developing country it would always imperative that law enforcement arm of country is as effective as possible to keep law and order as well as crime in check. 

But for day to day concerns such as Traffic, Law and Order issues,gatherings etc. police can get information with assistance from public themselves



As we know Social media platforms have obtained substantial interest of police to connect with residents.This has encouraged residents to report day-to-day law and order concerns such as traffic congestion, missing people, and harassment by cops on these platform , these are actionable posts . But these messages are lost in flood of unnecessary posts such as "Thank You notes", "Good Morning Posts" , "General known facts" etc. . Therein lies the challenge

What is Our Project

We collected Tweets from Official Police Handle in Twitter(@hydcitypolice) and tried some NLP techniques to identify such actionable information from user posts.

Our solution downloads tweets from Police handles and after processing feeds them to a Dashboard(Python/Flask based) which displays those tweets in categorized format. Categories are decided based on consultations from police personnel [1]. As we will see categorization is also used as attribute in identification of such important posts which is prime objective of our Project.

We can see from above distribution which contains all categories , categories such as Appreciation,Suggestion, General Info  are not Non -Serviceable.

Data Collection


We use official Twitter API ,Tweepy to fetch posts from verified Police Handles in Twitter(@hydcitypolice in our case). We extract features such as Date,Tweet Full Text , tweet_id.

Then we manually annotated all tweets into 13 different Topics which are displayed as below.This is done under the guidelines provided in reference paper published by Niharika Sachdeva [2].

Also we add Ground Truth label column which is to determine if tweets are Serviceable(S) or Not Serviceable (NS).



Some Preliminary Analysis

Tweets were analysed for Emotional Attribute Measures to establish some kind of co relation between Police responce on tweets.

Below are few examples:
  1. "Instagram account Shoppers Kart is doing all sorts of online fraud and cheating innocent people.He makes the Instagram users to pay advnce and blocks them once he receives payment.Plz take a serious action on him sir"
        "disgust": 0.461023,"anger": 0.422057
         Police Response : Yes
  1. "there is patient at my home but how these people making us in trouble at midnight...can i expect action"
         "disgust": 0.32599, "anger": 0.338764 
          Police Response : Yes
  1. "Can you take some action against this TV channel for misguiding the civilians with this news. They cant even differenciate @narendramodi official twitter account and spreading fake news."

    "disgust": 0.355119, "anger": 0.224149
    Police Response : No
  1. "One can hide a pain behind a smile .Cops need to eke out that hidden pain to turn a smile into a permanent glee - always"
        "disgust": 0.012144, "anger": 0.092993,
         Police Response : No 

We can see there is a threshold for Emotional scores and Police response on tweets. We will use this as measure of severity while ranking the serviceable tweets on the Dashboard.


How we Do it

Topic Modelling

 We use Bag of Words Algorithm to come with a lookup table with word tokens as columns and 13 topics as rows. Each cell value

is frequency of each word in each topic and it is TF-IDF  normalised.

TF-IDF

This technique is used to take care of imbalance in class distribution in Dataset.

Procedure: We tokenize each tweet and for each token topic with maximum weight is allocated from look up table.

More Features

Added to Topic Feature we extracted using BoW algo we also extract emotional features such as disgust, joy, anger and sadness. These scores are extracted with help of IBM Watson API. Also sentiment scores are extracted.


Training

Next we use annotated data of 500 collected tweets as previously mentioned and train model using different ML algorithms such as Logistic Regression, SVM and Multi Layer Perceptron.

Prediction and Performance

We then predict Serviceable tweets with trained models and check performance on all the above models.

  Interactive Dashboard (UI) 

Tweets are embedded in dashboard and once user clicks will be redirected to the Twitter page itself.

There is a drop down Menu which contains Serviceable and Non Serviceable select options and further each of them contain the relevant subcategories.

The framework is Python/Flask based which renders the Json fed from the back end. Each tweet is stored in Sqlite database. We have remove button feature for each tweet that removes the tweet from the Dashboard in case user has completed the required task.



Each time user will refresh the page it runs a script which collects tweets using API and calls the NLP module which tags each tweet its topic and predicts Serviceable and Non - Serviceable returns it in json format. We store it in DB and render it on dashboard

References 

[1] Call for Service: Characterizing and Modeling Police Response to Serviceable Requests on Facebook.

[2] Social Media for Safety: Characterizing Online Interactions between Citizens and Police. In Proc. HCI (2015).

Gallery


 


Comments

Popular posts from this blog

Real-Time and Predictive Traffic Data Analysis

Introduction Traffic prediction is crucial to many applications including traffic network planning, route guidance, and congestion avoidance. We have tried to minimize the time required for a vehicle to go from point A to point B, and maximize the efficiency of the flow of traffic, to help the traffic police in managing traffic. Several essential factors affect traffic prediction: Geographical factors such as topology, etc. Social factors such as holidays, concert, weekends, etc. Limited Dataset, i.e., either small or not a publicly available dataset. The primary aim of the project is to use historical and live traffic data to control the traffic lights for efficient traffic flow. Why is the problem statement important? The number of vehicles on the road in India have increased 2-fold in every 8 years since the year 2000. Apart from not having adequately constructed roads, there is no proper system for helping traffic police officers in controlling the flow of traffic...

Detecting Vulnerable regions in metropolitan cities

Introduction The problem is to handle the growing violence rate by estimating the probability of the upcoming violence, especially in metropolitan cities. Why is the problem important? This is important since if by doing so, we could somehow able to stop even 10-15% of upcoming threat then it can have a vast effect. Who will benefit : Police can analyze data in real time and may increase patrolling if required. Based on available data, police can effectively maintain law and order in  vulnerable areas. Our strategy For this we chose the social media platform twitter 1) First of all we collected tweets with geo tagged locations for the last 7 days for 4 citites hyderabad, mumbai, kolkata and delhi 2) But only 2% of total tweets have geo tagged locations. So what we have done is that, we made a dictionary of areas of these cities from maps of india and find   the location if it is mentioned in the tweet like My bag is stolen from CP D...

Social Media and Policing

Social Media and Policing Traditionally, Police all over the world have utilised a one-way communication model, sending information to the public either directly or through news media and not receiving communications back. Social media tools are changing these communication models, creating possibilities for interpersonal, participatory, and interactive communications. Our project focuses on the use of the social media tool, Twitter , for the job of policing. We analysed the official Police handles of Mumbai, Bangalore, Delhi and Hyderabad on Twitter. The purpose of our analysis was to determine what type of information is shared by city police departments over Twitter and how the public uses the information shared to converse with the police departments and with each other. Data Collection We analysed 24,110 posts authored by the 4 city police departments and 2,31,589 posts of Twitter users who tagged these handles. The analysis showed that city police depar...