Real Time Identification Actionable Posts on Twitter

Motivation for our Project

India has one of the lowest ratio of police personnel per lakh population in world.

This makes policing in India difficult on many fronts and with its current status as developing country it would always imperative that law enforcement arm of country is as effective as possible to keep law and order as well as crime in check.

But for day to day concerns such as Traffic, Law and Order issues,gatherings etc. police can get information with assistance from public themselves

As we know Social media platforms have obtained substantial interest of police to connect with residents.This has encouraged residents to report day-to-day law and order concerns such as traffic congestion, missing people, and harassment by cops on these platform , these are actionable posts . But these messages are lost in flood of unnecessary posts such as "Thank You notes", "Good Morning Posts" , "General known facts" etc. . Therein lies the challenge

What is Our Project

We collected Tweets from Official Police Handle in Twitter(@hydcitypolice) and tried some NLP techniques to identify such actionable information from user posts.

Our solution downloads tweets from Police handles and after processing feeds them to a Dashboard(Python/Flask based) which displays those tweets in categorized format. Categories are decided based on consultations from police personnel [1]. As we will see categorization is also used as attribute in identification of such important posts which is prime objective of our Project.

We can see from above distribution which contains all categories , categories such as Appreciation,Suggestion, General Info are not Non -Serviceable.

Data Collection

We use official Twitter API ,Tweepy to fetch posts from verified Police Handles in Twitter(@hydcitypolice in our case). We extract features such as Date,Tweet Full Text , tweet_id.

Then we manually annotated all tweets into 13 different Topics which are displayed as below.This is done under the guidelines provided in reference paper published by Niharika Sachdeva [2].

Also we add Ground Truth label column which is to determine if tweets are Serviceable(S) or Not Serviceable (NS).

Some Preliminary Analysis

Tweets were analysed for Emotional Attribute Measures to establish some kind of co relation between Police responce on tweets.

Below are few examples:

"Instagram account Shoppers Kart is doing all sorts of online fraud and cheating innocent people.He makes the Instagram users to pay advnce and blocks them once he receives payment.Plz take a serious action on him sir"

"disgust": 0.461023,"anger": 0.422057

Police Response : Yes

"there is patient at my home but how these people making us in trouble at midnight...can i expect action"

"disgust": 0.32599, "anger": 0.338764

Police Response : Yes

"Can you take some action against this TV channel for misguiding the civilians with this news. They cant even differenciate @narendramodi official twitter account and spreading fake news."

"disgust": 0.355119, "anger": 0.224149
Police Response : No

"One can hide a pain behind a smile .Cops need to eke out that hidden pain to turn a smile into a permanent glee - always"

"disgust": 0.012144, "anger": 0.092993,
Police Response : No

We can see there is a threshold for Emotional scores and Police response on tweets. We will use this as measure of severity while ranking the serviceable tweets on the Dashboard.

How we Do it

Topic Modelling

We use Bag of Words Algorithm to come with a lookup table with word tokens as columns and 13 topics as rows. Each cell value

is frequency of each word in each topic and it is TF-IDF normalised.

TF-IDF

This technique is used to take care of imbalance in class distribution in Dataset.

Procedure: We tokenize each tweet and for each token topic with maximum weight is allocated from look up table.

More Features

Added to Topic Feature we extracted using BoW algo we also extract emotional features such as disgust, joy, anger and sadness. These scores are extracted with help of IBM Watson API. Also sentiment scores are extracted.

Training

Next we use annotated data of 500 collected tweets as previously mentioned and train model using different ML algorithms such as Logistic Regression, SVM and Multi Layer Perceptron.

Prediction and Performance

We then predict Serviceable tweets with trained models and check performance on all the above models.

Interactive Dashboard (UI)

Tweets are embedded in dashboard and once user clicks will be redirected to the Twitter page itself.

There is a drop down Menu which contains Serviceable and Non Serviceable select options and further each of them contain the relevant subcategories.

The framework is Python/Flask based which renders the Json fed from the back end. Each tweet is stored in Sqlite database. We have remove button feature for each tweet that removes the tweet from the Dashboard in case user has completed the required task.

Each time user will refresh the page it runs a script which collects tweets using API and calls the NLP module which tags each tweet its topic and predicts Serviceable and Non - Serviceable returns it in json format. We store it in DB and render it on dashboard

References

[1] Call for Service: Characterizing and Modeling Police Response to Serviceable Requests on Facebook.

[2] Social Media for Safety: Characterizing Online Interactions between Citizens and Police. In Proc. HCI (2015).

IIIT-H | Big Data and Policing - Spring 2019 | Projects

Search This Blog