Skip to main content

Real Time Identification Actionable Posts on Twitter

Motivation for our Project


India has one of the lowest ratio of police personnel per lakh population in world.

This makes policing in India difficult on many fronts and with its current status as developing country it would always imperative that law enforcement arm of country is as effective as possible to keep law and order as well as crime in check. 

But for day to day concerns such as Traffic, Law and Order issues,gatherings etc. police can get information with assistance from public themselves



As we know Social media platforms have obtained substantial interest of police to connect with residents.This has encouraged residents to report day-to-day law and order concerns such as traffic congestion, missing people, and harassment by cops on these platform , these are actionable posts . But these messages are lost in flood of unnecessary posts such as "Thank You notes", "Good Morning Posts" , "General known facts" etc. . Therein lies the challenge

What is Our Project

We collected Tweets from Official Police Handle in Twitter(@hydcitypolice) and tried some NLP techniques to identify such actionable information from user posts.

Our solution downloads tweets from Police handles and after processing feeds them to a Dashboard(Python/Flask based) which displays those tweets in categorized format. Categories are decided based on consultations from police personnel [1]. As we will see categorization is also used as attribute in identification of such important posts which is prime objective of our Project.

We can see from above distribution which contains all categories , categories such as Appreciation,Suggestion, General Info  are not Non -Serviceable.

Data Collection


We use official Twitter API ,Tweepy to fetch posts from verified Police Handles in Twitter(@hydcitypolice in our case). We extract features such as Date,Tweet Full Text , tweet_id.

Then we manually annotated all tweets into 13 different Topics which are displayed as below.This is done under the guidelines provided in reference paper published by Niharika Sachdeva [2].

Also we add Ground Truth label column which is to determine if tweets are Serviceable(S) or Not Serviceable (NS).



Some Preliminary Analysis

Tweets were analysed for Emotional Attribute Measures to establish some kind of co relation between Police responce on tweets.

Below are few examples:
  1. "Instagram account Shoppers Kart is doing all sorts of online fraud and cheating innocent people.He makes the Instagram users to pay advnce and blocks them once he receives payment.Plz take a serious action on him sir"
        "disgust": 0.461023,"anger": 0.422057
         Police Response : Yes
  1. "there is patient at my home but how these people making us in trouble at midnight...can i expect action"
         "disgust": 0.32599, "anger": 0.338764 
          Police Response : Yes
  1. "Can you take some action against this TV channel for misguiding the civilians with this news. They cant even differenciate @narendramodi official twitter account and spreading fake news."

    "disgust": 0.355119, "anger": 0.224149
    Police Response : No
  1. "One can hide a pain behind a smile .Cops need to eke out that hidden pain to turn a smile into a permanent glee - always"
        "disgust": 0.012144, "anger": 0.092993,
         Police Response : No 

We can see there is a threshold for Emotional scores and Police response on tweets. We will use this as measure of severity while ranking the serviceable tweets on the Dashboard.


How we Do it

Topic Modelling

 We use Bag of Words Algorithm to come with a lookup table with word tokens as columns and 13 topics as rows. Each cell value

is frequency of each word in each topic and it is TF-IDF  normalised.

TF-IDF

This technique is used to take care of imbalance in class distribution in Dataset.

Procedure: We tokenize each tweet and for each token topic with maximum weight is allocated from look up table.

More Features

Added to Topic Feature we extracted using BoW algo we also extract emotional features such as disgust, joy, anger and sadness. These scores are extracted with help of IBM Watson API. Also sentiment scores are extracted.


Training

Next we use annotated data of 500 collected tweets as previously mentioned and train model using different ML algorithms such as Logistic Regression, SVM and Multi Layer Perceptron.

Prediction and Performance

We then predict Serviceable tweets with trained models and check performance on all the above models.

  Interactive Dashboard (UI) 

Tweets are embedded in dashboard and once user clicks will be redirected to the Twitter page itself.

There is a drop down Menu which contains Serviceable and Non Serviceable select options and further each of them contain the relevant subcategories.

The framework is Python/Flask based which renders the Json fed from the back end. Each tweet is stored in Sqlite database. We have remove button feature for each tweet that removes the tweet from the Dashboard in case user has completed the required task.



Each time user will refresh the page it runs a script which collects tweets using API and calls the NLP module which tags each tweet its topic and predicts Serviceable and Non - Serviceable returns it in json format. We store it in DB and render it on dashboard

References 

[1] Call for Service: Characterizing and Modeling Police Response to Serviceable Requests on Facebook.

[2] Social Media for Safety: Characterizing Online Interactions between Citizens and Police. In Proc. HCI (2015).

Gallery


 


Comments

Popular posts from this blog

Traffic Violations in Metropolitan Cities

Introduction With the advent of the smartphone era and the availability of 4G internet across the country, police forces have begun to use electronic receipts of the traditional traffic challans. E-Challans are electronically generated penalty receipt that takes the place of the physical paper receipts and helps in digitizing the whole process of collecting challans and penalizing violations. In this project, we analyze the set of all unpaid E-Challans collected in metropolitan cities over a large span of time to gain unique insights about the nature of traffic violations in such cities. The problem is very relevant for a course on Big Data & Policing as it tries to answer the following important questions: How are traffic violations distributed spatially and temporally across the city boundaries? Can the most common violation types be characterized and be used for providing intervention insights? How can police leverage social media for increasing awareness and for targe...

Real-Time and Predictive Traffic Data Analysis

Introduction Traffic prediction is crucial to many applications including traffic network planning, route guidance, and congestion avoidance. We have tried to minimize the time required for a vehicle to go from point A to point B, and maximize the efficiency of the flow of traffic, to help the traffic police in managing traffic. Several essential factors affect traffic prediction: Geographical factors such as topology, etc. Social factors such as holidays, concert, weekends, etc. Limited Dataset, i.e., either small or not a publicly available dataset. The primary aim of the project is to use historical and live traffic data to control the traffic lights for efficient traffic flow. Why is the problem statement important? The number of vehicles on the road in India have increased 2-fold in every 8 years since the year 2000. Apart from not having adequately constructed roads, there is no proper system for helping traffic police officers in controlling the flow of traffic...

Detecting Vulnerable regions in metropolitan cities

Introduction The problem is to handle the growing violence rate by estimating the probability of the upcoming violence, especially in metropolitan cities. Why is the problem important? This is important since if by doing so, we could somehow able to stop even 10-15% of upcoming threat then it can have a vast effect. Who will benefit : Police can analyze data in real time and may increase patrolling if required. Based on available data, police can effectively maintain law and order in  vulnerable areas. Our strategy For this we chose the social media platform twitter 1) First of all we collected tweets with geo tagged locations for the last 7 days for 4 citites hyderabad, mumbai, kolkata and delhi 2) But only 2% of total tweets have geo tagged locations. So what we have done is that, we made a dictionary of areas of these cities from maps of india and find   the location if it is mentioned in the tweet like My bag is stolen from CP D...