Skip to main content

Real Time Identification Actionable Posts on Twitter

Motivation for our Project


India has one of the lowest ratio of police personnel per lakh population in world.

This makes policing in India difficult on many fronts and with its current status as developing country it would always imperative that law enforcement arm of country is as effective as possible to keep law and order as well as crime in check. 

But for day to day concerns such as Traffic, Law and Order issues,gatherings etc. police can get information with assistance from public themselves



As we know Social media platforms have obtained substantial interest of police to connect with residents.This has encouraged residents to report day-to-day law and order concerns such as traffic congestion, missing people, and harassment by cops on these platform , these are actionable posts . But these messages are lost in flood of unnecessary posts such as "Thank You notes", "Good Morning Posts" , "General known facts" etc. . Therein lies the challenge

What is Our Project

We collected Tweets from Official Police Handle in Twitter(@hydcitypolice) and tried some NLP techniques to identify such actionable information from user posts.

Our solution downloads tweets from Police handles and after processing feeds them to a Dashboard(Python/Flask based) which displays those tweets in categorized format. Categories are decided based on consultations from police personnel [1]. As we will see categorization is also used as attribute in identification of such important posts which is prime objective of our Project.

We can see from above distribution which contains all categories , categories such as Appreciation,Suggestion, General Info  are not Non -Serviceable.

Data Collection


We use official Twitter API ,Tweepy to fetch posts from verified Police Handles in Twitter(@hydcitypolice in our case). We extract features such as Date,Tweet Full Text , tweet_id.

Then we manually annotated all tweets into 13 different Topics which are displayed as below.This is done under the guidelines provided in reference paper published by Niharika Sachdeva [2].

Also we add Ground Truth label column which is to determine if tweets are Serviceable(S) or Not Serviceable (NS).



Some Preliminary Analysis

Tweets were analysed for Emotional Attribute Measures to establish some kind of co relation between Police responce on tweets.

Below are few examples:
  1. "Instagram account Shoppers Kart is doing all sorts of online fraud and cheating innocent people.He makes the Instagram users to pay advnce and blocks them once he receives payment.Plz take a serious action on him sir"
        "disgust": 0.461023,"anger": 0.422057
         Police Response : Yes
  1. "there is patient at my home but how these people making us in trouble at midnight...can i expect action"
         "disgust": 0.32599, "anger": 0.338764 
          Police Response : Yes
  1. "Can you take some action against this TV channel for misguiding the civilians with this news. They cant even differenciate @narendramodi official twitter account and spreading fake news."

    "disgust": 0.355119, "anger": 0.224149
    Police Response : No
  1. "One can hide a pain behind a smile .Cops need to eke out that hidden pain to turn a smile into a permanent glee - always"
        "disgust": 0.012144, "anger": 0.092993,
         Police Response : No 

We can see there is a threshold for Emotional scores and Police response on tweets. We will use this as measure of severity while ranking the serviceable tweets on the Dashboard.


How we Do it

Topic Modelling

 We use Bag of Words Algorithm to come with a lookup table with word tokens as columns and 13 topics as rows. Each cell value

is frequency of each word in each topic and it is TF-IDF  normalised.

TF-IDF

This technique is used to take care of imbalance in class distribution in Dataset.

Procedure: We tokenize each tweet and for each token topic with maximum weight is allocated from look up table.

More Features

Added to Topic Feature we extracted using BoW algo we also extract emotional features such as disgust, joy, anger and sadness. These scores are extracted with help of IBM Watson API. Also sentiment scores are extracted.


Training

Next we use annotated data of 500 collected tweets as previously mentioned and train model using different ML algorithms such as Logistic Regression, SVM and Multi Layer Perceptron.

Prediction and Performance

We then predict Serviceable tweets with trained models and check performance on all the above models.

  Interactive Dashboard (UI) 

Tweets are embedded in dashboard and once user clicks will be redirected to the Twitter page itself.

There is a drop down Menu which contains Serviceable and Non Serviceable select options and further each of them contain the relevant subcategories.

The framework is Python/Flask based which renders the Json fed from the back end. Each tweet is stored in Sqlite database. We have remove button feature for each tweet that removes the tweet from the Dashboard in case user has completed the required task.



Each time user will refresh the page it runs a script which collects tweets using API and calls the NLP module which tags each tweet its topic and predicts Serviceable and Non - Serviceable returns it in json format. We store it in DB and render it on dashboard

References 

[1] Call for Service: Characterizing and Modeling Police Response to Serviceable Requests on Facebook.

[2] Social Media for Safety: Characterizing Online Interactions between Citizens and Police. In Proc. HCI (2015).

Gallery


 


Comments

Popular posts from this blog

BSafe

Problem Statement The course Big Data and Policing  has given us a detailed account about the prominence of Data and how it can influence Policing and general safety.  We as students had the chance to attend talks from policemen to lawyers who discussed their role in collecting and analysing data of any form to conduct policing in a smarter way. Our focus was to try and develop something that can tackle the issue of safety and provide a service that helps in general policing. We decided to come up with an application that could aid the process. Preliminary Idea  We started off with the idea to develop a web and mobile application primarily intended for women safety. We wanted to collect data about narrow streets and roads and understand how unsafe it would be for women mainly as pedestrians. The application allows the users to mark a particular spot on the street which they deem as unsafe. It also allows them to enter a short description about the area and

Human Trafficking dataset creation & analysis

Introduction The goal of this project is to create a Human Trafficking dataset from reliable sources such as news articles, Government agencies, etc and analyse the pain points in this area. Motivation   What is human trafficking? Human trafficking involves recruitment, harbouring or transporting people into a situation of exploitation through the use of violence, deception or coercion and forced to work against their will. In other words, trafficking is a process of enslaving people, coercing them into a situation with no way out, and exploiting them. What is it important?   Did you know that in 2015 alone, Human Trafficking generated $150 billion, more revenue  than Google, Nike, The NFL and Starbucks combined ?!?!   Sounds crazy right? Well there is more to this story than you know, that's why 18th of October is the EU Anti-Trafficking Day.According to a September 2017 report from the International Labor Organization (ILO) and Walk Free Foundation:   An es

InstaBully

Introduction Cyber bullying has become prevalent in today's social media driven world. Awareness about it however, is not very widespread. Given that there is usually no escape for cyber bullying victims from their bullies, it is even more devastating than traditional bullying. Sometimes it is also hard to distinguish between simple negative interactions and cyber-bullying. Keeping this in mind we wanted to create a program that would help detect cyber bullying on Instagram accounts given only a username. Relevance In India, nearly 40% of people have never heard of cyber-bullying. Furthermore a majority of people think that current cyber-bullying measures are insufficient. 45% of parents say that their children have been cyber-bullied. Out of all the various ways in which people can be bullied online social media is the most common and also the most personal.  Although the nature of the bullying changes from platform to platform the effect does not change. we picked