Skip to main content

Identifying Inhuman Humans

Prologue

In today's world, the thought of going through a day without checking notifications from social media platforms is nearly impossible. Social Media has changed the way we connect and interact with the world. Platforms like Twitter, operate with a simple principle, i.e. open for all, no bias towards anyone, almost like ideal equality. 

The fact that anyone can write about anything to anyone whilst sitting in the comfort of one's home, struggling in office or while in commute, has both a good and a bad side to it.

Terror groups lke ISIS have been active on Twitter since 2014, the time they captured parts of Syria. They use Twitter to spread hatred, radicalize people and recruit new so called 'soldiers'. An astonishing news came to light in December 2014, when an ordinary Software Engineer in Bangalore Shammi Witness was involved with ISIS, helping them by radicalizing and recruiting people. Astonishing thing about this is that he was doing so openly, without hiding his identity, not using any coded tweets. 

The outreach of such platfroms is worldwide, and thus one can influence masses using such platforms. Cases like Shammi Witness manier times are overlooked or don't come to light. 

With this project we tried to make use of our technical knowledge and apply it to tackle this problem.

Dataset

This project was started not only with the aim of analytics but also dataset creation. It may be considered as not so fancy work, but it is the heart of all analytics and machine learning. Thus, initial weeks were spent in data collection.

We collected real time data in a span of 1 month to 1.5 month. The data is mainly from the hashtags on Twitter like '#ISIS', '#Jihaad', '#ISIL', etc.

Data collection is done using Twitter API, cosidering the rate limiting, we were able to collect 45K tweets with 20 dimensions, i.e. effectively we ended up with a 45,000 x 20 matrix. It is of considerable size to do some analysis, and thus from this point onwards we moved our focus from data collection, preprocessing, cleaning and formating towards analytics.

Below, you can get a glimpse of data. Notice, how clean and well formatted it is. Justice is done to this step!

Some features:
  • Date Time
  • Location
  • Geo Tag
  • Tweet ID
  • Language
  • Hashtags
  • User Mentions
  • Retweet Count
  • Tweet Favourite Count
  • Device
  • User ID
  • User Name
  • Screen Name
  • Active Since
  • Tweet Count
  • Verification Status
  • Followers Count
  • Following Count
  • URL 
  • Full Text

Language Plot


Talking about Syria, Islamic State the first thing comes to mind is 'Wouldn't there be a language problem while analysing the tweets?'. We had the same doubt, so to burst this bubble the first plot we did (litterally) is the language of tweets. Fortunately, a shocking amount of tweets were in English. Phew!



This did take up the burdern away from our shoulders, or did it? Now what lies ahead is a bunch of plots signifying some or the other thing. You still with us?

Don't worry, we have tried to make the further read interesting.

Day Plot

Ohkay!, so let's begin.

Let's take a simple metric and see if anything interesting comes out. This was the mindset we had when we plotted the thing which you can see on the right.



Interestingly so, we did find out that ISIS supporters and anti-ISIS aren't bias towards the day of the week. Saturday witnessed as much tweets as Wednesday.

Enough fun and games, let's get serious, shall we?

Device Used Plot

Now we do some real analysis. The plot on the left is between the devices used and tweets done using those devices. It can be inferred from the plot that Twitter Web Client is predominantly used for tweeting. Other devices lke iPhones, Android even Blackberry were used in some cases.



A naive assumption can be made, that ISIS has some people who have some amount of technical knowledge. Overtime, we have seen a rise in the numbers for 'Android'.

Location Plot

Most of countries through which Tweets were done are CtrlSec which is Hacker Group also goes by the name anonymous. They tweets in against of ISIS and help out in suspending twitter accounts by notifying to Twitter about ISIS Twitter accounts which are used for radicalization and recruitment.



One more interesting things pops out here is that, count of tweets from Syria are far less than many other countries as Israel, USA(Washington DC), hence, we can conclude on it that ISIS is not a local problem of Syria, it is global problem.

Lets see few other analysis....

User Activity Plots

This plot tell us that, how many users are tweeting how much about ISIS, we had restricted this graph, to some count of tweets, just for the sake of simplicity, as plotting it beyond that won't make any better inference. 



From this plot we can get the inference that only few user are producing most of the content, in our case it is tweets, rest of the user are not very active in posting tweets, they do it very raerly, as we can see here that 18000 users had tweeted only one tweet, which clearly justifies the 'Power Law', which states that only 20% of user generate massive data, rest 80% just views that data, instead of generating any new data.

We will go further in it, with next plot....


In this plot we will get to know the most active users, there are for most active users, in rest of the users, few are active on medium level, and others are very less active in tweeting any tweet, they may be active on reading, liking, sharing, etc., but they don't post tweet that much actively.

Sentiment Score

Here we had analysed, how much a user is active on twitter, irrespective of ISIS subject, and what is there sentiment in there tweets for the ISIS subject.

Users who were tweeting against the ISIS
Users who were tweeting pro ISIS

Age Plot

Now here, we are presenting analysis on the basis of age group, like which age group is talking more about 'ISIS' either in favour or in against, but if they are interested in talking about 'ISIS' then we are counting them.


And according to our analysis we get an inference that, peoples of age group 25-34 are more interested in talking about 'ISIS', and on second place peoples of age group 22-24 are interested in it. So from this we can get an inference that youth is more keen to know about global problems.

Gender Plot

While analysing all the data, it is important to analyse if the topic is of same interest in both males and females, and here we are analysing the same for the 'ISIS' subject.



And here we analysed and what we get is, 'ISIS' subject is not that important among females than the males, as we see it among all the users more than 60% of users are male and less than 20% are females.


Hashtag Word Cloud




Tweet Word Cloud





User Mention Word Cloud



ML for PRO ISIS Tweets

We want to predict whether a given tweets in favor of ISIS or not with the collected tweets from twiiter with all total of nearly 35 thousands tweets, containing retweets and tweets .Out of dataset we used nearly 10 thousand tweets annotated them whether tweet is favor or not of ISIS. Then prepared a Prediction Model consist of following layers - LSTM layer 64 nodes then a Dense layer with 256 nodes with ReLU activation function and then a Dropout layer with 0.5 probability and end a single node output layer with sigmoid activation function with predicts with loss function of Binary Cross Entropy.

Accuracy: With this given model we were able obtain a result of 91.2%.

Link For ML Model


Comments

Popular posts from this blog

BSafe

Problem Statement The course Big Data and Policing  has given us a detailed account about the prominence of Data and how it can influence Policing and general safety.  We as students had the chance to attend talks from policemen to lawyers who discussed their role in collecting and analysing data of any form to conduct policing in a smarter way. Our focus was to try and develop something that can tackle the issue of safety and provide a service that helps in general policing. We decided to come up with an application that could aid the process. Preliminary Idea  We started off with the idea to develop a web and mobile application primarily intended for women safety. We wanted to collect data about narrow streets and roads and understand how unsafe it would be for women mainly as pedestrians. The application allows the users to mark a particular spot on the street which they deem as unsafe. It also allows them to enter a short description about the area and

Human Trafficking dataset creation & analysis

Introduction The goal of this project is to create a Human Trafficking dataset from reliable sources such as news articles, Government agencies, etc and analyse the pain points in this area. Motivation   What is human trafficking? Human trafficking involves recruitment, harbouring or transporting people into a situation of exploitation through the use of violence, deception or coercion and forced to work against their will. In other words, trafficking is a process of enslaving people, coercing them into a situation with no way out, and exploiting them. What is it important?   Did you know that in 2015 alone, Human Trafficking generated $150 billion, more revenue  than Google, Nike, The NFL and Starbucks combined ?!?!   Sounds crazy right? Well there is more to this story than you know, that's why 18th of October is the EU Anti-Trafficking Day.According to a September 2017 report from the International Labor Organization (ILO) and Walk Free Foundation:   An es

InstaBully

Introduction Cyber bullying has become prevalent in today's social media driven world. Awareness about it however, is not very widespread. Given that there is usually no escape for cyber bullying victims from their bullies, it is even more devastating than traditional bullying. Sometimes it is also hard to distinguish between simple negative interactions and cyber-bullying. Keeping this in mind we wanted to create a program that would help detect cyber bullying on Instagram accounts given only a username. Relevance In India, nearly 40% of people have never heard of cyber-bullying. Furthermore a majority of people think that current cyber-bullying measures are insufficient. 45% of parents say that their children have been cyber-bullied. Out of all the various ways in which people can be bullied online social media is the most common and also the most personal.  Although the nature of the bullying changes from platform to platform the effect does not change. we picked