Prologue
In today's world, the thought of going through a day without checking notifications from social media platforms is nearly impossible. Social Media has changed the way we connect and interact with the world. Platforms like Twitter, operate with a simple principle, i.e. open for all, no bias towards anyone, almost like ideal equality.
The fact that anyone can write about anything to anyone whilst sitting in the comfort of one's home, struggling in office or while in commute, has both a good and a bad side to it.
Terror groups lke ISIS have been active on Twitter since 2014, the time they captured parts of Syria. They use Twitter to spread hatred, radicalize people and recruit new so called 'soldiers'. An astonishing news came to light in December 2014, when an ordinary Software Engineer in Bangalore Shammi Witness was involved with ISIS, helping them by radicalizing and recruiting people. Astonishing thing about this is that he was doing so openly, without hiding his identity, not using any coded tweets.
The outreach of such platfroms is worldwide, and thus one can influence masses using such platforms. Cases like Shammi Witness manier times are overlooked or don't come to light.
With this project we tried to make use of our technical knowledge and apply it to tackle this problem.
Dataset
This project was started not only with the aim of analytics but also dataset creation. It may be considered as not so fancy work, but it is the heart of all analytics and machine learning. Thus, initial weeks were spent in data collection.
We collected real time data in a span of 1 month to 1.5 month. The data is mainly from the hashtags on Twitter like '#ISIS', '#Jihaad', '#ISIL', etc.
Data collection is done using Twitter API, cosidering the rate limiting, we were able to collect 45K tweets with 20 dimensions, i.e. effectively we ended up with a 45,000 x 20 matrix. It is of considerable size to do some analysis, and thus from this point onwards we moved our focus from data collection, preprocessing, cleaning and formating towards analytics.
Below, you can get a glimpse of data. Notice, how clean and well formatted it is. Justice is done to this step!
Some features:
- Date Time
- Location
- Geo Tag
- Tweet ID
- Language
- Hashtags
- User Mentions
- Retweet Count
- Tweet Favourite Count
- Device
- User ID
- User Name
- Screen Name
- Active Since
- Tweet Count
- Verification Status
- Followers Count
- Following Count
- URL
- Full Text
Language Plot
Talking about Syria, Islamic State the first thing comes to mind is 'Wouldn't there be a language problem while analysing the tweets?'. We had the same doubt, so to burst this bubble the first plot we did (litterally) is the language of tweets. Fortunately, a shocking amount of tweets were in English. Phew!
This did take up the burdern away from our shoulders, or did it? Now what lies ahead is a bunch of plots signifying some or the other thing. You still with us?
Don't worry, we have tried to make the further read interesting.
Day Plot
Ohkay!, so let's begin.Let's take a simple metric and see if anything interesting comes out. This was the mindset we had when we plotted the thing which you can see on the right.
Interestingly so, we did find out that ISIS supporters and anti-ISIS aren't bias towards the day of the week. Saturday witnessed as much tweets as Wednesday.
Enough fun and games, let's get serious, shall we?
Device Used Plot
Now we do some real analysis. The plot on the left is between the devices used and tweets done using those devices. It can be inferred from the plot that Twitter Web Client is predominantly used for tweeting. Other devices lke iPhones, Android even Blackberry were used in some cases.A naive assumption can be made, that ISIS has some people who have some amount of technical knowledge. Overtime, we have seen a rise in the numbers for 'Android'.
Location Plot
Most of countries through which Tweets were done are CtrlSec which is Hacker Group also goes by the name anonymous. They tweets in against of ISIS and help out in suspending twitter accounts by notifying to Twitter about ISIS Twitter accounts which are used for radicalization and recruitment.One more interesting things pops out here is that, count of tweets from Syria are far less than many other countries as Israel, USA(Washington DC), hence, we can conclude on it that ISIS is not a local problem of Syria, it is global problem.
Lets see few other analysis....
User Activity Plots
This plot tell us that, how many users are tweeting how much about ISIS, we had restricted this graph, to some count of tweets, just for the sake of simplicity, as plotting it beyond that won't make any better inference.From this plot we can get the inference that only few user are producing most of the content, in our case it is tweets, rest of the user are not very active in posting tweets, they do it very raerly, as we can see here that 18000 users had tweeted only one tweet, which clearly justifies the 'Power Law', which states that only 20% of user generate massive data, rest 80% just views that data, instead of generating any new data.
We will go further in it, with next plot....
Sentiment Score
Here we had analysed, how much a user is active on twitter, irrespective of ISIS subject, and what is there sentiment in there tweets for the ISIS subject.Users who were tweeting against the ISIS |
Users who were tweeting pro ISIS |
Age Plot
Now here, we are presenting analysis on the basis of age group, like which age group is talking more about 'ISIS' either in favour or in against, but if they are interested in talking about 'ISIS' then we are counting them.And according to our analysis we get an inference that, peoples of age group 25-34 are more interested in talking about 'ISIS', and on second place peoples of age group 22-24 are interested in it. So from this we can get an inference that youth is more keen to know about global problems.
Gender Plot
While analysing all the data, it is important to analyse if the topic is of same interest in both males and females, and here we are analysing the same for the 'ISIS' subject.And here we analysed and what we get is, 'ISIS' subject is not that important among females than the males, as we see it among all the users more than 60% of users are male and less than 20% are females.
Comments
Post a Comment