Skip to main content

Cloud Flow Prediction and Client Discovery Using Wireless Networks






Introduction:

Due to population growth, crowd analysis has become a major interest in social and technical disciplines.Crowd analysis is being used to develop crowd management strategies in public events as well as public space design, visual surveillance and virtual environments to make areas more convenient in order to prevent crowd induced disasters.

Identifying crowd patterns in a sample setting (IIIT-H) by WiFi requests sent by mobile devices as they move around. The client locations can be triangulated to know their accurate locations at any point of time. This data will be used to create heatmaps and perform time series analysis
Every device, which has Wifi ‘on’, performs ‘Active Scanning’ where it continuously transmits probe requests, which consists of BSSID (Broadcast MAC address), SSID (Zero Length) and MAC (device MAC address).As MAC Addresses are globally unique, hence tracking the movement of a particular device is possible.With a MAC address, these logs can be cross referenced to track the device in all of the places which has a listener. If a listener is driven around a residential area, then one can know the location of the house from which the device is from.

How Wifi Works?

  1. Turn the receiver on and listen on each channel for the beacon.
  2. Broadcast a "Who is there?" packet on each channel.
In (1) battery life is negatively impacted and 
power usage for a wifi adapter sky-rockets.
So we use (2) In this process, the wifi card 
transmits a probe request. The probe request consists of:
BSSID: Broadcast MAC address, SSID: Zero length, MAC: Your wifi MAC address

How we exploit this?


                  • We set up a device to purely sit silent and listen for these probe requests, then write the MAC address and timestamp to a file. 
                  • So we can tell when you pass within range of my access point , for as long as you are in the area. 

Underlying Math Involved in Triangulation:

Analysing the IIIT setting

Building a Heatmap of Crowdflow

a)Heat distribution over IIIT campus b) Corresponding satellite image
  • Setting up Raspberry Pis at multiple location thorugh out the campus.
  • Collecting and pre-processing the data.
  • Using the data collected by the Raspberry Pis, we will build a heatmap of crowdflow.
  • The red circles shows the hotspots of crowd averaged over all time in IIITH over a time period.

Time series analysis of Crowd Flow

  • Time series analysis is the process of forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects.
  • We utilize two python libraries to do this - Prophet and Arima (AutoRegressive Integrated Moving Average)
                    • The four graphs show performance of Arima w.r.t to ground truth:
                      • Arima Predictions in 2-D based on Training data
                      • Ground Truth of Actual Path taken by Target in 2-D
                      • Arima Predictions in 3-D based on Training data
                      • Ground Truth of Actual Path taken by Target in 3-D

Web Interface:

Live web interface collecting and parsing data in real time and generating predicted crowd flows and analysis graphs in real time.

 

Anomalies and Abnormal Crowd Behaviour:

  • When there was any important event on campus, we saw that there was an appropriate response in the crowd movement. 
  • We can see  in the above images that when there was the "R&D showcase", the crowd was concentrated at KCIS and nearby regions. 
  • During the "Farewell" that happened a few days ago, there were a lot of people at the Felicity Ground and nearby regions.


Extensions to this Project:

  • We can analyse bigger crowd settings like that of a big city like Hyderabad by deploying more Raspberry Pis at optimal locations throughout the city.
  • We can extend this project by analyzing the unencrypted packets being shared over WiFi. There are architectures such as CreepyDol, one might get access to sensitive personal data, which can be used to monitor people's actions remotely. 

References:

  1. ARIMA - Autoregressive integrated moving average.
  2. Facebook, Prophet - Forecasting procedure implemented in R and Python.
  3. https://linuxnet.ca/ieee/oui/,Make of the device from its MAC ID.
  4. Chuan-Chin Pu,Indoor Location Tracking using Received Signal Strength Indicator.
  5. https://www.raspberrypi.org,Raspberry Pi Setup.

Comments

Popular posts from this blog

Traffic Violations in Metropolitan Cities

Introduction With the advent of the smartphone era and the availability of 4G internet across the country, police forces have begun to use electronic receipts of the traditional traffic challans. E-Challans are electronically generated penalty receipt that takes the place of the physical paper receipts and helps in digitizing the whole process of collecting challans and penalizing violations. In this project, we analyze the set of all unpaid E-Challans collected in metropolitan cities over a large span of time to gain unique insights about the nature of traffic violations in such cities. The problem is very relevant for a course on Big Data & Policing as it tries to answer the following important questions: How are traffic violations distributed spatially and temporally across the city boundaries? Can the most common violation types be characterized and be used for providing intervention insights? How can police leverage social media for increasing awareness and for targe...

Real-Time and Predictive Traffic Data Analysis

Introduction Traffic prediction is crucial to many applications including traffic network planning, route guidance, and congestion avoidance. We have tried to minimize the time required for a vehicle to go from point A to point B, and maximize the efficiency of the flow of traffic, to help the traffic police in managing traffic. Several essential factors affect traffic prediction: Geographical factors such as topology, etc. Social factors such as holidays, concert, weekends, etc. Limited Dataset, i.e., either small or not a publicly available dataset. The primary aim of the project is to use historical and live traffic data to control the traffic lights for efficient traffic flow. Why is the problem statement important? The number of vehicles on the road in India have increased 2-fold in every 8 years since the year 2000. Apart from not having adequately constructed roads, there is no proper system for helping traffic police officers in controlling the flow of traffic...

Detecting Vulnerable regions in metropolitan cities

Introduction The problem is to handle the growing violence rate by estimating the probability of the upcoming violence, especially in metropolitan cities. Why is the problem important? This is important since if by doing so, we could somehow able to stop even 10-15% of upcoming threat then it can have a vast effect. Who will benefit : Police can analyze data in real time and may increase patrolling if required. Based on available data, police can effectively maintain law and order in  vulnerable areas. Our strategy For this we chose the social media platform twitter 1) First of all we collected tweets with geo tagged locations for the last 7 days for 4 citites hyderabad, mumbai, kolkata and delhi 2) But only 2% of total tweets have geo tagged locations. So what we have done is that, we made a dictionary of areas of these cities from maps of india and find   the location if it is mentioned in the tweet like My bag is stolen from CP D...