Skip to main content

Real-Time and Predictive Traffic Data Analysis

Introduction

Traffic prediction is crucial to many applications including traffic network planning, route guidance, and congestion avoidance. We have tried to minimize the time required for a vehicle to go from point A to point B, and maximize the efficiency of the flow of traffic, to help the traffic police in managing traffic. Several essential factors affect traffic prediction:
  • Geographical factors such as topology, etc.
  • Social factors such as holidays, concert, weekends, etc.
  • Limited Dataset, i.e., either small or not a publicly available dataset.
The primary aim of the project is to use historical and live traffic data to control the traffic lights for efficient traffic flow.

Why is the problem statement important?

The number of vehicles on the road in India have increased 2-fold in every 8 years since the year 2000. Apart from not having adequately constructed roads, there is no proper system for helping traffic police officers in controlling the flow of traffic. Usually, a traffic police officer has no idea which lane has how much traffic. Also, emergency vehicles get stuck in traffic, which may delay their working.

Data Collection


The number of lanes a road has, whether it’s a single or a two-way street, it’s speed limit and available width determines how much traffic can flow through it efficiently before congestion occurs. Several other parameters which identified a certain road segment were its start and end GPS coordinates and it’s length. The traffic followed certain patterns, usually being maximum an hour or two before lunch in the morning, and in the early evening which represented the daily commute of the majority working class population. Traffic was also high during some festivals or events. To capture such dependencies, we collected data in the form of the average speed of a vehicle over each road segment in 15-minute intervals.

Whenever an unusually high number of online map queries for a particular destination preceded an event supposed to happen nearby, we could predict possible congestion. Thus, there was a negative correlation between the people searching for a certain destination and traffic speed in the surrounding areas.

We used a large-scale traffic prediction dataset- Q traffic dataset, which consists of three sub-datasets: query sub-dataset, traffic speed sub-dataset and road network sub-dataset. We only used the traffic and road network sub-datasets. The attributes of the road segments in the road network sub-dataset are shown in the figure below. There are three kinds of auxiliary domains in the Q-Traffic dataset:
  • Geographical and social attributes include peak holidays, peak-hour, speed;
  • The road intersection information such as local road network and junctions;
  • Online crowd queries which record map search queries from users;

Average Speed of Road Segment with Time

Historical Data for traffic at -24h, -48h, etc.

Topology of Road Network

Method

Let m be the number of roads, t be the number of consecutive days (i.e., the respective periods) in which the detector data are collected, τ be the number of time intervals (e.g. of 15 min length) wherein each day is partitioned.
X is a matrix defined by p = t×τ  rows and m columns. Thus, each column i of matrix X represents the flow of the i-th road segment, and each row j denotes the average speed at that particular point in time(j).
After PCA, we can find r non-negligible singular values, X thus effectively resides on an r-dimensional subspace of Rp.
Now SVR is used to make the prediction, based on the attributes obtained from PCA above. Dependencies(nearby road segments) of a road segment are all weighted with their length before making the final feature vector and feeding it to SVR.

Result

One-day traffic speed sequence is used as inputs to predict future 2-hour traffic speed. The mean absolute percentage error (MAPE) is used to evaluate the performance for comparisons, which is defined as:

Here, vt and v ̃are the actual and predicted traffic speeds at time t. Then, depending on the expected traffic and the live traffic data, we control the traffic lights to ensure efficient traffic flow. The results for SVR are given in the figure below. We got an overall MAPE score of 9.73.

MAPE score with Time for SVR


Conclusion

We integrate spatial data, historical traffic data with real-time traffic data to predict the flow of traffic on each row segment and to identify areas where congestion might occur. By determining future traffic flow, we can decrease the response time of emergency vehicles, and ensure that they are not caught in a traffic jam.

Future Possible Work

Convolutional Neural Networks can be applied to include the spatial dependencies within road intersections. Online Map Queries is inversely correlated with the traffic flow of the nearby area. The number of queries with a particular destination indicates a possible future event which will lead to traffic congestion in the neighborhood. We could have created a hybrid model using all three auxiliary domains including traffic flow, spatial relation, and query counts. Also, we would like to create a unique feature for emergency vehicles to allow free passage.

Comments

Popular posts from this blog

Traffic Violations in Metropolitan Cities

Introduction With the advent of the smartphone era and the availability of 4G internet across the country, police forces have begun to use electronic receipts of the traditional traffic challans. E-Challans are electronically generated penalty receipt that takes the place of the physical paper receipts and helps in digitizing the whole process of collecting challans and penalizing violations. In this project, we analyze the set of all unpaid E-Challans collected in metropolitan cities over a large span of time to gain unique insights about the nature of traffic violations in such cities. The problem is very relevant for a course on Big Data & Policing as it tries to answer the following important questions: How are traffic violations distributed spatially and temporally across the city boundaries? Can the most common violation types be characterized and be used for providing intervention insights? How can police leverage social media for increasing awareness and for targe...

Detecting Vulnerable regions in metropolitan cities

Introduction The problem is to handle the growing violence rate by estimating the probability of the upcoming violence, especially in metropolitan cities. Why is the problem important? This is important since if by doing so, we could somehow able to stop even 10-15% of upcoming threat then it can have a vast effect. Who will benefit : Police can analyze data in real time and may increase patrolling if required. Based on available data, police can effectively maintain law and order in  vulnerable areas. Our strategy For this we chose the social media platform twitter 1) First of all we collected tweets with geo tagged locations for the last 7 days for 4 citites hyderabad, mumbai, kolkata and delhi 2) But only 2% of total tweets have geo tagged locations. So what we have done is that, we made a dictionary of areas of these cities from maps of india and find   the location if it is mentioned in the tweet like My bag is stolen from CP D...