Real-Time and Predictive Traffic Data Analysis

Introduction

Traffic prediction is crucial to many applications including traffic network planning, route guidance, and congestion avoidance. We have tried to minimize the time required for a vehicle to go from point A to point B, and maximize the efficiency of the flow of traffic, to help the traffic police in managing traffic. Several essential factors affect traffic prediction:

Geographical factors such as topology, etc.
Social factors such as holidays, concert, weekends, etc.
Limited Dataset, i.e., either small or not a publicly available dataset.

The primary aim of the project is to use historical and live traffic data to control the traffic lights for efficient traffic flow.

Why is the problem statement important?

The number of vehicles on the road in India have increased 2-fold in every 8 years since the year 2000. Apart from not having adequately constructed roads, there is no proper system for helping traffic police officers in controlling the flow of traffic. Usually, a traffic police officer has no idea which lane has how much traffic. Also, emergency vehicles get stuck in traffic, which may delay their working.

Data Collection

The number of lanes a road has, whether it’s a single or a two-way street, it’s speed limit and available width determines how much traffic can flow through it efficiently before congestion occurs. Several other parameters which identified a certain road segment were its start and end GPS coordinates and it’s length. The traffic followed certain patterns, usually being maximum an hour or two before lunch in the morning, and in the early evening which represented the daily commute of the majority working class population. Traffic was also high during some festivals or events. To capture such dependencies, we collected data in the form of the average speed of a vehicle over each road segment in 15-minute intervals.

Whenever an unusually high number of online map queries for a particular destination preceded an event supposed to happen nearby, we could predict possible congestion. Thus, there was a negative correlation between the people searching for a certain destination and traffic speed in the surrounding areas.

We used a large-scale traffic prediction dataset- Q traffic dataset, which consists of three sub-datasets: query sub-dataset, traffic speed sub-dataset and road network sub-dataset. We only used the traffic and road network sub-datasets. The attributes of the road segments in the road network sub-dataset are shown in the figure below. There are three kinds of auxiliary domains in the Q-Traffic dataset:

Geographical and social attributes include peak holidays, peak-hour, speed;
The road intersection information such as local road network and junctions;
Online crowd queries which record map search queries from users;

Average Speed of Road Segment with Time

Historical Data for traffic at -24h, -48h, etc.

Topology of Road Network

Method

Let m be the number of roads, t be the number of consecutive days (i.e., the respective periods) in which the detector data are collected, τ be the number of time intervals (e.g. of 15 min length) wherein each day is partitioned.

X is a matrix defined by p = t×τ rows and m columns. Thus, each column i of matrix X represents the flow of the i-th road segment, and each row j denotes the average speed at that particular point in time(j).

After PCA, we can find r non-negligible singular values, X thus effectively resides on an r-dimensional subspace of Rp.

Now SVR is used to make the prediction, based on the attributes obtained from PCA above. Dependencies(nearby road segments) of a road segment are all weighted with their length before making the final feature vector and feeding it to SVR.

Result

One-day traffic speed sequence is used as inputs to predict future 2-hour traffic speed. The mean absolute percentage error (MAPE) is used to evaluate the performance for comparisons, which is defined as:

Here, vt and v ̃t are the actual and predicted traffic speeds at time t. Then, depending on the expected traffic and the live traffic data, we control the traffic lights to ensure efficient traffic flow. The results for SVR are given in the figure below. We got an overall MAPE score of 9.73.

MAPE score with Time for SVR

Conclusion

We integrate spatial data, historical traffic data with real-time traffic data to predict the flow of traffic on each row segment and to identify areas where congestion might occur. By determining future traffic flow, we can decrease the response time of emergency vehicles, and ensure that they are not caught in a traffic jam.

Future Possible Work

Convolutional Neural Networks can be applied to include the spatial dependencies within road intersections. Online Map Queries is inversely correlated with the traffic flow of the nearby area. The number of queries with a particular destination indicates a possible future event which will lead to traffic congestion in the neighborhood. We could have created a hybrid model using all three auxiliary domains including traffic flow, spatial relation, and query counts. Also, we would like to create a unique feature for emergency vehicles to allow free passage.

IIIT-H | Big Data and Policing - Spring 2019 | Projects

Search This Blog