Skip to main content

Traffic Violations in Metropolitan Cities

Introduction

With the advent of the smartphone era and the availability of 4G internet across the country, police forces have begun to use electronic receipts of the traditional traffic challans. E-Challans are electronically generated penalty receipt that takes the place of the physical paper receipts and helps in digitizing the whole process of collecting challans and penalizing violations.
In this project, we analyze the set of all unpaid E-Challans collected in metropolitan cities over a large span of time to gain unique insights about the nature of traffic violations in such cities. The problem is very relevant for a course on Big Data & Policing as it tries to answer the following important questions:
  1. How are traffic violations distributed spatially and temporally across the city boundaries?
  2. Can the most common violation types be characterized and be used for providing intervention insights?
  3. How can police leverage social media for increasing awareness and for targetted intervention measures?
The problem of characterizing traffic violations has serious implications on the general society as there are more than 1.3 million road accidents in India in the last decade. 

Data collection:

We collect about four years worth of data for the city of Ahmedabad. The city of Ahmedabad was chosen in particular because of the easy availability of data and lack of Captcha in their portal. The Ahmedabad police provide all the E-Challan data on their website https://payahmedabadechallan.org/ and we can obtain the challan corresponding to a particular vehicle no by entering its license plate number. We use selenium headless browser to make repeated requests with all possible vehicle plate numbers and collect the data from the results returned. The data has several fields and one of them is the violation type which includes violations like driving without helmets, wrong parking, no seatbelt etc. In total, the data that we collect has the following fields:
  • Vehicle Number
  • Data of Issuing Challan
  • Place of Issue
  • Violation Type 
  • Fine Amount
E- Challan

Dataset Statistics:

The data was collected over a period of 15 days and the brief dataset statistics are given below:
Descriptive Statistics

Analysis:

We perform our analysis over 3 main domains: 1) Temporal Analysis, 2) Spatial Analysis & 3) Violation Distribution. Thus, the data can be used to answer questions like how to deploy forces at a particular area of the city at a given time (Weekend or weekdays) and for a particular type of violation (e.g. Driving without the helmet). 
In order to perform temporal analysis, we first geocode the locations provided and generate a heatmap of the traffic violations for the city of Ahmedabad. The heatmap given below can be used to infer the hotspots of traffic violations in the city. We can see from the below heatmap that most of the violations are clustered in certain regions given by the bright red regions. Such heatmaps can be made for each type of violation.

Heatmap of traffic violations
We similarly perform temporal analysis to know how traffic violations vary over time. The below plot shows the variation of the number of challans for each day over a period of 4 years from 2015 to 2019. The plot reveals certain interesting trends regarding the variation of traffic violations during the festival dates. There are peaks on the day immediately succeeding the festivals like Diwali which indicates that most of the people do not travel during these holidays or go for outstation trips. 
Time series plot of E-Challan
The third aspect of our analysis deals with the distribution of violations and their contribution to the total money owed. We can infer from the pie chart below that the majority of challans consist of violations like No helmet and Red light violations. Thus, reducing these two violations itself would make the roads much safer. 
Violation Distribution
The above two violations also contribute the greatest to the total amount of money owed to the government.  The plot below tells us the top 5 violations based on the amount owed to the government and the relative scale of the amount owed. 

Characterizing user behaviour:

In order to undertake targeted intervention behaviour, it is necessary to characterize user behaviour and find repeated offenders. In our dataset, there were a large number of repeat offenders and one of them even had 67 E-Challans to their name for the same type of violations. Catching hold of such offenders is the easiest way to tackle the menace of traffic violations. To characterize the repeat offence behaviour, we plot the distribution of the number of vehicles vs the number of challans. From the below plot, it can be reasoned that there is a significant number of users that commit the offences repeatedly (> 25%).
No. of Challans vs Vehicles
To further characterize user behaviour, we collect data again after 3 months to acquire statistics about the average repayment time and corresponding distribution. The data collected revealed some shocking stats such as:
  1. Only 3.5% of the people paid some of their challans in 3 months
  2. The average time of Challan payment was a whopping 339 days (Approx 11 months)
  3. For 13.25% of the people, the number of Challans increased in the 3 months, thus, indicating repeat offence behaviour
  4. Of the people who paid their challans, 10% of the people chose to not pay all of their challans at a given time
These statistics reveal the lack of knowledge of the people about E-Challans in general or the lack of incentive to pay the challan amount as soon as possible. Thus, the effect of E-Challans is diminished significantly. 

Suggesting for the police:

Based on the above data analysis of E-Challan data over a period of 4 years, we would like to make the following suggestions to the Ahmedabad police and the state government
  1. Deploy traffic police in areas that are major hotspots as given in the above spatial heatmap
  2. The traffic police need to be more attentive on the days preceding and succeeding the festivals
  3. There is a general lack of awareness amongst people regarding the system of E-Challans and this needs to be solved using awareness campaigns and social media
  4. There is no incentive for the people to pay their challans early and on time, thus, disincentives like interest on challan amount can be used
There are a lot more plots and other interesting analysis that has been done using this data. For more information or analysis, mail us at shashank[dot]s[at]research[dot]iiit[dot]ac[dot]in

Comments

Popular posts from this blog

Real-Time and Predictive Traffic Data Analysis

Introduction Traffic prediction is crucial to many applications including traffic network planning, route guidance, and congestion avoidance. We have tried to minimize the time required for a vehicle to go from point A to point B, and maximize the efficiency of the flow of traffic, to help the traffic police in managing traffic. Several essential factors affect traffic prediction: Geographical factors such as topology, etc. Social factors such as holidays, concert, weekends, etc. Limited Dataset, i.e., either small or not a publicly available dataset. The primary aim of the project is to use historical and live traffic data to control the traffic lights for efficient traffic flow. Why is the problem statement important? The number of vehicles on the road in India have increased 2-fold in every 8 years since the year 2000. Apart from not having adequately constructed roads, there is no proper system for helping traffic police officers in controlling the flow of traffic...

Detecting Vulnerable regions in metropolitan cities

Introduction The problem is to handle the growing violence rate by estimating the probability of the upcoming violence, especially in metropolitan cities. Why is the problem important? This is important since if by doing so, we could somehow able to stop even 10-15% of upcoming threat then it can have a vast effect. Who will benefit : Police can analyze data in real time and may increase patrolling if required. Based on available data, police can effectively maintain law and order in  vulnerable areas. Our strategy For this we chose the social media platform twitter 1) First of all we collected tweets with geo tagged locations for the last 7 days for 4 citites hyderabad, mumbai, kolkata and delhi 2) But only 2% of total tweets have geo tagged locations. So what we have done is that, we made a dictionary of areas of these cities from maps of india and find   the location if it is mentioned in the tweet like My bag is stolen from CP D...