Deep Learning Methods for Vehicle Detection in Mixed and Undisciplined Traffic Environments

Deshmukh, Prashant (2024) Deep Learning Methods for Vehicle Detection in Mixed and Undisciplined Traffic Environments. PhD thesis.

[img]PDF (Restricted up to 04/08/2027)
Restricted to Repository staff only

54Mb

Abstract

Nowadays, an intelligent traffic management system (ITMS) is essential for mixed and undisciplined traffic scenarios. It uses intelligent vehicle detection (IVD) approaches that comprise advanced sensors and computer vision algorithms for detecting and tracking vehicles. These are also used to improve traffic flow, reduce congestion, enhance safety, and support autonomous driving. Implementing IVD in disciplined traffic conditions is not complicated but becomes challenging in mixed and undisciplined traffic conditions. The major issue is due to multi-scale vehicles traveling close to each other on the roads and not following lane discipline. In recent years, convolutional neural network (CNN)-based deep learning (DL) methods have attained incredible progress in implementing IVD for disciplined traffic. However, most CNN-based DL methods do not consider mixed and undisciplined traffic environments. Also, these methods have difficulties in the extraction of the multi-scale features due to existing CNN backbones. In the case of multi-scale feature extraction, the main challenge is to accurately identify and extract the relevant features from an image or data across multiple scales. These features are necessary to capture different levels of detail in an image or to identify vehicles of different sizes. This research work focuses on designing DL methods to implement IVD in mixed and undisciplined traffic environments, which helps to overcome the issues of multi-scale feature extraction. Different approaches have been adopted to implement IVD, which are summarized below: • Initially, a vast, diverse traffic labeled dataset (DTLD) is collected and labeled for mixed and undisciplined vehicles. Also, an advanced visual computing deep learning (AVCDL) method is designed to implement IVD under diverse traffic conditions. AVCDL method ensembles features of two CNN architectures and combines them on a single channel via a feature concatenation to overcome the multi-scale feature extraction problem. It also uses an improved multi-stage vehicle detection head (MSVDH) that classifies the target vehicles into respective categories. In this scenario, the detection accuracy needs to be improved because AVCDL uses convolution operations that are locally constrained to a small area of an image. In the following contribution, a transformer-based self attention mechanism is used, which is globally constrained to the whole image. • A swin transformer-based vehicle detection (STVD) framework in an undisciplined traffic environment is designed. Swin transformer (ST) is a backbone that exchanges information within and between image patches and provides hierarchical feature maps. It uses the shifted window mechanism with self-attention blocks that are globally constrained to the whole image. Additionally, a bi-directional feature pyramid network (BIFPN) is connected to the output stages of the ST backbone for combining low-resolution features with high-resolution features bi-directionally, which provides more robust multi-scale features with different scales and resolutions. STVD effectively alleviates the multi-scale feature extraction problem. However, it runs slowly compared to the AVCDL method due to exponential parameter generation. The following contribution considers both speed and accuracy performance metrics and provides a robust vehicle detection method. • To consider speed and accuracy, a multi-class vehicle detection (MCVD) model is designed to detect vehicles in heterogeneous traffic using a realistic traffic dataset. MCVD is designed with a CNN backbone named VDnet, a light fusion bi-directional feature pyramid network (LFBFPN) and a modified vehicle detection head (MVDH). All the components of MCVD are designed using a depth-wise separable convolution (DWSC) to reduce the parameters in the detection model. VDnet extracts multi-scale features from the traffic input images using feature reuse techniques to enhance the feature extraction at multiple scales. LFBFPN combines these features bi-directionally and provides robust feature maps. Finally, MVDH is applied to detect multi-class vehicles and classify them into respective categories. The above methods are analyzed, experimented and measured over realistic traffic scenarios on the Quadro P6000 GPU system. These are also compared with the existing state-of-the-art IVD methods. The simulation results state that AVCDL achieves 86.17% accuracy with 27 frames per second (FPS), STVD achieves 91.32% accuracy with 17 FPS, and MCVD achieves 91.45% accuracy with 35 FPS. Also, the real-time performance of AVCDL and MCVD methods is validated using NVIDIA Jetson Tx2 and Nano boards. In addition, these methods are tested over other objects by including standard object detection atasets. Ultimately, the outcomes of IVD methods are used to estimate traffic parameters for implementing an efficient traffic management system.

Item Type:Thesis (PhD)
Uncontrolled Keywords:Convolutional neural network; Deep learning; IVD; Mixed undisciplined traffic; Traffic parameter estimations; Transformer.
Subjects:Engineering and Technology > Electronics and Communication Engineering > Intelligent Instrumentaion
Engineering and Technology > Electronics and Communication Engineering > Signal Processing
Engineering and Technology > Electronics and Communication Engineering > Image Processing
Divisions: Engineering and Technology > Department of Electronics and Communication Engineering
ID Code:10652
Deposited By:IR Staff BPCL
Deposited On:21 Aug 2025 11:07
Last Modified:21 Aug 2025 11:07
Supervisor(s):Das, Santos Kumar and Sahoo, Upendra Kumar

Repository Staff Only: item control page