Sahoo, Jaya Prakash (2023) Subject Independent Vision based Hand Gesture Recognition using Convolutional Neural Networks. PhD thesis.
PDF (Restricted upto 27/04/2026) Restricted to Repository staff only 11Mb |
Abstract
Hand gestures are one of the most important ways for humans to communicate and express their expectation. Automatic recognition of hand gestures using the computer vision technique is a popular area of research. Due to its user-friendliness and flexibility, hand gesture recognition (HGR) systems are widely utilized for human-computer interface (HCI) and human-robot interaction (HRI) technologies. HGR has shown incredible potential in many fields, including sign language interpretation, virtual reality, robot control, gaming, etc. Performance of HGR system differs substantially between subjects. In most of the literature, the HGR systems are generally implemented in a subject dependent mode. Such systems cannot work accurately in real-time as the system is user dependent. Subject independent HGR systems are suitable for real-time applications because no further training is required for new subjects to recognize hand gestures. Furthermore, the effects of illumination variations, complex backgrounds, shape of the user’s hand, etc., remain common challenges in this research area. These challenges motivate for the development of improved subject independent HGR techniques, with more informative and better feature extraction, and classification algorithms. In this regard, a deep convolutional neural network based features extraction technique is proposed for automatic recognition of hand gestures. Here, deep features from different deep CNNs are fused to represent the hand gesture image more efficiently. The derived features extract high-level information like abstract information of hand gesture image from the receptive fields of the deep CNN’s last convolutional layer. After feature fusion, principal component analysis (PCA) based dimension reduction technique is applied to eliminate the redundant, irrelevant information present in the feature vector. The reduced feature are used to recognize the hand gestures with a support vector machine (SVM) classifier. Next, a deep residual block intensity (RBI) feature extraction technique with the support of a two-stage residual CNN architecture is proposed for HGR. In this technique, a compact CNN architecture with residual learning is developed, which is represented as 2RCNN. The 2RCNN architecture effectively captures the hand gesture attributes from the raw input image, and the network’s compactness is achieved with tuning the network using optimum number of filters. Thus the proposed 2RCNN reduces the number of trainable parameters of the network. The proposed RBI features are obtained from the residual blocks of the 2RCNN architecture. The RBI features capture both the low-level information such as lines, edges, blobs of hand gesture image and high-level information like abstract information of hand gesture images. The proposed technique overcomes the requirement of a separate feature reduction block due to the compact 2RCNN architecture with optimum number of filters. The above techniques are unable to distinguish the inter-class similar gesture poses of the hand gesture datasets. A compact dual-stream dense residual fusion network (DeReFNet) is proposed to solve the above issue. The DeReFNet consists of a global feature aggregation (GFA) residual stream, a spatial feature (SF) dense stream, and a feature concatenation module (FCM). The GFA residual stream is designed to extract low, mid, and high-level features from hand gesture images through the global average pooling technique. SF dense stream combines the spatial information of gesture images through feature reuse, which strengthens the network by extracting the refined local-to-global texture features of hand gesture images. Both the information of two individual streams are combined through FCM, which strengthens the proposed CNN to provide better performance of for HGR. Then, a compact deep residual network with an attention mechanism is proposed to recognize hand gestures accurately. The proposed channel attention mechanism improves the representative information of the feature maps by integrating the receptive fields of each convolutional branch over multiple scales. In addition, a cascaded approach for the combination of residual blocks with the multi-scale channel attention module is proposed to develop a deeper network that learns low-level to high-level information of input hand gesture images. Finally, a user interface system is developed based on the proposed HGR system to control a mobile robot in real time. The proposed techniques are validated on three publicly available static hand gesture datasets, such as MUGD, NUS-II, ASL-FS, and an indigenously developed dataset in the laboratory environment. Furthermore, the qualitative and quantitative analysis of the experimental results evaluated on four different datasets illustrates that the proposed techniques outperform the state-of-the-art methods reported in the literature.
Item Type: | Thesis (PhD) |
---|---|
Uncontrolled Keywords: | Hand gesture recognition ; convolutional neural networks; residual block intensity feature; dual stream fusion network ; attention mechanism; support vector machine |
Subjects: | Engineering and Technology > Electronics and Communication Engineering > Sensor Networks Engineering and Technology > Electronics and Communication Engineering > Intelligent Instrumentaion Engineering and Technology > Electronics and Communication Engineering |
Divisions: | Engineering and Technology > Department of Electronics and Communication Engineering |
ID Code: | 10512 |
Deposited By: | IR Staff BPCL |
Deposited On: | 26 Apr 2024 17:49 |
Last Modified: | 26 Apr 2024 17:49 |
Supervisor(s): | Ari, Samit and Patra, Sarat Kumar |
Repository Staff Only: item control page