Voice Activity Detection by Maximizing Mutual Information of an Audio-Visual Scene

Priyadarshini, Priyanka (2016) Voice Activity Detection by Maximizing Mutual Information of an Audio-Visual Scene. MTech thesis.

PDF (Full text is restricted upto 26.04.2020)
Restricted to Repository staff only
3119Kb

Abstract

Human can extract speech signals that they need to understand from a mixture of background noise, interfering sound sources, and reverberation for effective communication. Voice activity detection is one of the key signal processing that human being perform by processing sound signals received by ears. However, with the help of visual cues by locating and observing the lip movement voice activity of a speaker can be detected. Similarly, only with the help of audio information voice activity of a speaker can be detected. Therefore intuition says that if audio and video information are used together then speaker voice activity detection is possible better than the individual. Furthermore, in adverse situations when neither audio nor video is prominent, an effective voice activity detection may be possible. In order to do such electronically an automatic speaker recognition systems include a voice activity detector (VAD) followed by feature extraction can be used. However, the design and implementation of VAD in practice are challenging yet, particularly when multiple simultaneous speakers exist in the same audiovisual scene. Various existing methods for audiovisual fusion does exist to detect a speaker from an audio-visual scene.
In this thesis we are using MFCC of audio sequence and optical flow of video sequence followed by face detection algorithm, as feature. An information theoretic measure of cross-modal correspondence between audio and video feature is fused through a nonparametric statistical density modelling technique by characterizing and maximizing the mutual information for VAD.

Item Type:	Thesis (MTech)
Uncontrolled Keywords:	Voice activity detection; Feature extraction; Mutual information
Subjects:	Engineering and Technology > Electronics and Communication Engineering > Image Processing Engineering and Technology > Electronics and Communication Engineering > Signal Processing
Divisions:	Engineering and Technology > Department of Electronics and Communication Engineering
ID Code:	9316
Deposited By:	Mr. Sanat Kumar Behera
Deposited On:	27 Apr 2018 20:36
Last Modified:	27 Apr 2018 20:36
Supervisor(s):	Roy , Lakshi Prosad

Repository Staff Only: item control page