Barla, Abhilasha (2017) Analysis of Audio and Video in an AudioVisual Scene for Feature Extraction. MTech thesis.
Restricted to Repository staff only
Audio and visual signals arriving from a common source are detected using a signal-level fusion technique. Human can extract speech signals that they need to understand from a mixture of background noise, interfering sound sources, and reverberation for effective communication.Using only audio information one can identify the speaker, but for efficient detection of speaker, visual information is also considered. However, with the help of visual cues by locating and observing the lip movement voice activity of a speaker can be detected. Similarly, only with the help of audio information voice activity of a speaker can be detected. Therefore intuition says that if audio and video information are used together then speaker voice activity detection is possible better than the individual. We wish to solve a conversational audiovisual correspondence problem: given sets of audio visual signals,decide which audiovisual pairs are consistent and could have come from a single speaker.
In this thesis for audio coming from an Audio-Visual Scene, audio features are extracted by using Mel Frequency Cepstral Coefficients(MFCC). And for that audio source, sound source localization is done by using Generalized Cross-Correlation using Phase Transform(GCC-PHAT) and for video feature extraction, optical flow of video sequence followed by face detection algorithm is performed.
|Optical Flow Algorithm; Sound Source Localization; Voice Activity Detection; GCC-PHAT; MFCC; Speech
|Engineering and Technology > Electronics and Communication Engineering > Signal Processing
Engineering and Technology > Electronics and Communication Engineering > Image Processing
|Engineering and Technology > Department of Electronics and Communication Engineering
|Mr. Kshirod Das
|15 Mar 2018 16:42
|15 Mar 2018 16:42
|Roy, Lakshi Prosad
Repository Staff Only: item control page