Emotion recognition and text-to-speech synthesis

Roshan, Akash (2017) Emotion recognition and text-to-speech synthesis. MTech thesis.

[img]PDF (Fulltext is restricted upto 22.01.2020)
Restricted to Repository staff only

2584Kb

Abstract

Emotion recognition is generally done by analyzing one of the three things voice, face or body language. Our main objective in this thesis would be to find the emotional state of a person, entirely from his speech. So we develop a system which would first record a person’s voice and analyze it to determine the person’s emotion. There would be no other input to the system.
In speech emotion recognition, extraction of speech features plays an important role. The basic speech features are pitch, rate of speech and energy, and these value can be extracted directly from acoustic waveform or speech. Feature extraction from the original speech is widely used to get the real physical appearance of a human. Pitches of speech depend on the number of vibrations produced by the vocal cords, and it is correlated to intonation and tone. Pitch represents the highness or lowness or highness of a tone as listened by a human. Pitch of some one voice is the best way to get the information about expressed emotion. The energy of speech also play an important role in the emotion recognition. Higher the energy of speech refers to anger and fear appearance, and lower the energy values explores the situation of sadness. Rate of speech also play an important role to provide state of emotion level. So after we obtained our speech signal, we seek to extract features from it which we will use in our classification. Deciding on which features to use and what would be our classification system would be the most important part.
There is many acoustic features which may be used to recognize emotion of human, but still there is no any proper way to get the best emotion recognition. Today’s research specially focus on this matter to concludes the best feature by which someone is able to get the real appearance.
This thesis presents a design and implementation of a text to speech synthesis and emotion recognition for English language using STM Discovery board (STM32F407VG) & MATLAB. The recording of voice is done in C using STM Discovery kit and A database of acoustic library is constructed, whereas implementation of text to speech synthesis is done using MATLAB R2015a.

Item Type:Thesis (MTech)
Uncontrolled Keywords:speech synthesis; emotion recognition; STM32f4 discovery kit; Matlab
Subjects:Engineering and Technology > Electronics and Communication Engineering > VLSI
Engineering and Technology > Electronics and Communication Engineering > Signal Processing
Divisions: Engineering and Technology > Department of Electronics and Communication Engineering
ID Code:8887
Deposited By:Mr. Kshirod Das
Deposited On:02 Apr 2018 15:46
Last Modified:02 Apr 2018 15:46
Supervisor(s):Mahapatra, Kamalakanta

Repository Staff Only: item control page