Personalized speech emotion recognition using multi-staged data selection다계층 자료 선택 기법을 이용한 화자 적응 기반의 음성 감정 인식

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 342
  • Download : 0
Nowadays, devices are regarded as partners rather than simple machines as users are able to personalize the devices. This tendency is being consolidated since mobile devices such as smart phones and tablet personal computers provide more advanced features which can understand a user`s intention and emotional states by analyzing voice and facial expressions. Understanding the emotional states plays such an important role in Human-Computer Interaction (HCI) since it enables a user to feel more comfortable and friendly interaction and appropriate responses from the devices depending on the emotional states of a user. The emotional information can be obtained from speech, facial expressions, gestures, biological features and so forth. Among these indicators, speech is a relatively natural and intuitive interface for interaction with devices. For these reasons, Speech Emotion Recognition (SER) can be an effective technology required for HCI along with speech recognition. Many researchers have introduced various approaches for SER tasks, but unfortunately, they have failed to achieve satisfactory performance due to two critical factors. First, different speakers rarely express emotional states in the same way. Second, several pairs of emotions, such as sadness and boredom, have acoustically similar characteristics, and this ambiguity causes unreliable recognition results. This dissertation aims at increasing the SER performance by resolving the domain-oriented characteristics. To deal with the large inter-speaker variations, speaker adaptation techniques is applied to SER. In this approach, Speaker Independent (SI) models are adapted to a relatively small amount of data collected from a specific speaker, and then the adapted models represent the acoustic characteristics of a target speaker. This dissertation focuses on unsupervised adaptation which does not require pre-define emotion labels since manual labeling is unpractical and somehow unrel iable. The proposed...
Advisors
Oh, Yung-Hwanresearcher오영환researcher
Description
한국과학기술원 : 전산학과,
Publisher
한국과학기술원
Issue Date
2011
Identifier
467952/325007  / 020093115
Language
eng
Description

학위논문(석사) - 한국과학기술원 : 전산학과, 2011.2, [ vi, 43 p. ]

Keywords

speech emotion recognition; speaker adaptation; MLLR; likelihood; 음성 감정 인식; 화자 적응; 최대 우도 선형 변환; 우도; 배경 모델; UBM

URI
http://hdl.handle.net/10203/180589
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=467952&flag=dissertation
Appears in Collection
CS-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0