DSpace at KOASAS: Online speaker segmentation and clustering of spoken documents

DSpace at KOASAS

College of Engineering(공과대학)School of Computing(전산학부)CS-Theses_Ph.D.(박사논문)

Online speaker segmentation and clustering of spoken documents음성 문서의 온라인 화자분할 및 군집화

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 577
Download : 0

Export

Park, Kyung-Mi / 박경미

As a variety of multimedia data such as broadcast news, entertainments, and education materials, are produced every day and spread over the internet, content retrieval technologies have become essential to search and manage such a large amount of data. In relation to this, people are beginning to take interest in spoken document retrieval, as research on speech and speaker recognition has led to major technical breakthroughs with smart devices. Spoken documents contain speech from various speakers and thus speaker diarization or speaker indexing is important for retrieval. Speaker diarization determines how many speakers are included in a given spoken document and partitions the document into homogeneous segments according to each speaker`s identity. This task replies to the question "Who spoke when?", whereas speaker recognition addresses the question "Who spoke?". Speaker diarization consists of three processes, speech detection, speaker segmentation, and clustering segments. This dissertation proposes online speaker segmentation and clustering technique of spoken documents for speaker diarization system. Speaker segmentation is to find the change point of the speakers so that each segment contains only one speaker`s speech. It has various applications such as a preprocessing task for audio indexing, speaker tracking, information extraction, and so on. The most popular criterion used in unsupervised speaker segmentation is the Bayesian Information Criterion (BIC). Conventional BIC-based speaker segmentation firstly constructs two single Gaussian models for two divided speech streams respectively, in an analysis window, a regular size of speech data shifted over the audio stream. And then, the dissimilarity between the two independent models is estimated according to the BIC principle. This approach has been successfully applied to speaker segmentation. However, it tends to fail to detect speaker changes for short speech segments since it is hard to represent...

Advisors: Oh, Yung-Hwan researcher; 오영환 researcher

Description: 한국과학기술원 : 전산학과,

Publisher: 한국과학기술원

Issue Date: 2011

Identifier: 466471/325007 / 020037223

Language: eng

Description: 학위논문(박사) - 한국과학기술원 : 전산학과, 2011.2, [ ix, 67 p. ]

Keywords: local UBM; relative-GLR; intra-GLR; 온라인 화자별 색인; 화자분할; 화자 군집화; 지역적 UBM; 상대적 GLR; online speaker segmentation; online speaker clustering

URI: http://hdl.handle.net/10203/33331

Link: http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=466471&flag=dissertation

Appears in Collection: CS-Theses_Ph.D.(박사논문)

Files in This Item: There are no files associated with this item.

Display Full Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Online speaker segmentation and clustering of spoken documents음성 문서의 온라인 화자분할 및 군집화

KOASAS

Communities & Collections