DSpace at KOASAS: Distributed online learning for topic models

DSpace at KOASAS

College of Engineering(공과대학)School of Computing(전산학부)CS-Theses_Master(석사논문)

Distributed online learning for topic models토픽 모델의 분산 온라인 기계 학습 알고리즘

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 645
Download : 0

Export

DC Field	Value	Language
dc.contributor.advisor	Oh, Hae-Yun	-
dc.contributor.advisor	오혜연	-
dc.contributor.author	Bak, Jin-Yeong	-
dc.contributor.author	박진영	-
dc.date.accessioned	2013-09-12T01:48:56Z	-
dc.date.available	2013-09-12T01:48:56Z	-
dc.date.issued	2013	-
dc.identifier.uri	http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=515124&flag=dissertation	-
dc.identifier.uri	http://hdl.handle.net/10203/180445	-
dc.description	학위논문(석사) - 한국과학기술원 : 전산학과, 2013.2, [ vi, 46 p. ]	-
dc.description.abstract	A major obstacle in using a probabilistic topic model, such as Latent Dirichlet Allocation (LDA) or Hierarchical Dirichlet Processes (HDP) is the amount of time it takes for posterior inference, especially for Web data which are huge and continuously expanding. Recent developments in distributed inference algorithms and minibatch-based online learning algorithms have offered partial solutions for this problem. In this paper, I propose a distributed online learning algorithm for LDA and HDP for dealing with both aspects of this problem at once. I apply our learning algorithm to three datasets: a corpus of 973K Twitter conversations and 4.8M Wikipedia articles used for a quantitative evaluation of our algorithm, and a larger corpus of 5.1M Twitter conversations for a case study. I compare our algorithm with the distributed version of variational inference using MapReduce and online learning using stochastic variational inference. I show that our learning algorithm achieves the same model fit and topic quality as the other inference algorithms but within a much shorter learning time. I conduct a case study using our distributed online learning framework to visualize how the topic proportions change over time in a stream of Web documents. Through this case study, I discover interesting temporal dynamics of topics in Twitter conversations.	eng
dc.language	eng	-
dc.publisher	한국과학기술원	-
dc.subject	Hierarchical Dirichlet Processes	-
dc.subject	Latent Dirichlet Allocation	-
dc.subject	Distributed inference	-
dc.subject	Online Learning	-
dc.subject	Topic modeling	-
dc.subject	Variational inference	-
dc.subject	토픽 모델	-
dc.subject	온라인 학습	-
dc.subject	분산 추론	-
dc.subject	Latent Dirichlet Allocation	-
dc.subject	Hierarchical Dirichlet Processes	-
dc.subject	Variational inference	-
dc.subject	맵리듀스	-
dc.subject	MapReduce	-
dc.title	Distributed online learning for topic models	-
dc.title.alternative	토픽 모델의 분산 온라인 기계 학습 알고리즘	-
dc.type	Thesis(Master)	-
dc.identifier.CNRN	515124/325007	-
dc.description.department	한국과학기술원 : 전산학과,	-
dc.identifier.uid	020113246	-
dc.contributor.localauthor	Oh, Hae-Yun	-
dc.contributor.localauthor	오혜연	-

Appears in Collection: CS-Theses_Master(석사논문)

Files in This Item: There are no files associated with this item.

Display Simple Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Distributed online learning for topic models토픽 모델의 분산 온라인 기계 학습 알고리즘

KOASAS

Communities & Collections