Distributed online learning for topic models토픽 모델의 분산 온라인 기계 학습 알고리즘

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 624
  • Download : 0
A major obstacle in using a probabilistic topic model, such as Latent Dirichlet Allocation (LDA) or Hierarchical Dirichlet Processes (HDP) is the amount of time it takes for posterior inference, especially for Web data which are huge and continuously expanding. Recent developments in distributed inference algorithms and minibatch-based online learning algorithms have offered partial solutions for this problem. In this paper, I propose a distributed online learning algorithm for LDA and HDP for dealing with both aspects of this problem at once. I apply our learning algorithm to three datasets: a corpus of 973K Twitter conversations and 4.8M Wikipedia articles used for a quantitative evaluation of our algorithm, and a larger corpus of 5.1M Twitter conversations for a case study. I compare our algorithm with the distributed version of variational inference using MapReduce and online learning using stochastic variational inference. I show that our learning algorithm achieves the same model fit and topic quality as the other inference algorithms but within a much shorter learning time. I conduct a case study using our distributed online learning framework to visualize how the topic proportions change over time in a stream of Web documents. Through this case study, I discover interesting temporal dynamics of topics in Twitter conversations.
Advisors
Oh, Hae-Yunresearcher오혜연
Description
한국과학기술원 : 전산학과,
Publisher
한국과학기술원
Issue Date
2013
Identifier
515124/325007  / 020113246
Language
eng
Description

학위논문(석사) - 한국과학기술원 : 전산학과, 2013.2, [ vi, 46 p. ]

Keywords

Hierarchical Dirichlet Processes; Latent Dirichlet Allocation; Distributed inference; Online Learning; Topic modeling; Variational inference; 토픽 모델; 온라인 학습; 분산 추론; Latent Dirichlet Allocation; Hierarchical Dirichlet Processes; Variational inference; 맵리듀스; MapReduce

URI
http://hdl.handle.net/10203/180445
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=515124&flag=dissertation
Appears in Collection
CS-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0