DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | Oh, Hae-Yun | - |
dc.contributor.advisor | 오혜연 | - |
dc.contributor.author | Bak, Jin-Yeong | - |
dc.contributor.author | 박진영 | - |
dc.date.accessioned | 2013-09-12T01:48:56Z | - |
dc.date.available | 2013-09-12T01:48:56Z | - |
dc.date.issued | 2013 | - |
dc.identifier.uri | http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=515124&flag=dissertation | - |
dc.identifier.uri | http://hdl.handle.net/10203/180445 | - |
dc.description | 학위논문(석사) - 한국과학기술원 : 전산학과, 2013.2, [ vi, 46 p. ] | - |
dc.description.abstract | A major obstacle in using a probabilistic topic model, such as Latent Dirichlet Allocation (LDA) or Hierarchical Dirichlet Processes (HDP) is the amount of time it takes for posterior inference, especially for Web data which are huge and continuously expanding. Recent developments in distributed inference algorithms and minibatch-based online learning algorithms have offered partial solutions for this problem. In this paper, I propose a distributed online learning algorithm for LDA and HDP for dealing with both aspects of this problem at once. I apply our learning algorithm to three datasets: a corpus of 973K Twitter conversations and 4.8M Wikipedia articles used for a quantitative evaluation of our algorithm, and a larger corpus of 5.1M Twitter conversations for a case study. I compare our algorithm with the distributed version of variational inference using MapReduce and online learning using stochastic variational inference. I show that our learning algorithm achieves the same model fit and topic quality as the other inference algorithms but within a much shorter learning time. I conduct a case study using our distributed online learning framework to visualize how the topic proportions change over time in a stream of Web documents. Through this case study, I discover interesting temporal dynamics of topics in Twitter conversations. | eng |
dc.language | eng | - |
dc.publisher | 한국과학기술원 | - |
dc.subject | Hierarchical Dirichlet Processes | - |
dc.subject | Latent Dirichlet Allocation | - |
dc.subject | Distributed inference | - |
dc.subject | Online Learning | - |
dc.subject | Topic modeling | - |
dc.subject | Variational inference | - |
dc.subject | 토픽 모델 | - |
dc.subject | 온라인 학습 | - |
dc.subject | 분산 추론 | - |
dc.subject | Latent Dirichlet Allocation | - |
dc.subject | Hierarchical Dirichlet Processes | - |
dc.subject | Variational inference | - |
dc.subject | 맵리듀스 | - |
dc.subject | MapReduce | - |
dc.title | Distributed online learning for topic models | - |
dc.title.alternative | 토픽 모델의 분산 온라인 기계 학습 알고리즘 | - |
dc.type | Thesis(Master) | - |
dc.identifier.CNRN | 515124/325007 | - |
dc.description.department | 한국과학기술원 : 전산학과, | - |
dc.identifier.uid | 020113246 | - |
dc.contributor.localauthor | Oh, Hae-Yun | - |
dc.contributor.localauthor | 오혜연 | - |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.