Guided HTM: Hierarchical Topic Model with Dirichlet Forest Priors

Cited 3 time in webofscience Cited 0 time in scopus
  • Hit : 452
  • Download : 0
Despite the proliferation of topic models, the organization of topics from the probabilistic models needs improvement in two ways: the better structured presentation of topics and the incorporation of domain knowledge on the corpus. The structured presentation, i.e., the hierarchical topic model, helps in categorizing similar topics; the incorporation of domain knowledge enables the concentrated sampling of predefined keywords in the mixture parameter learning. This paper presents a hierarchical topic models with incorporated domain knowledge, called Guided Hierarchical Topic Model (GHTM). Specifically, we allocated the prior information from the knowledge to the Dirichlet Forest prior. From the prior adjustment, we obtained the topic tree guided by the domain knowledge. This paper also contributes in enumerating four different knowledge extraction methods and applying the extracted knowledge to GHTM. We evaluated the performance of GHTM in terms of the hierarchical clustering accuracy, and we found a significant improvement of hierarchical clustering measured by F-measures. This improvement is also verified by the perplexity analyses. Additionally, we measured topic quality with KL-divergence and visualization, and these confirm the ability to better separate topic distributions. Finally, we tested the hierarchical topic quality through human experiments, and this also revealed significant improvements originating from the guidance.
Publisher
IEEE COMPUTER SOC
Issue Date
2017-02
Language
English
Article Type
Article
Citation

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, v.29, no.2, pp.330 - 343

ISSN
1041-4347
DOI
10.1109/TKDE.2016.2625790
URI
http://hdl.handle.net/10203/223034
Appears in Collection
IE-Journal Papers(저널논문)
Files in This Item
There are no files associated with this item.
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡ Click to see webofscience_button
⊙ Cited 3 items in WoS Click to see citing articles in records_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0