Utilizing non-local information to large-scale hierarchical text classification비국소적 정보를 이용한 대규모 계층적 문서 분류

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 571
  • Download : 0
Hierarchical text classification to a web taxonomy is challenging because it is a very large-scale problem with hundreds of thousand categories and associated documents. Furthermore, the conceptual levels and training data availabilities of categories vary widely. Compared to the previous work solely relying on machine learning, a narrow-down approach is the state-of-the-art that utilizes a search engine for generating candidates from the taxonomy and builds a classifier for the final category selection. However, we observed the previous work just focusing on local information associated with candidate categories to train a classifier. In this thesis, we take the same approach but address the issue of using non-local information, i.e. global and path information, to improve the effectiveness of classification. To this end, this thesis proposes methods using non-local information based on statistical language modeling framework which is well-developed in information retrieval area by understanding the necessity of non-local information. For evaluation, we constructed a document collection from web pages in the Open Directory Project (ODP). A series of exhaustive experiments and their results show the superiority of our methods and reveal the role of non-local information in hierarchical text classification.
Advisors
Myaeng, Sung-Hyonresearcher맹성현
Description
한국과학기술원 : 전산학과,
Publisher
한국과학기술원
Issue Date
2014
Identifier
568609/325007  / 020095234
Language
eng
Description

학위논문(박사) - 한국과학기술원 : 전산학과, 2014.2, [ vi, 86 p. ]

Keywords

language modeling; web taxonomy; 계층적 문서 분류; 언어모델; hierarchical text classification; 웹 텍사노미

URI
http://hdl.handle.net/10203/197821
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=568609&flag=dissertation
Appears in Collection
CS-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0