Utilizing global and path information with language modelling for hierarchical text classification

Cited 4 time in webofscience Cited 8 time in scopus
  • Hit : 765
  • Download : 10
Hierarchical text classification of a Web taxonomy is challenging because it is a very large-scale problem with hundreds of thousands of categories and associated documents. Furthermore, the conceptual levels and training data availabilities of categories vary widely. The narrow-down approach is the state of the art; it utilizes a search engine for generating candidates from the taxonomy and builds a classifier for the final category selection. In this paper, we take the same approach but address the issue of using global information in a language modelling framework to improve effectiveness. We propose three methods of using non-local information for the task: a passive way of utilizing global information for smoothing; an aggressive way where a top-level classifier is built and integrated with a local model; and a method of using label terms associated with the path from a category to the root, which is based on our systematic observation that they are underrepresented in the documents. For evaluation, we constructed a document collection from Web pages in the Open Directory Project. A series of experiments and their results show the superiority of our methods and reveal the role of global information in hierarchical text classification.
Publisher
SAGE PUBLICATIONS LTD
Issue Date
2014-04
Language
English
Article Type
Article
Keywords

NAIVE BAYES

Citation

JOURNAL OF INFORMATION SCIENCE, v.40, no.2, pp.127 - 145

ISSN
0165-5515
DOI
10.1177/0165551513507415
URI
http://hdl.handle.net/10203/188782
Appears in Collection
CS-Journal Papers(저널논문)
Files in This Item
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡ Click to see webofscience_button
⊙ Cited 4 items in WoS Click to see citing articles in records_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0