Automatic genre detection of Web documents

Cited 6 time in webofscience Cited 0 time in scopus
  • Hit : 480
  • Download : 0
A genre or a style is another view of documents different from a subject or a topic. The genre is also a criterion to classify the documents. There have been several studies on detecting a genre of textual documents. However, only a few of them dealt with web documents. In this paper we suggest sets of features to detect genres of web documents. Web documents are different from textual documents in that they contain URL and HTML tags within the pages. We introduce the features specific to web documents, which are extracted from URL and HTML tags. Experimental results enable us to evaluate their characteristics and performances.
Publisher
SPRINGER-VERLAG BERLIN
Issue Date
2005
Language
English
Article Type
Article; Proceedings Paper
Citation

LECTURE NOTES IN COMPUTER SCIENCE, v.3248, pp.310 - 319

ISSN
0302-9743
URI
http://hdl.handle.net/10203/89101
Appears in Collection
Files in This Item
There are no files associated with this item.
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡ Click to see webofscience_button
⊙ Cited 6 items in WoS Click to see citing articles in records_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0