Concept unification of terms in different languages via web mining for Information Retrieval

Cited 3 time in webofscience Cited 7 time in scopus
  • Hit : 270
  • Download : 24
For historical and cultural reasons, English phases, especially proper nouns and new words, frequently appear in Web pages written primarily in East Asian languages such as Chinese, Korean, and Japanese. Although such English terms and their equivalences in these East Asian languages refer to the same concept, they are often erroneously treated as independent index units in traditional Information Retrieval (IR). This paper describes the degree to which the problem arises in IR and proposes a novel technique to solve it. Our method first extracts English terms from native Web documents in an East Asian language, and then unifies the extracted terms and their equivalences in the native language as one index unit. For Cross-Language Information Retrieval (CLIR), one of the major hindrances to achieving retrieval performance at the level of Mono-Lingual Information Retrieval (MLIR) is the translation of terms in search queries which can not be found in a bilingual dictionary. The Web mining approach proposed in this paper for concept unification of terms in different languages can also be applied to solve this well-known challenge in CLIR. Experimental results based on NTCIR and KT-Set test collections show that the high translation precision of our approach greatly improves performance of both Mono-Lingual and Cross-Language Information Retrieval. (C) 2008 Elsevier Ltd. All rights reserved.
Publisher
PERGAMON-ELSEVIER SCIENCE LTD
Issue Date
2009-03
Language
English
Article Type
Article
Citation

INFORMATION PROCESSING & MANAGEMENT, v.45, pp.246 - 262

ISSN
0306-4573
DOI
10.1016/j.ipm.2008.09.006
URI
http://hdl.handle.net/10203/175068
Appears in Collection
CS-Journal Papers(저널논문)
Files in This Item
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡ Click to see webofscience_button
⊙ Cited 3 items in WoS Click to see citing articles in records_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0