An ensemble of transliteration models for information retrieval

Cited 8 time in webofscience Cited 15 time in scopus
  • Hit : 1127
  • Download : 0
DC FieldValueLanguage
dc.contributor.authorOh, JHko
dc.contributor.authorChoi, Key-Sunko
dc.date.accessioned2008-04-10T07:42:54Z-
dc.date.available2008-04-10T07:42:54Z-
dc.date.created2012-02-06-
dc.date.created2012-02-06-
dc.date.issued2006-07-
dc.identifier.citationINFORMATION PROCESSING & MANAGEMENT, v.42, no.4, pp.980 - 1002-
dc.identifier.issn0306-4573-
dc.identifier.urihttp://hdl.handle.net/10203/3777-
dc.descriptionReceived 21 June 2005; accepted 29 September 2005 Available online 16 November 2005en
dc.description.abstractTransliteration is used to phonetically translate proper names and technical terms especially from languages in Roman alphabets to languages in non-Roman alphabets such as from English to Korean, Japanese, and Chinese. Because transliterations are usually representative index terms for documents, proper handling of the transliterations is important for an effective information retrieval system. However, there are limitations on handling transliterations depending on dictionary lookup, because transliterations are usually not registered in the dictionary. For this reason, many researchers have been trying to overcome the problem using machine transliteration. In this paper, we propose a method for improving machine transliteration using an ensemble of three different transliteration models. Because one transliteration model alone has limitation on reflecting all possible transliteration behaviors, several transliteration models should be complementary used in order to achieve a high-performance machine transliteration system. This paper describes a method about transliteration production using the several machine transliteration models and transliteration ranking with web data and relevance scores given by each transliteration model. We report evaluation results for our ensemble transliteration model and experimental results for its impact on IR effectiveness. Machine transliteration tests on English-to-Korean transliteration and English-to-Japanese transliteration show that our proposed method achieves 78-80% word accuracy. Information retrieval tests on KTSET and NTCIR-1 test collection show that our transliteration model can improve the performance of an information retrieval system about 10-34%. (c) 2005 Elsevier Ltd. All rights reserved.-
dc.languageEnglish-
dc.language.isoen_USen
dc.publisherPERGAMON-ELSEVIER SCIENCE LTD-
dc.titleAn ensemble of transliteration models for information retrieval-
dc.typeArticle-
dc.identifier.wosid000236006600008-
dc.identifier.scopusid2-s2.0-29244488760-
dc.type.rimsART-
dc.citation.volume42-
dc.citation.issue4-
dc.citation.beginningpage980-
dc.citation.endingpage1002-
dc.citation.publicationnameINFORMATION PROCESSING & MANAGEMENT-
dc.identifier.doi10.1016/j.ipm.2005.09.007-
dc.embargo.liftdate9999-12-31-
dc.embargo.terms9999-12-31-
dc.contributor.localauthorChoi, Key-Sun-
dc.contributor.nonIdAuthorOh, JH-
dc.type.journalArticleArticle-
dc.subject.keywordAuthormachine transliteration-
dc.subject.keywordAuthorensemble-based transliteration model-
dc.subject.keywordAuthorweb data-
dc.subject.keywordAuthorinformation retrieval-
dc.subject.keywordAuthormachine learning-
Appears in Collection
CS-Journal Papers(저널논문)
Files in This Item
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡ Click to see webofscience_button
⊙ Cited 8 items in WoS Click to see citing articles in records_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0