Constructing a paraphrase database for agglutinative languages

Cited 2 time in webofscience Cited 2 time in scopus
  • Hit : 520
  • Download : 0
Paraphrase databases (PPDBs) are valuable resources for applications that use natural language processing (NLP) technology. In order to construct a high-quality PPDB for agglutinative languages, we propose a phrasal paraphrase extraction method; namely, affix modification-based bilingual pivoting method (AMBPM). AMBPM is suitable for agglutinative languages because it addresses the problems of lexical data sparsity and of not considering morphological word structure. In addition, we propose “improved AMBPM”, which is an improvement on AMBPM by addressing the problem of extracting incorrect stem paraphrase pairs caused by low semantic content stems (LSCSs) by using a rule-based filtering approach. In our experiments on AMBPM, we evaluate AMBPM and compare two state-of-the-art paraphrase extraction methods: the syntactic constraints-based bilingual pivoting method (SCBPM) and word embedding method. In the experiments on improved AMPBM, we evaluate our method and compare the resulting PPDB with four types of databases; PPDB constructed by using the original AMBPM, two PPDBs constructed by using two types of word-embedding-based methods (stem embedding and phrase embedding), and an existing thesaurus. The comparison is performed by using two NLP applications: sentential paraphrase generation and a question answering (QA) system. The experimental results demonstrate that, AMBPM outperforms the state-of-the-art paraphrase extraction methods. In addition, the improved AMBPM, which uses a rule-based filtering method, significantly improves AMBPM. Moreover, although a small amount of training data was used with no aid from linguistic resources, the PPDB constructed with the improved AMBPM is more useful than the four databases for the agglutinative language used in our study. We also publicized the Korean PPDB that was constructed using the improved AMBPM.
Publisher
ELSEVIER SCIENCE BV
Issue Date
2019-09
Language
English
Article Type
Article
Citation

DATA & KNOWLEDGE ENGINEERING, v.123, pp.1 - 20

ISSN
0169-023X
DOI
10.1016/j.datak.2017.07.007
URI
http://hdl.handle.net/10203/268479
Appears in Collection
CS-Journal Papers(저널논문)
Files in This Item
There are no files associated with this item.
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡ Click to see webofscience_button
⊙ Cited 2 items in WoS Click to see citing articles in records_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0