DSpace at KOASAS: Question-Answering in a Low-resourced Language: Benchmark Dataset and Models for Tigrinya

DSpace at KOASAS

College of Engineering(공과대학)School of Computing(전산학부)CS-Conference Papers(학술회의논문)

Question-Answering in a Low-resourced Language: Benchmark Dataset and Models for Tigrinya

Cited 0 time in webofscience

Cited 0 time in

Hit : 33
Download : 0

Export

Gaim, Fitsum / Yang, Wonsuk / Park, Hancheol / Park, Jong-Cheol researcher

Question-Answering (QA) has seen significant advances recently, achieving near human-level performance over some benchmarks. However, these advances focus on high-resourced languages such as English, while the task remains unexplored for most other languages, mainly due to the lack of annotated datasets. This work presents a native QA dataset for an East African language, Tigrinya. The dataset contains 10.6K question-answer pairs spanning 572 paragraphs extracted from 290 news articles on various topics. The dataset construction method is discussed, which is applicable to constructing similar resources for related languages. We present comprehensive experiments and analyses of several resource-efficient approaches to QA, including monolingual, cross-lingual, and multilingual setups, along with comparisons against machine-translated silver data. Our strong baseline models reach 76% in the F1 score, while the estimated human performance is 92%, indicating that the benchmark presents a good challenge for future work. We make the dataset, models, and leaderboard publicly available.

Publisher: Association for Computational Linguistics (ACL)

Issue Date: 2023-07-11

Language: English

Citation: 61st Annual Meeting of the Association for Computational Linguistics, ACL 2023, pp.11857 - 11870

URI: http://hdl.handle.net/10203/314658

Appears in Collection: CS-Conference Papers(학술회의논문)

Files in This Item: There are no files associated with this item.

Display Full Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Question-Answering in a Low-resourced Language: Benchmark Dataset and Models for Tigrinya

KOASAS

Communities & Collections