DSpace at KOASAS: Reducing annotation artifacts in crowdsourcing datasets for natural language processing

DSpace at KOASAS

College of Engineering(공과대학)School of Computing(전산학부)CS-Theses_Master(석사논문)

Reducing annotation artifacts in crowdsourcing datasets for natural language processingAnnotation artifact를 감소시키는 자연어처리 데이터셋의 크라우드소싱 기법

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 183
Download : 0

Export

Han, Donghoon

Many NLP datasets are generated with crowdsourcing because it is a relatively low-cost and scalable solution. One important issue in datasets built with crowdsourcing is annotation artifacts. That is, a model trained with such a dataset learns annotators' writing strategies that are irrelevant to the task itself. While this problem has already been identified and studied, there is limited research approaching it from the perspective of crowdsourcing workflow design. We suggest a simple but powerful adjustment to the dataset collection procedure: instruct workers not to use a word that is highly indicative of annotation artifacts. In the case study of natural language inference dataset construction, the results from two rounds of studies on Amazon Mechanical Turk reveal that applying a word-level constraint reduces the annotation artifacts from the generated dataset by 9.2% in terms of accuracy-gap score at the time cost of 19.7 second increase per unit task.

Advisors: Oh, Haeyun researcher; 오혜연 researcher; Kim, Juho researcher; 김주호 researcher

Description: 한국과학기술원 :전산학부,

Publisher: 한국과학기술원

Issue Date: 2021

Identifier: 325007

Language: eng

Description: 학위논문(석사) - 한국과학기술원 : 전산학부, 2021.2,[iv, 23 p. :]

Keywords: Datasets▼aannotation artifacts▼acrowdsourcing; 데이터셋▼a편향▼a크라우드소싱

URI: http://hdl.handle.net/10203/296135

Link: http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=948468&flag=dissertation

Appears in Collection: CS-Theses_Master(석사논문)

Files in This Item: There are no files associated with this item.

Display Full Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Reducing annotation artifacts in crowdsourcing datasets for natural language processingAnnotation artifact를 감소시키는 자연어처리 데이터셋의 크라우드소싱 기법

KOASAS

Communities & Collections