Reducing annotation artifacts in crowdsourcing datasets for natural language processingAnnotation artifact를 감소시키는 자연어처리 데이터셋의 크라우드소싱 기법

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 184
  • Download : 0
DC FieldValueLanguage
dc.contributor.advisorOh, Haeyun-
dc.contributor.advisor오혜연-
dc.contributor.advisorKim, Juho-
dc.contributor.advisor김주호-
dc.contributor.authorHan, Donghoon-
dc.date.accessioned2022-04-27T19:32:04Z-
dc.date.available2022-04-27T19:32:04Z-
dc.date.issued2021-
dc.identifier.urihttp://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=948468&flag=dissertationen_US
dc.identifier.urihttp://hdl.handle.net/10203/296135-
dc.description학위논문(석사) - 한국과학기술원 : 전산학부, 2021.2,[iv, 23 p. :]-
dc.description.abstractMany NLP datasets are generated with crowdsourcing because it is a relatively low-cost and scalable solution. One important issue in datasets built with crowdsourcing is annotation artifacts. That is, a model trained with such a dataset learns annotators' writing strategies that are irrelevant to the task itself. While this problem has already been identified and studied, there is limited research approaching it from the perspective of crowdsourcing workflow design. We suggest a simple but powerful adjustment to the dataset collection procedure: instruct workers not to use a word that is highly indicative of annotation artifacts. In the case study of natural language inference dataset construction, the results from two rounds of studies on Amazon Mechanical Turk reveal that applying a word-level constraint reduces the annotation artifacts from the generated dataset by 9.2% in terms of accuracy-gap score at the time cost of 19.7 second increase per unit task.-
dc.languageeng-
dc.publisher한국과학기술원-
dc.subjectDatasets▼aannotation artifacts▼acrowdsourcing-
dc.subject데이터셋▼a편향▼a크라우드소싱-
dc.titleReducing annotation artifacts in crowdsourcing datasets for natural language processing-
dc.title.alternativeAnnotation artifact를 감소시키는 자연어처리 데이터셋의 크라우드소싱 기법-
dc.typeThesis(Master)-
dc.identifier.CNRN325007-
dc.description.department한국과학기술원 :전산학부,-
dc.contributor.alternativeauthor한동훈-
Appears in Collection
CS-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0