DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | Oh, Haeyun | - |
dc.contributor.advisor | 오혜연 | - |
dc.contributor.advisor | Kim, Juho | - |
dc.contributor.advisor | 김주호 | - |
dc.contributor.author | Han, Donghoon | - |
dc.date.accessioned | 2022-04-27T19:32:04Z | - |
dc.date.available | 2022-04-27T19:32:04Z | - |
dc.date.issued | 2021 | - |
dc.identifier.uri | http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=948468&flag=dissertation | en_US |
dc.identifier.uri | http://hdl.handle.net/10203/296135 | - |
dc.description | 학위논문(석사) - 한국과학기술원 : 전산학부, 2021.2,[iv, 23 p. :] | - |
dc.description.abstract | Many NLP datasets are generated with crowdsourcing because it is a relatively low-cost and scalable solution. One important issue in datasets built with crowdsourcing is annotation artifacts. That is, a model trained with such a dataset learns annotators' writing strategies that are irrelevant to the task itself. While this problem has already been identified and studied, there is limited research approaching it from the perspective of crowdsourcing workflow design. We suggest a simple but powerful adjustment to the dataset collection procedure: instruct workers not to use a word that is highly indicative of annotation artifacts. In the case study of natural language inference dataset construction, the results from two rounds of studies on Amazon Mechanical Turk reveal that applying a word-level constraint reduces the annotation artifacts from the generated dataset by 9.2% in terms of accuracy-gap score at the time cost of 19.7 second increase per unit task. | - |
dc.language | eng | - |
dc.publisher | 한국과학기술원 | - |
dc.subject | Datasets▼aannotation artifacts▼acrowdsourcing | - |
dc.subject | 데이터셋▼a편향▼a크라우드소싱 | - |
dc.title | Reducing annotation artifacts in crowdsourcing datasets for natural language processing | - |
dc.title.alternative | Annotation artifact를 감소시키는 자연어처리 데이터셋의 크라우드소싱 기법 | - |
dc.type | Thesis(Master) | - |
dc.identifier.CNRN | 325007 | - |
dc.description.department | 한국과학기술원 :전산학부, | - |
dc.contributor.alternativeauthor | 한동훈 | - |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.