DSpace at KOASAS: Data augmentation for abusive language detection via back-translation and domain knowledge

DSpace at KOASAS

College of Engineering(공과대학)School of Computing(전산학부)CS-Theses_Master(석사논문)

Data augmentation for abusive language detection via back-translation and domain knowledge언어폭력 탐지를 위한 데이터 증강: 역번역과 도메인 지식을 활용하여

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 97
Download : 0

Export

DC Field	Value	Language
dc.contributor.advisor	Park, Jong C.	-
dc.contributor.advisor	박종철	-
dc.contributor.author	Shin, Jisu	-
dc.date.accessioned	2023-06-26T19:31:28Z	-
dc.date.available	2023-06-26T19:31:28Z	-
dc.date.issued	2022	-
dc.identifier.uri	http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1021047&flag=dissertation	en_US
dc.identifier.uri	http://hdl.handle.net/10203/309533	-
dc.description	학위논문(석사) - 한국과학기술원 : 전산학부, 2022.8,[iii, 32 p. :]	-
dc.description.abstract	As abusive language has emerged as one of the social problems, many researchers have attempted to automatically detect abusive language from online texts. The researchers have addressed various aspects of abusive languages, such as hate speech, derogatory language, and profanity, and performed various detection tasks, such as abusiveness detection, target detection, or target identification. In this regard, such diverse aspects of abusive language call for new datasets. However, constructing a new dataset is undesirable because it is not efficient due to the labor-intensive nature of annotations. Accordingly, there is a trend to improve detection performance by using data augmentation techniques in abusive language detection. In this study, we propose automatically augmenting the existing dataset by employing back-translation, maintaining the meaning of the original data but securing the diversity of words and structures. Previous studies using the back-translation augmentation showed performance degradation due to the use of a specific pivot language. Still, our study experimentally showed that data augmentation with guaranteed linguistic diversity is possible by using various pivot languages. In addition, in order to solve the limitations presented in previous studies, we introduce a post-processing method based on domain knowledge and validate its effectiveness through experiments.	-
dc.language	eng	-
dc.publisher	한국과학기술원	-
dc.subject	Natural Language Processing▼aAbusive Language Detection▼aData Augmentation▼aBack-translation	-
dc.subject	자연 언어 처리▼a언어폭력 탐지▼a데이터 증강▼a역번역	-
dc.title	Data augmentation for abusive language detection via back-translation and domain knowledge	-
dc.title.alternative	언어폭력 탐지를 위한 데이터 증강: 역번역과 도메인 지식을 활용하여	-
dc.type	Thesis(Master)	-
dc.identifier.CNRN	325007	-
dc.description.department	한국과학기술원 :전산학부,	-
dc.contributor.alternativeauthor	신지수	-

Appears in Collection: CS-Theses_Master(석사논문)

Files in This Item: There are no files associated with this item.

Display Simple Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Data augmentation for abusive language detection via back-translation and domain knowledge언어폭력 탐지를 위한 데이터 증강: 역번역과 도메인 지식을 활용하여

KOASAS

Communities & Collections