DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | Park, Jong C. | - |
dc.contributor.advisor | 박종철 | - |
dc.contributor.author | Shin, Jisu | - |
dc.date.accessioned | 2023-06-26T19:31:28Z | - |
dc.date.available | 2023-06-26T19:31:28Z | - |
dc.date.issued | 2022 | - |
dc.identifier.uri | http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1021047&flag=dissertation | en_US |
dc.identifier.uri | http://hdl.handle.net/10203/309533 | - |
dc.description | 학위논문(석사) - 한국과학기술원 : 전산학부, 2022.8,[iii, 32 p. :] | - |
dc.description.abstract | As abusive language has emerged as one of the social problems, many researchers have attempted to automatically detect abusive language from online texts. The researchers have addressed various aspects of abusive languages, such as hate speech, derogatory language, and profanity, and performed various detection tasks, such as abusiveness detection, target detection, or target identification. In this regard, such diverse aspects of abusive language call for new datasets. However, constructing a new dataset is undesirable because it is not efficient due to the labor-intensive nature of annotations. Accordingly, there is a trend to improve detection performance by using data augmentation techniques in abusive language detection. In this study, we propose automatically augmenting the existing dataset by employing back-translation, maintaining the meaning of the original data but securing the diversity of words and structures. Previous studies using the back-translation augmentation showed performance degradation due to the use of a specific pivot language. Still, our study experimentally showed that data augmentation with guaranteed linguistic diversity is possible by using various pivot languages. In addition, in order to solve the limitations presented in previous studies, we introduce a post-processing method based on domain knowledge and validate its effectiveness through experiments. | - |
dc.language | eng | - |
dc.publisher | 한국과학기술원 | - |
dc.subject | Natural Language Processing▼aAbusive Language Detection▼aData Augmentation▼aBack-translation | - |
dc.subject | 자연 언어 처리▼a언어폭력 탐지▼a데이터 증강▼a역번역 | - |
dc.title | Data augmentation for abusive language detection via back-translation and domain knowledge | - |
dc.title.alternative | 언어폭력 탐지를 위한 데이터 증강: 역번역과 도메인 지식을 활용하여 | - |
dc.type | Thesis(Master) | - |
dc.identifier.CNRN | 325007 | - |
dc.description.department | 한국과학기술원 :전산학부, | - |
dc.contributor.alternativeauthor | 신지수 | - |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.