Data augmentation for abusive language detection via back-translation and domain knowledge언어폭력 탐지를 위한 데이터 증강: 역번역과 도메인 지식을 활용하여

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 96
  • Download : 0
As abusive language has emerged as one of the social problems, many researchers have attempted to automatically detect abusive language from online texts. The researchers have addressed various aspects of abusive languages, such as hate speech, derogatory language, and profanity, and performed various detection tasks, such as abusiveness detection, target detection, or target identification. In this regard, such diverse aspects of abusive language call for new datasets. However, constructing a new dataset is undesirable because it is not efficient due to the labor-intensive nature of annotations. Accordingly, there is a trend to improve detection performance by using data augmentation techniques in abusive language detection. In this study, we propose automatically augmenting the existing dataset by employing back-translation, maintaining the meaning of the original data but securing the diversity of words and structures. Previous studies using the back-translation augmentation showed performance degradation due to the use of a specific pivot language. Still, our study experimentally showed that data augmentation with guaranteed linguistic diversity is possible by using various pivot languages. In addition, in order to solve the limitations presented in previous studies, we introduce a post-processing method based on domain knowledge and validate its effectiveness through experiments.
Advisors
Park, Jong C.researcher박종철researcher
Description
한국과학기술원 :전산학부,
Publisher
한국과학기술원
Issue Date
2022
Identifier
325007
Language
eng
Description

학위논문(석사) - 한국과학기술원 : 전산학부, 2022.8,[iii, 32 p. :]

Keywords

Natural Language Processing▼aAbusive Language Detection▼aData Augmentation▼aBack-translation; 자연 언어 처리▼a언어폭력 탐지▼a데이터 증강▼a역번역

URI
http://hdl.handle.net/10203/309533
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1021047&flag=dissertation
Appears in Collection
CS-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0