Data augmentation for abusive language detection via back-translation and domain knowledge언어폭력 탐지를 위한 데이터 증강: 역번역과 도메인 지식을 활용하여

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 97
  • Download : 0
DC FieldValueLanguage
dc.contributor.advisorPark, Jong C.-
dc.contributor.advisor박종철-
dc.contributor.authorShin, Jisu-
dc.date.accessioned2023-06-26T19:31:28Z-
dc.date.available2023-06-26T19:31:28Z-
dc.date.issued2022-
dc.identifier.urihttp://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1021047&flag=dissertationen_US
dc.identifier.urihttp://hdl.handle.net/10203/309533-
dc.description학위논문(석사) - 한국과학기술원 : 전산학부, 2022.8,[iii, 32 p. :]-
dc.description.abstractAs abusive language has emerged as one of the social problems, many researchers have attempted to automatically detect abusive language from online texts. The researchers have addressed various aspects of abusive languages, such as hate speech, derogatory language, and profanity, and performed various detection tasks, such as abusiveness detection, target detection, or target identification. In this regard, such diverse aspects of abusive language call for new datasets. However, constructing a new dataset is undesirable because it is not efficient due to the labor-intensive nature of annotations. Accordingly, there is a trend to improve detection performance by using data augmentation techniques in abusive language detection. In this study, we propose automatically augmenting the existing dataset by employing back-translation, maintaining the meaning of the original data but securing the diversity of words and structures. Previous studies using the back-translation augmentation showed performance degradation due to the use of a specific pivot language. Still, our study experimentally showed that data augmentation with guaranteed linguistic diversity is possible by using various pivot languages. In addition, in order to solve the limitations presented in previous studies, we introduce a post-processing method based on domain knowledge and validate its effectiveness through experiments.-
dc.languageeng-
dc.publisher한국과학기술원-
dc.subjectNatural Language Processing▼aAbusive Language Detection▼aData Augmentation▼aBack-translation-
dc.subject자연 언어 처리▼a언어폭력 탐지▼a데이터 증강▼a역번역-
dc.titleData augmentation for abusive language detection via back-translation and domain knowledge-
dc.title.alternative언어폭력 탐지를 위한 데이터 증강: 역번역과 도메인 지식을 활용하여-
dc.typeThesis(Master)-
dc.identifier.CNRN325007-
dc.description.department한국과학기술원 :전산학부,-
dc.contributor.alternativeauthor신지수-
Appears in Collection
CS-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0