Generation of Korean Offensive Language by Leveraging Large Language Models via Prompt Design

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 59
  • Download : 0
The research for detecting offensive language on online platforms has much advanced. However, the majority of these studies have primarily focused on English. Given the unique characteristics of offensive language, where social and cultural contexts significantly influence content understanding, language-specific datasets are essential. Acquiring comprehensive datasets in Korean, a less-resourced language, has mostly relied on human annotations, suffering from inherent limitations in terms of labor intensity and potential annotator bias. Automatic generation of datasets using generative methods offers an alternative approach to address these limitations, yet faces challenges in capturing linguistic and cultural diversities while maintaining native-level fluency. To address these challenges, we introduce a prompt design methodology, Korean Offensive language Machine Generation (K-OMG), using large language models. By manipulating three prompt factors, we find an effective prompt design to generate culturally aligned offensive language with fluent expressions. Experimental results demonstrate the high quality and utility of our automatically generated dataset. Our detailed analysis shows that the proposed approach achieves exceptional fluency in generating texts while effectively incorporating social and cultural diversities.
Publisher
Association for Computational Linguistics (ACL)
Issue Date
2023-11-02
Language
English
Citation

The 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics

URI
http://hdl.handle.net/10203/314595
Appears in Collection
CS-Conference Papers(학술회의논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0