Finding cracks in content moderation : delexicalized distant supervision for illicit drug jargon detection콘텐츠 조정을 위한 탈어휘 원격 지도 방식의 불법 약물 전문 용어 탐지

dc.contributor.advisorShin, Seungwon-
dc.contributor.authorSong, Minkyoo-
dc.description학위논문(석사) - 한국과학기술원 : 전기및전자공학부, 2023.2,[iv, 34 p. :]-
dc.description.abstractSales and discussions of illicit drugs have become commonplace online, including on social media. Social media platforms hosting user-generated content must therefore moderate harmful drug content. However, this is a difficult task due to the vast amount of jargon used in drug discussions. Previous works on drug jargon detection were limited to extracting a list of terms. However, systems relying on a banlist of words have limitations. First, they are trivially evaded using word substitutions. Second, they are cannot distinguish whether a drug euphemism (e.g., pot, crack) is used as a drug jargon or not. An effective drug content moderation system must be trained to find drug jargon using contexts, rather than relying on a banlist. Since the language around drugs is difficult and constantly changing, manually annotated datasets for training on this task are not only expensive to create but also prone to becoming obsolete. We present JEDIS, a system that detects illicit drug jargon terms by learning on distantly supervised data. We manually annotate two datasets from two sources, Reddit and Silk Road Forums, to evaluate drug jargon detection. Our experiments show JEDIS outperforms state-of-the-art word-based baselines in drug jargon detection by 26.16 F1-score and by 9.27 F1-score on the two evaluation datasets. We also use JEDIS in extracting a list of drug jargon terms from the corpus, and find it is robust against pitfalls that other systems face.-
dc.subjectAutomatic content moderation▼aDistant supervision▼aJargon detection▼aJargon extraction-
dc.subject자동 콘텐츠 조정▼a원격 지도 학습▼a특수 어휘 탐지 및 추출-
dc.titleFinding cracks in content moderation-
dc.title.alternative콘텐츠 조정을 위한 탈어휘 원격 지도 방식의 불법 약물 전문 용어 탐지-
dc.description.department한국과학기술원 :전기및전자공학부,-
dc.title.subtitledelexicalized distant supervision for illicit drug jargon detection-
