Finding cracks in content moderation : delexicalized distant supervision for illicit drug jargon detection콘텐츠 조정을 위한 탈어휘 원격 지도 방식의 불법 약물 전문 용어 탐지

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 80
  • Download : 0
Sales and discussions of illicit drugs have become commonplace online, including on social media. Social media platforms hosting user-generated content must therefore moderate harmful drug content. However, this is a difficult task due to the vast amount of jargon used in drug discussions. Previous works on drug jargon detection were limited to extracting a list of terms. However, systems relying on a banlist of words have limitations. First, they are trivially evaded using word substitutions. Second, they are cannot distinguish whether a drug euphemism (e.g., pot, crack) is used as a drug jargon or not. An effective drug content moderation system must be trained to find drug jargon using contexts, rather than relying on a banlist. Since the language around drugs is difficult and constantly changing, manually annotated datasets for training on this task are not only expensive to create but also prone to becoming obsolete. We present JEDIS, a system that detects illicit drug jargon terms by learning on distantly supervised data. We manually annotate two datasets from two sources, Reddit and Silk Road Forums, to evaluate drug jargon detection. Our experiments show JEDIS outperforms state-of-the-art word-based baselines in drug jargon detection by 26.16 F1-score and by 9.27 F1-score on the two evaluation datasets. We also use JEDIS in extracting a list of drug jargon terms from the corpus, and find it is robust against pitfalls that other systems face.
Advisors
Shin, Seungwonresearcher신승원researcher
Description
한국과학기술원 :전기및전자공학부,
Publisher
한국과학기술원
Issue Date
2023
Identifier
325007
Language
eng
Description

학위논문(석사) - 한국과학기술원 : 전기및전자공학부, 2023.2,[iv, 34 p. :]

Keywords

Automatic content moderation▼aDistant supervision▼aJargon detection▼aJargon extraction; 자동 콘텐츠 조정▼a원격 지도 학습▼a특수 어휘 탐지 및 추출

URI
http://hdl.handle.net/10203/309944
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1032898&flag=dissertation
Appears in Collection
EE-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0