DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | Shin, Seungwon | - |
dc.contributor.advisor | 신승원 | - |
dc.contributor.author | Song, Minkyoo | - |
dc.date.accessioned | 2023-06-26T19:34:15Z | - |
dc.date.available | 2023-06-26T19:34:15Z | - |
dc.date.issued | 2023 | - |
dc.identifier.uri | http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1032898&flag=dissertation | en_US |
dc.identifier.uri | http://hdl.handle.net/10203/309944 | - |
dc.description | 학위논문(석사) - 한국과학기술원 : 전기및전자공학부, 2023.2,[iv, 34 p. :] | - |
dc.description.abstract | Sales and discussions of illicit drugs have become commonplace online, including on social media. Social media platforms hosting user-generated content must therefore moderate harmful drug content. However, this is a difficult task due to the vast amount of jargon used in drug discussions. Previous works on drug jargon detection were limited to extracting a list of terms. However, systems relying on a banlist of words have limitations. First, they are trivially evaded using word substitutions. Second, they are cannot distinguish whether a drug euphemism (e.g., pot, crack) is used as a drug jargon or not. An effective drug content moderation system must be trained to find drug jargon using contexts, rather than relying on a banlist. Since the language around drugs is difficult and constantly changing, manually annotated datasets for training on this task are not only expensive to create but also prone to becoming obsolete. We present JEDIS, a system that detects illicit drug jargon terms by learning on distantly supervised data. We manually annotate two datasets from two sources, Reddit and Silk Road Forums, to evaluate drug jargon detection. Our experiments show JEDIS outperforms state-of-the-art word-based baselines in drug jargon detection by 26.16 F1-score and by 9.27 F1-score on the two evaluation datasets. We also use JEDIS in extracting a list of drug jargon terms from the corpus, and find it is robust against pitfalls that other systems face. | - |
dc.language | eng | - |
dc.publisher | 한국과학기술원 | - |
dc.subject | Automatic content moderation▼aDistant supervision▼aJargon detection▼aJargon extraction | - |
dc.subject | 자동 콘텐츠 조정▼a원격 지도 학습▼a특수 어휘 탐지 및 추출 | - |
dc.title | Finding cracks in content moderation | - |
dc.title.alternative | 콘텐츠 조정을 위한 탈어휘 원격 지도 방식의 불법 약물 전문 용어 탐지 | - |
dc.type | Thesis(Master) | - |
dc.identifier.CNRN | 325007 | - |
dc.description.department | 한국과학기술원 :전기및전자공학부, | - |
dc.contributor.alternativeauthor | 송민규 | - |
dc.title.subtitle | delexicalized distant supervision for illicit drug jargon detection | - |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.