Noise-tolerant relation annotation for knowledge extension지식 확장을 위한 잡음 내성 관계 주석 기법

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 235
  • Download : 0
Knowledge learning or knowledge extension, which is a study that extracts knowledge from natural language texts, which is information in an unstructured form, plays a vital role in the field of natural language processing research. Relation extraction, a core task in the field of knowledge extension, is a task of classifying one of the predefined relation set between two entities in a sentence, and then expressing the extracted relation and the two entities as RDF triples that can be understood by a machine. Distant supervision is a method of automatically generating the relation annotation data for the relation extraction task. It has a strong assumption that ’when a sentence containing two entities in the RDF triple of the knowledge base is collected, the sentence will contain the meaning of the relation between the two entities.’ At this time, the sentence and the two entities are collectively called an instance. However, in the distant supervision data, there are many noise instances, that is, sentences do not contain discriminative evidence of the relation. In our observation, there is 48% noise in the Korean distant supervision dataset. This noise data degrades the performance of the relation extraction model, and noise gradually accumulates in the knowledge base, resulting in a problem that the quality of relation extraction training data and the model continues to deteriorate. To solve this problem, semi-supervised relation extraction (SSRE) studies using high-quality small-scale seed data have been conducted, however, to the best of our knowledge, no studies are starting from seed data with some noise. SSRE using noise seed data seeks to improve the quality of newly annotated labeled data and also improve the performance of the relation extraction model. Noise seed data can be easily collected by the distant supervision, or a large amount of data can be created with a relatively low cost using a crowdsourcing technique, thus it is easy to expand. To achieve this goal, this study proposes a method of noise-tolerant relation annotation for knowledge extension. As a result, we proved the superiority of our method by improving the performance of the relation extraction model by 10% by the method proposed in this study. After that, this study proposes a method of constructing a knowledge extension environment in the low-resource languages. In this study, as an actual example of the Korean language, four types of knowledge extension data were constructed based on our crowdsourcing method, and a knowledge extension framework was designed using this data. Moreover, we released the collected language resources and Korean knowledge extension API. Next, we propose a surface knowledge graph that generates one graph for the whole-sentence without the knowledge base ontology. Ontology-based knowledge extension has a problem in that a lot of information in the text is missed, then its utilization is reduced in applications such as question answering. Therefore, to solve this problem, we design and implement a Korean surface knowledge graph extraction system. This surface knowledge graph can be combined with an ontology-based knowledge graph, and also proved its excellence by showing its application in a question answering system. Finally, we define various problems that arise from the viewpoint of real-world knowledge extension and discuss practical solutions to each problem. We expect that the efficient data construction and knowledge extension framework proposed in this paper, the definition of various problems arising from the real-world knowledge extension, and their solutions will be one method to overcome the low resource knowledge extension environment.
Advisors
Choi, Key-Sunresearcher최기선researcher
Description
한국과학기술원 :전산학부,
Publisher
한국과학기술원
Issue Date
2021
Identifier
325007
Language
eng
Description

학위논문(박사) - 한국과학기술원 : 전산학부, 2021.2,[119+viii :]

Keywords

Natural Language Processing▼aKnowledge Graph▼aRelation Annotation▼aRelation Extraction▼aKnowledge Extension▼aCrowdsourcing; 자연언어처리▼a지식 그래프▼a관계 주석▼a관계 추출▼a지식 확장▼a크라우드소싱

URI
http://hdl.handle.net/10203/295747
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=956457&flag=dissertation
Appears in Collection
CS-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0