Resolving homonymy with correlation clustering in scholarly digital libraries상관 군집화를 통한 학술 데이터베이스상의 동명이인 해결방법에 대한 연구

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 1210
  • Download : 0
As scholarly data increases rapidly, scholarly digital libraries, supplying tremendous scholarly data through convenient online interfaces, become more popular and important tools for researchers. However, because of the limitation of naming convention widely practiced in academic fields, a large number of scholarly publications often suffer with the problem of correctly identifying authors with common names. Especially, the naming conventions such as abbreviating first and middle names make it even harder to identify and distinguish authors with the same representation (i.e. spelling) of names. Several disambiguation methods have been suggested to tackle the problem but most of them require less practical inputs such as number of same-named authors, training set, or rich information about papers. Base on assumption that coauthors are likely to write more than one paper together, we propose an autonomous approach to group papers from the same author using the most common information, author lists. We employ various techniques to achieve the goal. First, we represent the input set of papers as a data matrix and reduce dimension of the matrix to find groups of coauthors who appear frequently together. Second, we devise relative correlation distance measure suitable to the reduced space and apply it to density-based clustering which are used to cluster papers showing similar coauthors. Finally, we adopt a concept of summarization to represent cluster of papers as a single vector. We evaluate our method using publication records about 11 ambiguous names, and show that our approach results better disambiguation while keeping high purity of clusters compared to other four density-based clustering methods.
Advisors
Moon, Sue-Bokresearcher문수복
Description
한국과학기술원 : 전산학과,
Publisher
한국과학기술원
Issue Date
2013
Identifier
515143/325007  / 020113625
Language
eng
Description

학위논문(석사) - 한국과학기술원 : 전산학과, 2013.2, [ iv, 23 p. ]

Keywords

Digital Libraries; Homonymy; 학술데이터베이스; 동명이인; 군집화; Clustering

URI
http://hdl.handle.net/10203/180426
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=515143&flag=dissertation
Appears in Collection
CS-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0