DSpace at KOASAS: Effective data clustering for large volume high dimensional datasets

DSpace at KOASAS

College of Engineering(공과대학)School of Computing(전산학부)CS-Theses_Ph.D.(박사논문)

Effective data clustering for large volume high dimensional datasets대용량 고차원 데이타집합을 위한 효과적인 데이타 클러스터링

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 511
Download : 0

Export

Woo, Kyoung-Gu / 우경구

Data Clustering is one of the most frequently used tools in Data Mining, which refers to the process of partitioning data so that intra-group similarities are maximized and inter-group similarities are minimized at the same time. Data clustering enables us to get a rough idea about the composition of the given dataset. It is especially useful when there is little knowledge about the given dataset. But as datasets become larger in their volumes and higher in their dimensions, more efficient clustering methods are required. Especially, the high dimensionality of a dataset makes it very difficult to generate a meaningful clustering result because the distance between any data object pair becomes similar in a high dimension. In this thesis, we present a study of an effective data clustering for a large volume of high dimensional datasets. To deal with the curse of dimensionality, the proposed method follows the philosophy of subspace clustering which assumes that important dimensions can be different between clusters. We first define a new similarity measure devised for high dimensional datasets. To measure the similarity between two data objects, the proposed similarity measure focuses on the number of dimensions that two objects are near enough from each other, rather than merely averaging the similarities along all dimensions. We then present a novel way to find out each cluster``s important dimensions(i.e. subspace). The suggested subspace finding method uses the nearest neighbor query results to gather the information required for selecting important dimensions. The gathered information is used to determine whether each dimension is important or not based on a binomial probability model. Finally we propose an algorithm which adopts our similarity measure and subspace finding method to perform clustering on a large volume of high dimensional dataset. Through the experiment results on various datasets, the proposed algorithm is shown to meet many requirements fo...

Advisors: Lee, Yoon-Joon researcher; 이윤준 researcher

Description: 한국과학기술원 : 전산학전공,

Publisher: 한국과학기술원

Issue Date: 2004

Identifier: 237666/325007 / 000985231

Language: eng

Description: 학위논문(박사) - 한국과학기술원 : 전산학전공, 2004.2, [ vi, 43 p. ]

Keywords: DATA MINING; HIGH DIMENSION; 고차원 클러스터링; 데이타 마이닝; CLUSTERING

URI: http://hdl.handle.net/10203/32863

Link: http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=237666&flag=dissertation

Appears in Collection: CS-Theses_Ph.D.(박사논문)

Files in This Item: There are no files associated with this item.

Display Full Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Effective data clustering for large volume high dimensional datasets대용량 고차원 데이타집합을 위한 효과적인 데이타 클러스터링

KOASAS

Communities & Collections