DSpace at KOASAS: (A) clustering with high dimensional categorical data for stream

DSpace at KOASAS

College of Engineering(공과대학)School of Computing(전산학부)CS-Theses_Ph.D.(박사논문)

(A) clustering with high dimensional categorical data for stream스트림 데이터를 위한 고차원 범주 데이터 클러스터링 기법

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 711
Download : 0

Export

DC Field	Value	Language
dc.contributor.advisor	Lee, Yoon-Joon	-
dc.contributor.advisor	이윤준	-
dc.contributor.author	Lee, Jeong-Hoon	-
dc.contributor.author	이정훈	-
dc.date.accessioned	2011-12-13T05:28:06Z	-
dc.date.available	2011-12-13T05:28:06Z	-
dc.date.issued	2011	-
dc.identifier.uri	http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=466482&flag=dissertation	-
dc.identifier.uri	http://hdl.handle.net/10203/33342	-
dc.description	학위논문(박사) - 한국과학기술원 : 전산학과, 2011.2, [ vi, 57 p. ]	-
dc.description.abstract	Progress in various hardware and sensor technology has made new kind of management for data emerge. These data, being generated and growing over time continuously and rapidly, are referred to stream data. Stream data became a challenge for Knowledge Discovery and Data mining (KDD) due to their large size and dynamics in generation and processing. Even high-dimensional attributes and multi-valued categorical values found in recent stream data issues a new challenge in management and processing of them. When processing stream data, three aspects should be considered. First, the size of stream data is very large to fit a limited system memory. Second, stream data is seriously affected by time because it emerges in time line and the characteristics of it are subject to be changed. Furthermore, recent applications of stream data require more sophisticated processes on complicated data format like summarizing or finding hidden knowledge in it, not only simple data management or filtering process. Based on factors of processing of stream data, we suggested a sampling for limited memory, a clustering method for multi-valued categorical data in high-dimension space, and a method to detect evolution of characteristics of data and learn from it. We suggest a sampling method reflecting time feature of stream data based on Quantile system. The importance of data is apt to be dependent on data arrival rate. Our method samples more data in the data interval with high arrival rate. Our sampling method can be applied to sophisticated knowledge applications such as clustering from multi-sources and help them to reflect the characteristics of stream data effectively. We propose an effective method to quantify the level of dissimilarity of categorical values and developed a framework of unsupervised learning for high dimensional categorical data. Clustering is the most representative unsupervised learning in KDD to group similar data and to find out hidden information about the ch...	eng
dc.language	eng	-
dc.publisher	한국과학기술원	-
dc.subject	High dimensional data	-
dc.subject	Categorical data	-
dc.subject	Clustering	-
dc.subject	Stream data	-
dc.subject	Concept drift	-
dc.subject	속성 변화	-
dc.subject	고차원	-
dc.subject	범주 데이터	-
dc.subject	클러스터링	-
dc.subject	스트림 데이터	-
dc.title	(A) clustering with high dimensional categorical data for stream	-
dc.title.alternative	스트림 데이터를 위한 고차원 범주 데이터 클러스터링 기법	-
dc.type	Thesis(Ph.D)	-
dc.identifier.CNRN	466482/325007	-
dc.description.department	한국과학기술원 : 전산학과,	-
dc.identifier.uid	020025234	-
dc.contributor.localauthor	Lee, Yoon-Joon	-
dc.contributor.localauthor	이윤준	-

Appears in Collection: CS-Theses_Ph.D.(박사논문)

Files in This Item: There are no files associated with this item.

Display Simple Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

(A) clustering with high dimensional categorical data for stream스트림 데이터를 위한 고차원 범주 데이터 클러스터링 기법

KOASAS

Communities & Collections