(A) clustering with high dimensional categorical data for stream스트림 데이터를 위한 고차원 범주 데이터 클러스터링 기법

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 675
  • Download : 0
DC FieldValueLanguage
dc.contributor.advisorLee, Yoon-Joon-
dc.contributor.advisor이윤준-
dc.contributor.authorLee, Jeong-Hoon-
dc.contributor.author이정훈-
dc.date.accessioned2011-12-13T05:28:06Z-
dc.date.available2011-12-13T05:28:06Z-
dc.date.issued2011-
dc.identifier.urihttp://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=466482&flag=dissertation-
dc.identifier.urihttp://hdl.handle.net/10203/33342-
dc.description학위논문(박사) - 한국과학기술원 : 전산학과, 2011.2, [ vi, 57 p. ]-
dc.description.abstractProgress in various hardware and sensor technology has made new kind of management for data emerge. These data, being generated and growing over time continuously and rapidly, are referred to stream data. Stream data became a challenge for Knowledge Discovery and Data mining (KDD) due to their large size and dynamics in generation and processing. Even high-dimensional attributes and multi-valued categorical values found in recent stream data issues a new challenge in management and processing of them. When processing stream data, three aspects should be considered. First, the size of stream data is very large to fit a limited system memory. Second, stream data is seriously affected by time because it emerges in time line and the characteristics of it are subject to be changed. Furthermore, recent applications of stream data require more sophisticated processes on complicated data format like summarizing or finding hidden knowledge in it, not only simple data management or filtering process. Based on factors of processing of stream data, we suggested a sampling for limited memory, a clustering method for multi-valued categorical data in high-dimension space, and a method to detect evolution of characteristics of data and learn from it. We suggest a sampling method reflecting time feature of stream data based on Quantile system. The importance of data is apt to be dependent on data arrival rate. Our method samples more data in the data interval with high arrival rate. Our sampling method can be applied to sophisticated knowledge applications such as clustering from multi-sources and help them to reflect the characteristics of stream data effectively. We propose an effective method to quantify the level of dissimilarity of categorical values and developed a framework of unsupervised learning for high dimensional categorical data. Clustering is the most representative unsupervised learning in KDD to group similar data and to find out hidden information about the ch...eng
dc.languageeng-
dc.publisher한국과학기술원-
dc.subjectHigh dimensional data-
dc.subjectCategorical data-
dc.subjectClustering-
dc.subjectStream data-
dc.subjectConcept drift-
dc.subject속성 변화-
dc.subject고차원-
dc.subject범주 데이터-
dc.subject클러스터링-
dc.subject스트림 데이터-
dc.title(A) clustering with high dimensional categorical data for stream-
dc.title.alternative스트림 데이터를 위한 고차원 범주 데이터 클러스터링 기법-
dc.typeThesis(Ph.D)-
dc.identifier.CNRN466482/325007 -
dc.description.department한국과학기술원 : 전산학과, -
dc.identifier.uid020025234-
dc.contributor.localauthorLee, Yoon-Joon-
dc.contributor.localauthor이윤준-
Appears in Collection
CS-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0