DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | Lee, Yoon-Joon | - |
dc.contributor.advisor | 이윤준 | - |
dc.contributor.author | Lee, Jeong-Hoon | - |
dc.contributor.author | 이정훈 | - |
dc.date.accessioned | 2011-12-13T05:28:06Z | - |
dc.date.available | 2011-12-13T05:28:06Z | - |
dc.date.issued | 2011 | - |
dc.identifier.uri | http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=466482&flag=dissertation | - |
dc.identifier.uri | http://hdl.handle.net/10203/33342 | - |
dc.description | 학위논문(박사) - 한국과학기술원 : 전산학과, 2011.2, [ vi, 57 p. ] | - |
dc.description.abstract | Progress in various hardware and sensor technology has made new kind of management for data emerge. These data, being generated and growing over time continuously and rapidly, are referred to stream data. Stream data became a challenge for Knowledge Discovery and Data mining (KDD) due to their large size and dynamics in generation and processing. Even high-dimensional attributes and multi-valued categorical values found in recent stream data issues a new challenge in management and processing of them. When processing stream data, three aspects should be considered. First, the size of stream data is very large to fit a limited system memory. Second, stream data is seriously affected by time because it emerges in time line and the characteristics of it are subject to be changed. Furthermore, recent applications of stream data require more sophisticated processes on complicated data format like summarizing or finding hidden knowledge in it, not only simple data management or filtering process. Based on factors of processing of stream data, we suggested a sampling for limited memory, a clustering method for multi-valued categorical data in high-dimension space, and a method to detect evolution of characteristics of data and learn from it. We suggest a sampling method reflecting time feature of stream data based on Quantile system. The importance of data is apt to be dependent on data arrival rate. Our method samples more data in the data interval with high arrival rate. Our sampling method can be applied to sophisticated knowledge applications such as clustering from multi-sources and help them to reflect the characteristics of stream data effectively. We propose an effective method to quantify the level of dissimilarity of categorical values and developed a framework of unsupervised learning for high dimensional categorical data. Clustering is the most representative unsupervised learning in KDD to group similar data and to find out hidden information about the ch... | eng |
dc.language | eng | - |
dc.publisher | 한국과학기술원 | - |
dc.subject | High dimensional data | - |
dc.subject | Categorical data | - |
dc.subject | Clustering | - |
dc.subject | Stream data | - |
dc.subject | Concept drift | - |
dc.subject | 속성 변화 | - |
dc.subject | 고차원 | - |
dc.subject | 범주 데이터 | - |
dc.subject | 클러스터링 | - |
dc.subject | 스트림 데이터 | - |
dc.title | (A) clustering with high dimensional categorical data for stream | - |
dc.title.alternative | 스트림 데이터를 위한 고차원 범주 데이터 클러스터링 기법 | - |
dc.type | Thesis(Ph.D) | - |
dc.identifier.CNRN | 466482/325007 | - |
dc.description.department | 한국과학기술원 : 전산학과, | - |
dc.identifier.uid | 020025234 | - |
dc.contributor.localauthor | Lee, Yoon-Joon | - |
dc.contributor.localauthor | 이윤준 | - |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.