Robust and efficient uniform sampling method for kernel-based algorithms커널 기반 알고리즘을 위한 균일 표집법

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 524
  • Download : 0
The importance of data mining and management grows as a vast number of data are accumulated. However, traditional data mining algorithms, such as clustering and locality-sensitive hashing, do not scale well in the presence of large and complex data sets. This is primarily due to the fact that a custom similarity function, known as the “kernel function”, is used to compute distances between data points with complex data types and non-linear relationships. In order to use kernelization, the original data mining algorithm must be re-formulated into a form that uses inner-products between the given data points. Unfortunately, such a re-formulation comes at the cost of expensive operations such as an eigen-decomposition of a large similarity matrix. In this work, we show how uniform sampling among the given data points can be used to address high computational complexities in kernelized data mining algorithms. In particular, we focus on three major algorithms used in data mining and retrieval: kernel k-means, kernel principal component analysis (KPCA), and locality-sensitive hashing (LSH). For the kernel k-means clustering, we use uniform sampling to compute a (1 + n-δ)-approximation of the per-iteration cost with complexity O(n^{1+δ}), for any δ ∈ (0, 1). For KPCA, we lower the complexity down to O(kn^{1+δ} +k^3), where k is the number of principal components, and prove that the reduced-size problem we solve is spectrally equivalent to the original KPCA problem. For the LSH algorithm we present, we address an additional issue of distribution-sensitivity, where the query time and accuracy varies depending on the underlying distribution of the data. We show that Voronoi-partitioning the data set around centers chosen uniformly at random yields stable and fast query time while maintaining high accuracy with fewer hash tables. We also show that our algorithm satisfies the locality-sensitive property. Through extensive experiments, we confirm that our algorithms are ve...
Advisors
Jung, Kyo-Minresearcher정교민
Description
한국과학기술원 : 전산학과,
Publisher
한국과학기술원
Issue Date
2013
Identifier
566042/325007  / 020097003
Language
eng
Description

학위논문(박사) - 한국과학기술원 : 전산학과, 2013.8, [ vi, 61 p. ]

Keywords

kernel; 해싱; 근사화; 균일 표집; 데이터마이닝; 커널; data mining; uniform sampling; approximation; hashing

URI
http://hdl.handle.net/10203/197805
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=566042&flag=dissertation
Appears in Collection
CS-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0