Efficient and accurate eigen-decomposition of large-scale PSD matrices via sample subspace compression샘플 부분 공간 압축을 통한 효율적이고 정확한 대규모 양의 준정부호 행렬 고유 분해

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 489
  • Download : 0
Nystr$\ddot{o}$m method is a sampling based method for spectral decomposition of positive semi-definite (PSD) matrices, and is widely used in kernel-based machine learning for large-scale data sets.Since its introduction, there has been a large body of work that improves the approximation accuracy while maintaining computational efficiency. In this paper, we present novel Nystr$\ddot{o}$m schemes that improve both accuracy and efficiency based on a new theoretical analysis.We first prove that One-shot Nystr$\ddot{o}$m Method (ONM) which is one of the existing Nystr$\ddot{o}$m methods solves sample-based kernel PCA problem given the sample subspace,and suggest that the subspace distance measure is important for accuracy of Nystr$\ddot{o}$m methods.We then prove novel upper error bounds based on subspace distance measure, and propose Principal Subspace Approximation (PSA) sampling that minimizes our error bounds based on the notion of compression of sample matrices with sparse representation.By combining the ONM and PSA sampling, we present our Double Nystr$\ddot{o}$m Method (DNM) that efficiently reduces the size of the decomposition problem in two stages.We report the results of extensive experiments that provide a detailed comparison of various sampling strategies and our PSA sampling, and show that PSA sampling is superior even to the sampling strategies that use clustering algorithms in terms of both accuracy and efficiency.We also demonstrate our DNM is highly efficient and accurate compared to other state-of-the-art Nystr$\ddot{o}$m methods for large-scale data sets.Next, we generalize DNM, and present Nested Nystr$\ddot{o}$m Method (NNM) which is a multilayer method based on a nested sequence of subsamples and multiple compressions.To compute spectral decomposition of PSD matrices,it compresses sample matrices and solves a smaller sized optimization problem, and updates the eigenspace on each layer.We prove that its upper error bound decreases as we use additional layers. Experimental results show that NNM is more accurate than DNM within the same short time.Finally, we tackle the local triangle counting problem on graph streams by using Nystr$\ddot{o}$m extension. We first derive a local triangle counting algorithm based on Nystr$\ddot{o}$m method, and design MELTING-U which is a memory-efficient and accurate local triangle counting algorithm on graph streams. We also propose a fast version of MELTING-U, called MELTING. By using DNM, we show that MELTING-U and MELTING are memory-efficient and more accurate compared to the competitive algorithms on a number of real data sets..
Advisors
Bae, Doo Hwanresearcher배두환researcherPark, Haesunresearcher박혜선researcher
Description
한국과학기술원 :전산학부,
Publisher
한국과학기술원
Issue Date
2017
Identifier
325007
Language
eng
Description

학위논문(박사) - 한국과학기술원 : 전산학부, 2017.8,[vi, 69 p. :]

Keywords

Positive Semi-Definite Matrix▼aEigen-decomposition▼aNystr$\ddot{o}$m method▼aLarge-Scale Learning▼aKernel Methods▼aLow-Rank Approximation; 양의 준정부호 행렬▼a고유 분해▼a나이스트롬 기법▼a대규모 학습▼a커널 기법▼a낮은 계수 근사법

URI
http://hdl.handle.net/10203/242098
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=718889&flag=dissertation
Appears in Collection
CS-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0