Efficient time-series subsequence matching using duality in constructing windows윈도우를 구성하는 방법의 이원성을 이용한 효율적인 시계열 서브시퀀스 매칭

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 541
  • Download : 0
Subsequence matching in time-series databases is an important problem in data mining and has attracted a lot of research interest. It is a problem of finding the data sequences containing subsequences similar to a given query sequence and of finding the offsets of these subsequences in the original data sequences. In this dissertation, we first propose a new subsequence matching method, Dual Match}, which exploits duality in constructing windows and significantly improves performance. Dual Match divides data sequences into disjoint windows and the query sequence into sliding windows, and thus, is a dual approach of the one by Faloutsos et al. (FRM in short), which divides data sequences into sliding windows and the query sequence into disjoint windows. We formally prove that our dual approach is correct, i.e., it incurs no false dismissal. We also prove that, given the minimum query length, there is a maximum bound of the window size to guarantee correctness of Dual Match and discuss the effect of the window size on performance. FRM causes a lot of false alarms (i.e., candidates that do not qualify) due to lack of point-filtering effect by storing minimum bounding rectangles (MBRs) rather than individual points representing windows to save storage space for the index. Using MBRs only causes false alarms by not allowing point-to-point comparison (which we call the point-filtering effect}) for checking the distances. Dual Match solves this problem by directly storing points without incurring excessive storage overhead. Experimental results show that, in most cases, Dual Match provides large improvement both in false alarms and in performance over FRM given the same amount of storage space. In particular, for low selectivities (less than $10^{-4}$), Dual Match drastically reduces the number of candidates-down to as little as $\frac{1}{8800}$ of that for FRM-reduces the number of page accesses by up to 26.9 times, and improves performance up to 430-fold. On the oth...
Advisors
Whang, Kyu-Youngresearcher황규영researcher
Description
한국과학기술원 : 전산학전공,
Publisher
한국과학기술원
Issue Date
2001
Identifier
169620/325007 / 000975106
Language
eng
Description

학위논문(박사) - 한국과학기술원 : 전산학전공, 2001.8, [ x, 96 p. ]

Keywords

Similarity Search; Data Mining; Time-Series Data; Subsequence Matching; Dual Match; 듀얼 매치; 유사 검색; 데이타 마이닝; 시계열 데이타; 서브시퀀스 매칭

URI
http://hdl.handle.net/10203/33349
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=169620&flag=dissertation
Appears in Collection
CS-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0