Accelerating similarity- and model-based outlier detection from a data stream데이터 스트림에서의 유사도 및 모델 기반 이상치 탐지 가속화

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 300
  • Download : 0
Recently, the advancement of network technologies for cloud computing, Internet of Things, and the 5G mobile communication is combined with the advancement of hardware technologies for semiconductors and sensors, and they together facilitate the collection, management, and processing of data streams generated in real time. As a result, there has been growing needs for technologies for rapid acquisition of valuable information from real-time data streams in various industries. Particularly, outlier detection techniques for finding abnormal data points that deviate significantly from normal data points are widely used in many applications, such as finance, manufacturing, healthcare, etc. This dissertation research aims to detect various types of outliers with high accuracy and low latency, mainly by preventing redundant updates incurred by the existing algorithms. There are two representative outlier detection approaches, similarity-based and model-based; the former approach measures the similarity between data points, whereas the latter approach learns parameters to explain the properties of data points. This dissertation addresses two main challenges in these outlier detection approaches with sliding windows: immediacy and complexity. The immediacy refers to the need for fast detection of outlier while continuously updating similarities between data points in sliding windows or updating a model to reflect the properties of data points. The complexity refers to differing types and related accuracy criteria of outliers, such as global, local, and high-dimensional; these each pose different constraints and objectives of outlier detection performance. Based on similarity measures and models suitable for each type of outlier, this dissertation presents four different studies of efficient updates of the similarities and models. The first study proposes an algorithm NETS, which implements a set-based update of distance-based similarity to reduce redundant computations; it detected global outliers 17 times faster on average than state-of-the-art algorithms. The second study proposes an algorithm MDUAL, which processes multiple and dynamic distance-based outlier detection queries by exploiting the duality of data grouping and query grouping; it detected global outliers 217 times faster on average than state-of-the-art algorithms. The third study proposes an algorithm STARE, which uses density-based similarity to detect local outliers efficiently by employing stationary region skipping; it detected local outliers 11 times faster on average than state-of-the-art algorithms. The fourth study proposes a deep learning model-based outlier detection framework ARCUS, which uses a model pooling approach to detect high-dimensional outliers that cannot be easily identified by comparing similarities; it demonstrated higher accuracy and efficiency than state-of-the-art algorithms. This dissertation is expected to bring great values to many real-world applications by resolving the immediacy and complexity challenges of outlier detection in a data stream.
Advisors
Lee, Jae-Gilresearcher이재길researcher
Description
한국과학기술원 :지식서비스공학대학원,
Publisher
한국과학기술원
Issue Date
2021
Identifier
325007
Language
eng
Description

학위논문(박사) - 한국과학기술원 : 지식서비스공학대학원, 2021.2,[v, 107 p. :]

Keywords

outlier▼adata stream▼asimilarity-based outlier detection▼amodel-based outlier detection; 이상치▼a데이터 스트림▼a유사도 기반 이상치 탐지▼a모델 기반 이상치 탐지

URI
http://hdl.handle.net/10203/295763
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=956348&flag=dissertation
Appears in Collection
KSE-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0