(A) study on variational inference-based deep learning algorithms for multi-source data with dataset shift다중 소스 데이터의 데이터셋 시프트를 고려한 변분추론 기반 심층 학습 기법에 관한 연구

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 4
  • Download : 0
With the rapid advancement of deep learning technology in recent years, the intelligence of services is being enhanced in various industries. Among them, time series prediction and anomaly detection are key modules used to build such intelligent services. For example, in the power field, time series prediction is essential for accurately predicting the future power consumption of individual buildings, and anomaly detection is used to improve the integrity and reliability of the power supply network. In maritime surveillance systems, time series prediction is used to forecast ship trajectories and anomaly detection is utilized to automatically detect illegal vessels and accidents, thereby enhancing maritime security. These applications of time series prediction and anomaly detection play a crucial role not only as standalone solutions but also as essential modules for control and management in various industry domains. As a result, deep learning-based models for time series prediction and anomaly detection considering the characteristics of each data have been extensively researched in various industrial fields. However, to effectively learn and apply these applications in real-world service domains, the consideration of a multi-source environment is necessary. Traditional deep neural network models assume that the training dataset is collected from the same distribution. However, in real services, training is performed using data collected from various sensors and client environments, and the service itself needs to independently consider various data sources. In such cases, dataset shift can occur. Dataset shift refers to the concept that includes distribution discrepancies between data sources, distribution shifts over time, and covariate changes according to environmental characteristics. This dissertation addresses the issue of distribution discrepancy between datasets collected from various sources and the distribution shift problem between the training and testing phases. To solve these problems, this dissertation proposes deep learning techniques considering the distribution discrepancy characteristics of data collected from multiple sources for time series prediction applications and anomaly detection models. Furthermore, techniques to address the dataset shift problem occurring after the model is deployed to edge devices are also proposed. In Chapter 3, a deep learning technique is proposed to address the distribution discrepancy and distribution shift problems of multi-source data in a time series power load prediction model in the power field. With the recent expansion of Advanced Metering Infrastructure (AMI) and smart meter installations, the power consumption of individual buildings is periodically measured at the terminal and transmitted to the central cloud. In the central cloud, the power consumption of individual buildings is predicted based on the real-time collected power data from multiple buildings, and this prediction is utilized for power supply and transactions. The data used for the load prediction model comes from various data sources, such as smart meters, which have different characteristics. Additionally, since performance degradation can occur due to distribution changes of source data during the testing phase of time series prediction, deep learning techniques are required. Therefore, to address these issues, Chapter 3 proposes a distribution estimation technique utilizing variational inference and real-time time series clustering models. Group-based model personalization based on the proposed distribution estimation is also suggested. This technique not only considers the distribution discrepancy between multi-sources but also addresses the temporal distribution changes of single-source data, resulting in robust prediction performance against concept drift. In Chapter 4, an anomaly detection technique is proposed that takes into account the distribution discrepancy of ship trajectory data collected from multiple ship AIS terminals in the maritime environment. Anomaly detection is one of the applications where estimating the data distribution is crucial due to its nature. In this dissertation, the occurrence of conditional distribution variations in multi-source data characteristics in ship route data is presented, and a deep learning technique is proposed that performs anomaly ship detection based on a more precise estimation of the normal data distribution. Specifically, the latent route condition variables of ships are estimated, and a model is proposed to evaluate the normalcy of a given route based on the estimated conditional data distribution using variational inference. Additionally, to ensure reliable maritime ship surveillance, a ship trajectory estimation technique is proposed that utilizes two heterogeneous datasets, AIS and satellite imagery, thus considering the data characteristics of multiple objects, in this case, ships. This demonstrates that the accuracy of deep network-based services can be improved by considering the characteristics of multiple objects. In Chapter 5, the dataset shift problem is addressed where the distribution of data changes when the model trained on the server is deployed to terminals, unlike the case where data from multiple sources is transmitted to the central cloud. To address this, methods are proposed to adapt the deployed model to the distribution of user data even when labels are not available. Specifically, the case is assumed where the model is trained with multi-source data when trained on the server, and learning techniques are proposed to overcome the limitations of existing studies that inevitably require source information from data collected from multiple sources. Unlike previous research where the size of the model increases as the number of source domains in the used dataset increases, the technique enables adaptation of the model regardless of the number of source domains, providing a suitable method for adaptation on terminal devices. In this dissertation, deep learning techniques are proposed that consider the distribution shift problem within a dataset when data is collected from multiple sources and used to train the model. The focus is on two applications: time series prediction and anomaly detection, where distribution changes and estimation are crucial. Not only the accuracy of the model is considered, but also learning techniques are proposed that take into account resource management efficiency from the perspective of service providers on the central server. Through this, robust learning techniques are proposed that address concept drift and covariate shift, which are common dataset shift problems. Particularly, performance improvements in anomaly detection applications sensitive to data distribution are confirmed by incorporating model training considering distribution discrepancies within the dataset.
Advisors
윤찬현researcher
Description
한국과학기술원 :전기및전자공학부,
Publisher
한국과학기술원
Issue Date
2023
Identifier
325007
Language
eng
Description

학위논문(박사) - 한국과학기술원 : 전기및전자공학부, 2023.8,[ix, 135 p. :]

Keywords

데이터셋 시프트▼a딥러닝 개별화▼a개별 부하 예측▼a선박 이상 항로 탐지▼a비지도 도메인 적응▼a테스트 타임 적응; dataset shift▼adeep learning personalization▼aindividual load forecasting▼avessel trajectory anomaly detection▼aunsupervised domain adaptation▼atest time adaptation

URI
http://hdl.handle.net/10203/320950
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1047246&flag=dissertation
Appears in Collection
EE-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0