I/O acceleration schemes for SSD swap memory and decoding cache in DL-based image processingSSD 스왑 메모리 및 딥 러닝 기반 이미지 처리 시스템의 성능과 공정성을 위한 I/O 관리 연구

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 456
  • Download : 0
With the advancement of big data processing, HPC, and deep learning-based AI technology, various applications such as autonomous driving, disaster prediction, face recognition, drone control, and satellite image analysis that were impossible to process in the existing computing environment are being realized with high accuracy. In such a computing environment, in order to achieve the purpose and increase accuracy, a large amount of input data must be processed and real-time responsiveness must be guaranteed. However, there is an urgent need to improve I/O performance, which is responsible for I/O of data storage, compared to the rapidly developing computational power. In general, SSDs with sequential read and write speeds of several GB/s are advancing to overcome the performance of HDDs with I/O speeds of 150MB/s. However, it is dozens of times slower compared to DRAM main memory with I/O performance of 50 GB/s, and becomes a big bottleneck in determining overall system performance. The computational process represented by a multi-core CPU and thousands of GPGPU core operations can increase performance through parallel processing, but if the bottleneck of I/O is not solved, the overall system performance improvement will be limited. In order to solve this problem, an I/O framework such as Nvidia's DALI has been recently proposed, but there remains a limited range of application only for deep learning tasks that operate only on Nvidia GPUs and fairness issues in a multi-GPU environment. Meanwhile, a large amount of data processing space is required to support these multi-cores. DRAM, which has been used as the main memory, is difficult to expand infinitely due to limitations in micronization processes, limitations in power consumption, and price per capacity. In accordance with the current trend in which the number of cores is increasing in order to overcome the situation in which the core operation clock is stagnating due to heat generation and power consumption problems, the capacity problem of the main DRAM memory is intensifying. The problem of impairing fairness between tasks when a number of tasks are performed at the same time is an urgent problem because a starvation phenomenon may occur in which the progress speed of a specific task is unpredictably slowed down. For example, in a cloud computing environment, there may be a problem that only some of the tasks of users with the same billing method are slowed down. In this dissertation, we proposed an I/O acceleration technique to solve these problems and studied a shared resource allocation method that guarantees fairness while supplementing the gap between computational power and data processing performance. First, we propose a technique that utilizes SSD as a storage class memory to solve the capacity expansion problem faced by DRAM main memory in current general-purpose computing. The SSD is used as a swap memory along with the DRAM main memory to form an inexpensive and power-efficient hybrid memory, and the existing swap method is optimized for HDD, solving the performance and fairness problems arising. If the existing swap method is applied to an SSD, the hit rate is lowered and performance is degraded. Therefore, the swap I/O size is adaptively managed by utilizing the performance characteristics of the SSD to improve performance. In addition, we propose a swap I/O equalization technique to solve the problem of starvation and impairment of fairness caused by the unbalanced frequency of page faults in the swap operation. A DRAM-SSD cache for accelerating data preprocessing in deep learning-based image learning tasks is proposed to solve the I/O bottleneck problem. Since deep learning-based image training has a training dataset of hundreds of GB, it is difficult to retain data in main memory due to its capacity limits. Therefore, even if the existing I/O framework is used, there is a problem that data required for learning must be repeatedly generated every time, and this time occupies a large part of the entire learning process. To solve these problems, we propose a DRAM-SSD-based decoding cache and an algorithm for prefetching this cache. In this dissertation, we showed the performance evaluation of the proposed techniques. The system performance is significantly improved compared to the traditional swap method of Linux and the performance imbalance between tasks is greatly reduced to prevent starvation. In addition, in deep learning-based image training workloads, the image decoding cache can significantly reduce the waiting time for input data of the GPU, which shortens the training time. It is judged that the techniques proposed in this dissertation guarantee improved performance even in the condition of the huge gap between I/O and calculation performance.
Advisors
Youn, Chan-Hyunresearcher윤찬현researcherPark, Kyu Horesearcher박규호researcher
Description
한국과학기술원 :전기및전자공학부,
Publisher
한국과학기술원
Issue Date
2021
Identifier
325007
Language
eng
Description

학위논문(박사) - 한국과학기술원 : 전기및전자공학부, 2021.2,[iv, 98 p. :]

Keywords

flash memory▼aSSD▼aoperating system▼aDeep Learning▼aimage processing; 플래시메모리▼aSSD▼a운영체제▼a딥 러닝▼a이미지처리

URI
http://hdl.handle.net/10203/295677
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=956630&flag=dissertation
Appears in Collection
EE-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0