Managing interference and scheduling deep learning tasks on consolidated GPU computing environment다중 GPU 환경에서의 딥 러닝 작업을 위한 성능 간섭 제어 및 작업 배치 기볍 연구

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 500
  • Download : 0
Deep-learning applications have gained popularity among service providers who wish to provide cognitive services to consumers. As deep learning gained popularity, ware-house scale servers have been developed and researched for guaranteeing Quality-of-Service (QoS) and throughput of deep learning tasks. Deep-learning tasks such as training and inference are compute-intensive tasks and accelerated by exploiting the parallelism that exists within. Therefore deep-learning applications are executed on ware-house scale server with GPUs, as means of acceleration. While deep-learning applications and related frameworks have been developed to make use of clustered resources, the key challenge of maximizing resource utilization, with limited amount of resource remains as a challenge. When there are not enough number of tasks to fully utilize the resources, servers are underutilized and the problem still exists when the running tasks are underutilizing GPU. In order to fully utilize resources on server, deep-learning tasks must be executed on a consolidated computing environment where multiple tasks can execute on multiple hardware and flexibly using available resources. However, the main problem on consolidated environments arises when latency-sensitive (LS) tasks receive interference from co-located tasks. Performance degradation is severe especially for compute intensive deep learning tasks. In order to prevent performance degradation, we propose an adaptive controlling method for adjusting the duration of co-located tasks with LS tasks in this dissertation. Since the main source of interference is queuing time, adjusting the time of halting co-located tasks is effective. The ratio of halting time to total execution time is determined adaptively to the performance degradation of the last LS task on the GPU. Additionally, imbalanced workload also hurts performance. In order to ensure balanced workload among GPUs, we propose greedy task scheduling and task migration. For evaluation, we have prepared a machine with multiple GPUs and Caffe implementation of Convolutional Neural Networks (CNN) were used as benchmarks throughout this study. The experimental results show that the proposed control method can prevent performance degradation and provide bandwidth for batch tasks by effectively sharing resources.
Advisors
Huh, Jae Hyukresearcher허재혁researcher
Description
한국과학기술원 :전산학부,
Publisher
한국과학기술원
Issue Date
2017
Identifier
325007
Language
eng
Description

학위논문(석사) - 한국과학기술원 : 전산학부, 2017.8,[iv, 30 p. :]

Keywords

GPU▼aDeep Learning▼aNeural Network▼aPerformance Isolation▼aScheduling; GPU▼a딥 러닝▼a뉴럴 네트워크▼a성능 보장▼a작업 배치

URI
http://hdl.handle.net/10203/243456
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=718736&flag=dissertation
Appears in Collection
CS-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0