Workload characterization for gpu partitioning-based machine learning inference serverGPU 분할 기술이 적용된 머신 러닝 추론 서버의 특성 분석 및 활용 방안 연구

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 4
  • Download : 0
Maximizing the utilization of graphics processing unit (GPU) resources becomes a crucial factor in directly reducing a data center’s Total cost of ownership (TCO). To address this, GPU partitioning technology has been developed, enabling the simultaneous execution of multiple workloads by dividing a single GPU. However, research on the analysis of characteristics when GPU partitioning technology is applied in real-world machine learning inference systems has not been actively conducted. This thesis analyzes workloads when utilizing GPU partitioning technology to enhance the efficiency of machine learning inference systems. Specifically, we focus on aspects such as resource utilization, throughput, and latency in the context of GPU partitioning. Based on the characterization results, we propose an efficient batching system for GPU partitioning-based machine learning inference to optimize the performance of the overall system further.
Advisors
유민수researcher
Description
한국과학기술원 :전기및전자공학부,
Publisher
한국과학기술원
Issue Date
2024
Identifier
325007
Language
eng
Description

학위논문(석사) - 한국과학기술원 : 전기및전자공학부, 2024.2,[iv, 25 p. :]

Keywords

머신 러닝▼a그래픽 처리 장치 분할 기술▼a머신 러닝 추론▼a배칭; Machine learning▼aGPU partitioning▼aInference▼aBatching

URI
http://hdl.handle.net/10203/321777
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1097302&flag=dissertation
Appears in Collection
EE-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0