(An) FPGA-based preprocessing system for GPU-partitioned machine learning inference serverGPU 분할 기술이 적용된 머신러닝 추론 서버를 위한 FPGA 기반의 전처리 시스템

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 5
  • Download : 0
In machine learning inference servers, unlike training servers, inference requests are irregularly allocated and must be completed within a limited time. Consequently, operations are performed with small batch sizes, leading to inefficient utilization of GPU resources. Recent advancements in GPUs provide partitioning technology, allowing the efficient use of GPU resources by dividing a single hardware resource into independent hardware of suitable sizes for users. As this technology is implemented in inference servers, there is an increase in the processing capacity and resource utilization of GPUs. However, this leads to a bottleneck in the preprocessing stage on the CPU associated with inference requests. In this thesis research, an analysis of the bottleneck points of preprocessing stage in GPU-partitioned machine learning inference server is conducted, and proposes FPGA-based hardware design to offload the data preprocessing to increase the overall processing throughput of the ML inference server.
Advisors
유민수researcher
Description
한국과학기술원 :전기및전자공학부,
Publisher
한국과학기술원
Issue Date
2024
Identifier
325007
Language
eng
Description

학위논문(석사) - 한국과학기술원 : 전기및전자공학부, 2024.2,[iv, 25 p. :]

Keywords

중앙처리장치▼a그래픽처리장치▼aFPGA▼a머신러닝 추론 서버▼a그래픽처리장치 분할기술; CPU▼aGPU▼aFPGA▼aInference server▼aGPU partitioning technique

URI
http://hdl.handle.net/10203/321605
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1097177&flag=dissertation
Appears in Collection
EE-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0