DSpace at KOASAS: Cooperative Distributed GPU Power Capping for Deep Learning Clusters

DSpace at KOASAS

College of Engineering(공과대학)School of Electrical Engineering(전기및전자공학부)EE-Journal Papers(저널논문)

Cooperative Distributed GPU Power Capping for Deep Learning Clusters

Cited 5 time in

Cited 0 time in

Hit : 1480
Download : 0

Export

DC Field	Value	Language
dc.contributor.author	Kang, Dong-Ki	ko
dc.contributor.author	Ha, Yungi	ko
dc.contributor.author	Peng. Limei	ko
dc.contributor.author	Youn, Chan-Hyun	ko
dc.date.accessioned	2022-02-25T06:41:08Z	-
dc.date.available	2022-02-25T06:41:08Z	-
dc.date.created	2021-09-09	-
dc.date.created	2021-09-09	-
dc.date.created	2021-09-09	-
dc.date.issued	2022-07	-
dc.identifier.citation	IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, v.69, no.7, pp.7244 - 7254	-
dc.identifier.issn	1557-9948	-
dc.identifier.uri	http://hdl.handle.net/10203/292394	-
dc.description.abstract	network (DNN) models, and high computational complexity. Thus, the traditional power capping methods for CPU-based clusters or small-scale GPU devices do not apply to the GPU-based clusters handling DL tasks. This paper develops a cooperative distributed GPU power capping (CD-GPC) system for GPU-based clusters, aiming to minimize the training completion time of invoked DL tasks without exceeding the limited power budget. Specifically, we first design the frequency scaling (FS) approach using the online model estimation based on the recursive least square (RLS) method. This approach achieves the accurate tuning for DL task training time and power usage of GPU devices without needing offline profiling. Then, we formulate the proposed FS problem as a Lagrangian dual decomposition-based economic model predictive control (EMPC) problem for large-scale heterogeneous GPU clusters. We conduct both the NVIDIA GPU-based lab-scale real experiments and real job trace-based simulation experiments for performance evaluation. Experimental results validate that the proposed system improves the power capping accuracy to have a mean absolute error <1%, and reduces the deadline violation ratio of invoked DL tasks by 21.5% compared with other recent counterparts.	-
dc.language	English	-
dc.publisher	IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC	-
dc.title	Cooperative Distributed GPU Power Capping for Deep Learning Clusters	-
dc.type	Article	-
dc.identifier.wosid	000753527500074	-
dc.identifier.scopusid	2-s2.0-85110848368	-
dc.type.rims	ART	-
dc.citation.volume	69	-
dc.citation.issue	7	-
dc.citation.beginningpage	7244	-
dc.citation.endingpage	7254	-
dc.citation.publicationname	IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS	-
dc.identifier.doi	10.1109/TIE.2021.3095790	-
dc.contributor.localauthor	Youn, Chan-Hyun	-
dc.contributor.nonIdAuthor	Kang, Dong-Ki	-
dc.contributor.nonIdAuthor	Peng. Limei	-
dc.description.isOpenAccess	N	-
dc.subject.keywordAuthor	Deep learning (DL) cluster	-
dc.subject.keywordAuthor	Economic model predictive control (EMPC)	-
dc.subject.keywordAuthor	GPU power capping	-
dc.subject.keywordAuthor	Lagrangian dual decomposition	-
dc.subject.keywordAuthor	Lipschitz continuity	-

Appears in Collection: EE-Journal Papers(저널논문)

Files in This Item: There are no files associated with this item.

This item is cited by other documents in WoS

⊙ Detail Information in WoSⓡ	Click to see
⊙ Cited 5 items in WoS	Click to see citing articles in

Display Simple Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Cooperative Distributed GPU Power Capping for Deep Learning Clusters

This item is cited by other documents in WoS

KOASAS

Communities & Collections