PREMA: A Predictive Multi-task Scheduling Algorithm For Preemptible Neural Processing Units

Cited 71 time in webofscience Cited 32 time in scopus
  • Hit : 251
  • Download : 0
To amortize cost, cloud vendors providing DNN acceleration as a service to end-users employ consolidation and virtualization to share the underlying resources among multiple DNN service requests. This paper makes a case for a "preemptible" neural processing unit (NPU) and a "predictive" multi-task scheduler to meet the latency demands of high-priority inference while maintaining high throughput. We evaluate both the mechanisms that enable NPUs to be preemptible and the policies that utilize them to meet scheduling objectives. We show that preemptive NPU multi-tasking can achieve an average 7.8x, 1.4x, and 4.8x improvement in latency, throughput, and SLA satisfaction, respectively.
Publisher
IEEE
Issue Date
2020-02-24
Language
English
Citation

26th IEEE International Symposium on High Performance Computer Architecture, HPCA 2020, pp.220 - 233

ISSN
1530-0897
DOI
10.1109/HPCA47549.2020.00027
URI
http://hdl.handle.net/10203/276230
Appears in Collection
EE-Conference Papers(학술회의논문)
Files in This Item
There are no files associated with this item.
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡ Click to see webofscience_button
⊙ Cited 71 items in WoS Click to see citing articles in records_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0