FlexBlock: A Flexible DNN Training Accelerator With Multi-Mode Block Floating Point Support

Cited 2 time in webofscience Cited 0 time in scopus
  • Hit : 239
  • Download : 0
DC FieldValueLanguage
dc.contributor.authorNoh, Seock-Hwanko
dc.contributor.authorKoo, Jahyunko
dc.contributor.authorLee, Seunghyunko
dc.contributor.authorPark, Jongseko
dc.contributor.authorKung, Jaehako
dc.date.accessioned2023-08-28T08:00:22Z-
dc.date.available2023-08-28T08:00:22Z-
dc.date.created2023-08-28-
dc.date.issued2023-09-
dc.identifier.citationIEEE TRANSACTIONS ON COMPUTERS, v.72, no.9, pp.2522 - 2535-
dc.identifier.issn0018-9340-
dc.identifier.urihttp://hdl.handle.net/10203/311897-
dc.description.abstractWhen training deep neural networks (DNNs), expensive floating point arithmetic units are used in GPUs or custom neural processing units (NPUs). To reduce the burden of floating point arithmetic, community has started exploring the use of more efficient data representations, e.g., block floating point (BFP). The BFP format allows a group of values to share an exponent, which effectively reduces the memory footprint and enables cheaper fixed point arithmetic for multiply-accumulate (MAC) operations. However, existing BFP-based DNN accelerators are targeted for a specific precision, making them less versatile. In this paper, we present FlexBlock, a DNN training accelerator with three BFP modes, possibly different among activation, weight, and gradient tensors. By configuring FlexBlock to a lower BFP precision, the number of MACs handled by the core increases by up to 4x in 8-bit mode or 16x in 4-bit mode compared to 16-bit mode. To reach this theoretical upper bound, FlexBlock maximizes the core utilization at various precision levels or layer types, and allows dynamic precision control to keep throughput at its peak without sacrificing training accuracy. We evaluate the effectiveness of FlexBlock using representative DNNs on CIFAR, ImageNet and WMT14 datasets. As a result, training in FlexBlock significantly improves training speed by 1.5 ,, 5.3x and energy efficiency by 2.4 ,, 7.0x compared to other training accelerators.-
dc.languageEnglish-
dc.publisherIEEE COMPUTER SOC-
dc.titleFlexBlock: A Flexible DNN Training Accelerator With Multi-Mode Block Floating Point Support-
dc.typeArticle-
dc.identifier.wosid001047175700008-
dc.identifier.scopusid2-s2.0-85149901909-
dc.type.rimsART-
dc.citation.volume72-
dc.citation.issue9-
dc.citation.beginningpage2522-
dc.citation.endingpage2535-
dc.citation.publicationnameIEEE TRANSACTIONS ON COMPUTERS-
dc.identifier.doi10.1109/TC.2023.3253050-
dc.contributor.localauthorPark, Jongse-
dc.contributor.nonIdAuthorNoh, Seock-Hwan-
dc.contributor.nonIdAuthorKoo, Jahyun-
dc.contributor.nonIdAuthorLee, Seunghyun-
dc.contributor.nonIdAuthorKung, Jaeha-
dc.description.isOpenAccessN-
dc.type.journalArticleArticle-
dc.subject.keywordAuthorTraining-
dc.subject.keywordAuthorTensors-
dc.subject.keywordAuthorHardware-
dc.subject.keywordAuthorArithmetic-
dc.subject.keywordAuthorParallel processing-
dc.subject.keywordAuthorDeep learning-
dc.subject.keywordAuthorScalability-
dc.subject.keywordAuthorBlock floating point-
dc.subject.keywordAuthorDNN training accelerator-
dc.subject.keywordAuthorlow precision training-
dc.subject.keywordAuthorprecision scalability-
Appears in Collection
CS-Journal Papers(저널논문)
Files in This Item
There are no files associated with this item.
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡ Click to see webofscience_button
⊙ Cited 2 items in WoS Click to see citing articles in records_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0