DSpace at KOASAS: Super Floating-Point (SuFP): Multi-region piecewise quantization with scalable bias

DSpace at KOASAS

College of Engineering(공과대학)School of Electrical Engineering(전기및전자공학부)EE-Theses_Master(석사논문)

Super Floating-Point (SuFP): Multi-region piecewise quantization with scalable bias다중 구역 정밀도를 가진 확장 가능한 바이어스를 이용한 양자화 기법

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 1
Download : 0

Export

Ko, Geonwoo / 고건우

Deep Neural Networks (DNNs) are transforming numerous fields, but as they do so, the size of these models and their computational requirements are also growing at an exponential rate. In response to these challenges, various quantization techniques have emerged as highly effective solutions. However, quantization methods using conventional data types, including integer or floating-point, face certain limitations in balancing between accuracy drop and computational benefit. In light of the advent of hardware accelerator design for AI processing, quantization research has entered a new phase: custom data types and specialized hardware have emerged as innovative alternatives. Particularly, piecewise quantization and block floating-point quantization exhibit notable performance and efficiency improvements, but they still suffer from handling outliers with huge dynamic ranges. To solve this issue, we introduce Super Floating-Point (SuFP), a breakthrough data type and quantization method that improves both memory footprint and logic efficiency without compromising model accuracy. The key idea of SuFP is multi- region piecewise quantization using a tensor-wise scalable bias. It can configure an optimized precision for each region to capture both dense near-zero data and outliers. In addition, the scalable bias offers flexible adaptability to diverse data distributions, requiring only a single addition operation at the tensor level. Furthermore, the tailored hardware for SuFP employs only integer arithmetic units and shifters, facilitating a highly compact hardware realization. Our experimental results show that SuFP quantization achieves accuracy performance on par with, and in some cases even exceeds, that of full precision floating-point (FP32) across vision, language, and generative model benchmarks. Its computational capability and energy efficiency have shown improvements, with a 9.00× and 17.04× enhancement over FP32 implementations. These improvements are notable when compared to state-of-the-art MSFP and BSFP, which show up to 7.20× and up to 8.27×, respectively.

Advisors: 김주영 researcher

Description: 한국과학기술원 :전기및전자공학부,

Publisher: 한국과학기술원

Issue Date: 2024

Identifier: 325007

Language: eng

Description: 학위논문(석사) - 한국과학기술원 : 전기및전자공학부, 2024.2,[iv, 34 p. :]

Keywords: 훈련 후 양자화▼a조각별 양자화▼a블록 부동소수점 양자화▼a하드웨어 친화적 데이터 타입; Post-training quantization▼aPiecewise quantization▼aBlock floating-point quantization▼aHardware friendly data type

URI: http://hdl.handle.net/10203/321597

Link: http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1097169&flag=dissertation

Appears in Collection: EE-Theses_Master(석사논문)

Files in This Item: There are no files associated with this item.

Display Full Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Super Floating-Point (SuFP): Multi-region piecewise quantization with scalable bias다중 구역 정밀도를 가진 확장 가능한 바이어스를 이용한 양자화 기법

KOASAS

Communities & Collections