DSpace at KOASAS: Efficient NLP executions by exploiting redundancies in transformer-based language models

DSpace at KOASAS

College of Engineering(공과대학)School of Electrical Engineering(전기및전자공학부)EE-Theses_Ph.D.(박사논문)

Efficient NLP executions by exploiting redundancies in transformer-based language models트랜스포머 기반 자연어 모델의 불필요한 중복성을 활용한 효율적인 자연어 처리

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 2
Download : 0

Export

Kang, Myeonggu / 강명구

Recently, along with significant advancements in algorithm performance, transformer-based language models have gained considerable prominence as de facto standard models in natural language processing (NLP) applications. These transformer-based language models possess deeper and more extensive structures compared to traditional deep neural network (DNN) models, necessitating a larger number of weight parameters and computational resources. It leads to substantial energy consumption and execution times when running transformer-based language models on resource-constrained mobile devices, ultimately limiting their practical usability. As a result, this thesis delves into the research of methods to efficiently operate transformer-based language models on diverse hardware platforms. To achieve this, this thesis initially demonstrates the presence of inherent redundancies in the execution of transformer-based language models based on tasks or input sentences. Among several inherent redundancies, this thesis particularly focuses on addressing 1. redundant self-attention operations, 2. redundant parameters within multi-task NLP models, and 3. the repetitiveness in decoder operations during word generation. The objective is to enable efficient execution of natural language processing applications. To utilize the inherent redundancies, the following approaches are proposed. Firstly, to mitigate redundant self-attention operations, an window-based self-attention mechanism is introduced by analyzing the characteristics of NLP applications. This approach significantly reduces the computational load of self-attention operations while maintaining algorithm performance. Secondly, to alleviate the problem of redundant parameters in multi-task NLP models, a strategy involving base-model sharing across multiple tasks and compression of task-specific parameters is suggested. This approach notably reduces the number of parameters required for running multi-task NLP models. Lastly, to reduce repetitive computations during word generation, a token-adaptive early exit technique is proposed. This technique effectively decreases the required number of transformer layers for each output word. Through these techniques, this research successfully mitigates the inherent redundancies within transformer-based language models, enabling the efficient execution of NLP applications while maintaining the algorithm performances.

Advisors: 김이섭 researcher

Description: 한국과학기술원 :전기및전자공학부,

Publisher: 한국과학기술원

Issue Date: 2024

Identifier: 325007

Language: eng

Description: 학위논문(박사) - 한국과학기술원 : 전기및전자공학부, 2024.2,[vii, 87 p. :]

Keywords: 딥-뉴럴 네트워크 가속기▼a자연어처리▼a트랜스포머 기반 자연어 모델; DNN accelerator▼aNatural language processing applications▼aTransformer-based language model

URI: http://hdl.handle.net/10203/322138

Link: http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1100038&flag=dissertation

Appears in Collection: EE-Theses_Ph.D.(박사논문)

Files in This Item: There are no files associated with this item.

Display Full Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Efficient NLP executions by exploiting redundancies in transformer-based language models트랜스포머 기반 자연어 모델의 불필요한 중복성을 활용한 효율적인 자연어 처리

KOASAS

Communities & Collections