Efficient NLP executions by exploiting redundancies in transformer-based language models트랜스포머 기반 자연어 모델의 불필요한 중복성을 활용한 효율적인 자연어 처리

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 2
  • Download : 0
Recently, along with significant advancements in algorithm performance, transformer-based language models have gained considerable prominence as de facto standard models in natural language processing (NLP) applications. These transformer-based language models possess deeper and more extensive structures compared to traditional deep neural network (DNN) models, necessitating a larger number of weight parameters and computational resources. It leads to substantial energy consumption and execution times when running transformer-based language models on resource-constrained mobile devices, ultimately limiting their practical usability. As a result, this thesis delves into the research of methods to efficiently operate transformer-based language models on diverse hardware platforms. To achieve this, this thesis initially demonstrates the presence of inherent redundancies in the execution of transformer-based language models based on tasks or input sentences. Among several inherent redundancies, this thesis particularly focuses on addressing 1. redundant self-attention operations, 2. redundant parameters within multi-task NLP models, and 3. the repetitiveness in decoder operations during word generation. The objective is to enable efficient execution of natural language processing applications. To utilize the inherent redundancies, the following approaches are proposed. Firstly, to mitigate redundant self-attention operations, an window-based self-attention mechanism is introduced by analyzing the characteristics of NLP applications. This approach significantly reduces the computational load of self-attention operations while maintaining algorithm performance. Secondly, to alleviate the problem of redundant parameters in multi-task NLP models, a strategy involving base-model sharing across multiple tasks and compression of task-specific parameters is suggested. This approach notably reduces the number of parameters required for running multi-task NLP models. Lastly, to reduce repetitive computations during word generation, a token-adaptive early exit technique is proposed. This technique effectively decreases the required number of transformer layers for each output word. Through these techniques, this research successfully mitigates the inherent redundancies within transformer-based language models, enabling the efficient execution of NLP applications while maintaining the algorithm performances.
Advisors
김이섭researcher
Description
한국과학기술원 :전기및전자공학부,
Publisher
한국과학기술원
Issue Date
2024
Identifier
325007
Language
eng
Description

학위논문(박사) - 한국과학기술원 : 전기및전자공학부, 2024.2,[vii, 87 p. :]

Keywords

딥-뉴럴 네트워크 가속기▼a자연어처리▼a트랜스포머 기반 자연어 모델; DNN accelerator▼aNatural language processing applications▼aTransformer-based language model

URI
http://hdl.handle.net/10203/322138
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1100038&flag=dissertation
Appears in Collection
EE-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0