Efficient NLP executions by exploiting redundancies in transformer-based language models트랜스포머 기반 자연어 모델의 불필요한 중복성을 활용한 효율적인 자연어 처리

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 3
  • Download : 0
DC FieldValueLanguage
dc.contributor.advisor김이섭-
dc.contributor.authorKang, Myeonggu-
dc.contributor.author강명구-
dc.date.accessioned2024-08-08T19:31:33Z-
dc.date.available2024-08-08T19:31:33Z-
dc.date.issued2024-
dc.identifier.urihttp://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1100038&flag=dissertationen_US
dc.identifier.urihttp://hdl.handle.net/10203/322138-
dc.description학위논문(박사) - 한국과학기술원 : 전기및전자공학부, 2024.2,[vii, 87 p. :]-
dc.description.abstractRecently, along with significant advancements in algorithm performance, transformer-based language models have gained considerable prominence as de facto standard models in natural language processing (NLP) applications. These transformer-based language models possess deeper and more extensive structures compared to traditional deep neural network (DNN) models, necessitating a larger number of weight parameters and computational resources. It leads to substantial energy consumption and execution times when running transformer-based language models on resource-constrained mobile devices, ultimately limiting their practical usability. As a result, this thesis delves into the research of methods to efficiently operate transformer-based language models on diverse hardware platforms. To achieve this, this thesis initially demonstrates the presence of inherent redundancies in the execution of transformer-based language models based on tasks or input sentences. Among several inherent redundancies, this thesis particularly focuses on addressing 1. redundant self-attention operations, 2. redundant parameters within multi-task NLP models, and 3. the repetitiveness in decoder operations during word generation. The objective is to enable efficient execution of natural language processing applications. To utilize the inherent redundancies, the following approaches are proposed. Firstly, to mitigate redundant self-attention operations, an window-based self-attention mechanism is introduced by analyzing the characteristics of NLP applications. This approach significantly reduces the computational load of self-attention operations while maintaining algorithm performance. Secondly, to alleviate the problem of redundant parameters in multi-task NLP models, a strategy involving base-model sharing across multiple tasks and compression of task-specific parameters is suggested. This approach notably reduces the number of parameters required for running multi-task NLP models. Lastly, to reduce repetitive computations during word generation, a token-adaptive early exit technique is proposed. This technique effectively decreases the required number of transformer layers for each output word. Through these techniques, this research successfully mitigates the inherent redundancies within transformer-based language models, enabling the efficient execution of NLP applications while maintaining the algorithm performances.-
dc.languageeng-
dc.publisher한국과학기술원-
dc.subject딥-뉴럴 네트워크 가속기▼a자연어처리▼a트랜스포머 기반 자연어 모델-
dc.subjectDNN accelerator▼aNatural language processing applications▼aTransformer-based language model-
dc.titleEfficient NLP executions by exploiting redundancies in transformer-based language models-
dc.title.alternative트랜스포머 기반 자연어 모델의 불필요한 중복성을 활용한 효율적인 자연어 처리-
dc.typeThesis(Ph.D)-
dc.identifier.CNRN325007-
dc.description.department한국과학기술원 :전기및전자공학부,-
dc.contributor.alternativeauthorKim, Lee-Sup-
Appears in Collection
EE-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0