DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | 김이섭 | - |
dc.contributor.author | Kang, Myeonggu | - |
dc.contributor.author | 강명구 | - |
dc.date.accessioned | 2024-08-08T19:31:33Z | - |
dc.date.available | 2024-08-08T19:31:33Z | - |
dc.date.issued | 2024 | - |
dc.identifier.uri | http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1100038&flag=dissertation | en_US |
dc.identifier.uri | http://hdl.handle.net/10203/322138 | - |
dc.description | 학위논문(박사) - 한국과학기술원 : 전기및전자공학부, 2024.2,[vii, 87 p. :] | - |
dc.description.abstract | Recently, along with significant advancements in algorithm performance, transformer-based language models have gained considerable prominence as de facto standard models in natural language processing (NLP) applications. These transformer-based language models possess deeper and more extensive structures compared to traditional deep neural network (DNN) models, necessitating a larger number of weight parameters and computational resources. It leads to substantial energy consumption and execution times when running transformer-based language models on resource-constrained mobile devices, ultimately limiting their practical usability. As a result, this thesis delves into the research of methods to efficiently operate transformer-based language models on diverse hardware platforms. To achieve this, this thesis initially demonstrates the presence of inherent redundancies in the execution of transformer-based language models based on tasks or input sentences. Among several inherent redundancies, this thesis particularly focuses on addressing 1. redundant self-attention operations, 2. redundant parameters within multi-task NLP models, and 3. the repetitiveness in decoder operations during word generation. The objective is to enable efficient execution of natural language processing applications. To utilize the inherent redundancies, the following approaches are proposed. Firstly, to mitigate redundant self-attention operations, an window-based self-attention mechanism is introduced by analyzing the characteristics of NLP applications. This approach significantly reduces the computational load of self-attention operations while maintaining algorithm performance. Secondly, to alleviate the problem of redundant parameters in multi-task NLP models, a strategy involving base-model sharing across multiple tasks and compression of task-specific parameters is suggested. This approach notably reduces the number of parameters required for running multi-task NLP models. Lastly, to reduce repetitive computations during word generation, a token-adaptive early exit technique is proposed. This technique effectively decreases the required number of transformer layers for each output word. Through these techniques, this research successfully mitigates the inherent redundancies within transformer-based language models, enabling the efficient execution of NLP applications while maintaining the algorithm performances. | - |
dc.language | eng | - |
dc.publisher | 한국과학기술원 | - |
dc.subject | 딥-뉴럴 네트워크 가속기▼a자연어처리▼a트랜스포머 기반 자연어 모델 | - |
dc.subject | DNN accelerator▼aNatural language processing applications▼aTransformer-based language model | - |
dc.title | Efficient NLP executions by exploiting redundancies in transformer-based language models | - |
dc.title.alternative | 트랜스포머 기반 자연어 모델의 불필요한 중복성을 활용한 효율적인 자연어 처리 | - |
dc.type | Thesis(Ph.D) | - |
dc.identifier.CNRN | 325007 | - |
dc.description.department | 한국과학기술원 :전기및전자공학부, | - |
dc.contributor.alternativeauthor | Kim, Lee-Sup | - |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.