DSpace at KOASAS: Roles and Utilization of Attention Heads in Transformer-based Neural Language Models

DSpace at KOASAS

College of Engineering(공과대학)School of Computing(전산학부)CS-Conference Papers(학술회의논문)

Roles and Utilization of Attention Heads in Transformer-based Neural Language Models

Cited 9 time in

Cited 0 time in scopus

Hit : 179
Download : 0

Export

DC Field	Value	Language
dc.contributor.author	Jo, Jae-young	ko
dc.contributor.author	Myaeng, Sung-Hyon	ko
dc.date.accessioned	2020-11-24T12:30:20Z	-
dc.date.available	2020-11-24T12:30:20Z	-
dc.date.created	2020-11-06	-
dc.date.created	2020-11-06	-
dc.date.created	2020-11-06	-
dc.date.issued	2020-07	-
dc.identifier.citation	58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, pp.3404 - 3417	-
dc.identifier.uri	http://hdl.handle.net/10203/277591	-
dc.description.abstract	Sentence encoders based on the transformer architecture have shown promising results on various natural language tasks. The main impetus lies in the pre-trained neural language models that capture long-range dependencies among words, owing to multi-head attention that is unique in the architecture. However, little is known for how linguistic properties are processed, represented, and utilized for downstream tasks among hundreds of attention heads inside the pre-trained transformer-based model. For the initial goal of examining the roles of attention heads in handling a set of linguistic features, we conducted a set of experiments with ten probing tasks and three downstream tasks on four pre-trained transformer families (GPT, GPT2, BERT, and ELECTRA). Meaningful insights are shown through the lens of heat map visualization and utilized to propose a relatively simple sentence representation method that takes advantage of most influential attention heads, resulting in additional performance improvements on the downstream tasks.	-
dc.language	English	-
dc.publisher	Association for Computational Linguistics	-
dc.title	Roles and Utilization of Attention Heads in Transformer-based Neural Language Models	-
dc.type	Conference	-
dc.identifier.wosid	000570978203071	-
dc.type.rims	CONF	-
dc.citation.beginningpage	3404	-
dc.citation.endingpage	3417	-
dc.citation.publicationname	58th Annual Meeting of the Association for Computational Linguistics, ACL 2020	-
dc.identifier.conferencecountry	US	-
dc.identifier.conferencelocation	Online	-
dc.identifier.doi	10.18653/v1/2020.acl-main.311	-
dc.contributor.localauthor	Myaeng, Sung-Hyon	-
dc.contributor.nonIdAuthor	Jo, Jae-young	-

Appears in Collection: CS-Conference Papers(학술회의논문)

Files in This Item: There are no files associated with this item.

This item is cited by other documents in WoS

⊙ Detail Information in WoSⓡ	Click to see
⊙ Cited 9 items in WoS	Click to see citing articles in

Display Simple Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Roles and Utilization of Attention Heads in Transformer-based Neural Language Models

This item is cited by other documents in WoS

KOASAS

Communities & Collections