Showing results 1 to 1 of 1
Linear attention is (maybe) all you need (to understand Transformer optimization) Ahn, Kwangjun; Cheng, Xiang; Song, Minhak; Yun, Chulhee; Jadbabaie, Ali; Sra, Suvrit, 12th International Conference on Learning Representations, ICLR 2024, International Conference on Learning Representations (ICLR), 2024-05-07 |
Discover