Showing results 1 to 2 of 2
Linear attention is (maybe) all you need (to understand Transformer optimization) Ahn, Kwangjun; Cheng, Xiang; Song, Minhak; Yun, Chulhee; Jadbabaie, Ali; Sra, Suvrit, 12th International Conference on Learning Representations, ICLR 2024, International Conference on Learning Representations (ICLR), 2024-05-07 |
Trajectory Alignment: Understanding the Edge of Stability Phenomenon via Bifurcation Theory Song, Minhak; Yun, Chulhee, 37th Annual Conference on Neural Information Processing Systems, Neural Information Processing Systems, 2023-12-13 |
Discover