Fine-grained Analysis of In-context Linear Estimation: Data, Architecture, and Beyond
Yingcong Li, Ankit Singh Rawat, and Samet Oymak
Advances in Neural Information Processing Systems, 2024
In this work, we develop a stronger characterization of the optimization landscape of ICL through contributions on architectures, low-rank parameterization, and correlated designs: (1) We study the landscape of 1-layer linear attention/H3 models and prove that both implement 1-step preconditioned gradient descent (PGD). Additionally, thanks to its native convolution filters, H3 also has the advantage of implementing sample weighting and outperforming linear attention. (2) We provide new risk bounds for RAG and task-feature alignment which reveal how ICL sample complexity benefits from distributional alignment. (3) We derive the optimal risk for low-rank parameterized attention weights in terms of covariance spectrum. Through this, we also shed light on how LoRA can adapt to a new distribution by capturing the shift between task covariances.