The driving factor limiting context window size is the quadratic scaling of self-attention in transformers.
New research explores alternative mechanisms like Hyena Operators, State Space Models, and hierarchical attention to improve context window efficiency.
Emphasis is placed on the importance of context curation and retrieval systems over simply increasing context window size for effective LLM performance.