TheSequence • 84 implied HN points • 28 Jan 26
- Two new commercial companies from the vLLM and SGLang teams—Inferact and RadixArk—raised huge funding and are positioning themselves as major players in the inference stack.
- The focus is shifting from building bigger models to improving inference unit economics, so the software that manages memory, scheduling, and kernels is now the main battleground.
- Serving models efficiently is bottlenecked by scarce VRAM and the KV cache tax, because asynchronous and unpredictable inference patterns drive up cost and complexity.