More Than Moore • 957 implied HN points • 16 Mar 26
- NVIDIA has folded Groq’s engineering and chip technology into its product line and is shipping the Groq LP30 inside LPX nodes to accelerate inference decode workloads.
- The LP30 offers about 1.2 PFLOP FP8 performance and ~500 MB of SRAM per chip, with 8-chip LPX units giving 4 GB and full systems scaling to 256 chips / 128 GB, prioritizing huge SRAM bandwidth for high-throughput decoding.
- NVIDIA will use its Dynamo orchestration to split work across Rubin, Rubin CPX and Groq LPX hardware (customers can mix up to ~25% Groq) so prefill and decode are handled by the best-suited chips to boost tokens-per-second for premium use cases.