The hottest Memory Systems Substack posts right now

And their main takeaways
Category
Top Education Topics
More Than Moore 957 implied HN points 16 Mar 26
  1. NVIDIA has folded Groq’s engineering and chip technology into its product line and is shipping the Groq LP30 inside LPX nodes to accelerate inference decode workloads.
  2. The LP30 offers about 1.2 PFLOP FP8 performance and ~500 MB of SRAM per chip, with 8-chip LPX units giving 4 GB and full systems scaling to 256 chips / 128 GB, prioritizing huge SRAM bandwidth for high-throughput decoding.
  3. NVIDIA will use its Dynamo orchestration to split work across Rubin, Rubin CPX and Groq LPX hardware (customers can mix up to ~25% Groq) so prefill and decode are handled by the best-suited chips to boost tokens-per-second for premium use cases.
More Than Moore 467 implied HN points 03 Feb 26
  1. They use a dataflow architecture that runs the compiler's intermediate graph directly instead of a traditional instruction stream, so pipelines stay full and ALUs can execute whole loops every cycle for much higher effective throughput.
  2. Memory is handled by many small, localized MMU-like units plus runtime telemetry that adapts allocations to reduce false sharing, enabling an order-of-magnitude more outstanding memory requests and very high HBM utilization even on irregular workloads like GUPS.
  3. Their go-to-market and tooling are HPC-first while supporting common parallel models (OpenMP, CUDA, Kokkos) with a "bring your own code" approach, hardware-accelerated low-overhead kernel reconfiguration, and chiplet/RDMA-style scaling, with AI-specialized designs planned later.
TheSequence 21 implied HN points 21 Jan 26
  1. The current LLM trend is to scale models huge and use sparsity tricks like Mixture-of-Experts so only a small part of the model activates per token, reducing FLOPs.
  2. Reusing an old technique — storing large, static lookup-like memories on CPU RAM and conditionally accessing them — can let models hold around 100B parameters off-GPU and avoid expensive dense computation.
  3. The key insight is that many LLM costs come from simulating static lookup tables with neural computation, so replacing that simulation with real conditional lookups makes models much more efficient.
Arjun Panickssery 58 implied HN points 04 Mar 23
  1. Free-recall questions are better than multiple-choice for effective learning
  2. Automating basic facts through rote memorization can decrease the load on working memory and aid in understanding complex ideas
  3. Using spaced repetition systems can be beneficial for understanding and retaining knowledge in various fields
Nano Thoughts 1 implied HN point 14 Jan 26
  1. Memory is organized as a graph not to store everything, but so edges can decay and useless paths are forgotten; forgetting is an intentional feature, not a bug.
  2. What gets remembered depends on the agent’s goals, so memory must be filtered by a utility function before or during encoding; a single universal context that keeps everything will produce noise not useful memory.
  3. Current AI systems are mostly search/archives, not true memory; real memory needs valuation-driven, lossy compression (e.g., reinforcing repetition or preserving surprise) to avoid overfitting and enable useful prediction.
Get a weekly roundup of the best Substack posts, by hacker news affinity: