The hottest AI Infrastructure Substack posts right now

And their main takeaways
Category
Top Technology Topics
SemiAnalysis • 15961 implied HN points • 25 Feb 26
  1. NVIDIA built Rubin as an "extreme co-design" where the rack is treated as one integrated compute unit, combining Rubin GPUs, Vera CPUs, NVLink‑6 switches, ConnectX‑9 NICs, BlueField‑4 DPUs and Spectrum switches to push performance and tight system control.
  2. Rubin GPUs prioritize low‑precision scaling (big FP4/FP8 gains), much higher HBM bandwidth and an adaptive compression engine for sparsity, but they also bring very large power envelopes (up to 2300W), driving big thermal and cost impacts.
  3. The NVL72 rack is redesigned for manufacturing and reliability: cableless modular trays with board‑to‑board connectors, upgraded high‑end PCBs, 100% liquid cooling and 50V power delivery, which shifts component, cooling and assembly supply chains and raises TCO considerations.
Big Technology • 6755 implied HN points • 27 Feb 26
  1. AI training is shifting heavily toward reinforcement learning, which teaches models to complete real tasks instead of just predicting text.
  2. Task-based training needs detailed simulated environments and far more compute because models must try many steps to learn workflows like banking or booking.
  3. Reinforcement learning often doesn’t generalize well, so models are likely to specialize and diverge, with different systems becoming better at different kinds of tasks.
SemiAnalysis • 22426 implied HN points • 09 Feb 26
  1. Datacenter CPUs are back in demand because reinforcement learning, agentic models, and RAG-style inference need lots of general-purpose compute for environments, tool use, data sharding and media decode, which is driving hyperscalers and AI labs to build large CPU clusters and straining inventories.
  2. CPU architecture is rapidly shifting to chiplet/disaggregated designs, higher core counts and mesh interconnects with advanced packaging, and vendors are diverging — AMD and hyperscale ARM designs are outperforming while Intel faces delays and questionable design choices that hurt competitiveness.
  3. The broader system ecosystem now matters as much as raw CPU cores: GPUs and specialized CPUs act as head nodes with shared memory, DPUs and context-memory platforms change how memory is used, and DRAM shortages plus packaging yields are shaping performance, supply and pricing.
SemiAnalysis • 21820 implied HN points • 01 Jan 26
  1. Co-packaged optics (CPO) is moving from labs to shipping products and will be the key way to scale high-bandwidth, low-latency AI scale-up networks because it offers much higher bandwidth density and longer reach than copper.
  2. CPO cuts or removes power-hungry DSPs and long-reach SerDes, unlocking big energy and density gains by integrating optical engines near the chip and using enablers like TSMC COUPE, modulators (MRM/MZM/EAM), WDM, and FAUs.
  3. Wide adoption still faces real hurdles — supply chain, manufacturability, reliability, serviceability and standards — so early wins will be limited, but hyperscaler commitments and compelling scale-up economics should drive a larger ramp later this decade.
SemiAnalysis • 15456 implied HN points • 06 Jan 26
  1. Scaling reinforcement learning (post‑training) is the main engine of recent capability and utility gains, with labs pouring compute into RL and using broad real‑world evals like GDPval to measure progress.
  2. Building RL environments and datasets is a large, specialized industry — firms clone UIs, create coding and software gyms, and hire domain experts to write tasks and rubrics, spawning many vendors and "RL as a service" offerings.
  3. Applying RL to science and biology requires closed‑loop physical experiments and robotics, faces long costly rollouts and sparse rewards, and will push models and labs toward specialized, non‑commodified solutions.
Get a weekly roundup of the best Substack posts, by hacker news affinity:
SemiAnalysis • 33539 implied HN points • 28 Nov 25
  1. Google's TPUs are becoming a serious competitor to Nvidia's GPUs, especially with big companies like Anthropic starting to use them. This might change the game in AI hardware.
  2. The design and architecture of Google's TPU systems, especially the new TPUv7, are optimized for better performance and cost efficiency. This means companies can save money on their AI infrastructures.
  3. Google is focusing on improving its software tools for TPUs, making them more user-friendly and possibly attracting more developers. This shift might help boost the adoption of TPUs over Nvidia's GPUs.
TheSequence • 126 implied HN points • 15 Mar 26
  1. AI is rapidly shifting from chat assistants to autonomous, persistent workers that can plan, act, and even modify their own code, enabling self-improving research loops and agentic code review.
  2. Multi-agent frameworks and locally hosted persistent agents are spreading quickly, letting individuals automate complex workflows while also creating serious security and governance risks when agents gain deep system access.
  3. Massive capital is pouring into compute and new model paradigms — gigawatt-scale GPU factories and billion-dollar bets on grounded "world models" — alongside releases like multimodal embeddings that make retrieval and agent memory far more powerful.
@adlrocha Weekly Newsletter • 64 implied HN points • 22 Feb 26
  1. Some industry voices argue that orbiting data centres could solve Earth’s energy limits by tapping continuous, stronger solar power and avoiding on-ground grid and land constraints.
  2. Physics and operations pose major roadblocks: vacuum cooling needs huge radiators, cosmic rays cause silent data corruptions, laser links and atmospheric downlinks have bandwidth and reliability limits, and launch, upgrade, and debris risks make huge satellite fleets impractical today.
  3. A more viable approach may be to design far more energy-efficient computing paradigms (photonic chips, thermodynamic samplers, non‑deterministic hardware) so AI can scale on Earth without shipping massive GPU fleets to space.
TheSequence • 245 implied HN points • 04 Feb 26
  1. Kimi 2.5 represents a paradigm shift from scale-driven "emergence" to orchestration, where the model coordinates complex workflows instead of just generating text.
  2. It functions as an end-to-end agent that manages execution environments, spawns subprocesses, and debugs its own visual outputs in a closed-loop system.
  3. The system uses sparsity to deliver trillion-parameter capability with the latency and cost profile similar to a ~32B dense model.
Democratizing Automation • 142 implied HN points • 02 Feb 26
  1. Arcee released Trinity-Large-Preview, an ultra-sparse MoE with 400B total parameters and about 13B active parameters, plus a public tech report and base models.
  2. LiquidAI’s LFM2.5-1.2B-Instruct punches above its size, often matching larger models in tests and coming with Japanese, vision, and audio variants.
  3. Kimi-K2.5 is a multimodal continual-pretrain model (15T tokens) that’s cheaper and stronger on coding and agent tasks, though its writing quality has slipped compared to earlier K2 models.
Brad DeLong's Grasping Reality • 169 implied HN points • 18 Dec 25
  1. Big tech is building lots of AI infrastructure not because it’s betting the farm on core AI products, but to capture the rents from the AI boom by selling infrastructure and services.
  2. The AI labs are the ones digging for breakthrough models and customer demand, but core AI products may have low margins and fickle users, so those businesses carry higher risk of a bust.
  3. Cloud and platform companies often commoditize or give away core AI tools to protect their high‑margin businesses, and investors are increasingly valuing firms based on real cash generation rather than AI hype.
Clouded Judgement • 20 implied HN points • 20 Feb 26
  1. A global NAND/SSD shortage has emerged as AI demand has ballooned, driving big gains in memory-related stocks and creating a structural supply problem.
  2. AI has shifted from being compute-bound to data- and memory-bound. Inference, KV caches, and the flood of AI-generated artifacts need huge, low-latency memory and expose inefficiencies in legacy tiering and NAS data paths.
  3. The answer is efficiency, not just buying more flash: orchestrate data so local GPU NVMe can be used as fast Tier‑0, tier cold data to HDDs, recover stranded capacity, use hybrid cloud, and deduplicate across regions to cut flash demand.
TheSequence • 56 implied HN points • 14 Dec 25
  1. AI is moving to an agent-first model where LLMs act as operators for long-running, multi-step workflows, improving planning, tool use, and end-to-end task completion.
  2. Open-weight and deployable model families are maturing, letting teams host, fine-tune, and run agentic coding and workflow assistants on their own infrastructure.
  3. Compute and energy limits are now a primary bottleneck, driving investment in efficient architectures like MoEs, distillation, edge inference, and new hardware approaches.
Kesav’s Lab • 8 implied HN points • 26 Jan 26
  1. Using an inference provider gets you serverless endpoints, streaming, and time-to-first-token optimizations fast and is great for experimentation, but it sacrifices control over data residency and token logging. Building your own infra gives maximum control and compliance but is costly, slow to provision, and requires tradeoffs between speed, quality, and price.
  2. Provisioning large GPU instances is as much political and logistical as it is technical — expect weeks of lead time, enterprise support, and close coordination with cloud vendors to get high-end capacity. Tools like managed notebooks speed prototyping, but real deployments involve lots of debugging and operational overhead.
  3. TechBio workloads need specialized compute and tight lab-in-the-loop integration, which opens a market for domain-specific inference platforms that help fine-tune models and evaluate clinical viability. Because downstream clinical validation is slow and expensive, models that focus on toxicology and clinical outcomes are especially valuable for capturing real-world ROI.
Enterprise AI Trends • 192 HN points • 03 Jul 24
  1. Building AI infrastructure startups is really tough because there’s a lot of competition. Many startups struggle to offer something different enough to attract enterprise customers.
  2. It's hard for these startups to get noticed because bigger companies like AWS and Google can quickly copy any good ideas. This makes it tough for startups to maintain a unique edge.
  3. To succeed, startups should narrow their focus on a specific market or problem. Doing one thing really well can help them stand out instead of trying to cater to everyone.
Jakob Nielsen on UX • 11 implied HN points • 02 Dec 24
  1. Cookie consent banners waste a huge amount of time for users, costing billions in productivity. Most people ignore them or find them useless.
  2. NVIDIA's approach to building AI infrastructure allows for significantly faster performance improvements compared to traditional methods, promising exciting advancements in AI capabilities.
  3. Virtual try-on technology is becoming more accessible, allowing users to see how clothes look on them without needing a photoshoot, which can change the shopping experience.
The Product Channel By Sid Saladi • 13 implied HN points • 28 Jan 24
  1. AI product management has various roles like AI Infrastructure PMs, Ranking PMs, Generative AI PMs, Conversational AI PMs, Computer Vision PMs, AI Security PMs, and AI Analytics PMs.
  2. Each type of AI PM role has specific skills and responsibilities like deep knowledge of full AI infrastructure tech stacks for AI Infrastructure PMs, tuning relevance algorithms for Ranking PMs, and incorporating human-in-the-loop feedback loops for Generative AI PMs.
  3. To excel in AI Product Management, it's crucial to understand the landscape, develop relevant skills, and embrace a mindset of continuous learning and adaptation to innovate effectively.