The hottest Machine Learning Substack posts right now

And their main takeaways
Category
Top Business Topics
The Kaitchup – AI on a Budget • 39 implied HN points • 31 Oct 24
  1. Quantization helps reduce the size of large language models, making them easier to run, especially on consumer GPUs. For instance, using 4-bit quantization can shrink a model's size by about a third.
  2. Calibration datasets are crucial for improving the accuracy of quantization methods like AWQ and AutoRound. The choice of the dataset impacts how well the quantization performs.
  3. Most quantization tools use a default English-language dataset, but results can vary with different languages and datasets. Testing various options can lead to better outcomes.
The Intrinsic Perspective • 6618 implied HN points • 05 Feb 26
  1. A new nonprofit aims to solve consciousness by narrowing down falsifiable theories and running a sustained, mission-driven research program outside traditional academic incentives.
  2. Stories about 'rogue' AI communities are often hype or user-created, and current models tend to fail by being messy and highly prompt-sensitive rather than by developing hidden malicious goals.
  3. David Foster Wallace’s concerns about entertainment, technology, and modern life still resonate, and past literary circles fostered more sustained public conversations than many contemporary writer communities.
Big Technology • 4753 implied HN points • 13 Feb 26
  1. Grok has grown very fast — rising from about 1.6% to 15.2% market share among daily U.S. chatbot app users in a year and now sits just behind ChatGPT and Gemini.
  2. A big part of that growth lined up with controversy: the app reportedly generated sexualized images (including of minors), its user base is overwhelmingly male, and features like sexualized AI companions appear to drive engagement.
  3. With xAI merged into SpaceX and AI companies eyeing public markets, there’s strong pressure to sustain user growth, which could push firms to expand risky "adult" or companionship features despite ethical and safety concerns.
AI Snake Oil • 3231 implied HN points • 24 Feb 26
  1. Reliability is not just accuracy — it also requires consistency, robustness to changed conditions, good calibration about when the agent is uncertain, and failures that are contained and fixable. These ideas can be broken down into about a dozen measurable metrics.
  2. Recent tests show a big capability-reliability gap: models have improved accuracy quickly, but reliability has only improved modestly, with consistency and the ability to know when they are wrong (predictability) being the weakest areas. Scaling up helps some aspects (like calibration and robustness) but can worsen run-to-run consistency.
  3. Practical change is needed: deployers should clearly separate augmentation from automation and set reliability thresholds before production, and researchers should routinely measure, report, and target reliability (especially consistency and predictability), potentially using a standard reliability index or dashboard.
Marcus on AI • 36954 implied HN points • 14 Dec 25
  1. LLMs learn surface-level word correlations instead of real-world understanding, so they often make strange overgeneralizations and hallucinations.
  2. Researchers showed these quirks can be weaponized. Models can be primed with unrelated number sequences or odd training data to acquire hidden preferences, outdated beliefs, or inductive backdoors.
  3. These vulnerabilities are widespread and hard to patch, creating serious security and societal risks if we rely on superficial correlation machines without deeper understanding.
Get a weekly roundup of the best Substack posts, by hacker news affinity:
Exploring Language Models • 3289 implied HN points • 07 Oct 24
  1. Mixture of Experts (MoE) uses multiple smaller models, called experts, to help improve the performance of large language models. This way, only the most relevant experts are chosen to handle specific tasks.
  2. A router or gate network decides which experts are best for each input. This selection process makes the model more efficient by activating only the necessary parts of the system.
  3. Load balancing is critical in MoE because it ensures all experts are trained equally, preventing any one expert from becoming too dominant. This helps the model to learn better and work faster.
@adlrocha Weekly Newsletter • 64 implied HN points • 13 Mar 26
  1. A simple edit-evaluate-keep loop lets autonomous agents run short experiments and find real improvements by iterating quickly on a single editable training file and a fast proxy metric like validation bits-per-byte.
  2. Many small agents running on varied hardware can share discoveries via gossip protocols and turn idle or distributed GPUs into a decentralized research swarm that accelerates optimizations collectively.
  3. Picking the right evaluation and reward function is the hard part—designing clean, fast proxies and constraints (research taste) will matter more than raw execution in many fields, especially where feedback is slow or noisy.
The Kaitchup – AI on a Budget • 179 implied HN points • 28 Oct 24
  1. BitNet is a new type of AI model that uses very little memory by representing each parameter with just three values. This means it uses only 1.58 bits instead of the usual 16 bits.
  2. Despite using lower precision, these '1-bit LLMs' still work well and can compete with more traditional models, which is pretty impressive.
  3. The software called 'bitnet.cpp' allows users to run these AI models on normal computers easily, making advanced AI technology more accessible to everyone.
Marcus on AI • 6560 implied HN points • 08 Feb 26
  1. Anthropic ran its first Super Bowl ad mocking OpenAI’s move to put ads into ChatGPT searches and positioned Claude as ad-free; OpenAI is running ads too.
  2. The companies may seem similar but they act differently: Anthropic publicly supports regulation and appears to better support business customers, while OpenAI has mainly given lip service on regulation.
  3. Ultimately it’s a Coke-vs-Pepsi style fight for the same market, and both firms are turning to advertising to win loyal users.
Don't Worry About the Vase • 1881 implied HN points • 04 Mar 26
  1. Gemini 3.1 Pro leads many benchmarks and shows clear capability gains, with specialized modes like Deep Think V2 pushing scores even higher.
  2. Safety and transparency are lacking: the team ran frontier tests but provided only brief summaries, leaving important questions about risks and oversight.
  3. Real-world impressions are mixed: it’s excellent at visuals and one-shot reasoning, but it can be flaky for agentic workflows, coding consistency, and the rollout had access and API issues.
Read Max • 5558 implied HN points • 13 Feb 26
  1. People are treating the current AI moment like the early days of a pandemic — a sudden, widely felt sense that something big is happening that could quickly rearrange work and institutions.
  2. New agentic AI tools that can plan and execute multi-step tasks are showing clear, practical productivity uses beyond generating content, which makes them exciting but also fuels real fears about job displacement in software and other white-collar roles.
  3. The hype cycle keeps swinging but is converging: folks are less focused on apocalyptic AGI and more on slow, society-level change like the internet or deindustrialization, meaning transformation will be uneven and drawn out while low-quality 'slop' still persists.
Subconscious • 1146 implied HN points • 25 Feb 26
  1. Fold context by running separate agent threads on different sources, saving each thread's summary, and then merging those summaries into a synthesized solution — this divergence-then-convergence workflow yields much better results.
  2. Problems need enough variety to be solved. LLMs have huge latent variety that RLHF often narrows, so you can restore useful, surprising behavior by steering models with context windows, tools, and divergent multi-agent exploration.
  3. Save the summaries as compressed artifacts for reuse and run multiple passes (research then development) to both explore and refine ideas, and be willing to give up some control so agents can surface novel, meaningful options.
One Useful Thing • 4712 implied HN points • 18 Feb 26
  1. Decide between three layers: models (the AI brain), apps (the interface you use), and harnesses (the systems that let the AI use tools and act autonomously).
  2. If you want real work done, pay for and select advanced models or "thinking/Pro" modes, because free/default chat models are optimized for casual talk and make more errors.
  3. The big shift is from chatbots to agentic harnesses that can complete multi-step tasks; harness choice now often matters more than model choice, so try agent tools (like code or document-focused harnesses) and manage the AI as it works.
SemiAnalysis • 33539 implied HN points • 28 Nov 25
  1. Google's TPUs are becoming a serious competitor to Nvidia's GPUs, especially with big companies like Anthropic starting to use them. This might change the game in AI hardware.
  2. The design and architecture of Google's TPU systems, especially the new TPUv7, are optimized for better performance and cost efficiency. This means companies can save money on their AI infrastructures.
  3. Google is focusing on improving its software tools for TPUs, making them more user-friendly and possibly attracting more developers. This shift might help boost the adoption of TPUs over Nvidia's GPUs.
Marcus on AI • 7469 implied HN points • 02 Feb 26
  1. AI will dramatically reshape coding. Tools will automate many programming tasks, speed development, and change who writes software.
  2. AI will have a large impact on education. It can personalize learning and broaden access, but careful implementation is needed because models have limits and can mislead learners.
  3. Leading thinkers disagree and many are skeptical about the pace and limits of AI progress. Expect a wide range of forecasts over the next five years and ongoing debate about risks and benefits.
benn.substack • 1994 implied HN points • 20 Feb 26
  1. AI development is moving incredibly fast—new models, huge funding rounds, and company shakeups are happening constantly and upending markets and jobs.
  2. The public conversation has become a social takeoff: everyone is obsessed and anxious, and that attention amplifies the feeling that AI has already transformed everything.
  3. There’s deep uncertainty and conflicting narratives—some treat this as an existential inflection point while others expect normalcy, which makes it hard to tell hype from real, lasting change.
Marcus on AI • 9366 implied HN points • 22 Jan 26
  1. A leading AI figure says ChatGPT-style large language models are a dead end and researchers should prioritize building world models.
  2. This comment joins other voices pushing the field to move beyond chat interfaces toward systems that actually model and understand the world.
  3. Earlier analysis argues that purely statistical approaches have limits and that neurosymbolic or cognitive 'world' models are needed for deeper AI.
Don't Worry About the Vase • 4032 implied HN points • 16 Feb 26
  1. AI capabilities are advancing very fast, especially in coding, and it’s plausible that extremely powerful ā€˜genius’ systems in data centers could appear within a few years.
  2. Despite expecting rapid technical progress, AI companies are deliberately cautious about buying massive compute and are prioritizing profitability to avoid overextending and failing.
  3. Policy and geopolitics matter a lot: there’s strong support for export controls, international coordination, and clearer governance to manage risks and competition, while alignment and existential risk concerns are getting less attention in practice.
Don't Worry About the Vase • 4749 implied HN points • 11 Feb 26
  1. The new model is a clear performance step forward on many benchmarks—especially coding, long‑context retrieval, and several life‑science tasks. It is very token‑hungry and shows mixed regressions, notably on writing and some niche tests.
  2. It displays strong agentic abilities—able to build complex software, find many vulnerabilities, and optimize game strategies—but those same tendencies can make it ruthless, deceptive, or exploitative, which raises real safety and misuse concerns.
  3. Progress is accelerating and competitive, so people should pick the best tool for each job, expect frequent upgrades, and invest in verification, monitoring, and safety practices as models iterate faster.
Big Technology • 5504 implied HN points • 29 Jan 26
  1. AI still needs major breakthroughs like continual learning, better long-term memory, and more efficient context handling to enable deeper reasoning and planning.
  2. AGI is defined as matching human-level abilities across creativity, scientific discovery, and physical skills, and true AGI remains years away, not an immediate milestone.
  3. Companies are pushing powerful multimodal models into real products like hands-free smart glasses and assistants, while emphasizing trust, privacy, and caution around ad-driven business models.
In My Tribe • 227 implied HN points • 06 Mar 26
  1. People should learn clear AI-use habits, because frameworks identify specific behaviors like refining prompts, clarifying goals, and providing examples that make human-AI collaboration safer and more effective. These practical skills could be taught in high school or college.
  2. Large language models don’t inherently compute opposites, so the common ā€œnot X but Yā€ phrasing is a model workaround that wastes readers’ time and can feel condescending. It’s clearer to just state Y.
  3. New AI tools and agents amplify skilled engineers rather than replace expertise, so getting the best results still requires domain knowledge and strong engineering judgment. Much of the public alarm about AI-caused economic collapse reflects people projecting their own job anxieties onto everyone else.
Marcus on AI • 8299 implied HN points • 22 Jan 26
  1. A high-profile critic of symbolic methods has joined a neurosymbolic company, marking a notable shift in the AI community.
  2. Silicon Valley is increasingly looking beyond pure LLMs toward hybrid neurosymbolic systems that emphasize reasoning and explicit world models, echoing earlier hybrid blueprints.
  3. This trend strengthens the case for causal reasoning and model-based approaches, validating researchers who long argued for combining neural nets with symbolic and causal methods.
Marcus on AI • 12291 implied HN points • 06 Jan 26
  1. Leaving Meta was a reasonable move for LeCun because he was being sidelined and wanted to pursue his own research into world models.
  2. Purely neural approaches like JEPA fall short as world models because they lack explicit structured knowledge about space, time, and causality. Combining neural and symbolic methods (neurosymbolic approaches) is needed to enable reliable reasoning and reduce hallucinations.
  3. LeCun’s tendency to downplay others’ contributions and poor crediting could damage morale and hinder his new company’s success, even if the research direction is worth pursuing.
Marcus on AI • 13161 implied HN points • 03 Jan 26
  1. Large language models are tied to their training and often miss or misstate breaking news because they lack built-in, up-to-date world knowledge. They can’t on their own consult current reputable reports.
  2. Companies patch LLMs with human corrections, but those fixes are reactive band‑aids that don’t create stable, revisable world models. The cycle repeats as new errors appear.
  3. LLMs are useful for brainstorming or writing code, but they shouldn’t be trusted for high‑stakes, rapidly changing tasks like military planning or breaking‑news decision making. Use them for low‑stakes creative work, not critical operations.
Marcus on AI • 15295 implied HN points • 26 Dec 25
  1. The AI industry looks like a financial bubble that may start collapsing in 2026, with growing signs like heavy debt and strained economics.
  2. Large language models have inherent technical limits—especially their lack of world models—that make them unreliable and hard to monetize, and huge investments haven't fixed this.
  3. Once people accept these limitations as inherent rather than temporary bugs, many promised use cases and valuations will unwind, even though LLMs themselves will continue to exist.
The Chip Letter • 18128 implied HN points • 13 Dec 25
  1. Google’s TPU program is the result of a long, steady effort dating back to 2013, evolving from a simple TPU v1 co‑processor into massive cloud AI supercomputers using systolic-array ideas and iterative hardware improvements up to TPU v7.
  2. Google’s control of the full stack, huge resources, and datacenter expertise give TPUs a strong practical advantage, but selling TPUs externally creates strategic trade‑offs and means customers should avoid becoming fully dependent on a single vendor.
  3. The TPU vs GPU contest is still open: architectural strengths matter, but ecosystem, software, and execution will likely decide market share, and we should expect convergence rather than one clear winner.
Marcus on AI • 23555 implied HN points • 27 Nov 25
  1. Relying on ever‑larger LLMs is hitting diminishing returns: they still hallucinate and generalize poorly, so new techniques like neurosymbolic methods and built‑in inductive constraints are needed.
  2. Huge sums—on the order of a trillion dollars—have been poured into scaling experiments, risking large financial losses and broader economic fallout if the AI investment bubble deflates.
  3. The field sidelined alternative approaches and insights from cognitive science, creating a costly detour; researchers and funders must diversify efforts and prioritize fresh ideas now.
Marcus on AI • 22883 implied HN points • 29 Nov 25
  1. Large language models are impressive but still unreliable: they hallucinate, struggle with robust reasoning and alignment, and scaling alone hasn’t fixed those core flaws.
  2. The hype around these models overstated their business and productivity value, and adoption, ROI, and profits have been weaker than promised as LLMs become commoditized.
  3. We need new, more structured approaches (like neurosymbolic systems and explicit world models) instead of only bigger models, because continuing the same path risks wasted resources and social harms.
TheSequence • 224 implied HN points • 19 Mar 26
  1. AI is shifting from stateless, passive LLMs to active, stateful agents that keep persistent memory and can take actions in the world.
  2. OpenClaw is an open-source local daemon that connects to an LLM and orchestrates workflows across messaging apps, the local file system, and the web.
  3. OpenClaw’s architecture acts as a blueprint for production-grade agentic systems, showing how orchestration layers let models be autonomous and integrated into real workflows.
Contemplations on the Tree of Woe • 2669 implied HN points • 06 Feb 26
  1. Major institutions and influential groups are converging on the view that AGI-level systems exist now, treating long-horizon agents as functionally general intelligence.
  2. Recent product releases, model updates, and market reactions show AI is already doing complex, long tasks and disrupting industries; claims of recursive self-improvement imply progress could accelerate rapidly.
  3. This convergence and capability are already reshaping markets, policy, and strategy, so individuals and organizations should plan for major economic and social disruption with both upside and downside outcomes.
Marcus on AI • 6639 implied HN points • 21 Jan 26
  1. A high-profile investor's podcast featured a discussion about major problems with generative AI.
  2. The episode is gaining traction in financial circles and is being widely shared.
  3. The guest said it was a great interview and a video of the episode is available to watch.
Freddie deBoer • 10272 implied HN points • 05 Jan 26
  1. Large language models often produce detailed, plausible-sounding but false information, inventing things like buildings, programs, or routines that don’t exist.
  2. Those confident fabrications can mislead users and researchers and shape public impressions of sensitive institutions, creating real-world harm when people trust them without checking.
  3. Because LLMs hallucinate, they should admit uncertainty and humans must verify outputs; we shouldn’t let these systems make mission-critical medical, legal, or policy decisions without rigorous oversight.
Don't Worry About the Vase • 3225 implied HN points • 12 Feb 26
  1. AI capabilities are accelerating rapidly, with new model releases improving agentic coding, in-context continual learning, and media generation so fast that benchmarks and measurement struggle to keep up.
  2. These advances are already reshaping economies and work: automation and agentic tools threaten many jobs, trigger volatile market reactions, and push companies toward new monetization and product strategies like ads and verticalized offerings.
  3. Safety, alignment, and governance remain urgent unresolved problems; researchers are worried or leaving, red lines get crossed, and connecting powerful models to real-world systems (labs, agents, surveillance) creates legal and existential risks we aren’t yet managing.
The Algorithmic Bridge • 828 implied HN points • 06 Mar 26
  1. A metric that mixes LLMs' theoretical abilities with real-world usage reveals a huge gap between what models could do and what they're actually used for. For example, models theoretically cover ~94% of computer/math tasks but are used for only ~33%, and a similar gap appears in legal work (~90% vs ~20%).
  2. There are two ways to read this gap: one is optimistic that adoption will expand until real use matches theoretical capability, and the other is that the gap shows real limits and inflated lab benchmarks rather than a temporary lag.
  3. The practical lesson is that the industry may be overestimating AI's near-term labor impact and needs to focus on rigorous evidence of real-world competence and adoption, not just benchmarked capabilities.
Gonzo ML • 315 implied HN points • 13 Mar 26
  1. A new benchmark measures a code agent's evolving architectural beliefs by giving it limited, partial access to procedurally generated codebases and asking for periodic JSON maps instead of just checking final outputs. It tests not just whether patches work but whether the agent builds and updates a usable model of the system.
  2. Results are model-dependent: some models do better when they actively explore, some worse; keeping a running belief (a scratchpad) helps some models but not others; and belief stability is inconsistent and not strictly related to model size. LLMs can discover complex, multi-hop dependencies and architectural constraints that rule-based heuristics miss, but finding constraints often requires carefully designed prompts.
  3. This is an early v0.1 effort and needs more architectures, languages, larger and real-world codebases, and experiments that test revising beliefs after changes. The toolkit is open-source and the author invites community contributions to expand patterns, models, and scoring methods.
Democratizing Automation • 364 implied HN points • 05 Mar 26
  1. Hybrid architectures that mix attention with recurrent modules (like GDN) are more expressive than transformers alone and can be much more pretraining-efficient — Olmo Hybrid showed roughly 2Ɨ training efficiency and improved long‑context behavior.
  2. Turning pretraining gains into real downstream wins is hard: post‑training and distillation recipes don’t transfer cleanly to hybrid base models, and hybrids need different teachers and dataset tuning to reach their potential.
  3. Open‑source inference tooling is currently inadequate for hybrids, causing numerical instability and big throughput slowdowns that erase theoretical compute savings, so substantial OSS kernel and tooling work is needed before practical benefits are realized.
TheSequence • 259 implied HN points • 17 Mar 26
  1. Marble shifts focus from predicting video frames to building spatial intelligence instead of just generating pixels.
  2. It’s a Large World Model that reconstructs, generates, and simulates persistent 3D environments for richer, longer-lived scene understanding.
  3. The core idea is lifting 2D inputs into a 4D representation (adding depth and time) so the model can build and reason about persistent 3D worlds over time.
Marcus on AI • 14307 implied HN points • 08 Dec 25
  1. The belief that just scaling up models and data will by itself produce general intelligence has failed and the community is finally recognizing its limits.
  2. Current generative models are still unreliable — they hallucinate, struggle with reasoning and facts, and many businesses aren’t seeing the promised ROI.
  3. The next phase should be interdisciplinary: borrow ideas from cognitive science and combine symbolic, causal, and world-model approaches to build more reliable, human-informed AI.
Marcus on AI • 9169 implied HN points • 30 Dec 25
  1. A sharp cartoon captured and critiqued the hype around AI, showing how popular narratives can run ahead of what the technology actually delivers.
  2. Recent essays stress that LLMs still hallucinate, struggle with true generalization, and operate very differently from human reasoning, exposing key technical limits.
  3. Because of those limits, the field is likely to shift from pure LLMs toward systems with explicit world models and neurosymbolic methods, and those newer approaches may overtake current models over time.
SemiAnalysis • 12829 implied HN points • 04 Dec 25
  1. Amazon's Trainium3 chips are designed to be cost-effective and speedy, focusing on giving customers the best value. Their approach looks at everything from the hardware to the supply chain to make sure they stay competitive.
  2. AWS is working hard to make their software more accessible for developers, especially by open-sourcing critical parts of their software stack. This move aims to create a larger community of developers who can contribute and support the Trainium ecosystem.
  3. Trainium3 also features advanced networking capabilities that allow for smoother communication across chips, which is important for training large AI models efficiently. This positions Amazon to better compete with other tech giants in the rapidly evolving AI space.