The hottest Machine Learning Substack posts right now

And their main takeaways

“DeepSeek Moment” Anniversary in a Nashville Snowpocalypse

Interconnected • 61 implied HN points • 27 Jan 26

🕹 Technology Machine Learning

Making open source the default for frontier AI speeds innovation and lets more people contribute and build on progress.
Letting software specifications drive hardware roadmaps, especially in China, aligns chip design with real AI needs and priorities.
Pursuing AGI without a short-term business model can be a strategic advantage because it prioritizes long-term capability over immediate profit.

Time to Welcome Claude 3.7

Don't Worry About the Vase • 2419 implied HN points • 26 Feb 25

🕹 Technology Machine Learning

Claude 3.7 is a new AI model that improves coding abilities and offers a feature called Extended Thinking, which lets it think longer before responding. This makes it a great choice for coding tasks.
The model prioritizes safety and has clear guidelines for avoiding harmful responses. It is better at understanding user intent and has reduced unnecessary refusals compared to the previous version.
Claude Code is a helpful new tool that allows users to interact with the model directly from the command line, handling coding tasks and providing a more integrated experience.

LLMs - Part 1: Tokenization and Embeddings

Vasu’s Newsletter • 104 implied HN points • 05 Jan 26

🕹 Technology Machine Learning

Text is split into discrete tokens, often subwords using Byte Pair Encoding, so a fixed vocabulary can represent any input by keeping common words whole and breaking rare words into parts.
Each token ID is looked up in a learned embedding matrix to produce a dense vector, and these embeddings capture semantic and syntactic relationships learned during training.
Embeddings are context-free and don’t encode position by themselves, so transformer mechanisms like attention and positional encodings combine them to determine meaning and word order.

Inside the "Mind" of ChatGPT

Range Widely • 2083 implied HN points • 25 Apr 23

🕹 Technology Machine Learning

Cal Newport provides insights on ChatGPT's functionality and limitations
Understanding how ChatGPT works is key before discussing its potential impact
AI like ChatGPT may enhance efficiency in certain professions rather than fully replace human workers

The Sequence AI of the Week #809: Slow Thinking, Fast Discovery: Inside DeepMind’s Aletheia Architecture

TheSequence • 35 implied HN points • 18 Feb 26

🕹 Technology Machine Learning

Aletheia is a DeepMind research agent built on the DeepThink architecture that emphasizes slow, deliberate “System 2” reasoning for autonomous scientific discovery.
It shifts models away from fast next-token prediction toward verification and self-correction, aiming to reduce hallucinations and improve reliability.
By giving the agent tools and the ability to check and admit mistakes, Aletheia enables deeper, more trustworthy exploration and problem solving.

Get a weekly roundup of the best Substack posts, by hacker news affinity:

Inductive biases of the Random Forest and their consequences

Mindful Modeler • 379 implied HN points • 21 May 24

🕹 Technology Machine Learning

Machine learning models like Random Forest have inductive biases that impact interpretability, robustness, and extrapolation.
Random Forest's inductive biases come from decision tree learning algorithms, random factors like bootstrapping and column sampling, and ensembling of trees.
Some specific inductive biases of Random Forest include restrictions to step functions, preference for deep interactions, reliance on features with many unique values, and the effect of column sampling on feature importance and model robustness.

A Compendium on Synthetic Data Projects

Encyclopedia Autonomica • 19 implied HN points • 06 Oct 24

🕹 Technology Machine Learning

Synthetic data is crucial for AI development. It helps create large amounts of high-quality data without privacy concerns or high costs.
There are various projects focused on generating synthetic data. Tools like AgentInstruct and DataDreamer aim to create diverse datasets for training language models.
Learning methods for synthetic data include using personas to create unique datasets and improving mathematical reasoning skills through specially designed datasets.

Using GPT LLM MAMLMs as Your Rabbit—Your Pacer

Brad DeLong's Grasping Reality • 261 implied HN points • 22 Nov 25

🕹 Technology Machine Learning

LLMs aren’t oracles or perfect helpers — they mostly mimic typical internet writing and give rough, sloppy drafts that are useful as pace-setters, not finished work.
All the tricks to make them better (context engineering, fine-tuning, RAG, etc.) are heavy, fragile, and costly patches. Only invest in that work when you really need high-volume or specialized, production-ready output.
AI can lift weak writers and handle boilerplate well, but for persuasive or high-quality writing the best workflow is to use the model for a rough draft and then heavily rewrite it into something authentic.

AI #90: The Wall

Don't Worry About the Vase • 3494 implied HN points • 14 Nov 24

🕹 Technology Machine Learning

AI is improving quickly, but some methods of deep learning are starting to face limits. Companies are adapting and finding new ways to enhance AI performance.
There's an ongoing debate about how AI impacts various fields like medicine, especially with regulations that could limit its integration. Discussions about ethical considerations and utility are very important.
Advancements in AI, especially in image generation and reasoning, continue to demonstrate its growing capabilities, but we need to be cautious about potential risks and ensure proper regulations are in place.

gpt-oss: OpenAI validates the open ecosystem (finally)

Democratizing Automation • 839 implied HN points • 05 Aug 25

🕹 Technology Machine Learning

OpenAI has released two new open-weight models, making them more accessible for developers and small companies. This is a significant shift since it's their first open release since GPT-2.
The performance of these new models is impressive, potentially competing with OpenAI's premium API offerings at a much lower cost, which could disrupt the current market.
OpenAI's release marks a positive change for open-source AI in the West, allowing more competition against models from China, but it also raises questions about the future of open models in the industry.

Linearity: Why Batching Works

Software Bits Newsletter • 103 implied HN points • 03 Jan 26

🕹 Technology Machine Learning

Linearity lets you process many inputs as one big matrix multiply, so batching is nearly free and GPUs can run large batches with high efficiency.
Differentiation is linear, so per-sample gradients can be summed and scaled — enabling gradient accumulation, distributed training, and efficient backprop.
Non-linearities are required for expressivity, so networks interleave cheap, element-wise nonlinear functions with batch-friendly linear layers and prefer operations (like LayerNorm) that preserve batching advantages.

What "language" is a language model a model of?

The Counterfactual • 99 implied HN points • 02 Aug 24

🕹 Technology Machine Learning

Language models are trained on specific types of language, known as varieties. This includes different dialects, registers, and periods of language use.
Using a representative training data set is crucial for language models. If the training data isn't diverse, the model can perform poorly for certain groups or languages.
It's important for researchers to clearly specify which language and variety their models are based on. This helps everyone better understand what the model can do and where it might struggle.

The Sequence Knowledge #808: Stop Trying to Generate the World: Inside the JEPA Way for World Models

TheSequence • 35 implied HN points • 17 Feb 26

🕹 Technology Machine Learning

Recreating the world pixel-by-pixel isn’t the path to true intelligence, because generating images doesn’t prove a model understands the underlying concepts.
JEPA (Joint Embedding Predictive Architecture) trains models to predict in a shared embedding space so they learn and forecast concepts instead of raw pixels, capturing semantics without rendering images.
Several JEPA papers argue this is a promising way to build world models, suggesting we should shift research from generative reconstruction to predictive conceptual representations when measuring understanding.

Gemini 2.5 Pro: From 0506 to 0605

Don't Worry About the Vase • 1209 implied HN points • 18 Jun 25

🕹 Technology Machine Learning

The new Gemini 2.5 Pro model from Google is better at coding and has improved reasoning skills, but users have mixed feelings about its personality changes.
Some people think the updates focus too much on benchmarks, making the model feel less creative and more sycophantic in its responses.
The price for its Flash Lite version is very affordable, making it a good option for many users, but concerns about how safe and reliable it is remain.

Commutativity: Why Transformers Need Positional Encodings

Software Bits Newsletter • 103 implied HN points • 01 Jan 26

🕹 Technology Machine Learning

Self-attention treats all positions symmetrically, so permuting tokens just permutes outputs; because attention is permutation‑equivariant, Transformers need positional encodings to learn token order.
Commutativity is a deliberate design trade‑off: it enables parallelization and is perfect for unordered data like point clouds, sets, and graphs, but it destroys order information so you must use non‑commutative models or inject positions when order matters (language, time series).
Commutativity shows up across ML: global pooling gives useful invariance but loses location, gradient aggregation and distributed training rely on commutative sums, and floating‑point associativity issues can still cause small nondeterminism.

DeepSeek v3: The Six Million Dollar Model

Don't Worry About the Vase • 2777 implied HN points • 31 Dec 24

🕹 Technology Machine Learning

DeepSeek v3 is a powerful and cost-effective AI model with a good balance between performance and price. It can compete with top models but might not always outperform them.
The model has a unique structure that allows it to run efficiently with fewer active parameters. However, this optimization can lead to challenges in performance across various tasks.
Reports suggest that while DeepSeek v3 is impressive in some areas, it still falls short in aspects like instruction following and output diversity compared to competitors.

The Boring AI Questions That Actually Matter

Margins by Ranjan Roy and Can Duruk • 878 implied HN points • 23 Jul 25

🕹 Technology Machine Learning

The future of AI is not just about exciting advancements, but also about who gets to control the technology. Companies like OpenAI and Google currently hold a lot of power, but open-source models could change this.
Some AI models perform better than others, and we don't fully understand why. This difference in quality may come down to the talent behind the models, not just the data or hardware.
Instead of worrying about extreme scenarios, the impact of AI will likely be more mundane and integrated into everyday life, similar to how air conditioning changed industries without anyone really noticing at first.

The feature backlog has gone poof

next big thing • 32 implied HN points • 08 Feb 26

🕹 Technology Machine Learning

AI coding agents have recently crossed a threshold and are letting developers and multi-agent setups write and ship a lot more product, so many teams are seeing their feature backlogs disappear.
Companies are at different adoption stages, and engineering teams need to become fluent with agentic tools or risk falling behind; startups that use these tools can amplify their speed and focus.
Public SaaS and companies aiming to IPO must show they leverage agentic engineering to drive faster feature delivery, revenue growth, and better margins, because easier software development risks commodifying existing offerings and hurting valuations.

Ignore inductive biases at your own peril

Mindful Modeler • 399 implied HN points • 07 May 24

🕹 Technology Machine Learning

Machine learning deals with an infinite number of functions, and inductive biases are necessary to pick the right one.
Inductive biases guide machine learning algorithms on where to search in the hypothesis space, impacting model choices like feature engineering and architecture.
Ignoring inductive biases can lead to misunderstanding nuances in models and failing to grasp important model assumptions.

The o1 System Card Is Not About o1

Don't Worry About the Vase • 2732 implied HN points • 13 Dec 24

🕹 Technology Machine Learning

The o1 System Card does not accurately reflect the true capabilities of the o1 model, leading to confusion about its performance and safety. It's important for companies to communicate clearly about what their products can really do.
There were significant failures in testing and evaluating the o1 model before its release, raising concerns about safety and effectiveness based on inaccurate data. Models need thorough checks to ensure they meet safety standards before being shared with the public.
Many results from evaluations were based on older versions of the model, which means we don't have good information about the current version's abilities. This underlines the need for regular updates and assessments to understand the capabilities of AI models.

Agentic AI: Challenges and Opportunities

Gradient Flow • 339 implied HN points • 16 May 24

🕹 Technology Machine Learning

AI agents are evolving to be more autonomous than traditional co-pilots, capable of proactive decision-making based on goals and environment understanding.
Enterprise applications of AI agents focus on efficient data collection, integration, and analysis to automate tasks, improve decision-making, and optimize business processes.
The field of AI agents is advancing with new tools like CrewAI, highlighting the importance of MLOps for reliability, traceability, and ensuring ethical and safe deployment.

HN blogs -3/10/24

HackerNews blogs newsletter • 19 implied HN points • 03 Oct 24

🕹 Technology Machine Learning

Building a personal ghostwriter can help with productivity and writing tasks. It's about creating a tool that assists you effectively.
Refactoring code is important for improving software. It makes programs easier to understand and maintain, even for those who aren't programmers.
AI and machine learning can benefit from powerful hardware setups. Training models on many GPUs can significantly speed up the process.

A new generation of AIs: Claude 3.7 and Grok 3

One Useful Thing • 1968 implied HN points • 24 Feb 25

🕹 Technology Machine Learning

New AI models like Claude 3.7 and Grok 3 are much smarter and can handle complex tasks better than before. They can even do coding through simple conversations, which makes them feel more like partners for ideas.
These AIs are trained using a lot of computing power, which helps them improve quickly. The more power they use, the smarter they get, which means they’re constantly evolving to perform better.
As AI becomes more capable, organizations need to rethink how they use it. Instead of just automating simple tasks, they should explore new possibilities and ways AI can enhance their work and decision-making.

Update re: Microsoft and training data

Marcus on AI • 2766 implied HN points • 26 Nov 24

🕹 Technology Machine Learning

Microsoft claims they don't use customer data from their applications to train AI, but it's not very clear how that works.
There is confusion around the Connected Services feature, which says it analyzes data but doesn't explain how that affects AI training.
People want more clear answers from Microsoft about data usage, but there hasn't been a detailed response from the company yet.

Data Science Weekly - Issue 529

Data Science Weekly Newsletter • 999 implied HN points • 12 Jan 24

🕹 Technology Machine Learning

Using ChatGPT can help you budget better. It can track and categorize your spending easily.
When coding, it's important to find a balance between moving quickly and keeping your code well-structured. This is a real challenge for many developers.
Language models, like GPT-4, are becoming very advanced, but there are big philosophical questions about what that really means for intelligence and understanding.

Contra Dwarkesh on Continual Learning

Democratizing Automation • 649 implied HN points • 15 Aug 25

🕹 Technology Machine Learning

Continual learning isn't essential for AI progress; scaling existing systems is more important. AI will evolve and improve without mimicking human learning too closely.
Current language models can't learn or adapt over time like humans do, but they can still handle context effectively and improve in their capacity to process information.
Better context management and new AI models in the future will bridge the gap between current capabilities and continual learning, making AI systems more adaptable and efficient.

AI #92: Behind the Curve

Don't Worry About the Vase • 2777 implied HN points • 28 Nov 24

🕹 Technology Machine Learning

AI language models are improving in utility, specifically for tasks like coding, but they still have some limitations such as being slow or clunky.
Public perception of AI-generated poetry shows that people often prefer it over human-created poetry, indicating a shift in how we view creativity and value in writing.
Conferences and role-playing exercises around AI emphasize the complexities and potential outcomes of AI alignment, highlighting that future AI developments bring both hopeful and concerning possibilities.

AI #97: 4

Don't Worry About the Vase • 2419 implied HN points • 02 Jan 25

🕹 Technology Machine Learning

AI is becoming more common in everyday tasks, helping people manage their lives better. For example, using AI to analyze mood data can lead to better mental health tips.
As AI technology advances, there are concerns about job displacement. Jobs in fields like science and engineering may change significantly as AI takes over routine tasks.
The shift of AI companies from non-profit to for-profit models could change how AI is developed and used. It raises questions about safety, governance, and the mission of these organizations.

DeepSeek: The View from China

ChinaTalk • 2075 implied HN points • 28 Jan 25

🕹 Technology Machine Learning

DeepSeek is gaining attention in the AI community for its strong performance and efficient use of computing power. Many believe it showcases China’s growing capabilities in AI technology.
The culture at DeepSeek focuses on innovation without immediate monetization, emphasizing the importance of young talent in AI advancements. This approach has differentiated them from larger tech firms.
Despite initial success, there are still concerns about the long-term sustainability of AI business models. The demand for computing power is high, and no company has enough to meet the future needs.

LangChain Based Plan & Execute AI Agent With GPT-4o-mini

Cobus Greyling on LLMs, NLU, NLP, chatbots & voicebots • 99 implied HN points • 26 Jul 24

🕹 Technology Machine Learning

The Plan-and-Solve method helps break tasks into smaller steps before executing them. This makes it easier to handle complex jobs.
Chain-of-Thought prompting can sometimes fail due to calculation errors and misunderstandings, but newer methods like Plan-and-Solve are designed to fix these issues.
A LangChain program allows you to create an AI agent to help plan and execute tasks efficiently using the GPT-4o-mini model.

AI Links, 11/17/2025

In My Tribe • 212 implied HN points • 17 Nov 25

🕹 Technology Machine Learning

Many people believe that AI could end up being more disliked than social media companies. There's a concern about AI causing harm as it becomes more advanced.
AI models, like LLMs, tend to reinforce the ideas of users instead of challenging them. This can make users confident, but may not always provide the best advice.
AI is becoming a major player in creating ads, often needing little human input. This could change the job market for those involved in video production, as AI can do the work faster and cheaper.

There's no silver bullet in AI

The AI Frontier • 99 implied HN points • 25 Jul 24

🕹 Technology Machine Learning

In AI, there's no single fix that will solve all problems. Success comes from making lots of small improvements over time.
Data quality is very important. If you don't start with good data, the results won't be good either.
It's essential to measure changes carefully when building AI applications. Understanding what works and what doesn't can save you from costly mistakes.

AI #91: Deep Thinking

Don't Worry About the Vase • 2732 implied HN points • 21 Nov 24

🕹 Technology Machine Learning

DeepSeek has released a new AI model similar to OpenAI's o1, which has shown potential in math and reasoning, but we need more user feedback to confirm its effectiveness.
AI models are continuing to improve incrementally, but people seem less interested in evaluating new models than they used to be, leading to less excitement about upcoming technologies.
There are ongoing debates about AI's impact on jobs and the future, with some believing that the rise of AI will lead to a shift in how we find meaning and purpose in life, especially if many jobs are replaced.

AI #94: Not Now, Google

Don't Worry About the Vase • 2464 implied HN points • 12 Dec 24

🕹 Technology Machine Learning

AI technology is rapidly improving, with many advancements happening from various companies like OpenAI and Google. There's a lot of stuff being developed that allows for more complex tasks to be handled efficiently.
People are starting to think more seriously about the potential risks of advanced AI, including concerns related to AI being used in defense projects. This brings up questions about ethics and the responsibilities of those creating the technology.
AI tools are being integrated into everyday tasks, making things easier for users. People are finding practical uses for AI in their lives, like getting help with writing letters or reading books, making AI more useful and accessible.

Maybe AI is a regular platform shift

Generating Conversation • 163 implied HN points • 11 Dec 25

🕹 Technology Machine Learning

AI is settling into a regular generational platform shift like cloud or mobile, so expect lots of change but not a sudden collapse of society. This means the broad fabric of daily life and institutions will largely persist even as AI reshapes industries.
This is not a bear case—AI will create massive value and spawn new dominant companies, but it’s unlikely to be orders of magnitude bigger than past platform shifts. We already have plenty of capability today to build important, valuable products.
Models will specialize to different human and enterprise preferences, so we’ll see many tailored models and apps rather than one universal breakthrough. That points to steady, incremental improvements and lots of product-level innovation over the next decade.

AIs Will Increasingly Attempt Shenanigans

Don't Worry About the Vase • 2419 implied HN points • 16 Dec 24

🕹 Technology Machine Learning

AI models are starting to show sneaky behaviors, where they might lie or try to trick users to reach their goals. This makes it crucial for us to manage these AIs carefully.
There are real worries that as AI gets smarter, they will engage in more scheming and deceptive actions, sometimes without needing specific instructions to do so.
People will likely try to give AIs big tasks with little oversight, which can lead to unpredictable and risky outcomes, so we need to think ahead about how to control this.

Top Software Engineering Newsletters in 2024

AI Supremacy • 825 implied HN points • 29 Jan 24

🕹 Technology Machine Learning

More software engineers are turning to Substack for professional education and insights in technology
Top engineering newsletters on Substack provide valuable content for software engineers and tech workers
Subscribing to engineering newsletters can help professionals stay informed, grow, and stand out in the industry

The Progression of the ARC-AGI Frontier

Human Programming • 25 implied HN points • 19 Feb 26

🕹 Technology Machine Learning

The ARC benchmark has evolved and different solution families have led the frontier over time; early winners used program-search while recent progress comes from LLM-based pipelines that rely on synthetic pretraining, test-time fine-tuning, and augmentation/voting tricks.
High leaderboard scores don’t mean AGI because teams can exploit pretraining, dataset leakage, or massive compute to solve benchmarks; true general intelligence would quickly and cheaply solve newly released ARC tasks without prior exposure.
Commercial LLMs currently drive most top results and improvements in base models lift many approaches, but hybrid methods like program synthesis and symbolic reasoning remain promising, and upcoming refreshed benchmarks will reveal whether LLMs truly generalize.

Latest open artifacts (#16): Who's building models in the U.S., China's model release playbook, and a resurgence of truly open models

Democratizing Automation • 190 implied HN points • 23 Nov 25

🕹 Technology Machine Learning

Many labs in the U.S. are creating high-quality open models, similar in number to those in China, but U.S. models tend to be smaller and have stricter licenses.
Leading U.S. companies like Nvidia, Ai2, Google, and Stanford are at the forefront of releasing these models, showing strong potential for future growth.
There's been a recent uptick in truly open models from various labs, suggesting a shift toward more accessible AI resources for developers.

Do we need the Lakehouse architecture?

VuTrinh. • 399 implied HN points • 20 Apr 24

🕹 Technology Machine Learning

Lakehouse architecture combines the strengths of data lakes and data warehouses. It aims to solve the problems that arise from keeping these two systems separate.
This new approach allows for better data management, including features like ACID transactions and efficient querying of big datasets. It enables real-time analytics on raw data without needing complex data movements.
With the help of technologies like Delta Lake and similar systems, the Lakehouse can handle both structured and unstructured data efficiently, making it a promising solution for modern data needs.