The hottest Machine Learning Substack posts right now

And their main takeaways

A ‘Shocking Number’ of Top AI Researchers Don't Use AI

The Algorithmic Bridge • 265 implied HN points • 01 Aug 25

🕹 Technology Machine Learning

Many top AI researchers don’t use the AI tools they create, which seems strange.
This reflects a common idea across cultures that in the places we expect to find certain skills or tools, they might actually be missing.
Some people think it’s interesting and even suspicious that those who know a lot about AI aren’t using it in their own work.

Art to Science

Abstraction • 39 implied HN points • 02 Jan 26

🕹 Technology Machine Learning

Forecasting bots can run continuously, answer many questions, and be scored in real time, turning forecasting from a slow craft into a fast, repeatable process.
Large, scored tournaments and shared datasets will let people empirically test different methods and finally learn which forecasting approaches actually work at scale.
Simple heuristics get you most of the way there, but reaching the frontier requires deeper techniques and open sharing of methods to accelerate progress.

Large language models, explained with a minimum of math and jargon

The Counterfactual • 599 implied HN points • 28 Jul 23

🕹 Technology Machine Learning

Large language models, like ChatGPT, work by predicting the next word based on patterns they learn from tons of text. They don’t just use letters like we do; they convert words into numbers to understand their meanings better.
These models handle the many meanings of words by changing their representation based on context. This means that the same word could have different meanings depending on how it's used in a sentence.
The training of these models does not require labeled data. Instead, they learn by guessing the next word in a sentence and adjusting their processes based on whether they are right or wrong, which helps them improve over time.

ByteDance's new fine-tuning technique boost LLMs for reasoning tasks

TechTalks • 314 implied HN points • 22 Jan 24

🕹 Technology Machine Learning

A new fine-tuning technique called Reinforced Fine-Tuning improves large language models for reasoning tasks.
Reinforced Fine-Tuning combines supervised fine-tuning with reinforcement learning to enhance reasoning capabilities.
ReFT helps models discover new reasoning paths without needing extra training data and outperforms traditional fine-tuning methods on reasoning benchmarks.

Pre-Mortems and Sanity Checks

Abstraction • 29 implied HN points • 14 Jan 26

🕹 Technology Machine Learning

Do a pre-mortem: assume the forecast is wrong and list plausible ways it could fail (like cancellations, acquisitions, or shifted definitions) so you don’t miss important paths.
Run a sanity check to make sure the probability fits basic world knowledge and common sense, and correct obvious errors like using the wrong base rate.
Make these checks the final gate: if either one flags a problem, rework the forecast or use a different approach before submitting.

Get a weekly roundup of the best Substack posts, by hacker news affinity:

The Sequence Opinion #782: The New Gradient: Research Directions That Will Ship in 2026

TheSequence • 42 implied HN points • 01 Jan 26

🕹 Technology Machine Learning

Blanket scaling of transformers with more data and compute is showing diminishing returns, so new research directions are needed to keep improving frontier models.
The field is shifting from generative AI that just looks right to verifiable AI that can deliberate and produce correct, auditable outputs, effectively adding a "System 2" for reasoning.
Emerging methods like RLVR aim to give models unit-test-style feedback and tighter verification, and these kinds of approaches are poised to influence models shipping in 2026.

Flavours of Creation

Fragmentary • 569 implied HN points • 05 May 23

🕹 Technology Machine Learning

Using copyright material to train AI requires proper authorization and compensation.
Different countries have varying perspectives on intellectual property related to AI creation.
AI does not inherently create, but rather replicates based on patterns and codes created by humans.

Data Science Weekly - Issue 546

Data Science Weekly Newsletter • 119 implied HN points • 10 May 24

🕹 Technology Machine Learning

Time-series analysis and Gaussian processes are powerful tools for interpreting data. They allow for flexibility and control in modeling data, making them essential for data practitioners.
Understanding A/B testing is crucial for making informed business decisions. Using a reliable experimentation system can save time and lead to better results.
New advancements in AI and data science are enhancing applications in various fields, like biomedical research and recommendation systems. These innovations help combine human creativity with machine learning capabilities.

Can GPT-4 Actually Write Code?

Tyler Glaiel's Blog • 567 HN points • 17 Mar 23

🕹 Technology Machine Learning

GPT-4 can write code when given existing algorithms or well-known problems, as it remixes existing solutions.
However, when faced with novel or unique problems, GPT-4 struggles to provide accurate solutions and can make incorrect guesses.
It's crucial to understand that while GPT-4 can generate code, it may not be reliable for solving complex, new problems in programming.

Open LLMs don’t need to beat OpenAI

The AI Frontier • 119 implied HN points • 09 May 24

🕹 Technology Machine Learning

Open LLMs, like Llama 3, are getting really good and can perform well in many tasks. This improvement makes them a strong option for various applications.
Fine-tuning open LLMs is becoming more attractive because of their improved quality and lower costs. This means smaller, specialized models can be more easily developed and used.
However, open models likely won't surpass OpenAI's offerings. The proprietary models have a big advantage, but open LLMs can still thrive by focusing on efficiency and specific use cases.

The Broken Leg Check

Abstraction • 34 implied HN points • 07 Jan 26

🕹 Technology Machine Learning

Do a quick "broken leg" check first because a decisive news event can resolve a question immediately and save the time and cost of running the full forecasting pipeline.
Be cautious: a wrongly triggered broken-leg update is dangerous since proper scoring heavily penalizes confident incorrect forecasts, so false positives can wipe out gains.
Treat it as an empirical trade-off: implement a news-based detector, clearly define what "overwhelmingly resolves" means, track when it fires, and tune thresholds, confidence damping, or disable it if blowouts outweigh the savings.

o3: The grand finale of AI in 2024

Democratizing Automation • 815 implied HN points • 20 Dec 24

🕹 Technology Machine Learning

OpenAI's new model, o3, is a significant improvement in AI reasoning. It will be available to the public in early 2025, and many experts believe it could change how we use AI.
The o3 model has shown it can solve complex tasks better than previous models. This includes performing well on math and coding benchmarks, marking a big step for AI.
As the costs of using AI decrease, we can expect to see these models used more widely, impacting jobs and industries in ways we might not yet fully understand.

Data Science Weekly - Issue 540

Data Science Weekly Newsletter • 179 implied HN points • 29 Mar 24

🕹 Technology Machine Learning

SQL is seen as an easier way to write relational algebra, but it's not ideal for building new query tools. Understanding its limits can help in learning and using SQL better.
Many successful companies have developed their own AI models, showing a trend in the tech industry. Knowing about these companies can give insights into future developments in AI.
Binary vector search methods can save a lot of memory compared to traditional methods. However, it's important to balance memory savings with maintaining accuracy.

RAG Implementations Fail Due To Insufficient Focus On Question Intent

Cobus Greyling on LLMs, NLU, NLP, chatbots & voicebots • 39 implied HN points • 18 Jul 24

🕹 Technology Machine Learning

Large Language Models (LLMs) can create useful text but often struggle with specific knowledge-based questions. They need better ways to understand the question's intent.
Retrieval-augmented generation (RAG) systems try to solve this by using extra knowledge from sources like knowledge graphs, but they still make many mistakes.
The Mindful-RAG approach focuses on understanding the question's intent more clearly and finding the right context in knowledge graphs to improve answers.

Triplex — a SOTA LLM for Knowledge Graph Construction

Owen’s Substack • 59 implied HN points • 19 Jul 24

🕹 Technology Machine Learning

Triplex is a new tool that helps create knowledge graphs quickly and cheaply. It's much cheaper to use than older methods, making it easier for more people to utilize.
This tool is small enough to run on regular laptops, which means you don't need powerful computers to build knowledge graphs. This makes technology more accessible to everyone.
Triplex is open-source, allowing anyone to use and improve it. The community can experiment with it freely and innovate new ways to organize and understand information.

The Sequence AI of the Week #785: Gradient Highway Maintenance: Inside DeepSeek’s Latest Breakthrough

TheSequence • 35 implied HN points • 07 Jan 26

🕹 Technology Machine Learning

DeepSeek's mHC challenges established assumptions about AI scaling and suggests new architectural ideas that could change how larger models are built and trained.
Residual connections are the unsung scaffolding of modern deep networks, providing a 'gradient highway' that keeps training stable across many layers.
The simple rule y = f(x) + x—adding the input back to a layer's output—was revolutionary because it preserves signals and gradients, making very deep networks trainable.

Data Science Weekly - Issue 538

Data Science Weekly Newsletter • 199 implied HN points • 14 Mar 24

🕹 Technology Machine Learning

Serverless computing can handle big tasks without limits, but it also brings challenges like managing large uploads effectively.
Art careers can be influenced by the reputation of institutions, with established artists facing less access to elite spaces early on compared to newcomers.
Learning about LLM evaluation metrics can help improve understanding and performance when working with large language models.

The Sequence Radar #771: Last Week in AI: GPT-5.2, Mistral, and Google’s Agent Stack

TheSequence • 56 implied HN points • 14 Dec 25

🕹 Technology Machine Learning

AI is moving to an agent-first model where LLMs act as operators for long-running, multi-step workflows, improving planning, tool use, and end-to-end task completion.
Open-weight and deployable model families are maturing, letting teams host, fine-tune, and run agentic coding and workflow assistants on their own infrastructure.
Compute and energy limits are now a primary bottleneck, driving investment in efficient architectures like MoEs, distillation, edge inference, and new hardware approaches.

GPT-4.5 Feels Like a Letdown But It’s OpenAI’s Biggest Bet Yet

The Algorithmic Bridge • 605 implied HN points • 28 Feb 25

🕹 Technology Machine Learning

GPT-4.5 is not as impressive as expected, but it's part of a plan for bigger advancements in the future. OpenAI is using this model to build a better foundation for what's to come.
Despite being larger and more expensive, GPT-4.5 isn't leading in new capabilities compared to older models. It's more focused on creativity and communication, which might not appeal to all users.
OpenAI wants to improve the basic skills of AI rather than just aiming for high scores in tests. This step back is meant to ensure future models are smarter and more capable overall.

The Sequence Radar #763: Last Week AI Trifecta: Opus 4.5, DeepSeek Math, and FLUX.2

TheSequence • 70 implied HN points • 30 Nov 25

🕹 Technology Machine Learning

Claude Opus 4.5 is impressively smart and can handle complex coding tasks, making it feel like a senior engineer rather than just a chatbot.
DeepSeek Math V2 shows how AI can self-correct and improve its mathematical reasoning, hitting new highs in performance and reliability.
FLUX.2 brings amazing visual quality and features for generative media, proving that open models can achieve top-notch results without being locked down.

Bretton Goods is becoming Speculative Decoding

Bretton Goods • 38 implied HN points • 27 Dec 25

🕹 Technology Machine Learning

The blog is changing focus from explaining why countries get rich to studying AI — especially how to tell what AI systems are actually doing.
The author shifted careers from policy and macroeconomics to computer science and now works on AI evaluations and reducing hallucinations through internships and a job at Elicit.
Bretton Goods will be archived and its audience moved to a new Substack, Speculative Decoding, with a commitment to roughly one post a month about AI evaluations, safety, policy, and related research.

We need better LLM evaluations

The AI Frontier • 159 implied HN points • 04 Apr 24

🕹 Technology Machine Learning

Current methods for evaluating language models (LLMs) are not effective because they try to give one-size-fits-all answers. Each LLM is better suited for different tasks, so we need evaluations that reflect that.
It’s important to look at specific skills of LLMs, like how well they follow instructions or retrieve information. This will help users understand which model works best for their needs.
We need more detailed benchmarks that assess individual capabilities rather than general performance scores. This way, developers can make smarter choices when selecting LLMs for their projects.

self learning models (1)

Eternal Sunshine of the Stochastic Mind • 119 implied HN points • 02 May 24

🕹 Technology Machine Learning

Machine Learning is a leap of faith in Computer Science where data shapes the outcome rather than instructions.
In machine learning, viewing yourself as a neural network model can offer insights into self-improvement.
Understanding machine learning concepts can help in identifying learning failures, training the mind, and reflecting on personal objectives.

RAG Foundry By Intel

Cobus Greyling on LLMs, NLU, NLP, chatbots & voicebots • 19 implied HN points • 13 Aug 24

🕹 Technology Machine Learning

RAG Foundry is an open-source framework that helps make the use of Retrieval-Augmented Generation systems easier. It brings together data creation, model training, and evaluation into one workflow.
This framework allows for the fine-tuning of large language models like Llama-3 and Phi-3, improving their performance with better, task-specific data.
There is a growing trend in using synthetic data for training models, which helps create tailored datasets that match specific needs or tasks better.

Expanding AI Horizons: The Rise of Function Calling in LLMs

Gradient Flow • 279 implied HN points • 25 Jan 24

🕹 Technology Machine Learning

Function Calling in AI enables models to interact with external functions, going beyond basic text generation to execute actions based on requests.
Combining Retrieval Augmented Generation (RAG) with Function Calling enhances AI systems, allowing them to access external APIs to improve adaptability and assist in various tasks.
Despite its potential, Function Calling in AI faces challenges like security risks, ethical alignment, technical limitations, and the need for advancements in contextual understanding for full potential realization.

A Comprehensive Approach to Using LLMs

Gradient Flow • 519 implied HN points • 05 Oct 23

🕹 Technology Machine Learning

Starting with proprietary models through public APIs, like GPT-4 or GPT-3.5, is a common and easy way to begin working with Large Language Models (LLMs). This stage allows exploration with tools like Haystack.
Transitioning to open source LLMs provides benefits like cost control, speed, and stability, but requires expertise in managing models, data, and infrastructure. Using open source LLMs like Llama models from Anyscale can be efficient.
Creating custom LLMs offers advantages of tailored accuracy and performance for specific tasks or domains, though it requires calibration and domain-specific data. Managing multiple custom LLMs enhances performance and user experience but demands robust serving infrastructure.

SpatialBench: Can Agents Analyze Real-World Spatial Biology Data?

LatchBio • 41 implied HN points • 26 Dec 25

🔬 Science Machine Learning

SpatialBench is a realistic suite of 146 verifiable spatial biology problems across five platforms and seven task types that recreates real analyst workspaces using snapshots of data and images.
Current agent models perform poorly overall (roughly 20–38% accuracy) and vary widely by task and platform, and the choice of execution harness or wrapper can change outcomes as much as changing the base model.
Inspecting agent trajectories reveals clear failure modes and productive strategies, showing that detailed traces help explain performance and that benchmarks like this are a practical first step toward engineering agents that can reliably automate spatial biology analysis.

Data Science Weekly - Issue 525

Data Science Weekly Newsletter • 359 implied HN points • 15 Dec 23

🕹 Technology Machine Learning

Learning about causal models is important in data analysis because it helps explain what caused the data. This understanding can improve how we interpret results using Bayesian methods.
There's growing concern over data privacy in AI tools like Dropbox. Users are worried their private files could be used for AI training, even though companies deny this.
Netflix recently held a Data Engineering Forum to share best practices. They discussed ways to improve data pipelines and processing, which could benefit many in the data engineering community.

SAI Notes #04: CI/CD for Machine Learning.

SwirlAI Newsletter • 511 implied HN points • 28 May 23

🕹 Technology Machine Learning

In Machine Learning projects, CI/CD processes need to treat the ML training pipeline separately from regular software pipelines.
Efficient MLOps implementation requires an organizational structure where ML product development flows within a single end-to-end ML team.
ML systems in mature MLOps setups involve ML teams building and delivering pipelines that expose predictions to end users through backend and frontend services.

SHAP Is Not All You Need

Mindful Modeler • 898 implied HN points • 07 Feb 23

🕹 Technology Machine Learning

It's important to avoid assuming one method is always the best for all interpretation contexts when working with machine learning interpretability tools like SHAP.
Different interpretability methods like SHAP and permutation feature importance (PFI) have unique goals and can provide different insights, so it's crucial to choose the method that aligns with the specific question you want to answer.
Research on interpretability should be more driven by questions rather than methods, to ensure that the tools used provide meaningful insights based on the context.

Technically Monthly (February 2026)

Technically • 14 implied HN points • 05 Feb 26

🕹 Technology Machine Learning

Modern generative models mirror pathways in the human brain, and many researchers believe leveraging that similarity could be key to much stronger AI.
Real cloud-spend data shows the fastest-growing AI use cases are coding agents, low-latency LLM inference, and computational biology, while AI art and video generation have plateaued as the market professionalizes.
Models overuse em dashes mainly because of their training data and tokenization quirks—older texts and auto-converted punctuation make the em dash common—and this highlights how dataset quality and representativeness drive model behavior.

Decay Functions

Abstraction • 29 implied HN points • 09 Jan 26

🕹 Technology Machine Learning

A single probability for a time window needs a decay model because where the probability mass sits across the window determines how much chance remains as time passes.
Probability can follow different hazard patterns—constant (linear decay), increasing (back-loaded, like last‑minute negotiations), decreasing (front‑loaded, like ceasefires), or event‑driven—and each pattern changes how fast the cumulative probability is consumed over time.
The forecasting bot classifies which hazard applies (defaulting to constant when unsure) and uses that to update remaining probability as time elapses, but this is a refinement that can be misclassified and matters most for long‑horizon questions.

Using AI: Queries, Conversations, and Projects

In My Tribe • 303 implied HN points • 11 Jun 25

🕹 Technology Machine Learning

A conversation with AI is different from simply asking a question. You can explore topics more deeply and learn from the back-and-forth interaction.
Using AI for projects is essential to becoming skilled with it. It’s like doing a group assignment, where you can create something together.
Providing clear instructions and materials to AI helps it assist you better. Treating it like a partner, rather than just a tool, can lead to better results.

Why We Must Build World Models

Reasons to Be Optimistic • 6 implied HN points • 17 Feb 26

🕹 Technology Machine Learning

Text-only models are powerful but incomplete because language misses how the world actually looks, moves, and feels; video offers a far richer, high-volume source of physics, sound, and human behavior.
True world models must be causal and action-conditioned, predicting the next state step-by-step under intervention; autoregressive diffusion transformer architectures trained on multimodal video and actions are a promising path.
General world models will turn naive software into systems that understand and interact with the real world, enabling adaptive robots, immersive simulations, new learning tools, and large-scale scientific discovery.

Classification and Method Selection

Abstraction • 29 implied HN points • 08 Jan 26

🕹 Technology Machine Learning

Match the forecasting method to the question type: classify questions into base-rate, time-series, conditional-chain, or novel-event and route each to a specialized approach.
Use the right technique for each class: use historical reference classes and adjustments for base rates, simulate trajectories for time-series questions, multiply conditional probabilities for conjunctive chains, and apply a Laplace-style prior for unprecedented events.
Track and improve empirically: use an LLM classifier (defaulting to base rate when unsure), choose reference classes and decompositions carefully, and measure which methods are over- or under-confident as you scale.

2026 - the year of AGI

davidj.substack • 23 implied HN points • 13 Jan 26

🕹 Technology Machine Learning

AGI means an AI that can learn many different tasks and perform many things at least as well as a typical human — it doesn't require sentience or being a superintelligence.
Progress toward AGI will rely more on post-training learning: agents that can learn after deployment, retain skills, and build or use tools, rather than just bigger pretraining runs.
Narrow AGI will appear in specific domains soon via agents that learn and share useful skills while keeping private data local, but these systems will still have clear limits and won't replace all human abilities.

Data Science Weekly - Issue 542

Data Science Weekly Newsletter • 139 implied HN points • 12 Apr 24

🕹 Technology Machine Learning

This newsletter provides links and updates about data science, AI, and machine learning. It's a helpful resource for anyone wanting to stay informed in this field.
One article teaches how to handle real questions using Python, which is great for people wanting practical coding skills. Another discusses techniques to make sure AI outputs stay on task.
The newsletter also features resources and courses to help people learn and improve their skills in data science and related areas. It's a good place to find learning opportunities.

Claude 4 and Anthropic's bet on code

Democratizing Automation • 332 implied HN points • 27 May 25

🕹 Technology Machine Learning

Claude 4 is a strong AI model from Anthropic, focused on coding and software tasks. It has a unique personality and improved performance over its predecessors.
The benchmarks for Claude 4 might not look impressive compared to others like ChatGPT and Gemini, which could affect its market position. It's crucial for Anthropic to show real-world utility beyond just numbers.
Anthropic aims to lead in software development, but they fall behind in general benchmarks. This may limit their ability to compete with bigger players like OpenAI and Google in the race for advanced AI.

Week #1: Getting Started With Conformal Prediction For Classification

Mindful Modeler • 1018 implied HN points • 20 Dec 22

🕹 Technology Machine Learning

Model predictions should consider uncertainty to make informed decisions. Decisions relying only on point predictions can be risky.
Conformal prediction is a method that can provide rigorous uncertainty scores, giving probabilistic guarantees of covering the true outcome.
Conformal prediction is simple to apply, often with just 3 lines of code. It is model-agnostic, distribution-free, and comes with coverage guarantees.

Grok 3 and an accelerating AI roadmap

Democratizing Automation • 554 implied HN points • 18 Feb 25

🕹 Technology Machine Learning

Grok 3 is a new AI model that's designed to compete with existing top models. It aims to improve quickly, with updates happening daily.
There's increasing competition in the AI field, which is pushing companies to release their models faster, leading to more powerful AI becoming available to users sooner.
Current evaluations of AI models might not be very practical or useful for everyday life. It's important for companies to share more about their evaluation processes to help users understand AI advancements.