The hottest Machine Learning Substack posts right now

And their main takeaways

The AI user's guide to evals

Technically • 14 implied HN points • 11 Dec 25

🕹 Technology Machine Learning

Evals are software tests for AI that turn fuzzy model outputs into measurable metrics so you can find and fix errors instead of guessing.
Look at your data first — analyze real outputs to spot where the model fails, because you can’t measure or fix problems you don’t identify.
Start with simple keyword checks and assertions before building complex “LLM-as-judge” setups, and iterate: test, fix, measure, repeat; otherwise your system just feels like a slot machine.

Artificial Intelligence: The importance of temporal validity for financial AI

The Fintech Blueprint • 78 implied HN points • 09 Jan 24

🕹 Technology Machine Learning

Understanding time series data can give a competitive edge in the financial markets.
Fintech's future relies on building better AI models with temporal validity.
AI in finance involves LLMs, generative AI, machine learning, deep learning, and neural networks.

Transformers are Eating Quantum

TheSequence • 217 implied HN points • 24 Nov 24

🕹 Technology Machine Learning

Quantum computing faces challenges due to noise affecting performance. AI, specifically AlphaQubit, helps improve error correction in quantum systems.
AlphaQubit uses a neural network design from language models to better decode quantum errors. It shows greater accuracy and adapts to various data types effectively.
While AlphaQubit is a major step forward, there are still issues to tackle, mainly concerning its speed and ability to scale for larger quantum systems.

Moving Past RLHF: In 2025 We Will Transition from Preference Tuning to Reward Optimization in Foundation Models

TheSequence • 189 implied HN points • 29 Dec 24

🕹 Technology Machine Learning

Artificial intelligence is moving from preference tuning to reward optimization for better alignment with human values. This change aims to improve how models respond to our needs.
Preference tuning has its limits because it can't capture all the complexities of human intentions. Researchers are exploring new reward models to address these limitations.
Recent models like GPT-o3 and Tülu 3 showcase this evolution, showing how AI can become more effective and nuanced in understanding and generating language.

The Sequence AI of the Week #769: Inside Gemini Deep Think

TheSequence • 14 implied HN points • 10 Dec 25

🕹 Technology Machine Learning

Gemini Deep Think is a “thinking layer” added on top of large multimodal models that turns a mixture-of-experts into a coordinated swarm of small reasoning agents.
It runs parallel, coordinated inference-time processes, which let it solve very hard problems and achieve state-of-the-art results on benchmarks like Olympiad-level math.
The key insight is that how you use compute at inference time matters as much as raw parameter count, pushing future model design toward dynamic runtime strategies.

Get a weekly roundup of the best Substack posts, by hacker news affinity:

Edge 375: Meta's System 2 Attention is a Very Unique LLM Reasoning Method

TheSequence • 462 implied HN points • 05 Mar 24

🕹 Technology Machine Learning

Meta's System 2 Attention method in LLM reasoning is inspired by cognitive psychology and immediately impacts reasoning.
LLMs excel in reasoning by focusing intensely on the context to predict the next word, but they can be misled by irrelevant correlations in context.
Understanding Meta's System 2 Attention helps in comprehending the functioning of Transformer-based LLMs.

Recursive Language Models ("RLMs")

Why Now • 7 implied HN points • 09 Jan 26

🕹 Technology Machine Learning

Models suffer from "context rot" on very long inputs: attention gets diluted, positional signals degrade, and small mistakes compound over long sequences.
Recursive Language Models (RLMs) handle long context by having a root model peek, create targeted context slices, spawn sub-models to summarize or process each chunk, and then combine results, so each model sees much less context.
RLMs have shown strong empirical gains and cost savings on long-context benchmarks, and they could enable scalable codebase reasoning, long-running assistants, and other tasks that need effectively unlimited context.

FIT-RAG: Are RAG Architectures Settling On A Standardised Approach?

Cobus Greyling on LLMs, NLU, NLP, chatbots & voicebots • 39 implied HN points • 02 Apr 24

🕹 Technology Machine Learning

As RAG systems evolve, they are integrating more smart features to enhance their effectiveness. This means they are not just providing basic responses but are becoming more advanced and adaptable.
The challenges with RAG include static rules for retrieving data and the problem of excessive tokens during processing. These issues can slow down performance and reduce efficiency.
FIT-RAG is addressing these challenges with new tools, like a special document scorer and token reduction strategies, to improve how information is retrieved and used. This helps RAG systems provide better answers while using fewer resources.

GPT-4 is "WEIRD"—what should we do about it?

The Counterfactual • 59 implied HN points • 12 Feb 24

🕹 Technology Machine Learning

Large Language Models (LLMs) like GPT-4 often reflect the views of people from Western, educated, industrialized, rich, and democratic (WEIRD) cultures. This means they may not accurately represent other cultures or perspectives.
When using LLMs for research, it's important to consider who they are modeling. We should check if the data they were trained on includes a variety of cultures, not just a narrow subset.
To improve LLMs and make them more representative, researchers should focus on creating models that include diverse languages and cultural contexts, and be clear about their limitations.

The Smartest Bear vs the Dumbest Human: The Implications of AI's Growing Problem-Solving Abilities

The Future of Life • 19 implied HN points • 04 Jun 24

🕹 Technology Machine Learning

AI is getting really good at problem-solving, even beating humans at some tasks, like solving CAPTCHAs. This shows that AI can reason better than many humans, especially in certain situations.
The Turing test isn't just one hurdle to jump over; it's a series of challenges that measure how closely AI can act like a human. As AI improves, it passes more of these challenges, showing its capabilities.
While current AI isn't fully intelligent like a human, it's almost ready to solve a lot of problems. The only big limitation is how much computing power is available for training these AI systems.

The Sequence Opinion #480: What is GPT-o1 Actually Doing?

TheSequence • 161 implied HN points • 30 Jan 25

🕹 Technology Machine Learning

GPT models are becoming more advanced in reasoning and problem-solving, not just generating text. They are now synthesizing programs and refining their results.
There's a focus on understanding how these models work internally through ideas like hypothesis search and program synthesis. This helps in grasping the real innovation they bring.
Reinforcement learning is a key technique used by newer models to improve their outputs. This shows that they are evolving and getting better at what they do.

Data Science Weekly - Issue 501

Data Science Weekly Newsletter • 179 implied HN points • 30 Jun 23

🕹 Technology Machine Learning

Data scientists are sharing tips on how to make their scientific data more accessible and useful. This helps others to understand and use the data better.
There are many discussions happening about the benefits and drawbacks of large language models (LLMs) like ChatGPT. Some people believe they are amazing, while others think they aren't very helpful.
Naming things in programming can be tough, but there are resources and books that can help. Learning the right naming conventions can improve coding practices.

Data Science Weekly - Issue 497

Data Science Weekly Newsletter • 199 implied HN points • 02 Jun 23

🕹 Technology Machine Learning

Data drift doesn't always hurt model performance, so it's important to analyze the context before reacting to it.
Work on solving bigger problems as you grow in your career, instead of waiting for difficult tasks to be handed to you.
To improve a model's reasoning skills, reward it for each correct step in problem-solving, not just the final answer.

The Future of Search and How You Can Shape It

Gradient Flow • 199 implied HN points • 23 Feb 23

🕹 Technology Machine Learning

The blend of artificial intelligence and chatbot interfaces, like seen in ChatGPT, is transforming search applications, with startups emphasizing large language models for better search experiences.
Expectations around user interactions with company websites are changing with the rise of chatbot-equipped search engines, requiring integration of AI and foundation models for improved responses incorporating text, images, videos, and audio.
Data and AI teams are crucial in developing, testing, and maintaining next-generation search applications, with companies likely seeking more control over their data and the potential creation of custom models for enhanced privacy and innovation.

Neologisms

johan’s substack • 19 implied HN points • 02 Jun 24

🕹 Technology Machine Learning

Exploring neologisms can reveal insights into AI models and their inner workings.
Speculative neologisms can provide a framework for understanding how AI processes information and feelings.
Using neologisms can help simulate and investigate complex behaviors in AI models and uncover hidden structures.

Comparing Human, LLM & LLM-RAG Responses

Cobus Greyling on LLMs, NLU, NLP, chatbots & voicebots • 59 implied HN points • 09 Feb 24

🕹 Technology Machine Learning

The study compared answers from humans, a basic LLM, and an LLM that uses RAG to see which is most accurate in healthcare. The LLM with RAG performed the best.
Using RAG, the model was much quicker than humans, taking only about 15-20 seconds. Humans took around 10 minutes to respond.
GPT-4, especially with RAG, showed high accuracy and can support doctors by providing fast and reliable answers, but humans should still check the information.

HOISTED FROM COMMENTS: RAFAEL KAUFMANN: Carving Nature at the Joints: Faithful Representation, the Platonic Dream, & the Unreasonable Near-Success of GPT LLM MAMLMs

Brad DeLong's Grasping Reality • 69 implied HN points • 25 Jun 25

🕹 Technology Machine Learning

Machines, like large language models, can imitate human language because they find patterns hidden in how we express ourselves. They simplify the chaos of our words into something easier to understand.
Even though these models are good at predicting responses, they struggle with truly understanding the world. They can replicate language well, but grasping the deeper meaning remains a challenge.
The hope is that with better training and understanding causal relationships, these models could evolve to not only imitate but truly comprehend the world around them.

Last Words on AI

God's Spies by Thomas Neuburger • 80 implied HN points • 10 Jun 25

🕹 Technology Machine Learning

AI can't solve new problems unless they've been solved by humans before. It relies on previous data and patterns to operate.
AI is largely a tool driven by greed, impacting our environment negatively. Its energy demands could worsen the climate crisis.
Current AI models are not genuinely intelligent; they mimic patterns they've learned without real reasoning ability. This highlights that we are far from achieving true artificial general intelligence.

10 ways to estimate SHAP

Mindful Modeler • 119 implied HN points • 18 Jul 23

🕹 Technology Machine Learning

SHAP values are estimated using various methods due to computational constraints
Estimation methods include exact explainer, sampling explainer, permutation explainer, and more to attribute model predictions to features
The `shap` package implements multiple estimation methods, with defaults based on the type of data and model

The Sequence Opinion #662: From Words to Worlds: Some Observations About World Models

TheSequence • 77 implied HN points • 12 Jun 25

🕹 Technology Machine Learning

LLMs are great with words, but they struggle with understanding and acting in real-life environments. They need to develop spatial intelligence to navigate and manipulate the world around them.
Spatially-grounded AI can create internal models of their surroundings, which helps them operate in real spaces. This advancement represents a big step forward in general intelligence for AI.
The essay discusses how new AI designs focus on spatial reasoning instead of just language, emphasizing that understanding the physical world is a key part of being intelligent.

Creating Individualised Chess Engines

Chess Engine Lab • 39 implied HN points • 26 Mar 24

🕹 Technology Machine Learning

An engine called Maia focused on predicting human moves accurately instead of just being the strongest in chess, resulting in a more meaningful impact, especially for club-level players.
By individualizing chess engines to predict moves of specific players, accuracy can be increased by 4-5% and players can be identified with 98% accuracy from a pool of 400, based on their game patterns.
Identifying players through their mistakes is a crucial aspect - as mistakes are unique to individual players, understanding and fixing them can greatly aid in chess improvement.

A reply to Michael Huemer on AI

Matthew Barnett’s Blog • 117 implied HN points • 17 Feb 23

🕹 Technology Machine Learning

We don't fully understand how AI like ChatGPT works and it may have some true understanding.
AI models like ChatGPT are not perfect and do have limitations.
It's important to differentiate between what an AI model is optimized to do and what it actually does.

LLM Links

In My Tribe • 167 implied HN points • 23 Dec 24

🕹 Technology Machine Learning

AI-generated podcasts can share information in new ways, like converting written essays into audio. This shows how AI can create engaging content without much input.
Large Language Models (LLMs) struggle to learn new concepts as effectively as humans do because they rely on past data. Humans continue to adapt and learn from everyday experiences.
The potential economic impact of robots is huge, especially for tasks like cleaning and driving. The market for humanoid robots could reach trillions, and they might also help reduce accidents.

Dashing Data Viz - Issue 233

Dashing Data Viz • 117 implied HN points • 11 Apr 23

🕹 Technology Machine Learning

Dive into various topics in data visualization such as interactive data visualization and mapping rivers
Explore unique projects like creating pretty maps from OpenStreetMap data and generating stylized visualizations
Discover resources for improving dashboard designs and achieving realistic color mixing in web projects

Delegating To Computers

As Clay Awakens • 117 implied HN points • 17 Sep 23

🕹 Technology Machine Learning

Delegating tasks to computers can be challenging due to difficulty in conveying the task
Approaches to delegation include instruction, demonstration, and explanation
Delegation via instruction requires detailed guidance, while delegation via explanation involves explaining the task to the assistant

The best kept secret to ML success.

Data Engineering Central • 117 implied HN points • 17 Apr 23

🕹 Technology Machine Learning

The best secret to ML success is Databricks + Delta Lake.
There's an overload of Machine Learning content, and not all of it is valuable.
Consider a 7-day free trial to learn more about the best-kept secret to ML success.

Releasing Vodka V2 and All the Details How We Made it [Part 2]

followfox.ai’s Newsletter • 117 implied HN points • 18 May 23

🕹 Technology Machine Learning

Vodka V2 was released with an updated dataset and marginally better model compared to V1
The key changes in V2 included using a better dataset, increasing data volume, and cleaning the data more thoroughly
The training protocol for V2 involved lower learning rate and enhanced data cleaning to achieve smoother training and optimize model performance

More thoughts on chatbot design; TruthGPT, really?

Paola Writes • 117 implied HN points • 21 Apr 23

🕹 Technology Machine Learning

Design chatbots to be noticeably different from humans for ethical purposes.
Avoid language and actions in AI that simulate human feelings to protect autonomy.
Consider the implications and potential risks of creating AI models like TruthGPT.

2023 Trends To Watch: Data, Machine Learning, AI

Gradient Flow • 219 implied HN points • 12 Jan 23

🕹 Technology Machine Learning

2023 Trends to Watch: Data, Machine Learning, and AI are key areas to keep an eye on for advancements and innovations.
Tech job market shifts: Despite challenges, demand for skilled professionals in MLOps and MLflow showcases opportunities for job seekers.
Financial market impacts on data companies: Young data infrastructure companies faced stock value drops in 2022, with some like Klarna, Stripe, and Thoughtspot showing resilience amidst challenges.

A New Study Compares RAG & Fine-Tuning For Knowledge Base Use-Cases

Cobus Greyling on LLMs, NLU, NLP, chatbots & voicebots • 39 implied HN points • 25 Mar 24

🕹 Technology Machine Learning

Choosing technology depends on what you need to achieve. Focus on the specific requirements of the problem to find the right solution.
Retrieval-Augmented Generation (RAG) is often more effective than Fine-Tuning for knowledge base tasks. It allows for quick searches and better accuracy.
RAG systems are easier to update with new information compared to Fine-Tuned models. You can simply add new data without complex adjustments.

ChatGPT has gone berserk

Marcus on AI • 432 HN points • 21 Feb 24

🕹 Technology Machine Learning

ChatGPT has had some issues reported by users recently, causing concern.
Generative AI is complex and sometimes unpredictable due to the nature of data and prompts used.
There is a call for alternative technologies that are more interpretable and reliable when compared to current AI systems.

On ChatGPT-4, Mistral 7B OpenChat and OpenAI going vertical

ML Under the Hood • 98 implied HN points • 12 Nov 23

🕹 Technology Machine Learning

OpenAI released new versions of its GPT models, making them cheaper but still effective
OpenAI is expanding its offerings with larger context windows, image processing, and Chat GPT apps
Mistral 7B OpenChat is an affordable and competitive open-source model catching up with GPT-3.5

Data Science Weekly - Issue 480

Data Science Weekly Newsletter • 279 implied HN points • 02 Feb 23

🕹 Technology Machine Learning

The newsletter is now hosted on Substack and remains free for everyone. A paid option is available for more features and interactions.
Data teams need to build trust with stakeholders to effectively measure their value and justify their budgets. Having good relationships is more important than just metrics.
Understanding MLOps is crucial for the industry. It involves not only the tools but also the culture and practices around machine learning operations.

AI Hallucinations: Proven Unsolvable - What Do We Do?

Mind Prison • 73 implied HN points • 17 Jun 25

🕹 Technology Machine Learning

AI hallucinations happen because AI relies on patterns from limited data, which can't cover everything. This means AI will always make mistakes when trying to understand things outside its knowledge.
We need to treat all AI outputs with caution since they can all be hallucinations. It's important to check and verify what the AI says, especially in critical situations.
The issue of hallucinations is built into how AI works, so trying to completely fix them isn't possible. Instead, we should focus on verifying AI results to ensure reliability.

The Super Weight in Large Language Models

Gonzo ML • 189 implied HN points • 29 Nov 24

🕹 Technology Machine Learning

There's a special weight in large language models called the 'super weight.' If you remove it, the model's performance crashes dramatically, showing just how crucial it is.
Super weights are linked to what's called 'super activations,' meaning they help generate better text. Without them, the model struggles to create coherent sentences.
Finally, researchers found ways to identify and protect these super weights during the model training and quantization processes. This makes the model more efficient and retains its quality.

Opus 4.5, Z-Image, DeepSeekMath-V2, FLUX.2, Fara-7B, Hunyuan 3D global, Supertonic TTS, ltx Retake and more

AI Brews • 15 implied HN points • 28 Nov 25

🕹 Technology Machine Learning

FLUX.2 can create super detailed images and infographics with up to 10 references combined. This means it can help artists and designers make more complex visuals easily.
Z-Image is a powerful image generation model that works well even on regular computers. It can produce amazing images while accurately handling both English and Chinese text.
The Retake feature from ltx studio lets users quickly change parts of a video after it's made. This saves time by keeping most of the video the same while only adjusting specific scenes.

Pay Attention to Small Weights

Gonzo ML • 63 implied HN points • 06 Jul 25

🕹 Technology Machine Learning

Small weight updates during model training can lead to better results, especially since large weights might hold key features that we don't want to change.
Using a method called NanoAdam, we can focus on smaller weights, which allows for more efficient memory usage and better performance during fine-tuning.
It seems that large gradients often come from small weights, suggesting that sometimes it’s smarter to update these smaller weights instead of the larger ones.

Why I’m excited about AI-assisted human feedback

Musings on the Alignment Problem • 459 implied HN points • 29 Mar 22

🕹 Technology Machine Learning

The use of reinforcement learning from human feedback (RLHF) has been successful in aligning models with human intent like following instructions.
Training AI systems on tasks that are hard for humans to evaluate may not be directly solvable with RLHF due to challenges in generalization and evaluation.
AI-assisted human feedback, like recursive reward modeling (RRM), can help tackle complex tasks by involving human evaluation in aligning AI systems.

An Introduction To DSPy

Cobus Greyling on LLMs, NLU, NLP, chatbots & voicebots • 19 implied HN points • 28 May 24

🕹 Technology Machine Learning

DSPy is a programming tool that simplifies how we work with language models by separating the tasks from the prompts. This means you tell DSPy what to do, not how to do it.
It uses something called 'signatures' to describe tasks in a simple way, which helps in generating and optimizing prompts automatically. This reduces the need for manual prompt crafting.
DSPy offers an iterative workflow for optimizing language tasks, making it suitable for complex applications. It can improve performance with minimal effort by tweaking how it uses language models.

Scaling human feedback with fine-tuned open-source LLMs

LLMs for Engineers • 59 implied HN points • 30 Jan 24

🕹 Technology Machine Learning

Fine-tuned open-source models like Llama and Mistral can produce accurate feedback, similar to high-performing custom models. They're a great option for companies needing control over their data.
Using tools like Axolotl and Modal makes it easier to fine-tune these models. They help create customized training jobs and simplify deploying models across multiple GPUs.
Fine-tuning significantly improves the clarity and structure of the model's output. It reduces irrelevant information, allowing for cleaner, more useful results.