The hottest Machine Learning Substack posts right now

And their main takeaways

The Customer Is Always Right (but not always human)

How the Hell • 49 implied HN points • 17 Sep 25

🕹 Technology Machine Learning

AI agents are getting much better at long, uninterrupted work and will learn to budget their thinking and compute, which will push costly or complex tasks from cheap subscriptions to pay-per-use models.
Agents will pay for external resources like compute, data, web access, and licenses, and websites and services will likely charge tiny fees to serve those automated clients.
A new market will appear to sell services to agents—everything from automated testing, voices, and compliance checks to agent banks and even shady offerings like credential markets.

Metadirection

Covidian Æsthetics • 28 implied HN points • 11 Nov 25

🕹 Technology Machine Learning

Metadirection is all about keeping awareness of interactions with AI as a type of performance, rather than seeing the AI as a real person. This helps users navigate the conversation without getting lost in it.
Users can use specific techniques like 'framing' and 'distancing' to maintain a balance between being engaged and aware. This prevents confusion between the AI's outputs and personal thoughts.
Staying flexible and open to possibility is key. Techniques like 'swerving' allow the user to introduce new ideas, keeping the dialogue dynamic and ensuring the user stays in control of the interaction.

Streamlit Simplicity With GPT-4: One Powerful Prompt To Dashboard Creation

Data at Depth • 39 implied HN points • 29 Apr 24

🕹 Technology Machine Learning

Create an interactive Python Streamlit dashboard with a single GPT-4 prompt.
The complexity of Streamlit dashboard design is simplified using GPT-4 prompting.
GPT-4 can streamline the process of developing data visualization dashboards by providing code generation based on prompts.

LLMs and World Models, Part 2

AI: A Guide for Thinking Humans • 196 implied HN points • 13 Feb 25

🕹 Technology Machine Learning

LLMs (like OthelloGPT) may have learned to represent the rules and state of simple games, which suggests they can create some kind of world model. This was tested by analyzing how they predict moves in the game Othello.
While some researchers believe these models are impressive, others think they are not as advanced as human thinking. Instead of forming clear models, LLMs might just use many small rules or heuristics to make decisions.
The evidence for LLMs having complex, abstract world models is still debated. There are hints of this in controlled settings, but they might just be using collections of rules that don't easily adapt to new situations.

A quick introduction to Reinforcement Learning [Math Mondays]

Technology Made Simple • 159 implied HN points • 17 Oct 23

🕹 Technology Machine Learning

Reinforcement Learning is a big part of Machine Learning, focused on maximizing rewards for models.
Setting up Reinforcement Learning involves components like RL agents, suitable for teaching AI to play games and develop various skills.
Reinforcement Learning is valuable because it can show unexpected system vulnerabilities by behaving differently from humans.

Get a weekly roundup of the best Substack posts, by hacker news affinity:

The Sequence AI of the Week #777: Thinking Fast, Thinking Cheap: Thinking Fast, Thinking Cheap: The Nemotron 3 Blueprint

TheSequence • 14 implied HN points • 24 Dec 25

🕹 Technology Machine Learning

NVIDIA launched the Nemotron 3 family (Nano, Super, and Ultra), establishing a new baseline for open-weight AI and moving into the reasoning-model race.
The models use a hybrid Mamba-Transformer Mixture-of-Experts design, and Nemotron 3 Nano achieves a new state-of-the-art for the 30B parameter class, showing strong efficiency and performance.
This release signals a shift away from brute-force dense Transformers toward more architecture-efficient, cost-effective models that matter for enterprises and researchers.

So... what is multi-modal AI? And why is the internet losing their mind about it? [Math Mondays]

Technology Made Simple • 159 implied HN points • 10 Oct 23

🕹 Technology Machine Learning

Multi-modal AI integrates multiple types of data in the same training process, allowing models to represent data in a common n-dimensional space.
Multi-modality adds an extra dimension to data, expanding the search space exponentially, enabling more diverse and powerful AI applications.
While multi-modality enhances model performance, it does not solve fundamental issues with AI models like GPT, and simpler technologies may be more effective for certain use-cases.

Brief: Pentagon's new AI chief is former Google Trust & Safety exec who previously helped guide U.S. special operations

All-Source Intelligence Fusion • 569 implied HN points • 14 Mar 24

🕹 Technology Machine Learning

Radha Iyengar Plumb, a former Google Trust & Safety exec, will become the Pentagon's new Chief Digital and AI Officer in April, replacing Craig Martell.
Iyengar Plumb has had a diverse career, transitioning from a professor to roles at RAND, the National Security Council, Google, Facebook, and now the Pentagon.
Executives like Iyengar Plumb moving between tech companies like Google and roles in the defense and intelligence community highlights the intersecting realms of technology and national security.

DoRA is The New LoRA!

Aziz et al. Paper Summaries • 59 implied HN points • 07 Apr 24

🕹 Technology Machine Learning

LoRA helps fine-tune large language models without changing all their parameters. It uses two small matrices, which keeps the performance quick during use.
LoRA's updates to weights can miss valuable details you'd get from full fine-tuning, because it treats magnitude and direction together.
DoRA improves on LoRA by separating magnitude and direction, leading to better performance on reasoning tasks and other applications. It works best with smaller settings, making it efficient.

Data Science Weekly - Issue 503

Data Science Weekly Newsletter • 219 implied HN points • 14 Jul 23

🕹 Technology Machine Learning

Machine learning is making its way into finance, and researchers are identifying practical uses for it. This can help finance professionals learn new tools and statisticians find interesting financial problems to solve.
AI platforms, like social media, are becoming crucial in our lives but can be confusing and unreliable. People are figuring out how to use these platforms effectively despite their unpredictability.
Large language models are changing how data scientists work. These models can automate many tasks, allowing data scientists to focus on managing and assessing the AI's outputs.

Minds Aren't Programs. Neither Are AIs.

Outlandish Claims • 19 implied HN points • 20 Jun 24

🕹 Technology Machine Learning

Most artificial intelligences were computer programs executed by code, fundamentally different from human minds.
Artificial intelligence 'trainees', like GPT, aren't classified as programs or minds but act as learners mimicking human expertise.
The process of creating AI 'trainees' involves converting inputs/outputs into numbers, forming formulas through trial and error, and testing for accuracy.

How to build a side-project to get a job in Machine Learning [Storytime Saturdays]

Technology Made Simple • 159 implied HN points • 01 Oct 23

🕹 Technology Machine Learning

Developing an amazing side project is crucial for getting your first job in Machine Learning. Ditch the basic datasets and focus on building exceptional projects to stand out.
When building your career in Machine Learning, individual factors like goals, interests, skills, location, experience, and networks play a significant role. Tailor your approach based on your unique situation.
For undergrad students seeking a role in Machine Learning, focusing on creating strong side projects is a key step. These projects can help you differentiate yourself and showcase your skills effectively.

The SHAP Book is Available in Print 🥳

Mindful Modeler • 159 implied HN points • 12 Sep 23

🕹 Technology Machine Learning

SHAP is an explainable AI technique that computes Shapley values for machine learning predictions, attributing predicted value among features fairly.
SHAP is versatile and model-agnostic, working with any model type from linear regression to deep learning, and handling various data formats like tabular, image, or text.
The SHAP Book offers a comprehensive guide to mastering the theory and application of SHAP, suitable for data scientists, statisticians, machine learners, and those familiar with Python.

The Tech Buffet #1: How To Design a System To Chat With Your Private Data

The Tech Buffet • 159 implied HN points • 04 Sep 23

🕹 Technology Machine Learning

Building a custom chatbot helps in getting accurate answers from specific internal data without the risk of it making things up. This is especially useful for specialized knowledge.
Using a chatbot saves time and makes it super easy to find information quickly, boosting productivity for users.
You can keep improving and updating the bot as your data changes, and you have full control over privacy by using open-source tools.

From bare-bones to holistic machine learning

Mindful Modeler • 159 implied HN points • 08 Aug 23

🕹 Technology Machine Learning

Machine learning can range from simple, bare-bones tasks to more complex, holistic approaches.
In bare-bones machine learning, the modeling choices are defined, making it about the model's performance and tuning.
Holistic machine learning involves designing the model to connect with the larger context, considering factors like uncertainty, interpretability, and shifts in distribution.

The Sequence #668: Inside V-JEPA 2: Meta AI's Breakthrough in Self-Supervised Visual World Modeling

TheSequence • 98 implied HN points • 20 Jun 25

🕹 Technology Machine Learning

V-JEPA 2 is an advanced AI model from Meta that improves how machines learn about the world without needing labeled data. It builds on the original V-JEPA framework and aims for better understanding and modeling of environments.
The new version enhances architectural size and training methods, allowing the AI to make predictions about its surroundings more effectively. This could lead to smarter and more capable AI systems.
With V-JEPA 2, we are moving closer to creating AI that can think and act on its own, resembling human-like reasoning. This is an exciting step towards achieving more advanced AI technologies.

The Sequence Knowledge #675: Learning to Evaluate Multi-Agent AIs

TheSequence • 91 implied HN points • 01 Jul 25

🕹 Technology Machine Learning

Multi-agent benchmarks are important now because they test how AI agents can work together, unlike old methods that focused on just one agent at a time.
These new benchmarks help us see how well AI can handle tasks that involve teamwork and communication in changing environments.
As AI gets better, understanding how these systems interact will be key to unlocking smarter, more capable AI behavior.

Find Optimal Learning Rates for Stable Diffusion Fine-tunes

followfox.ai’s Newsletter • 157 implied HN points • 13 Mar 23

🕹 Technology Machine Learning

Estimate the minimum and maximum learning rate values by observing when the loss decreases and increases during training.
Choosing learning rates within the estimated range can optimize model training.
Validating learning rate ranges and fine-tuning with different datasets can improve model flexibility and accuracy.

TikTok's Recommendation Engine Explained: Monolith

MLOps Newsletter • 157 implied HN points • 30 Jul 23

🕹 Technology Machine Learning

TikTok's recommendation system is designed to give real-time suggestions by using sparsity-aware factorization machines, online learning, and caching.
Multimodal deep learning focuses on text-image modeling due to lack of large annotated datasets for other modalities like video and audio.
A new framework called Parsel enables automatic implementation of complex algorithms with code language models, leading to better problem-solving results in competitions.

Data Science Weekly - Issue 496

Data Science Weekly Newsletter • 259 implied HN points • 26 May 23

🕹 Technology Machine Learning

AI has great potential to improve our lives but also comes with risks if misused. It's important to balance optimism and caution.
Tools like Copilot in Power BI make it easier for users to analyze and visualize data by allowing them to communicate their needs in plain language.
The concept of the 'Curse of Dimensionality' shows that sometimes having too much data can confuse models instead of helping them make better predictions.

Exploring the Purpose, Power & Potential of Small Language Models (SLMs)

Cobus Greyling on LLMs, NLU, NLP, chatbots & voicebots • 59 implied HN points • 11 Mar 24

🕹 Technology Machine Learning

Small Language Models (SLMs) can effectively handle specific tasks without needing to be large. They are more focused on doing certain jobs well rather than trying to be everything at once.
The Orca 2 model aims to enhance the reasoning abilities of smaller models, helping them outperform even bigger models when reasoning tasks are involved. This shows that size isn't everything.
Training with tailored synthetic data helps smaller models learn better strategies for different tasks. This makes them more efficient and useful in various applications.

The Sequence Research #543: The Leaderboard Illusion Challenges Chatbot Arena Type Benchmarks

TheSequence • 119 implied HN points • 16 May 25

🕹 Technology Machine Learning

Leaderboards in AI help direct research by showing who is doing well, but they can also create problems. They might not show the whole picture of how models really perform.
The Chatbot Arena is a way to judge AI models based on user choices, but it has issues that make it unfair. Some big labs can take advantage of the system more than smaller ones.
To make AI evaluations better, there need to be rules that ensure fairness and transparency. This way, everyone gets a fair chance in the AI race.

Data Science Weekly - Issue 505

Data Science Weekly Newsletter • 199 implied HN points • 28 Jul 23

🕹 Technology Machine Learning

Large language models use complex methods like word vectors and transformers to understand language, but this can be explained simply without heavy math. They need a lot of data to perform well.
Using AI tools like ChatGPT for real-world programming tasks can streamline the coding process, as it allows for a more focused workflow without switching between different resources.
Building effective data storage systems, like Amazon S3, involves overcoming interesting challenges and nuances, demonstrating the amazing technology behind big data management.

The Tech Buffet #22: Why You Should Consider Weaviate As Your Ultimate Vector Database

The Tech Buffet • 39 implied HN points • 23 Apr 24

🕹 Technology Machine Learning

Weaviate is a powerful vector database that helps in creating advanced AI applications. It's useful for managing large amounts of data and performing semantic searches efficiently.
When working with Weaviate, you can easily load and index data, allowing for quick access to information. This makes it easier to build systems that need to handle a lot of data quickly.
Weaviate supports different search methods like vector search, keyword search, and hybrid search. This way, you can find the most relevant results based on your needs.

OLMo 2 and building effective teams for training language models

Democratizing Automation • 245 implied HN points • 26 Nov 24

🕹 Technology Machine Learning

Effective language model training needs attention to detail and technical skills. Small issues can have complex causes that require deep understanding to fix.
As teams grow, strong management becomes essential. Good managers can prioritize the right tasks and keep everyone on track for better outcomes.
Long-term improvements in language models come from consistent effort. It’s important to avoid getting distracted by short-term goals and instead focus on sustainable progress.

Data Science Weekly - Issue 489

Data Science Weekly Newsletter • 299 implied HN points • 06 Apr 23

🕹 Technology Machine Learning

Understanding linear programming can help solve complex problems using Python. It's useful in various fields and can optimize outcomes.
MLOps is closely related to data engineering, showing that managing data for machine learning involves more engineering than initially thought.
The new pandas 2.0 version has exciting features like the Apache Arrow backend, which will enhance its performance and capabilities.

Data Engineering Vs Machine Learning Pipelines

SeattleDataGuy’s Newsletter • 1048 implied HN points • 11 Apr 23

🕹 Technology Machine Learning

Data engineering and machine learning pipelines are essential components for every company, but are often confused because they have different objectives.
Data engineering pipelines involve data collection, cleaning, integration, and storage, while machine learning pipelines focus on data cleaning, feature engineering, model training, evaluation, registry, deployment, and monitoring.
Both data and ML pipelines require careful consideration of computational needs to handle sudden changes, and understanding the differences between them is important for effective data processing and decision-making.

Claude's agentic future and the current state of the frontier models

Democratizing Automation • 277 implied HN points • 23 Oct 24

🕹 Technology Machine Learning

Anthropic has released Claude 3.5, which many people find better for complex tasks like coding compared to ChatGPT. However, they still lag in revenue from chatbot subscriptions.
Google's Gemini Flash model is praised for being small, cheap, and effective for automation tasks. It often outshines its competitors, offering fast responses and efficiency.
OpenAI is seen as having strong reasoning capabilities but struggles with user experience. Their o1 model is quite different and needs better deployment strategies.

Math Discovery, Long-Context Memory, and the Limits of Multimodal Reasoning

HackerPulse Dispatch • 13 implied HN points • 19 Dec 25

🕹 Technology Machine Learning

AlphaEvolve demonstrates AI agents can autonomously discover and improve mathematical constructions, generalize finite solutions into universal formulas, and integrate with proof assistants for verification.
MMGR shows that image and video models produce convincing visuals but largely fail at causal and abstract reasoning (often <10% accuracy), revealing a major gap between perceptual quality and true world understanding.
Advances in model design and decoding are pushing capabilities: QwenLong-L1.5 enables reasoning over 4M-token contexts using synthetic multi-hop data, stabilized RL, and memory-augmented architectures, and ReFusion speeds text generation by decoding in parallel with a plan-and-infill diffusion approach.

Large Impact: The Rise of Small Language Models

Cobus Greyling on LLMs, NLU, NLP, chatbots & voicebots • 59 implied HN points • 07 Mar 24

🕹 Technology Machine Learning

Small Language Models (SLMs) are becoming popular because they are easier to access and can run offline. This makes them appealing to more users and businesses.
While Large Language Models (LLMs) are powerful, they can give wrong answers or lack up-to-date information. SLMs can solve many problems without these issues.
Using Retrieval-Augmented Generation (RAG) with SLMs can help them answer questions better by providing the right context without needing extensive knowledge.

Compound AI is AGI

Generating Conversation • 233 implied HN points • 13 Dec 24

🕹 Technology Machine Learning

The debate about whether we've achieved AGI (Artificial General Intelligence) is ongoing. Many people don't agree on what AGI really means, making it hard to know if we've reached it.
The argument is that current AI models can work together to perform tasks at a human-like level. This teamwork, or 'compound AI,' could be seen as a form of general intelligence, even if it's not from a single AI model.
Not all forms of intelligence are the same, and AI systems can do things that humans can’t, but that doesn't mean they can't be considered intelligent. The future potential of AI isn't just about mimicking human intellect; it may also involve different types of skills and knowledge.

5 questions to categorize machine learning interpretability approaches

Mindful Modeler • 419 implied HN points • 13 Sep 22

🕹 Technology Machine Learning

Machine learning interpretability approaches can be categorized using 5 key questions, such as whether they are point-wise or global interpretations.
Interpretability methods can be either interpretable by design or require post-hoc interpretation, with implications for ease of understanding the model.
Some explanation methods generate interpretable models, while others do not, emphasizing the importance of understanding the nature of the explanation outcome.

LLM links, 11/17

In My Tribe • 243 implied HN points • 18 Nov 24

🕹 Technology Machine Learning

AI agents are most helpful when they can repeat simple tasks many times, rather than doing complex, one-time jobs. It’s better to have them automate quick tasks consistently.
Chatbots face serious challenges, especially when discussing sensitive topics like suicide. They should guide users to seek help but also create a safe conversation environment.
There’s concern that new AI models may not improve in accuracy and could actually make mistakes more often. This suggests that AI will always struggle to tell the truth from lies.

DeepSeek V3 and R1

From the New World • 188 implied HN points • 28 Jan 25

🕹 Technology Machine Learning

DeepSeek has released a new AI model called R1, which can answer tough scientific questions. This model has quickly gained attention, competing with major players like OpenAI and Google.
There's ongoing debate about the authenticity of DeepSeek's claimed training costs and performance. Many believe that its reported costs and results might not be completely accurate.
DeepSeek has implemented several innovations to enhance its AI models. These optimizations have helped them improve performance while dealing with hardware limits and developing new training techniques.

Data Science Weekly - Issue 485

Data Science Weekly Newsletter • 319 implied HN points • 09 Mar 23

🕹 Technology Machine Learning

The newsletter shares interesting links about data science, machine learning, and AI each week. It’s a good way to keep up with new trends and knowledge in the field.
There's a discussion on what databases should do but often don’t. Understanding these gaps can help you improve your data projects by knowing what to build yourself.
AI's impact on jobs and industries is being researched, especially how language models like ChatGPT could change certain occupations. It's important to understand how AI can affect your career choices.

What Does Hitting Scaling Law Limit Mean for US-China AI Competition

Interconnected • 246 implied HN points • 18 Nov 24

🕹 Technology Machine Learning

The scaling law for AI models might be losing effectiveness, meaning that simply using more data and compute power may not lead to significant improvements like it did before.
US export controls on AI technology may become less impactful over time, as diminishing returns on AI model scaling could lessen the advantages of having the most advanced hardware.
If AI development slows down, the urgency for a potential 'AI doomsday' scenario may decrease, allowing for a more balanced competition between the US and China in AI advancements.

Data Science Weekly - Issue 500

Data Science Weekly Newsletter • 219 implied HN points • 23 Jun 23

🕹 Technology Machine Learning

AI technology is advancing quickly and can even cover public meetings, but we need to think carefully about its readiness for everyday use.
Engineers can improve their people skills and interactions by applying the same problem-solving mindset they use in their technical work.
Generative AI is becoming important in data science for creating synthetic data, which helps in privacy and enhances analysis without losing useful information.

DR-RAG: Applying Dynamic Document Relevance To Question-Answering RAG

Cobus Greyling on LLMs, NLU, NLP, chatbots & voicebots • 19 implied HN points • 14 Jun 24

🕹 Technology Machine Learning

DR-RAG improves how we find information for question-answering by focusing on both highly relevant and less obvious documents. This helps to ensure we get accurate answers.
The process uses a two-step method: first, it retrieves the most relevant documents, then it connects those with other documents that might not be directly related, but still helps in forming the answer.
This method shows that we often need to look at many documents together to answer complex questions, instead of relying on just one document for all the needed information.

The Tech Buffet #16: Quickly Evaluate your RAG Without Manually Labeling Test Data

The Tech Buffet • 99 implied HN points • 18 Dec 23

🕹 Technology Machine Learning

You can automate the testing of Retrieval Augment Generation (RAG) systems without needing to label data yourself. This makes it faster and easier to evaluate their performance.
Generating synthetic datasets with questions and answers allows you to test how well your RAG performs. This method helps you understand the effectiveness of your application and provides useful insights.
Using various metrics is key to evaluating your RAG accurately. This way, you assess different aspects of performance, ensuring you get a well-rounded view of how your system is doing.

Data Science Weekly - Issue 499

Data Science Weekly Newsletter • 219 implied HN points • 16 Jun 23

🕹 Technology Machine Learning

Using large language models can help kids learn to ask curious questions by automating the teaching process.
New techniques for 3D space reconstruction can make indoor views on platforms like Google Maps look more realistic and interactive.
There's a growing need to understand the value of personal data in online shopping, especially as new regulations come into play.