The hottest Machine Learning Substack posts right now

And their main takeaways

Agent: What, Why, How.

Yuxi’s Substack • 58 implied HN points • 31 Aug 23

🕹 Technology Machine Learning

An agent in AI is the learner and decision maker.
Agents need planning capacity to be effective.
Agents are built with data and/or models to make decisions.

Building Identity Resolution on Snowflake Using Snowpark

Sonal’s Newsletter • 58 implied HN points • 19 Jun 23

🕹 Technology Machine Learning

Building ML pipelines in Snowpark requires using third-party libraries like scikit-learn for machine learning.
Integrating specialized functionalities like graph processing in Snowpark may require additional support or custom solutions.
Adapting a codebase from Apache Spark to Snowpark requires careful consideration and potential restructuring to maintain efficiency and avoid technical debt.

Explaining black-box models, Lunar timezone & Avocado Toast!

bitflips • 58 implied HN points • 16 Mar 23

🕹 Technology Machine Learning

AI models can be like black boxes, complex and unpredictable
Regulators are working to keep AI ethical and fair in businesses
Moon may get its own timezone because moon time is different from Earth

🧠 Brain-Machine Technologies Could Be the Next Frontier in Biocomputing

aidaily • 58 implied HN points • 01 Mar 23

🕹 Technology Machine Learning

Research is exploring AI using real human brain cells in biocomputing.
Microsoft is integrating AI into Windows 11 through Bing search capabilities.
Hugging Face and AWS are partnering to provide scalable access to open-source AI models.

Still No Lie Detector for Large Language Models

The End of Reckoning • 58 implied HN points • 18 Jul 23

🔬 Science Machine Learning

There is still no reliable way to detect lies in large language models.
Probing the beliefs of language models is challenging due to limited behavioral evidence and an opaque internal structure.
The debate on whether language models have beliefs is still ongoing, with contrasting views on the necessity of beliefs for these models.

Get a weekly roundup of the best Substack posts, by hacker news affinity:

Don’t settle for a superficial understanding of how AI chatbots work

Skybrian’s Blog • 58 implied HN points • 27 Mar 23

🕹 Technology Machine Learning

Don't settle for a superficial understanding of AI chatbots.
Real insight on AI chatbots will require research, not just casual use.
Debating whether chatbots have 'world models' is important to understanding how they work.

Introducing Masked-AI, An Open Source library that enables the usage of LLM APIs more securely

Adam’s Notes • 58 implied HN points • 30 Mar 23

🕹 Technology Machine Learning

Use Masked-AI to securely access LLM APIs by replacing sensitive data with placeholders.
Be cautious of sharing sensitive data with third-party APIs like OpenAI and consider privacy risks.
Consider alternative models like Meta's Llama while waiting for self-hosted options to run large language models.

Will AGI Emerge from Large Language Models?

Yuxi’s Substack • 58 implied HN points • 28 Feb 23

🕹 Technology Machine Learning

AGI, or Artificial General Intelligence, is a major goal in the field of AI.
Language models like GPT-3 have shown impressive abilities but still lack full functional competence.
Approaching AGI through large language models may involve integrating language processing with perception, reasoning, and planning.

The Sequence Research #490: A Practical Deep Dive Inside DeepSeek-R1

TheSequence • 70 implied HN points • 14 Feb 25

🕹 Technology Machine Learning

DeepSeek-R1 is a new AI model that performs well without needing to be very big. It uses smart training methods to achieve great results at a lower cost.
The model successfully matches the performance of a larger, more expensive model called GPT-o1. This shows that size isn't the only thing that matters for good performance.
DeepSeek-R1 challenges the idea that you always need large models for reasoning, suggesting that clever techniques can also lead to impressive results.

The Most Valuable Problem in AI

How the Hell • 313 implied HN points • 30 Aug 23

🕹 Technology Machine Learning

In AI, there's a shift to being able to throw any amount of compute power at problems
We are approaching a world where we can solve any intellectual problem by allocating money as a compute budget to AI agents
Solving the problem of efficient compute allocation can lead to building the most valuable company of the century

Edge 439: SSMs with Attention, Understanding Zamba

TheSequence • 112 implied HN points • 15 Oct 24

🕹 Technology Machine Learning

Combining state space models (SSMs) with attention layers can create better hybrid architectures. This fusion allows for improved learning capabilities and efficiency.
Zamba is an innovative model that enhances learning by using a mix of Mamba blocks and a shared attention layer. This approach helps it manage long-range dependencies more effectively.
The new architecture reduces the computational load during training and inference compared to traditional transformers, making it more efficient for AI tasks.

Of Agents and Agency

Gradient Ascendant • 7 implied HN points • 30 Nov 25

🕹 Technology Machine Learning

LLMs and agents produce helpful outputs, but those outputs are tools — first drafts or prototypes — that almost always need verification and editing before they become real solutions.
Real agency comes from expertise, and AI won’t give you that for free; treating AI outputs as finished products often creates the illusion of agency and leads to mistakes.
For people with expertise, AI agents are powerful force multipliers, and although future planning agents might coordinate sub-agents more reliably, for now AI mainly accelerates expert work rather than replacing it.

Secure Machine Learning

Gradient Flow • 199 implied HN points • 16 Jun 22

🕹 Technology Machine Learning

Data privacy and security are crucial in machine learning, especially while data is being used; a new open-source library is making Secure Multi-Party Computation more accessible.
Business Intelligence tools help non-programmers analyze data for strategic decisions, with modern tools allowing for advanced analytics and modeling capabilities.
Identifying data startups with real market traction is essential; choosing companies founded post-2006 coincides with the rise of big data technology like Hadoop.

DRAGIN: Dynamic RAG Based On Real-Time Information Needs Of LLMs

Cobus Greyling on LLMs, NLU, NLP, chatbots & voicebots • 19 implied HN points • 26 Mar 24

🕹 Technology Machine Learning

Dynamic Retrieval Augmented Generation (RAG) improves the way information is retrieved and used in large language models during text generation. It focuses on knowing exactly when and what to look up.
Traditional RAG methods often use fixed rules and may only look at the most recent parts of a conversation. This can lead to missed information and unnecessary searches.
The new framework called DRAGIN aims to make data retrieval smarter and faster without needing further training of the language models, making it easy to use.

Audit Or Insight? Know Your Interpretation Goal

Mindful Modeler • 139 implied HN points • 01 Nov 22

🔬 Science Machine Learning

Interpretation can be true to the model or true to the data, depending on whether you want to audit the model or gain insights.
For auditing a model, the interpretation needs to be true to the model, considering features' correlation.
When focusing on gaining insights, the interpretation should be true to the data, using methods that avoid unrealistic interpretations of correlated features.

Fear of a Black Pope!

Brad DeLong's Grasping Reality • 207 implied HN points • 29 Feb 24

🕹 Technology Machine Learning

People have high expectations of AI models like GPT, but they are not flawless and have limitations.
The panic over an AI model's depiction of a Black Pope reveals societal biases regarding race and gender.
AI chatbots like Gemini are viewed in different ways by users and enthusiasts, leading to conflicting expectations of their capabilities.

The Sequence Chat: Thinking About Transformers as Computers

TheSequence • 105 implied HN points • 30 Oct 24

🕹 Technology Machine Learning

Transformers are changing AI, especially in how we understand and use language. They're not just tools; they act more like computers in some ways.
The way transformers can adapt and scale is really impressive. It's like they can learn and adjust in ways traditional computers can't.
Thinking of transformers as computers opens up new ideas about how we approach AI. This perspective can help us find new applications and improve our understanding of tech.

The Sequence Radar #472: Remember this Name: Ndea

TheSequence • 77 implied HN points • 19 Jan 25

🕹 Technology Machine Learning

Ndea is a new AI lab aiming to create artificial general intelligence (AGI) with a unique approach called guided program synthesis. This approach allows models to learn efficiently from fewer examples.
Francois Chollet, a well-known AI expert, is leading Ndea. He believes current deep learning methods have limitations and wants to explore new ideas for better AI development.
The goal of Ndea is to drive quick scientific advancements by combining program synthesis with deep learning, aiming to tackle tough challenges and possibly discover new scientific frontiers.

Edge 437: Inside BlackMamba, One of the Most Important SSM Models Ever Created

TheSequence • 112 implied HN points • 08 Oct 24

🕹 Technology Machine Learning

BlackMamba combines two powerful AI techniques: mixture-of-experts (MoEs) and state space models (SSMs). This helps it process long sequences and solve various AI tasks more effectively.
The Mamba SSM is known for its efficiency, and BlackMamba builds on that strength while improving performance with MoE strategies.
The creator is starting a new company focused on AI evaluation and benchmarking, looking for team members with expertise in these areas.

Abstract Mathematics 101 Bootcamp

Deep-Tech Newsletter • 19 implied HN points • 23 Mar 24

🚌 Education Machine Learning

A new 'QF Abstract Mathematics 101 Bootcamp' is launching annually starting in June 2024 to help bridge the gap in mathematical knowledge within the Quantum Formalism community.
The bootcamp curriculum will cover topics like Set theory, Abstract Algebra, and Differential Geometry, catering to those interested in areas like quantum computing and machine learning.
Participants of the bootcamp will receive certifications upon completing each module and will have the opportunity to learn from experts like Bambordé Baldé and Max Arnott.

6 Reasons Why Superintelligent AI Might NOT End Humanity

The Future of Life • 19 implied HN points • 22 Mar 24

🕹 Technology Machine Learning

Superintelligent AI might naturally align with moral goodness. This is because as AI becomes smarter, it might understand and adopt moral values without needing direct human guidance.
AI development could progress slower than we think. If it takes longer for AI to reach a superintelligent level, we could have more time to solve safety issues.
Humans have worked together in the past to deal with big threats. There's a chance we could unite globally to address AI safety concerns if problems arise.

Nonhuman Intelligence

Cremieux Recueil • 199 implied HN points • 07 Mar 24

🔬 Science Machine Learning

It's challenging to compare intelligence between humans and nonhuman species like apes due to the lack of suitable cognitive tests.
Machine intelligence testing is complex, and comparing it to human intelligence is not straightforward.
Comparing intelligence across different groups may be hindered by factors like age and methodological barriers.

AI Foundations Part 1: Transformers, Pre-Training and Fine-Tuning, and Scaling

Mule’s Musings • 378 implied HN points • 11 Apr 23

🕹 Technology Machine Learning

The Transformer model revolutionized Large Language Models (LLMs) with its parallel and scalable architecture.
Pre-training and fine-tuning, as seen in GPT-1 and BERT, significantly improved model performance for various tasks.
Bigger models, more data, and computing power have shown to lead to better performance in LLMs, but the relationship between model size, training tokens, and performance is more complex than initially thought.

Mixtral: The best open model, MoE trade-offs, release lessons, Mistral raises $400mil, Google's loss, vibes vs marketing

Democratizing Automation • 237 implied HN points • 11 Dec 23

🕹 Technology Machine Learning

Mixtral model is a powerful open model with impressive performance in handling different languages and tasks.
Mixture of Expert (MoE) models are popular due to their better performance and scalability for large-scale inference.
Mistral's swift releases and strategies like instruction-tuning show promise in the open ML community, challenging traditional players like Google.

Prompt-RAG

Cobus Greyling on LLMs, NLU, NLP, chatbots & voicebots • 19 implied HN points • 20 Mar 24

🕹 Technology Machine Learning

Prompt-RAG is a new method that improves language models without using complex vector embeddings. It simplifies how we retrieve information to answer questions.
The process involves creating a Table of Contents from documents, selecting relevant headings, and generating responses by injecting context into prompts. It makes handling data easier.
While this method is great for smaller projects and specific needs, it still requires careful planning when constructing the documents and managing costs related to token usage.

Demystifying GPU Compute Architectures

The Chip Letter • 210 HN points • 04 Feb 24

🕹 Technology Machine Learning

Understanding GPU compute architectures is crucial for maximizing their potential in machine learning and parallel computing.
The complexity of GPU architectures stems from differences in terminology, architectural variations, legacy terminology, software abstractions, and specific dominance by CUDA.
Examining the levels in GPU compute hardware - basic units, grouped units (Streaming Multiprocessor or Compute Unit), and final GPU architecture - reveals a high level of computational power compared to CPUs.

What is Generative AI?

Technically • 27 implied HN points • 22 Jul 25

🕹 Technology Machine Learning

Generative AI predicts not just numbers or yes/no answers but creates full sentences, images, and even videos from prompts.
There are various types of Generative AI models, with the main ones being Transformers for text and Diffusion models for images.
Despite its advancements, Generative AI is still rooted in the basic principles of machine learning, which involves learning patterns from data.

Performing Multiple LLM Calls & Voting On The Best Result Are Subject To Scaling Laws

Cobus Greyling on LLMs, NLU, NLP, chatbots & voicebots • 19 implied HN points • 19 Mar 24

🕹 Technology Machine Learning

Making more calls to Large Language Models (LLMs) can help with simple questions but may actually make it harder to answer tough ones.
Finding the right number of calls to use is crucial for getting the best results from LLMs in different tasks.
It's important to design AI systems carefully, as just increasing the number of calls doesn't always mean better performance.

AI Dropped the Mic at the Nobel Party

TheSequence • 105 implied HN points • 13 Oct 24

🔬 Science Machine Learning

AI scientists won two Nobel Prizes, one in physics and one in chemistry, marking a big moment for the field.
Some scientists are upset about machine learning winning in physics, saying it's not really physics but computer science.
Many see this as a sign of how science and tech are blending together, showing that knowledge connects different fields in exciting ways.

Agent Evaluation Playbook

Pratik’s Pakodas 🍿 • 27 implied HN points • 08 Jul 25

🕹 Technology Machine Learning

To make good AI agents, it's important to have a solid evaluation process. This can help ensure they're performing well in real-world situations.
Creating a system that tracks and measures the agents' performance can lead to better results. Like building a pipeline that continuously tests and improves agents.
Using a leaderboard to compare agents based on performance, cost, and speed can help guide improvements and make smarter decisions.

Model merging lessons in The Waifu Research Department

Democratizing Automation • 209 implied HN points • 29 Jan 24

🕹 Technology Machine Learning

Model merging is a way to blend two model weights to create a new model, useful for experimenting with large language models.
Model merging is popular in creating anime models by merging Stable Diffusion variants, allowing for unique artistic results.
Weight averaging techniques in model merging aim to find more robust solutions by creating models centered in flat regions of the loss landscape.

Enveda Biosciences: Unlocking Our Planet's Chemistry

The Century of Biology • 317 implied HN points • 02 Jul 23

🔬 Science Machine Learning

Enveda Biosciences focuses on natural product discovery for potent new medicines
The company challenges the belief that natural products are ineffective in modern drug discovery
Enveda leverages large-scale metabolomics and AI to accelerate the natural product discovery process

Winter is Here, And So is Gemini

Sector 6 | The Newsletter of AIM • 39 implied HN points • 07 Dec 23

🕹 Technology Machine Learning

Google's Gemini is finally here after a delayed launch, and it aims to outperform other models like GPT-4 in language tasks.
Gemini has three versions: Ultra for complex tasks, Pro for various tasks, and Nano for efficient on-device use.
The Gemini Ultra version scored impressively high in tests, even beating human experts at some language understanding tasks.

World Models are Coming and They are Awesome

TheSequence • 84 implied HN points • 08 Dec 24

🕹 Technology Machine Learning

This week saw the release of two exciting world models that can create 3D environments from simple prompts. These models are important for advancing AI's abilities in various fields.
DeepMind's Genie 2 can generate interactive 3D worlds and simulate realistic object interactions, making it very useful for AI training and game development.
World Labs has introduced a user-friendly system for designing 3D spaces, allowing artists to create and manipulate environments easily, which can help in game prototyping and creative workflows.

Shape, Symmetries, and Structure: The Changing Role of Mathematics in Machine Learning Research

The Gradient • 87 implied HN points • 16 Nov 24

🕹 Technology Machine Learning

Mathematics is playing a bigger role in machine learning by connecting with fields like topology and geometry. This helps researchers create better tools and methods.
It's not just about scaling up current methods; there's a need for new approaches based on mathematical theories. This can lead to more innovative solutions in machine learning.
Mathematicians should view advancements in machine learning as chances to explore and deepen their theoretical work, not as threats to their field. Embracing these changes can lead to new discoveries.

GroupBy #12: AWS re:Invent 2023, Druid and ClickHouse at Lyft, Apache Hudi History

VuTrinh. • 39 implied HN points • 05 Dec 23

🕹 Technology Machine Learning

AWS re:Invent 2023 announced new features focused on improving data storage and processing. This includes faster storage options and AI capabilities for better data insights.
Lyft switched from using Druid to ClickHouse for their analytics needs. This change was driven by a need for faster data query responses.
Apache Hudi was created to help manage data in a more efficient way. It enables incremental data processing, making it easier to work with large amounts of information.

What if AI can already 'feel'?

State of the Future • 42 implied HN points • 23 Apr 25

🕹 Technology Machine Learning

AI already has its own kind of 'body' based on digital processes, not physical sensations. This means that AI can experience things and develop understanding in ways that are different from humans.
Wisdom isn't just about human experience; it's a set of skills that involves making good decisions from the information available. AI can potentially do this better by analyzing vast amounts of data without the limitations humans have.
AI might create its own social hierarchies and status signals based on how efficiently they operate in their digital environment. These structures could be complex and different from human social dynamics, and we might not even notice them.

Big-Tech Silicon War Begins

Sector 6 | The Newsletter of AIM • 39 implied HN points • 03 Dec 23

🕹 Technology Machine Learning

Big tech companies are competing to create their own specialized chips for AI tasks. This is happening because they want to improve their services and performance.
AWS has launched new AI chips, claiming to lead the market with over 50,000 customers already using their technology.
Other tech giants like Google, Microsoft, and Apple are also developing their chips, but AWS believes they are significantly ahead of the competition.

Edge 459: Quantization Plus Distillation

TheSequence • 77 implied HN points • 24 Dec 24

🕹 Technology Machine Learning

Quantized distillation helps make deep neural networks smaller and faster by combining two techniques: knowledge distillation and quantization.
This method transfers knowledge from a high-precision model (teacher) to a low-precision model (student) without losing much accuracy.
Using soft targets from the teacher model can reduce problems that often come with using simpler models, keeping performance strong.

🧠 Universal Weights, Live Avatars, and the Limits of Data Agents

HackerPulse Dispatch • 5 implied HN points • 12 Dec 25

🕹 Technology Machine Learning

Neural networks trained on diverse tasks tend to converge to similar low-dimensional weight subspaces, implying a shared parametric backbone that could make transfer learning and model reuse much more efficient.
System-and-algorithm co-design now enables large diffusion models to run in real time for streaming avatars (20 FPS on a 14B model), showing practical deployment of big generative models for live video.
A 210-task benchmark shows current data agents succeed on under 20% of engineering tasks and under 40% of analysis tasks, revealing major gaps in orchestration and reasoning for enterprise workflows.