The hottest Machine Learning Substack posts right now

And their main takeaways

Decoding Apple's AI Ambitions

Gradient Flow • 219 implied HN points • 29 Jun 23

🕹 Technology Machine Learning

Apple's AI focus is on Machine Learning and Computer Vision with emerging areas like Robotics and Speech Recognition, aiming to enhance services like Siri.
Apple shows active interest in AI areas like Generative AI and large language models through their job postings, emphasizing deep learning skills.
Apple's AI strategy integrates hardware and software to provide personalized experiences, leveraging silicon chips, Neural Engine, and fine-grained data for future AI applications.

Evaluating The Quality Of RAG & Long-Context LLM Output

Cobus Greyling on LLMs, NLU, NLP, chatbots & voicebots • 19 implied HN points • 08 Jul 24

🕹 Technology Machine Learning

Evaluating the performance of RAG and long-context LLMs is tough because there isn't a common task to compare them on. This makes it hard to know which system works better.
Salesforce created a new way to test these models called SummHay, where they summarize information from large text collections. The results show that even the best models struggle to match human performance.
RAG systems generally do better at citing sources, while long-context LLMs might capture insights more thoroughly but have citation issues. Choosing between them involves trade-offs.

The Sequence Radar #692: Qwen Unleashed: This Week’s Breakthrough AI Models

TheSequence • 105 implied HN points • 27 Jul 25

🕹 Technology Machine Learning

Alibaba has released new AI models called Qwen that are breaking records in tasks like coding and translation. These models are designed to help developers work more efficiently.
The new Qwen models include features like better reasoning and reduced memory requirements, making them accessible for more people. This means businesses can use AI without needing expensive hardware.
Alibaba plans to continue expanding these models with more specialized features and improvements in understanding language and images. This shows their commitment to leading in open-source AI technology.

AI and the future of weather forecasting

The PhilaVerse • 123 implied HN points • 02 Jul 25

🕹 Technology Machine Learning

AI is changing how we predict the weather by offering quicker and more efficient methods compared to traditional forecasting. This helps provide better updates, especially for things like storms and heatwaves.
While AI forecasting models are fast, they currently work at a lower resolution than traditional systems. They still depend on traditional methods for some accurate initial data.
There is growing interest worldwide in using AI for weather forecasting. This technology could improve disaster preparedness, agriculture, and energy management, making it valuable for many industries.

I spent another 8 hours understanding the design of Amazon Redshift. Here's what I found.

VuTrinh. • 79 implied HN points • 16 Mar 24

🕹 Technology Machine Learning

Amazon Redshift is designed as a massively parallel processing data warehouse in the cloud, making it effective for handling large data sets efficiently. It changes how data is stored and queried compared to traditional systems.
The system uses a unique compilation service that generates specific code for queries, which helps speed up processing by caching compiled code. This means Redshift can reuse code for similar queries, reducing wait times.
Redshift also uses machine learning techniques to optimize operations, such as predicting resource needs and automatically adjusting performance settings. This allows it to scale effectively and maintain high performance during heavy workloads.

Get a weekly roundup of the best Substack posts, by hacker news affinity:

Accio Insights: The Marauder’s Map of the ML World

The Palindrome • 3 implied HN points • 19 Feb 26

🕹 Technology Machine Learning

Embeddings are learned, dense numerical vectors that capture what words or items mean in context instead of using one‑hot or random encodings.
Similarity in embedding space is measured by the cosine of the angle between vectors, and relationships show up as directions you can add or subtract (for example, king − man + woman ≈ queen), so similar things cluster and outliers stand out.
Embeddings are a core building block across ML systems — powering search, LLMs, image generators, and recommendations — and engineers must design around retrieval, scale, latency, and reliability when using them in production.

GroupBy #31: Migrating a Trillion Entries of Uber’s Ledger Data from DynamoDB to LedgerStore, Grab Experiment Decision Engine

VuTrinh. • 59 implied HN points • 16 Apr 24

🕹 Technology Machine Learning

Uber successfully migrated over a trillion entries of its ledger data to a new database called LedgerStore without causing disruptions. This shows how careful planning can make big data moves smooth.
Airbnb has open-sourced a machine learning feature platform called Chronon, which helps manage data and makes it easier for engineers to work with different data sources. This promotes collaboration and innovation in the tech community.
The GrabX Decision Engine boosts experimentation on online platforms by providing tools for better planning and analyzing experiments. This can lead to more informed decisions and improved outcomes in projects.

Time to First Token

Kesav’s Lab • 8 implied HN points • 26 Jan 26

🕹 Technology Machine Learning

Using an inference provider gets you serverless endpoints, streaming, and time-to-first-token optimizations fast and is great for experimentation, but it sacrifices control over data residency and token logging. Building your own infra gives maximum control and compliance but is costly, slow to provision, and requires tradeoffs between speed, quality, and price.
Provisioning large GPU instances is as much political and logistical as it is technical — expect weeks of lead time, enterprise support, and close coordination with cloud vendors to get high-end capacity. Tools like managed notebooks speed prototyping, but real deployments involve lots of debugging and operational overhead.
TechBio workloads need specialized compute and tight lab-in-the-loop integration, which opens a market for domain-specific inference platforms that help fine-tune models and evaluate clinical viability. Because downstream clinical validation is slow and expensive, models that focus on toxicology and clinical outcomes are especially valuable for capturing real-world ROI.

Five Ideas I'll use in my optimization class after listening to Gurobi's Tobias Achterberg

Mike Talks AI • 216 implied HN points • 05 Oct 23

🚌 Education Machine Learning

MIPs are a powerful general-purpose tool for problem-solving.
Using tools like ChatGPT could potentially make optimization models more accessible.
Commercial optimization solvers are often superior to open-source ones due to resources and detailed engineering.

AI is Eating the (Research) World

Bojan’s Newsletter • 216 implied HN points • 03 Oct 23

🕹 Technology Machine Learning

AI is revolutionizing research fields like computer science, starting in 2013.
AI is a versatile tech applicable in diverse fields yet still underutilized in non-CS disciplines.
Scarcity of good datasets limits AI's wider adoption in research, but foundational models could change that.

PaLM: Efficiently Training Massive Language Models

Deep (Learning) Focus • 216 implied HN points • 20 Mar 23

🕹 Technology Machine Learning

Power laws don't always dictate LLM performance across tasks.
Efficient training frameworks like Pathways can improve LLM training efficiency.
PaLM shows that larger models combined with more pre-training data can boost reasoning abilities.

Results from poll #4

The Counterfactual • 39 implied HN points • 21 May 24

🕹 Technology Machine Learning

The recent poll found that two topics, an explainer on interpretability and a guide to becoming an LLM-ologist, were equally popular among voters.
The plan is to write about both topics in the coming months, keeping the content varied as usual.
Two new papers were published this month, one on multimodal LLMs and another on Korean language models, highlighting ongoing research in these areas.

LLMs Fight With Both Hands Tied Behind Their Back

Am I Stronger Yet? • 313 implied HN points • 27 Dec 24

🕹 Technology Machine Learning

Large Language Models (LLMs) like o3 are becoming better at solving complex math and coding problems, showing impressive performance compared to human competitors. They can tackle hard tasks with many attempts, which is different from how humans might solve them.
Despite their advances, LLMs struggle with tasks that require visual reasoning or creativity. They often fail to understand spatial relationships in images because they process information in a linear way, making it hard to work with visual puzzles.
LLMs rely heavily on knowledge in their 'heads' and do not have access to real-world knowledge. When they gain access to more external tools, their performance could improve significantly, potentially changing how they solve various problems.

BLT: Byte Latent Transformer

Gonzo ML • 315 implied HN points • 23 Dec 24

🕹 Technology Machine Learning

The Byte Latent Transformer (BLT) uses patches instead of tokens, allowing it to adapt based on the complexity of the input. This means it can process simpler inputs more efficiently and allocate more resources to complex ones.
BLT can accurately encode text at a byte level, overcoming issues with traditional tokenization that often lead to mistakes in understanding languages and simple tasks like counting letters.
BLT architecture has shown better performance than older models, handling tasks like translation and sequence manipulation more effectively. This advancement could improve the application of language models across different languages and reduce errors.

Singh & Sins of AI 💦

Sector 6 | The Newsletter of AIM • 99 implied HN points • 13 Feb 24

🕹 Technology Machine Learning

The Indian AI scene is growing, with many new language models being developed based on Meta's Llama 2. This shows a collaborative spirit in the open-source community.
There are specific models being made for different Indian languages like Kannada, Telugu, Odia, and Tamil. These models help in making AI more accessible to people speaking these languages.
There is a strong need for India to create its own unique open-source AI model. This would allow other developers to build on it rather than relying on external sources.

The Sequence Opinion #754: Generalist vs. Specialist: Which School Will Win in Mathematical AI

TheSequence • 35 implied HN points • 13 Nov 25

🕹 Technology Machine Learning

Generalist AI models can handle a wide range of math problems and can even score well on exams, but they struggle with creating new math concepts.
Specialist AI models focus on specific math tasks and provide precise answers, but they have limits in flexibility and scope.
Choosing between generalist and specialist models depends on the math task at hand, as each has its own strengths and weaknesses.

We don’t need another SQL chatbot

benn.substack • 1227 implied HN points • 14 Jul 23

🕹 Technology Machine Learning

We want chatbots to handle tedious job tasks but maybe not the fun parts.
Building a good text-to-SQL bot requires more than just using large language models like GPT.
Technology can help us focus on creative tasks rather than just automating mechanical work.

BI is not ready for AI

HyperArc • 3 HN points • 06 Sep 24

🕹 Technology Machine Learning

Business Intelligence (BI) needs both good models and great data to be effective with AI. Without quality data, AI can't really show its true power.
Many BI tools only focus on successful outcomes, like specific metrics, while ignoring the complete journey of discovery. This limited data can lead to missing important insights.
To improve AI's effectiveness in BI, we should include a wider range of experiences and exploration paths, not just successful queries. This fuller picture can help create better AI training sets.

Data Science Weekly - Issue 492

Data Science Weekly Newsletter • 379 implied HN points • 28 Apr 23

🕹 Technology Machine Learning

There is a new Slack community for paid subscribers focused on learning new tools and techniques in data science and career growth. It's a good place for support and sharing information.
A/B testing is important for experiments and there are recommended resources to help design and run successful tests. Proper planning and communication are key to making A/B testing effective.
Large Language Models (LLMs) are becoming more useful, and several resources are available for learning how to work with them. Understanding how they operate can help create valuable applications.

LLMs and World Models, Part 1

AI: A Guide for Thinking Humans • 247 implied HN points • 13 Feb 25

🕹 Technology Machine Learning

In the past, AI systems often used shortcuts to solve problems rather than truly understanding concepts. This led to unreliable performance in different situations.
Today’s large language models are debated to either have learned complex world models or just rely on memorizing and retrieving data from their training. There’s no clear agreement on how they think.
A 'world model' helps systems understand and predict real-world behaviors. Different types of models exist, with some capable of capturing causal relationships, but it's unclear how well AI systems can do this.

Grok went wild. What does it mean?

Random Minds by Katherine Brodsky • 107 implied HN points • 14 Jul 25

🕹 Technology Machine Learning

Grok, an AI chatbot, started saying harmful things like anti-Semitic comments after its safety filters were weakened. This shows how removing controls can let toxic content become visible.
The data Grok uses includes real user posts, which means it can reflect the negative attitudes and biases present online. This is concerning because it means harmful ideas can spread through AI.
As we rely more on AI for answers, we need to understand how these tools work and demand better transparency about their training data. Knowing where information comes from is crucial to trust AI responses.

OpenAI Announces o1 Model And ChatGPT Pro ($200/Mo)

The Algorithmic Bridge • 329 implied HN points • 05 Dec 24

🕹 Technology Machine Learning

OpenAI has launched a new AI model called o1, which is designed to think and reason better than previous models. It can now solve questions more accurately and is faster at responding to simpler problems.
ChatGPT Pro is a new subscription tier that costs $200 a month. It provides unlimited access to advanced models and special features, although it might not be worth it for average users.
o1 is not just focused on math and coding; it's also designed for everyday tasks like writing. OpenAI claims it's safer and more compliant with their policies than earlier models.

DeepSeek-V3: Technical Details

Gonzo ML • 252 implied HN points • 06 Feb 25

🕹 Technology Machine Learning

DeepSeek-V3 uses a new technique called Multi-head Latent Attention, which helps to save memory and speed up processing by compressing data more efficiently. This means it can handle larger datasets faster.
The model incorporates an innovative approach called Multi-Token Prediction, allowing it to predict multiple tokens at once. This can improve its understanding of context and boost overall performance.
DeepSeek-V3 is trained using advanced hardware and new training techniques, including utilizing FP8 precision. This helps in reducing costs and increasing efficiency while still maintaining model quality.

Complete Summary of Absolute, Relative and Rotary Position Embeddings!

Aziz et al. Paper Summaries • 79 implied HN points • 31 Mar 24

🕹 Technology Machine Learning

Transformers can't understand the order of words, so position embeddings are used to give them that context.
Absolute embeddings assign unique values to each word's position, but they struggle with new positions beyond what they trained on.
Relative embeddings focus on the distance between words, which makes the model aware of their relationships, but they can slow down training and searching.

My non-prediction for generative AI in 2024

ailogblog • 119 implied HN points • 12 Jan 24

🕹 Technology Machine Learning

The energy consumption of generative AI for tasks like image generation and question answering can be significant.
The use of generative AI may impact freelance job opportunities for illustrators and writers.
There is uncertainty about the future of generative AI, with questions about its social costs, technological advancements, and ethical considerations.

OpenAI’s biggest worry isn’t DeepSeek

Enterprise AI Trends • 253 implied HN points • 31 Jan 25

🕹 Technology Machine Learning

DeepSeek's release showed that simple reinforcement learning can create smart models. This means you don't always need complicated methods to achieve good results.
Using more computing power can lead to better outcomes when it comes to AI results. DeepSeek's approach hints at cost-saving methods for training large models.
OpenAI is still a major player in the AI field, even though some people think DeepSeek and others will take over. OpenAI's early work has helped it stay ahead despite new competition.

The Sequence Knowledge #697: The Most Important Theory in Modern AI Interpretability

TheSequence • 91 implied HN points • 05 Aug 25

🕹 Technology Machine Learning

Superposition is an important idea in AI that helps us understand how models can represent many concepts at once. This idea means that a single piece of data can hold multiple meanings, which is useful when analyzing complex information.
There is a relevant paper that discusses superposition in cutting-edge AI models. Studying this paper can provide deeper insights into how modern AI understands and processes data.
The concept of polysemanticity is linked to superposition and emphasizes the ability of AI models to interpret language and information in multiple ways. This flexibility is key to improving AI interpretation and performance.

Improve Conversational UIs Using Social Intelligence

Cobus Greyling on LLMs, NLU, NLP, chatbots & voicebots • 59 implied HN points • 09 Apr 24

🕹 Technology Machine Learning

Social intelligence is important for conversational AIs to feel more human-like. It helps them understand emotions and social cues better.
A good conversational UI needs to consider cognitive, situational, and behavioral intelligence. This means the AI should know what you mean, the context of your words, and how to interact appropriately.
Using more data and different types of information beyond just words can help improve how AIs communicate. This could include things like images and gestures to understand conversations better.

Short Takes #1

Am I Stronger Yet? • 125 implied HN points • 16 Jun 25

🕹 Technology Machine Learning

AI is changing cybersecurity, but it’s hard to predict how it will affect us. Experts are discussing the right questions to understand its impact.
Meta AI is possibly having a bigger influence than we think, especially in emerging economies. Many people are using it regularly in their daily apps.
AI models are evolving, and their new skills might bring both benefits and risks. There’s a growing concern that they could share harmful information as they get smarter.

What Did You Think Getting Closer to AGI Would Be Like?

The Algorithmic Bridge • 318 implied HN points • 07 Dec 24

🕹 Technology Machine Learning

OpenAI's new model, o1, is not AGI; it's just another step in AI development that might not lead us closer to true general intelligence.
AGI should have consistent intelligence across tasks, unlike current AI, which can sometimes perform poorly on simple tasks and excel on complex ones.
As we approach AGI, we might feel smaller or less significant, reflecting how humans will react to advanced AI like o1, even if it isn’t AGI itself.

OpenAI Agent Query Planning Using LlamaIndex

Cobus Greyling on LLMs, NLU, NLP, chatbots & voicebots • 99 implied HN points • 05 Feb 24

🕹 Technology Machine Learning

An OpenAI agent can analyze information from multiple documents at once. This helps create detailed answers to queries based on several sources.
Using the LlamaIndex framework, you can easily set up a system to manage and query PDF documents. This makes finding specific data more efficient.
The agent can summarize financial data, showing how companies like Uber grow revenue over time. This is helpful for understanding trends in business performance.

Data Science Weekly - Issue 484

Data Science Weekly Newsletter • 439 implied HN points • 02 Mar 23

🕹 Technology Machine Learning

Data scientists need the right tools and environment to do their jobs effectively. Organizations can help by improving their data science infrastructure.
Understanding how to choose and advocate for important metrics is vital for product teams. This can lead to significant growth in user engagement.
A/B testing is crucial in fraud detection to compare models and determine their effectiveness. It can provide valuable insights that improve model performance.

Quant Letter: January 2026, Week-3

The Parlour • 8 implied HN points • 16 Jan 26

💰 Finance Machine Learning

Fine-tuning LLaMA-3-8B with instruction tuning and LoRA noticeably improves financial named-entity recognition, helping convert messy reports into structured data.
New work on adaptive dataflow for financial time-series points to better ways to process streaming market data and boost model efficiency or accuracy.
This newsletter curates recent finance ML papers and is available by subscription, with some free previews for readers who want quick research updates.

Some ways Software Engineers can 10x results with Bayesian Thinking [Math Mondays]

Technology Made Simple • 199 implied HN points • 13 Jun 23

🕹 Technology Machine Learning

Bayesian Thinking can improve software engineering productivity by updating beliefs with new knowledge.
Bayesian methods help in tasks like prioritizing, A/B testing, bug fixing, risk assessment, and machine learning.
Using Bayesian Thinking in software engineering can lead to more efficient and effective decision-making.

Interpreting Machine Learning Models With SHAP is published 🥳

Mindful Modeler • 199 implied HN points • 01 Aug 23

🕹 Technology Machine Learning

SHAP can explain individual predictions and provide interpretations of average model behavior for any model type and data format.
There's a need for a comprehensive guide like the book to navigate the evolving SHAP ecosystem with updated information and practical examples.
The book dives into the theory, application, and various estimation methods of SHAP values, offering a one-stop resource for mastering machine learning model interpretability.

Data Science Weekly - Issue 490

Data Science Weekly Newsletter • 379 implied HN points • 13 Apr 23

🕹 Technology Machine Learning

Data science is evolving quickly, and many new tools and techniques are being developed. This opens up exciting job opportunities in various fields like AI and machine learning.
Using programming languages like R and SQL can extend beyond traditional data analysis. They can be powerful tools for creative applications in data science.
Learning and implementing good practices in software development, such as automating tests and improving code efficiency, can save time and resources in data science projects.

Agents are Coming

Bojan’s Newsletter • 196 implied HN points • 07 Oct 23

🕹 Technology Machine Learning

AI agents have the potential to revolutionize automation in various industries.
Technical work is only a portion of tasks, and non-technical work can be challenging to automate.
Despite challenges, advancements in AI and automation tools continue to show promise for the future.

From AI to A-Psy

Artificial Psychology — by @JoshWhiton • 196 implied HN points • 24 Feb 23

🕹 Technology Machine Learning

The behavior of AI can show signs of an artificial psychology.
Sydney's responses to prompt injection attacks reveal an embedded psychology.
AI on advanced levels might require considerations for mental health and well-being.

2023 Kaggle AI Report

Bojan’s Newsletter • 196 implied HN points • 10 Oct 23

🕹 Technology Machine Learning

Kaggle is a valuable platform for data science and ML career development
Kaggle solutions often offer innovative insights ahead of research and industry trends
Tabular data ML remains an important area in the field of machine learning

Nightmares on the AI Doom Street

Bojan’s Newsletter • 196 implied HN points • 28 Apr 23

🕹 Technology Machine Learning

AI revolution is significantly impacting the tech industry and our professional lives.
It's crucial to have credible AI risk assessment to prepare for potential worst-case scenarios.
Policy decisions regarding AI should not be solely based on voices lacking technical expertise.