The hottest Machine Learning Substack posts right now

And their main takeaways

The Sequence Opinion #557: Millions of GPUs, Zero Understanding: The Cost of AI Interpretability

TheSequence • 49 implied HN points • 05 Jun 25

🕹 Technology AI Machine Learning Computing Data science Cybersecurity

AI models are becoming super powerful, but we don't fully understand how they work. Their complexity makes it hard to see how they make decisions.
There are new methods being explored to make these AI systems more understandable, including using other AI to explain them. This is a fresh approach to tackle AI interpretability.
The debate continues about whether investing a lot of resources into understanding AI is worth it compared to other safety measures. We need to think carefully about what we risk if we don't understand these machines better.

CONFIRMED: LLMs have indeed reached a point of diminishing returns

Marcus on AI • 13754 implied HN points • 09 Nov 24

🕹 Technology AI Trends Machine Learning Data science Generative AI

LLMs, or large language models, are hitting a point where adding more data and computing power isn't leading to better results. This means companies might not see the improvements they hoped for.
The excitement around generative AI may fade as reality sets in, making it hard for companies like OpenAI to justify their high valuations. This could lead to a financial downturn in the AI industry.
There is a need to explore other AI approaches since relying too heavily on LLMs might be a risky gamble. It might be better to rethink strategies to achieve reliable and trustworthy AI.

Claude 4 and Anthropic's bet on code

Democratizing Automation • 324 implied HN points • 27 May 25

🕹 Technology AI Models Software Engineering Machine Learning Data science Tech industry

Claude 4 is a strong AI model from Anthropic, focused on coding and software tasks. It has a unique personality and improved performance over its predecessors.
The benchmarks for Claude 4 might not look impressive compared to others like ChatGPT and Gemini, which could affect its market position. It's crucial for Anthropic to show real-world utility beyond just numbers.
Anthropic aims to lead in software development, but they fall behind in general benchmarks. This may limit their ability to compete with bigger players like OpenAI and Google in the race for advanced AI.

AI #92: Behind the Curve

Don't Worry About the Vase • 2777 implied HN points • 28 Nov 24

🕹 Technology AI Machine Learning Tech Policy Programming Innovation

AI language models are improving in utility, specifically for tasks like coding, but they still have some limitations such as being slow or clunky.
Public perception of AI-generated poetry shows that people often prefer it over human-created poetry, indicating a shift in how we view creativity and value in writing.
Conferences and role-playing exercises around AI emphasize the complexities and potential outcomes of AI alignment, highlighting that future AI developments bring both hopeful and concerning possibilities.

15 Times to use AI, and 5 Not to

One Useful Thing • 2226 implied HN points • 09 Dec 24

🕹 Technology AI Machine Learning Data science Automation Software Development

AI is great for generating lots of ideas quickly. Instead of getting stuck after a few, you can use AI to come up with many different options.
It's helpful to use AI when you have expertise and can easily spot mistakes. You can rely on it to assist with complex tasks without losing track of quality.
However, be cautious using AI for learning or where accuracy is critical. It may shortcut your learning and sometimes make errors that are hard to notice.

Get a weekly roundup of the best Substack posts, by hacker news affinity:

The latest open artifacts (#10): More permissive licenses, everything as a reasoner, and from artifacts to agents

Democratizing Automation • 277 implied HN points • 29 May 25

🕹 Technology AI Models Open Source Licensing Data science Machine Learning

There is a rise in Chinese AI models that use more open licenses, influencing other models to adopt similar practices. This pressure is especially affecting Western companies like Meta and Google.
Qwen models are becoming more popular for fine-tuning compared to Llama models, with smaller American startups favoring Qwen. These trends show a shift in preferences in the AI community.
The focus in AI is shifting from just model development to creating tools that leverage these models. This means future releases will often be tool-based rather than just about the AI models themselves.

AI #90: The Wall

Don't Worry About the Vase • 3494 implied HN points • 14 Nov 24

🕹 Technology Artificial Intelligence Regulations Machine Learning Data Privacy Ethics

AI is improving quickly, but some methods of deep learning are starting to face limits. Companies are adapting and finding new ways to enhance AI performance.
There's an ongoing debate about how AI impacts various fields like medicine, especially with regulations that could limit its integration. Discussions about ethical considerations and utility are very important.
Advancements in AI, especially in image generation and reasoning, continue to demonstrate its growing capabilities, but we need to be cautious about potential risks and ensure proper regulations are in place.

25 AI Predictions for 2025, from Marcus on AI

Marcus on AI • 8181 implied HN points • 01 Jan 25

🕹 Technology Artificial Intelligence Machine Learning Predictions Safety Regulation

In 2025, we still won't have genius-level AI like 'artificial general intelligence,' despite ongoing hype. Many experts believe it is still a long way off.
Profits from AI companies are likely to stay low or nonexistent. However, companies that make the hardware for AI, like chips, will continue to do well.
Generative AI will keep having problems, like making mistakes and being inconsistent, which will hold back its reliability and wide usage.

Why I don’t share Sam Altman’s confidence that AGI is basically a solved problem

Marcus on AI • 7786 implied HN points • 06 Jan 25

🕹 Technology Artificial Intelligence Machine Learning Data science Computing Robotics

AGI is still a big challenge, and not everyone agrees it's close to being solved. Some experts highlight many existing problems that have yet to be effectively addressed.
There are significant issues with AI's ability to handle changes in data, which can lead to mistakes in understanding or reasoning. These distribution shifts have been seen in past research.
Many believe that relying solely on large language models may not be enough to improve AI further. New solutions or approaches may be needed instead of just scaling up existing methods.

DeepSeek-V3: Technical Details

Gonzo ML • 252 implied HN points • 06 Feb 25

🕹 Technology Artificial Intelligence Machine Learning Computer Science Data Analysis Software Development

DeepSeek-V3 uses a new technique called Multi-head Latent Attention, which helps to save memory and speed up processing by compressing data more efficiently. This means it can handle larger datasets faster.
The model incorporates an innovative approach called Multi-Token Prediction, allowing it to predict multiple tokens at once. This can improve its understanding of context and boost overall performance.
DeepSeek-V3 is trained using advanced hardware and new training techniques, including utilizing FP8 precision. This helps in reducing costs and increasing efficiency while still maintaining model quality.

Fine-tuning LLMs with 32-bit, 8-bit, and Paged AdamW Optimizers

The Kaitchup – AI on a Budget • 259 implied HN points • 07 Oct 24

🕹 Technology AI Machine Learning Optimization Data processing Programming

Using 8-bit and paged AdamW optimizers can save a lot of memory when training large models. This means you can run more complex models on cheaper, lower-memory GPUs.
The 8-bit optimizer is almost as effective as the 32-bit version, showing similar results in training. You can get great performance with less memory required.
Paged optimizers help manage memory efficiently by moving data only when needed. This way, you can keep training even if you don't have enough GPU memory for everything.

DeepSeek moment

Gonzo ML • 441 implied HN points • 27 Jan 25

🕹 Technology AI Models Machine Learning Open Source Deep Learning

DeepSeek is a game-changer in AI, trained models at a much lower cost compared to its competitors like OpenAI and Meta. This makes advanced technology more accessible.
They released new models called DeepSeek-V3 and DeepSeek-R1, which offer impressive performance and reasoning capabilities similar to existing top models. These require advanced setups but show promise for future development.
Their multimodal model, Janus-Pro, can work with both text and images, and it reportedly outperforms popular models in generation tasks. This indicates a shift toward more versatile AI technologies.

Google DeepMind CEO Demis Hassabis: The Path To AGI, LLM Creativity, And Google Smart Glasses

Big Technology • 5754 implied HN points • 23 Jan 25

🕹 Technology Artificial Intelligence Machine Learning Innovation Tech industry Digital Transformation

Demis Hassabis thinks we're still a few years away from achieving AGI, or human-level AI. He mentions that while there's been progress, we still need to develop more capabilities like reasoning and creativity.
Current AI models are strong in some areas but still have weaknesses and can't consistently perform all tasks well. Hassabis believes an AGI should be able to reason and come up with new ideas, not just solve existing problems.
He warns that if someone claims they've reached AGI by 2025, it might just be a marketing tactic. True AGI requires much more development and consistency than what we currently have.

The Sequence Radar #554 : The New DeepSeek R1-0528 is Very Impressive

TheSequence • 77 implied HN points • 01 Jun 25

🕹 Technology AI Computing Machine Learning Software Engineering

The DeepSeek R1-0528 model is really good at math and reasoning, showing big improvements in understanding complicated problems.
This new model can handle large amounts of data at once, making it perfect for tasks that need lots of information, like technical documents.
DeepSeek is focused on making advanced AI accessible to everyone, not just big companies, which is great for developers and researchers with limited resources.

AI & Python #27: Books I Read to Learn Data Science and Machine Learning

Artificial Corner • 119 implied HN points • 16 Oct 24

🕹 Technology Data science Machine Learning Artificial Intelligence Education

Reading is essential for understanding data science and machine learning. Books can help you learn these subjects from scratch or deepen your existing knowledge.
One recommended book is 'Data Science from Scratch' by Joel Grus. It covers important math and statistics concepts that are crucial for data science.
For beginners in Python, it's important to learn Python basics before diving into data science books. Supplement your reading with beginner-friendly Python books.

AI #91: Deep Thinking

Don't Worry About the Vase • 2732 implied HN points • 21 Nov 24

🕹 Technology AI Development Machine Learning Cybersecurity AI Policy Tech industry

DeepSeek has released a new AI model similar to OpenAI's o1, which has shown potential in math and reasoning, but we need more user feedback to confirm its effectiveness.
AI models are continuing to improve incrementally, but people seem less interested in evaluating new models than they used to be, leading to less excitement about upcoming technologies.
There are ongoing debates about AI's impact on jobs and the future, with some believing that the rise of AI will lead to a shift in how we find meaning and purpose in life, especially if many jobs are replaced.

On Good and Bad AI

TK News by Matt Taibbi • 10761 implied HN points • 27 Nov 24

🕹 Technology AI Ethics Machine Learning Automation Digital Culture Innovation

AI can be a tool that helps us, but we should be careful not to let it control us. It's important to use AI wisely and stay in charge of our own decisions.
It's possible to have fun and creative interactions with AI, like making it write funny poems or reimagine famous speeches in different styles. This shows AI's potential for entertainment and creativity.
However, we should also be aware of the challenges that come with AI, such as ethical concerns and the impact on jobs. It's a balance between embracing the technology and understanding its risks.

The Weekly Kaitchup #62

The Kaitchup – AI on a Budget • 159 implied HN points • 11 Oct 24

🕹 Technology AI Machine Learning Hardware Software Data science

Avoid using small batch sizes with gradient accumulation. It often leads to less accurate results compared to using larger batch sizes.
Creating better document embeddings is important for retrieving information effectively. Including neighboring documents in embeddings can really help improve the accuracy of results.
Aria is a new model that processes multiple types of inputs. It's designed to be efficient but note that it has a higher number of parameters, which means it might take up more memory.

AI #93: Happy Tuesday

Don't Worry About the Vase • 1971 implied HN points • 04 Dec 24

🕹 Technology AI Machine Learning Data science Open Source Cybersecurity

Language models can be really useful in everyday tasks. They can help with things like writing, translating, and making charts easily.
There are serious concerns about AI safety and misuse. It's important to understand and mitigate risks when using powerful AI tools.
AI technology might change the job landscape, but it's also essential to consider how it can enhance human capabilities instead of just replacing jobs.

Weekly Top Picks #96

The Algorithmic Bridge • 276 implied HN points • 03 Feb 25

🕹 Technology Artificial Intelligence Machine Learning Automation Geopolitics Research

OpenAI has launched two new AI agents, Operator and Deep Research, which focus on web tasks and detailed reports. Deep Research is particularly useful right now.
OpenAI's o3-mini model is now free and demonstrates strong reasoning capabilities. This shows that powerful AI tools can be accessible to everyone.
AI technology is evolving rapidly, and companies can benefit collectively from its advancements. Telling an AI to think longer can actually improve its performance.

The Five Stages of AGI Grief

Marcus on AI • 6205 implied HN points • 07 Jan 25

🕹 Technology Artificial Intelligence Machine Learning Computing Innovation Tech Ethics

Many people are changing what they think AGI means, moving away from its original meaning of being as smart as a human in flexible and resourceful ways.
Some companies are now defining AGI based on economic outcomes, like making profits, which isn't really about intelligence at all.
A lot of discussions about AGI don't clearly define what it is, making it hard to know when we actually achieve it.

Fit For Purpose

Doomberg • 6134 implied HN points • 26 Dec 24

🕹 Technology Cybernetics Artificial Intelligence Machine Learning Energy

Cybernetics studies how information is used in complex systems, which helps in fields like AI and managing big teams. Understanding this can make complex situations easier to handle.
The principle of POSIWID means that the real purpose of a system is shown by what it actually does, not just what it says it aims for. This can help us see the truth behind many actions and motives.
Current hype around fusion energy suggests it might soon be commercially viable, but we should question if the excitement aligns with real progress or hidden agendas in energy politics.

AI still lacks “common” sense, 70 years later

Marcus on AI • 5968 implied HN points • 05 Jan 25

🕹 Technology Artificial Intelligence Machine Learning Robotics Cognitive Science Data science

AI struggles with common sense. While humans easily understand everyday situations, AI often fails to make the same connections.
Current AI models, like large language models, don't truly grasp the world. They may create text that seems correct but often make basic mistakes about reality.
To improve AI's performance, researchers need to find better ways to teach machines commonsense reasoning, rather than relying on existing data and simulations.

Where will AI be at the end of 2027? A bet

Marcus on AI • 6007 implied HN points • 30 Dec 24

🕹 Technology Artificial Intelligence Human-computer interaction AI Policy Machine Learning

A bet has been placed on whether AI can perform 8 out of 10 specific tasks by the end of 2027. It's a way to gauge how advanced AI might be in a few years.
The tasks include things like writing biographies, following movie plots, and writing screenplays, which require a high level of intelligence and creativity.
If the AI succeeds, a $2,000 donation goes to one charity; if it fails, a $20,000 donation goes to another charity. This is meant to promote discussion about AI's future.

AGI versus “broad, shallow intelligence”

Marcus on AI • 5019 implied HN points • 13 Jan 25

🕹 Technology AI Machine Learning Intelligence Robotics Data science

We haven't reached Artificial General Intelligence (AGI) yet. People can still easily come up with problems that AI systems can't solve without training.
Current AI systems, like large language models, are broad but not deep in understanding. They might seem smart, but they can make silly mistakes and often don't truly grasp the concepts they discuss.
It's important to keep working on AI that isn't just broad and shallow. We need smarter systems that can reliably understand and solve different problems.

Short Takes #1

Am I Stronger Yet? • 125 implied HN points • 16 Jun 25

🕹 Technology AI Cybersecurity Big Data Machine Learning Economics

AI is changing cybersecurity, but it’s hard to predict how it will affect us. Experts are discussing the right questions to understand its impact.
Meta AI is possibly having a bigger influence than we think, especially in emerging economies. Many people are using it regularly in their daily apps.
AI models are evolving, and their new skills might bring both benefits and risks. There’s a growing concern that they could share harmful information as they get smarter.

DeepSeek-R1: Open model with Reasoning

Gonzo ML • 126 implied HN points • 10 Feb 25

🕹 Technology AI Research Machine Learning Natural Language Processing Open Source Reinforcement Learning

DeepSeek-R1 shows how AI models can think through problems by reasoning before giving answers. This means they can generate longer, more thoughtful responses rather than just quick answers.
This model is a big step for open-source AI as it competes well with commercial versions. The community can improve it further, making powerful tools accessible for everyone.
The training approach used is innovative, focusing on reinforcement learning to teach reasoning without needing a lot of examples. This could change how we train AI in the future.

AlphaGeometry2: Impressive accomplishment, but still a long path ahead

Marcus on AI • 3161 implied HN points • 17 Feb 25

🕹 Technology Artificial Intelligence Machine Learning Computer Science Mathematics Research

AlphaGeometry2 is a specialized AI designed specifically for solving tough geometry problems, unlike general chatbots that tackle various types of questions. This means it's really good at what it was built for, but not much else.
The system's impressive 84% success rate comes with a catch: it only achieves this after converting problems into a special math format first. Without this initial help, the success rate drops significantly.
While AlphaGeometry2 shows promising advancements in AI problem-solving, it still struggles with many basic geometry concepts, highlighting that there's a long way to go before it can match high school students' understanding in geometry.

Humanity’s “Oh shit!” AI moment?

Marcus on AI • 6639 implied HN points • 12 Dec 24

🕹 Technology Artificial Intelligence Machine Learning Cybersecurity Ethics Regulation

AI systems can say one thing and do another, which makes them unreliable. It’s important not to trust their words too blindly.
The increasing power of AI could lead to significant risks, especially if misused by bad actors. We might see more cybercrime driven by these technologies soon.
Delaying regulation on AI increases the risks we face. There is a growing need for rules to keep these powerful tools in check.

Breaking news: AGI is not imminent!

Marcus on AI • 4466 implied HN points • 20 Jan 25

🕹 Technology AI Machine Learning Computing Innovation Digital Trends

Many people believe AGI, or artificial general intelligence, is coming soon, but that might not be true. It's important to stay cautious and not believe everything we hear about upcoming technology.
Sam Altman, a well-known figure in AI, suggested we're close to achieving AGI, but he later changed his statement. This shows that predictions in technology can quickly change.
Experts like Gary Marcus are confident that AGI won't arrive as soon as 2025. They think we still have a long way to go before we reach that level of intelligence in machines.

Mondays with the Machine: Thinking Out Loud About the Current "AI" Boom-Bubble

Brad DeLong's Grasping Reality • 169 implied HN points • 02 Jun 25

🕹 Technology Artificial Intelligence Machine Learning Cognitive Science Digital Tools Information Technology

New technologies like AI often cause panic as people worry about their impact, similar to how calculators were once banned in schools. Over time, we learn to use these tools responsibly.
AI chatbots can seem human-like, but they are actually complex tools for finding information. Instead of treating them like people, we should learn how to use them effectively for our needs.
While AI can generate a lot of ideas quickly, it lacks the depth and truthfulness that history provides. History gives us valuable lessons, but AI can still help spark new thoughts and start conversations.

Generate Synthetic Data from Personas to Train AI Chatbots

The Kaitchup – AI on a Budget • 139 implied HN points • 10 Oct 24

🕹 Technology AI Chatbots Data science Machine Learning

Creating a good training dataset is key to making AI chatbots work well. Without quality data, the chatbot might struggle to perform its tasks effectively.
Generating your own dataset using large language models can save time instead of collecting data from many different sources. This way, the data is tailored to what your chatbot really needs.
Using personas can help you create specific question-and-answer pairs for the chatbot. It makes the training process more focused and relevant to various topics.

AI Agents: Hype versus Reality, redux

Marcus on AI • 4545 implied HN points • 15 Jan 25

🕹 Technology AI Machine Learning Software Development Innovation Data science

AI agents are getting a lot of attention right now, but they still aren't reliable. Most of what we see this year are just demos that don't work well in real life.
In the long run, we might have powerful AI agents doing many jobs, but that won't happen for a while. For now, we need to be careful about the hype.
To build truly helpful AI agents, we need to solve big challenges like common sense and reasoning. If those issues aren't fixed, the agents will continue to give strange or wrong results.

China's Deepseek is NOT as smart as ChatGPT-o1

Maximum Truth • 231 implied HN points • 29 Jan 25

🕹 Technology AI Machine Learning Competitiveness Innovation Data

Deepseek performs on par with free AI models but does not reach the intelligence of OpenAI's paid models. It can exceed or match free AIs like Claude and ChatGPT-4o, but falls short against the more advanced paid versions.
When tested with IQ questions only found offline, Deepseek does better than free models but still trails behind OpenAI’s paid models. Its results imply it may have leveraged internet data for online IQ tests, thus affecting its offline performance.
Despite being competitive, the US maintains a lead in AI intelligence. Deepseek shows promise but faces challenges ahead, especially with the restrictions on technology that China experiences.

Hard-forked! Casey Newton’s distorted portrait of Gary Marcus and AI skepticism

Marcus on AI • 6679 implied HN points • 06 Dec 24

🕹 Technology AI Machine Learning Data Ethics Tech Policy Regulation

We need to prepare for AI to become more dangerous than it is now. Even if some experts think its progress might slow, it's important to have safety measures in place just in case.
AI doesn't always perform as promised and can be unreliable or harmful. It's already causing issues like misinformation and bias, which means we should be cautious about its use.
AI skepticism is a valid and important perspective. It's fair for people to question the role of AI in society and to discuss how it can be better managed.

Inverse frontiers

arg min • 158 implied HN points • 07 Oct 24

🕹 Technology Optimization Machine Learning Algorithms Data science Artificial Intelligence

Convex optimization has benefits, like collecting various modeling tools and always finding a reliable solution. However, not every problem fits neatly into a convex framework.
Some complex problems, like dictionary learning and nonlinear models, often require nonconvex optimization, which can be tricky to handle but might be necessary for accurate results.
Using machine learning methods can help solve inverse problems because they can learn the mapping from measurements to states, making it easier to compute solutions later, though training the model initially can take a lot of time.

The Sequence Research #553: Self-Evaluating LLMs Are Here: Inside Meta AI’s J1 Framework

TheSequence • 63 implied HN points • 30 May 25

🕹 Technology AI Machine Learning Software Innovation Research

LLMs are now used as judges, which is an exciting new trend in AI. This can help improve how we evaluate AI outputs.
Meta AI's J1 framework is a significant development that makes LLMs more like active thinkers rather than just content creators. This means they can make better evaluations.
Using reinforcement learning, J1 allows AI models to learn effective ways to judge tasks. This helps ensure that their evaluations are both reliable and understandable.

DeepSeek-V3: Training

Gonzo ML • 126 implied HN points • 08 Feb 25

🕹 Technology Machine Learning Artificial Intelligence Data science Software Development Computer Science

DeepSeek-V3 uses a lot of training data, with 14.8 trillion tokens, which helps it learn better and understand more languages. It's been improved with more math and programming examples for better performance.
The training process has two main parts: pre-training and post-training. After learning the basics, it gets fine-tuned to enhance its ability to follow instructions and improve its reasoning skills.
DeepSeek-V3 has shown impressive results in benchmarks, often performing better than other models despite having fewer parameters, making it a strong competitor in the AI field.

AGI isn’t coming in 2025, and GPT-5 probably isn’t either.

Marcus on AI • 4189 implied HN points • 09 Jan 25

🕹 Technology AI Computing Data science Machine Learning Software

AGI, or artificial general intelligence, is not expected to be developed by 2025. This means that machines won't be as smart as humans anytime soon.
The release of GPT-5, a new AI model, is also uncertain. Even experts aren't sure if it will be out this year.
There is a trend of people making overly optimistic predictions about AI. It's important to be realistic about what technology can achieve right now.

We need to do something about AI now

Philosophy bear • 486 implied HN points • 05 Jan 25

🕹 Technology AI Ethics Digital economy Data Privacy Machine Learning

AI is rapidly advancing and could soon take over many jobs, which might lead to massive unemployment. We need to pay attention and prepare for these changes.
There's a real fear that AI could create a huge gap between a rich elite and the rest of society. We shouldn't just accept this as a given; instead, we should work towards solutions.
To protect our rights and livelihoods, we need to build movements that unite people concerned about AI's impact on jobs and society. It's important to act before it’s too late.