The hottest Machine Learning Substack posts right now

And their main takeaways

The Sequence Radar #767: Last Week in AI: Google Logic, Amazon Utility, and Mistral Efficiency

TheSequence • 56 implied HN points • 07 Dec 25

🕹 Technology Machine Learning

AI model development is changing focus from just making models bigger to making them smarter and more specialized. It's now about using different tools for specific tasks instead of one model for everything.
Google's Gemini 3 Deep Think is a significant release that uses a new way of thinking to solve problems. It focuses on careful reasoning rather than quick responses, leading to much better problem-solving skills.
Amazon's Nova 2 and Mistral's Large 3 provide new options for businesses by focusing on efficiency and privacy. These models allow companies to create tailored solutions without relying on large, generic AI models.

OK, I can partly explain the LLM chess weirdness now

DYNOMIGHT INTERNET NEWSLETTER • 796 implied HN points • 21 Nov 24

🕹 Technology Machine Learning

LLMs like `gpt-3.5-turbo-instruct` can play chess well, but most other models struggle. Using specific prompts can improve their performance.
Providing legal moves to LLMs can actually confuse them. Instead, repeating the game before making a move helps them make better decisions.
Fine-tuning and giving examples both improve chess performance for LLMs, but combining them may not always yield the best results.

Supercharge Your GPT Model: Custom Data Fine-Tuning using Node.js

Nader's Thoughts • 471 implied HN points • 19 Mar 23

🕹 Technology Machine Learning

You can fine-tune GPT models with your own custom data using Node.js.
OpenAI provides APIs and SDKs for easy model training in multiple languages.
Creating and uploading training data is essential to customize and improve your model.

Teaching Small Language Models to Reason

Cobus Greyling on LLMs, NLU, NLP, chatbots & voicebots • 39 implied HN points • 10 Jul 24

🕹 Technology Machine Learning

Using Chain-Of-Thought prompting helps large language models think through problems step by step, which makes them more accurate in their answers.
Smaller language models struggle with Chain-Of-Thought prompting and often get confused because they don't have enough knowledge and understanding like the bigger models.
Google Research has a method to teach smaller models by learning from larger ones. This involves using the bigger models to create helpful examples that the smaller models can then learn from.

Issue #2 - The Data Ecosystem: Where do you even start?

The Data Ecosystem • 119 implied HN points • 21 Apr 24

🕹 Technology Machine Learning

Data can be really complicated, and it's easy to miss how everything connects. People often focus on their own area and forget about the bigger picture of the data ecosystem.
Chief Data Officers (CDOs) are important but can only do so much to fix data issues. They deal with many challenges, including limited power, lack of experience, and politics within the organization.
To improve in the data field, we need to recognize the gaps in our knowledge, prioritize what to focus on, and continuously educate ourselves in both our own areas and related data domains.

Get a weekly roundup of the best Substack posts, by hacker news affinity:

Latest open artifacts (#13): The abundance era of open models

Democratizing Automation • 182 implied HN points • 11 Aug 25

🕹 Technology Machine Learning

The open-weight AI ecosystem has become a competitive market with many quality releases over the past year. This means there's a lot more choice and better options available now.
Open models are gaining popularity because they are trusted, low-cost, and often better than closed models. Many users are starting with them instead of going for expensive alternatives.
While text-based models are commonly discussed, there are also many valuable multimodal and specialized models that show the strength of the open AI ecosystem. It's exciting to see growth in these areas too.

The Sequence AI of the Week #793: DeepSeek's New Paper: Storing 100B Parameters on CPU RAM

TheSequence • 21 implied HN points • 21 Jan 26

🕹 Technology Machine Learning

The current LLM trend is to scale models huge and use sparsity tricks like Mixture-of-Experts so only a small part of the model activates per token, reducing FLOPs.
Reusing an old technique — storing large, static lookup-like memories on CPU RAM and conditionally accessing them — can let models hold around 100B parameters off-GPU and avoid expensive dense computation.
The key insight is that many LLM costs come from simulating static lookup tables with neural computation, so replacing that simulation with real conditional lookups makes models much more efficient.

Deploying a Forecasting Bot

Abstraction • 29 implied HN points • 05 Jan 26

🕹 Technology Machine Learning

A structured, reproducible forecasting pipeline models how strong human forecasters think so methods can be tested and refined systematically.
Huge cost cuts made iteration affordable: per-question cost dropped from $0.109 to $0.004 (about 27×), enabling many more experiments across the tournament.
The team accepts a likely short-term performance hit by using cheaper models and fewer tokens because the priority is learning which pipeline parts truly matter using the tournament as a feedback loop.

Import AI 341: Neural nets can smell; technofeudalism via AI; China releases another solid open access model

Import AI • 459 implied HN points • 25 Sep 23

🕹 Technology Machine Learning

China released open access language models trained on both English and Chinese data, emphasizing safety practices tailored to China's social context.
Google and collaborators created a digital map of smells, pushing AI capabilities to not just recognize visual and audio data but also scents, opening new possibilities for exploration and understanding.
An economist outlines possible societal impacts of AI advancement, predicting a future where superintelligence prompts dramatic changes in governance structures, requiring adaptability from liberal democracies.

Our Human Creativity Is Becoming More Uniform Due To ChatGPT

Cobus Greyling on LLMs, NLU, NLP, chatbots & voicebots • 39 implied HN points • 09 Jul 24

🕹 Technology Machine Learning

Using ChatGPT for creativity can lead to less unique ideas among different users. This means many people might come up with similar concepts.
People might feel more creative while using ChatGPT, but this doesn't always result in original or diverse thoughts.
Reliance on a single AI tool can limit the creative process. It's important for new tools to encourage individual input instead of providing complete solutions right away.

Charting the Graphical Roadmap to Smarter AI

Gradient Flow • 399 implied HN points • 02 Nov 23

🕹 Technology Machine Learning

Knowledge graphs can enhance large language models (LLMs) by providing structured factual knowledge about the world, improving their reasoning abilities and usefulness for real-world applications.
Augmenting pre-training of LLMs with knowledge graphs through techniques like integrating into training objectives and model inputs can create models proficient in language generation and factual knowledge.
Enterprises can leverage their data to enhance LLM applications with knowledge graphs, as tools exist to automatically turn semi-structured data into structured knowledge graphs.

Embrace the clash between domain expertise and machine learning

Mindful Modeler • 139 implied HN points • 02 Apr 24

🕹 Technology Machine Learning

There can be clashes between domain expertise and machine learning models.
Machine learning is prediction-focused, while domain expertise is often theory-driven.
Embracing and investigating the gaps between machine learning models and domain knowledge can lead to improved understanding and model refinement.

Data Science Weekly - Issue 523

Data Science Weekly Newsletter • 339 implied HN points • 01 Dec 23

🕹 Technology Machine Learning

Data science is evolving quickly, and it's important to stay updated with new advances and tools. Courses and reading lists can help you catch up and enhance your skills.
Using machine learning to solve real-world problems, like correctly attributing quotes, shows the practical applications of data science. Collaboration between universities and organizations can lead to innovative solutions.
The job market for data scientists is challenging right now. Many applicants are competing for limited positions, so if you're looking for a job, patience is key.

The One and a Half Gemini

Don't Worry About the Vase • 1657 implied HN points • 22 Feb 24

🕹 Technology Machine Learning

Gemini 1.5 introduces a breakthrough in long-context understanding by processing up to 1 million tokens, which means improved performance and longer context windows for AI models.
The use of mixture-of-experts architecture in Gemini 1.5, alongside Transformer models, contributes to its overall enhanced performance, potentially giving Google an edge over competitors like GPT-4.
Gemini 1.5 offers opportunities for new and improved applications, such as translation of low-resource languages like Kalamang, providing high-quality translations and enabling various innovative use cases.

Data Science Weekly - Issue 536

Data Science Weekly Newsletter • 179 implied HN points • 01 Mar 24

🕹 Technology Machine Learning

The DSPy framework makes working with large language models easier by focusing on programming instead of complex prompting techniques. This helps reduce errors and improves usability.
A new sequence model approach shows better performance than traditional Transformers, especially for long data sequences. It also works faster, making it a promising development in the field.
Learning resources like online courses and free books on deep learning and causal ML can help deepen understanding of data science. They provide structured material that is great for both beginners and advanced learners.

I cannot believe the shit that morons are getting up to with ChatGPT

Read Max • 2739 implied HN points • 30 May 23

🕹 Technology Machine Learning

Consulting AI for factual information can lead to misinformation and embarrassing situations.
AI models like ChatGPT are better at capturing vibes than providing accurate facts.
Using AI as a search engine for factual research can be risky due to its tendency to generate fake content.

✨🎄 Some AGI optimism: an early Xmas present

Faster, Please! • 639 implied HN points • 23 Dec 24

🕹 Technology Machine Learning

OpenAI has released a new AI model called o3, which is designed to improve skills in math, science, and programming. This could help advance research in various scientific fields.
The o3 model performs much better than the previous model, o1, and other AI systems on important tests. This shows significant progress in AI performance.
There's a feeling of optimism about AGI technology as these advancements might bring us closer to achieving more intelligent and capable AI systems.

The Sequence Knowledge #784: The Convergence of Synthetic Data and World Models Models Are Unlocking Embodied AI

TheSequence • 28 implied HN points • 06 Jan 26

🕹 Technology Machine Learning

Collecting high-quality, perfectly labeled 3D data from the real world is slow, expensive, and misses rare edge cases, so 'reality' is the main bottleneck for embodied AI.
Pairing synthetic data generation with world models lets teams create rich, diverse, and labeled simulated environments, so agents can be trained and tested without costly real-world collection.
New world models like Google DeepMind's Genie show this approach in action by enabling interactive, dynamic 3D simulations where robots and autonomous vehicles can learn more robust behaviors.

James Zou: one of the most prolific and creative A.I. researchers in both life science and medicine

Ground Truths • 2012 implied HN points • 01 Nov 23

🕹 Technology Machine Learning

James Zou is a prolific and creative A.I. researcher in life science and medicine.
His work focuses on using large language models for peer review and analyzing pathology posts from Twitter.
He is exploring the use of text descriptions of genes for improving genomic analysis.

Machine Learning's Secret Sauce: Competition

Mindful Modeler • 219 implied HN points • 30 Jan 24

🕹 Technology Machine Learning

Competition drives progress in both running marathons and advancing in machine learning.
In machine learning, progress often comes from a series of small improvements rather than a single breakthrough.
Intense competition can lead to shortcuts and undesirable practices in both sports and machine learning.

Imbalanced data? Why "Do Nothing" should be the default

Mindful Modeler • 419 implied HN points • 19 Sep 23

🔬 Science Machine Learning

For imbalanced classification tasks, 'Do Nothing' should be the default approach, especially when dealing with calibration, strong classifiers, and class-based metrics.
Addressing imbalanced data should be considered in scenarios where misclassification costs vary, metrics are impacted by imbalance, or weaker classifiers are used.
Instead of using oversampling methods like SMOTE, adjusting data weighting, using cost-sensitive machine learning, and threshold tuning are more effective ways to handle class imbalance.

🤖 An economic super-boom needs humanoid robots, not just human-level AI

Faster, Please! • 1736 implied HN points • 11 Jan 24

🕹 Technology Machine Learning

An economic super-boom requires humanoid robots, not just human-level AI.
To achieve exponential economic growth, automation of tasks and idea production is crucial.
Advances in generative AI are beneficial, but physical interaction data is necessary for real-world robotics development.

The Sequence AI of the Week #753: Inside Kimi K2 Thinking: The Architecture of Long-Horizon Reasoning

TheSequence • 70 implied HN points • 12 Nov 25

🕹 Technology Machine Learning

Kimi K2 Thinking is a new AI model that thinks in a more advanced way than just giving one answer at a time. It can plan and act over longer periods while staying on track.
This model is built on a powerful billion-parameter system designed to improve how it learns and uses data efficiently. It makes the most of its resources when solving problems.
Kimi K2 also uses smart training methods, like reinforcement learning, to help it use tools better and think through problems in a layered way.

The Sequence Radar #477: The R1 Moment

TheSequence • 546 implied HN points • 26 Jan 25

🕹 Technology Machine Learning

DeepSeek-R1 is a new AI model that shows it can perform as well or better than big-name AI models but at a much lower cost. This means smaller companies can now compete in AI innovation without needing huge budgets.
The way DeepSeek-R1 is trained is different from traditional methods. It uses a new approach called reinforcement learning, which helps the model learn smarter reasoning skills without needing a ton of supervised data.
The open-source nature of DeepSeek-R1 means anyone can access and use the code for free. This encourages collaboration and allows more people to innovate in AI, making technology more accessible to everyone.

The latest open artifacts (#10): More permissive licenses, everything as a reasoner, and from artifacts to agents

Democratizing Automation • 277 implied HN points • 29 May 25

🕹 Technology Machine Learning

There is a rise in Chinese AI models that use more open licenses, influencing other models to adopt similar practices. This pressure is especially affecting Western companies like Meta and Google.
Qwen models are becoming more popular for fine-tuning compared to Llama models, with smaller American startups favoring Qwen. These trends show a shift in preferences in the AI community.
The focus in AI is shifting from just model development to creating tools that leverage these models. This means future releases will often be tool-based rather than just about the AI models themselves.

SAI Notes #07: What is a Vector Database?

SwirlAI Newsletter • 412 implied HN points • 18 Jun 23

🕹 Technology Machine Learning

Vector Databases are essential for working with Vector Embeddings in Machine Learning applications.
Partitioning and Bucketing are important concepts in Spark for efficient data storage and processing.
Vector Databases have various real-life applications, from natural language processing to recommendation systems.

Data Science Weekly - Issue 521

Data Science Weekly Newsletter • 339 implied HN points • 17 Nov 23

🕹 Technology Machine Learning

JAX is becoming popular for its speed and capabilities, and learning it may be essential for those familiar with PyTorch. It does have a steeper learning curve, but there are resources to help ease the transition.
The demand for GPUs is skyrocketing, driven by various market factors. Understanding these dynamics can help anticipate the future of technology and resource availability in industries reliant on powerful computing.
Freelancing in data science can lead to an overwhelming number of job offers. Tips on finding clients on platforms like Upwork and LinkedIn can help navigate this new freelance landscape.

Data Science Weekly - Issue 518

Data Science Weekly Newsletter • 379 implied HN points • 27 Oct 23

🕹 Technology Machine Learning

Web development is evolving with the use of local models and technologies for building applications, moving beyond just Python-based machine learning.
It's becoming increasingly important for developers to understand GPUs since they're widely used in deep learning and can greatly enhance performance.
Companies are exploring various use cases for generative AI that provide real value, focusing on practical implementations that drive return on investment.

Data Science Weekly - Issue 531

Data Science Weekly Newsletter • 219 implied HN points • 26 Jan 24

🕹 Technology Machine Learning

AI often gets criticized for the quality of its output, but that might not be the real issue people have with it. If quality is fixed, the conversation about AI could change significantly.
Common sense is tricky to define and measure, but researchers are developing ways to quantify it both individually and collectively. This could help clarify how we understand common sense in different contexts.
Large language models (LLMs) can transform education by encouraging hands-on learning. They offer opportunities for more interactive and engaging learning experiences.

Data Science Weekly - Issue 524

Data Science Weekly Newsletter • 299 implied HN points • 08 Dec 23

🕹 Technology Machine Learning

Data engineering is evolving with new design patterns that help improve efficiency in handling data. A new book dives into these patterns and their importance.
Machine learning is being used to understand and control the movement of silicon atoms in materials, which could lead to advancements in technology like better electronics.
A new model called PoseGPT can estimate 3D human poses from images and text, linking physical movements to broader concepts about humans, showing the capabilities of large language models.

For the Next Month: Going All-in on The Browser Company's Dia-AI

Brad DeLong's Grasping Reality • 176 implied HN points • 01 Aug 25

🕹 Technology Machine Learning

The Dia Browser is a new tool that aims to combine AI with web browsing, helping users get more control and streamline their information processing.
Large language models like ChatGPT can handle information overload by summarizing and organizing data, acting like advanced autocomplete systems that enhance productivity.
While these technologies are powerful, they lack true understanding and reasoning, meaning users still play a crucial role in guiding their use effectively.

In the land of LLMs, can we do better mock data generation?

Neurelo Engineering’s Substack • 1 HN point • 27 Sep 24

🕹 Technology Machine Learning

Mock data is super useful for testing software, but it hasn't really improved much over the years. It needs to be more flexible and easier to generate high-quality data.
Using LLMs (large language models) can be tricky for creating mock data. Instead of trying to generate everything, it’s often better to use techniques like topological sorting to keep relationships correct between data entries.
A new approach is turning to strategies like the Genesis Point Strategy, which helps create unique mock data efficiently. It shows that you can simplify processes to get good results without overcomplicating things.

Tokens Aren't Fungible

Who is Nnamdi • 7 implied HN points • 11 Feb 26

🕹 Technology Machine Learning

Cheaper, equally intelligent open-source models still capture under 30% of usage, which shows price and benchmark scores explain only a small part of why people choose models.
Most users pick one model and stick with it, and price cuts mainly shift volume rather than grow revenue, so being a user's primary model creates strong lock-in.
Benchmarks miss key, hard-to-measure factors like trust, safety, privacy, tooling, and support, so differentiation on intangibles matters and tokens aren’t fungible.

From 0 to 2M Users in Generative AI

Startup Pirate by Alex Alexakis • 235 implied HN points • 12 Jan 24

🕹 Technology Machine Learning

Uizard has over 2 million users and enables fast product design creation with AI and an intuitive editor.
Their technology includes deep learning, computer vision, and natural language processing to power their platform.
Product market fit for Uizard was achieved by shifting focus to non-experts and iterating based on user feedback.

Machine learning changed how I see the world

Mindful Modeler • 399 implied HN points • 29 Aug 23

🕹 Technology Machine Learning

Professions strongly influence how people think and solve problems.
Machine learning has expanded the way we view and approach problem-solving through the lens of prediction.
A background in supervised ML can lead to seeing various situations in life as prediction or learning problems.

MLOps Basics - For Data Engineers.

Data Engineering Central • 393 implied HN points • 15 May 23

🕹 Technology Machine Learning

Working on Machine Learning as a Data Engineer is not as hard as it seems - it falls somewhere in the middle of difficulty.
Machine Learning work for Data Engineers focuses on MLOps like feature stores, model prediction, automation, and metadata storage.
The key aspects of MLOps include automating tasks, using tools like Apache Airflow, and managing metadata for a stable ML environment.

A Grandmaster's Guide to Machine Learning Challenges

Mindful Modeler • 339 implied HN points • 07 Nov 23

🕹 Technology Machine Learning

Focus on creating an end-to-end pipeline first, experiment with simple models, and then scale up gradually for better results in machine learning challenges.
Success in a challenge correlates with time invested, so choose challenges that motivate you and spend time understanding the data before committing.
Adopt a strategy to pick challenges that interest you, prioritize an experimentation loop, and aim to optimize later for overall success.

Horse rides Astronaut, redux

Marcus on AI • 1462 implied HN points • 13 Feb 24

🕹 Technology Machine Learning

DALL-E 2 and Gemini Ultra struggled with complex prompts and concepts, showing limitations in language understanding.
Proper prompts and iterations are crucial to achieve desired results with AI models like Gemini Ultra.
Despite progress in some areas, challenges persist in neural networks' factuality and compositionality.

AI and neuroscience

Technically • 21 implied HN points • 13 Jan 26

🕹 Technology Machine Learning

Neural networks are deliberately inspired by the brain: they use many simple "neurons" wired together to detect patterns and process information.
This brain-inspired approach has a long history and has been applied to real problems since early work by neuroscientists and engineers, showing the idea actually works in practice.
The brain is still poorly understood, so AI only roughly approximates biological brains, and many researchers think learning more about the brain could be key to building far more powerful intelligence.

MILKEN INSTITUTE REVIEW: Behind the Hype: What "AI" Is & Isn't

Brad DeLong's Grasping Reality • 176 implied HN points • 24 Jul 25

🕹 Technology Machine Learning

AI is reshaping jobs and how companies operate, especially in Silicon Valley where big players are fighting for profit. It's changing the game of technology investment and control.
Investors need to carefully consider whether they're joining a genuine revolution or just chasing another tech bubble like cryptocurrency. Understanding the real nature of AI is crucial.
AI is really about complex models that process information, not the magical intelligence people often hype it up to be. There’s a big difference between the promise of AI and what it can actually do right now.