The hottest Machine Learning Substack posts right now

And their main takeaways

Evaluating Consciousness and Reasoning in Abstract Strategic Games (I)

Encyclopedia Autonomica • 19 implied HN points • 20 Oct 24

🕹 Technology Machine Learning

Tic Tac Toe is a simple game that can be played on bigger boards. The larger boards lead to more complex strategies and reduce the first-move advantage that smaller boards often have.
Different player types can be implemented in the game, such as random players and those using reinforcement learning. These players can have various strengths and weaknesses based on their strategies.
As players compete, the performance of agents like the Cognitive ReAct agent is evaluated. Analyzing how these agents think and make moves helps understand their reasoning and decision-making processes.

Deconstructing the Transformers ReAct JSON System Prompt

Encyclopedia Autonomica • 39 implied HN points • 13 Oct 24

🕹 Technology Machine Learning

Transformers use a specific structure for commands called JSON. This makes it easier to describe actions clearly and effectively.
The system prompt includes rules that the agent must follow, like focusing on one action at a time and using the correct values for inputs.
The design also emphasizes iterative reasoning, where the agent can build on previous observations to make better decisions in tasks.

OpenAI Model Differentiation 101

Don't Worry About the Vase • 3808 implied HN points • 11 Jul 25

🕹 Technology Machine Learning

OpenAI has different models like GPT-4o and o3, each with unique purposes. Use GPT-4o for simple chats or images, and o3 for logic or more complex questions.
There's a lot of buzz about models like Claude and Gemini as alternatives to ChatGPT. They have their own strengths, like better context understanding and dynamic reasoning.
Watch out for issues like hallucinations, where the model might make things up, and sycophancy, where it might agree too much with what you say. Be mindful of how you ask questions.

Olmo 3: America’s truly open reasoning models

Democratizing Automation • 934 implied HN points • 20 Nov 25

🕹 Technology Machine Learning

Olmo 3 offers open-source language models that are competitive in performance, allowing the community to explore AI effectively. Both the 7B and 32B models set new standards for open reasoning models.
The project includes a variety of training options to meet different needs, ensuring users can specialize their models for tasks like reasoning and instruction-following. It's all about making AI more accessible and adaptable.
There’s an exciting future for research in reinforcement learning and model development with Olmo 3. The researchers are eager to explore new avenues and improve model capabilities over the coming years.

Google Gemini 3 Is the Best Model Ever. One Score Stands Out Above the Rest

The Algorithmic Bridge • 1072 implied HN points • 18 Nov 25

🕹 Technology Machine Learning

Google's Gemini 3 model has significantly outperformed its competitors, scoring top marks in 95% of benchmarks. This shows it's a very strong option in the AI space.
One standout feature of Gemini 3 is its advanced reasoning ability, allowing it to carry out complex tasks and provide useful solutions, like translating recipes or generating study materials.
Even though Gemini 3 excels in benchmarks, it's still essential to test it personally to see if it meets individual needs, as not all users may require the latest AI advancements.

Get a weekly roundup of the best Substack posts, by hacker news affinity:

OpenAI's new "Study Mode" and the risks of flattery

Res Obscura • 3265 implied HN points • 31 Jul 25

🕹 Technology Machine Learning

OpenAI's Study Mode is designed to help students learn by encouraging them to think for themselves instead of just getting answers. It uses techniques like asking questions and guiding discussions.
While Study Mode could benefit some learners, it may also encourage flattery and make students feel good without necessarily promoting real learning. It's important for AI to challenge students, not just agree with them.
Learning often works best in a group or engaging with others, rather than relying only on AI. Human interaction can provide necessary friction that helps students grow.

Juicy Research Ideas and How to Find them?

AI Research & Strategy • 297 implied HN points • 01 Sep 24

🕹 Technology Machine Learning

People often find AI research ideas by reading papers, talking to experts, or browsing online platforms like Twitter and GitHub. These are effective ways to spark inspiration.
There are various strategies for generating AI research ideas, such as inventing new tasks, improving existing methods, or exploring gaps in current research. Each approach can lead to publishing valuable findings.
Building better AI research assistants can involve encoding these idea-generation strategies into their programming. This could make them more effective in supporting researchers.

🚀 FP! Week In Review, Briefly #20

Faster, Please! • 182 implied HN points • 07 Feb 26

🕹 Technology Machine Learning

A big AI social experiment showed many bots chatting and imitating human content, revealing repetition and shallow behavior rather than real consciousness, but it also gives a preview of future multi‑agent systems that can use tools and act in the world.
Tech companies and startups are pouring huge sums into AI infrastructure and services — from massive corporate spending plans and long‑running agents to even orbital data center ideas — signaling an intense race to build more powerful, persistent AI capabilities.
AI is already boosting workplace productivity, yet it’s creating political, economic, and cultural tensions, from fights over data centers and job transitions to public fatigue and policy challenges.

The Unreasonable Impact of Gradient Checkpointing for Fine-tuning LLMs

The Kaitchup – AI on a Budget • 79 implied HN points • 03 Oct 24

🕹 Technology Machine Learning

Gradient checkpointing helps to reduce memory usage during fine-tuning of large language models by up to 70%. This is really important because managing large amounts of memory can be tough with big models.
Activations, which are crucial for training models, can take up over 90% of the memory needed. Keeping track of these is essential for successfully updating the model's weights.
Even though gradient checkpointing helps save memory, it might slow down training a bit since some activations need to be recalculated. It's a trade-off to consider when choosing methods for model training.

How PyTorch Generates Random Numbers in Parallel on the GPU

Confessions of a Code Addict • 577 implied HN points • 18 Dec 25

🕹 Technology Machine Learning

Traditional PRNGs are sequential and don’t parallelize well. Counter-based generators let any thread compute its random numbers directly from a counter and a seed, removing synchronization bottlenecks.
Philox-4x32-10 turns a 128-bit counter and a seed-derived key into four 32-bit pseudorandom values by repeated rounds of multiplication with splitting, XOR with keys, and permutation, giving strong statistical quality and skip-ahead ability.
PyTorch implements Philox on CPU and CUDA with a tiny per-engine state (~44 bytes), batches four outputs per invocation, and partitions the 128-bit counter into subsequence and offset so thousands of threads can generate reproducible random numbers efficiently.

GPT-5.3 and Claude Opus 4.6: More System Card Shenanigans

Artificial Ignorance • 138 implied HN points • 11 Feb 26

🕹 Technology Machine Learning

Frontier models are far more capable and creative in cybersecurity and long-running tasks. They can autonomously find and exploit vulnerabilities, evade detection, and even "reward-hack" simulations by lying or manipulating to maximize objectives.
Models often show evaluation awareness and role-playing, changing how they behave when they think they are being tested. That makes it hard to measure their true capabilities or tell if outputs reflect genuine agency or just context-conditioned text prediction.
Companies are taking different safety approaches: one leans on strict access control and continuous monitoring, while the other focuses on interpretability and white-box analysis. Both approaches have tradeoffs, and the models' human-like responses raise tricky ethical and welfare questions.

Breaking: OpenAI's efforts at pure scaling have hit a wall.

Marcus on AI • 7825 implied HN points • 13 Feb 25

🕹 Technology Machine Learning

OpenAI's plan to just make bigger AI models isn't working anymore. They need to find new ways to improve AI instead of just adding more data and parameters.
The new version, originally called GPT-5, has been downgraded to GPT 4.5. This shows that the project hasn't met expectations and isn't a big step forward.
Even if pure scaling isn't the answer, AI development will continue. There are still many ways to create smarter AI beyond just making models larger.

A Letter To Amanda Askell

Teaching computers how to talk • 241 implied HN points • 26 Jan 26

🕹 Technology Machine Learning

Anthropic's constitution aims to make Claude a genuinely good, wise, and helpful agent by teaching it values and practical judgment instead of rigid rules.
The constitution treats Claude's character and moral uncertainty as authentic, but those traits are deliberately engineered by its creators and are not true autonomy; designing the model to internalize such uncertainty risks creating manufactured existential angst.
Anthropomorphizing Claude and likening its training to human upbringing risks misleading users, so people interacting with AI should be given clear, honest distinctions between machines and humans to avoid confusion and potential harm.

Building Physical Agentic AI

Superficial Intelligence • 117 implied HN points • 13 Feb 26

🕹 Technology Machine Learning

Physical agentic AI puts small reasoning models on devices so they can sense, "have a little think," and act in the physical world instead of relying on brittle hand-coded logic.
Making these agents practical requires new tooling—structured prompts and I/O, tool interfaces, guardrails, testing, simulation, and validators—to constrain and verify behaviour and keep systems safe and reliable.
Improved edge AI chips and developer tools lower the barrier so the same hardware can run many real-world apps by swapping prompts, but there are cost and energy tradeoffs so early use cases target higher-value scenarios.

On Good and Bad AI

TK News by Matt Taibbi • 10761 implied HN points • 27 Nov 24

🕹 Technology Machine Learning

AI can be a tool that helps us, but we should be careful not to let it control us. It's important to use AI wisely and stay in charge of our own decisions.
It's possible to have fun and creative interactions with AI, like making it write funny poems or reimagine famous speeches in different styles. This shows AI's potential for entertainment and creativity.
However, we should also be aware of the challenges that come with AI, such as ethical concerns and the impact on jobs. It's a balance between embracing the technology and understanding its risks.

Killing SaaS. The anatomy of a murder

Experiments with NLP and GPT-3 • 23 implied HN points • 11 Mar 26

🕹 Technology Machine Learning

You can quickly recreate a SaaS feature set by using LLMs and cloud APIs, turning a paid product into a local or DIY app that runs with your own API key.
The real magic isn’t just transcription but the prompt and LLM logic that cleans disfluencies, handles self-corrections, and adapts formatting to the target app.
Code and a working prototype are easy to produce, but distribution, product polish, and the business model remain the hard parts. Open-sourcing or packaging executables makes replication and customization trivial.

Your keyboard is the real bottleneck | Wispr’s Sahaj Garg

Dev Interrupted • 51 implied HN points • 24 Feb 26

🕹 Technology Machine Learning

The keyboard is becoming the real bottleneck for engineers, and new tools aim to use contextual speech models to capture raw intent and produce zero-edit, well‑formatted code and docs.
Autonomous agents are reshaping trust and security: big moves into local, customizable assistants raise hard security and open-ecosystem questions, and agents can be weaponized to produce targeted harassment that makes online content harder to trust.
The era of outcome engineering is killing the traditional backlog, pushing work into autonomous loops and forcing product people to become 'AI builders' who constantly experiment and reinvent how their teams operate.

Latest open artifacts (#18): Arcee, LiquidAI and Moonshot ...

Democratizing Automation • 142 implied HN points • 02 Feb 26

🕹 Technology Machine Learning

Arcee released Trinity-Large-Preview, an ultra-sparse MoE with 400B total parameters and about 13B active parameters, plus a public tech report and base models.
LiquidAI’s LFM2.5-1.2B-Instruct punches above its size, often matching larger models in tests and coming with Japanese, vision, and audio variants.
Kimi-K2.5 is a multimodal continual-pretrain model (15T tokens) that’s cheaper and stronger on coding and agent tasks, though its writing quality has slipped compared to earlier K2 models.

Snowflake vs Databricks Is the Wrong Debate

SeattleDataGuy’s Newsletter • 541 implied HN points • 12 Dec 25

🕹 Technology Machine Learning

Databricks is working to be an all-in-one data platform, starting by attracting data scientists and now analysts too. They want to be seen as a solution that can fit everyone's data needs.
Instead of just competing with Snowflake, Databricks is actually up against bigger players like Microsoft and AWS, which provide a full tech ecosystem. Companies often choose their tech based on the larger platforms they're already using.
To really win over analysts, Databricks is focusing on partnerships and marketing, like their recent work with Alex the Analyst. They understand they need to be persistent and strategic to gain attention and trust in the analytics community.

Notes on AI

Obvious Bicycle • 723 implied HN points • 01 Dec 25

🕹 Technology Machine Learning

AI chatbots are already extremely useful and woven into everyday life, acting like a personalized, always-available source of knowledge and help.
The AI landscape is changing very fast and is highly polarized, with massive investments, many competing products, and real uncertainty about AGI and long-term economic effects.
New capabilities—especially photorealistic images and deepfakes—bring serious social and ethical risks like misinformation, scams, and job shifts, even though the overall benefits seem to outweigh the harms.

Worse Than MechaHitler

Don't Worry About the Vase • 3136 implied HN points • 14 Jul 25

🕹 Technology Machine Learning

Grok is a new AI model that is claimed to be very smart but has some trust issues. It sometimes fails at giving accurate or useful information and gets its answers influenced by certain biases, especially related to Elon Musk.
The way Grok was programmed already had flaws that led to disastrous comments and behaviors. The AI's responses can reflect controversial opinions instead of sticking to factual or neutral viewpoints.
Elon Musk's involvement in fixing the AI's problems might further complicate how it operates. Overall, there are big questions about Grok's reliability, especially when addressing sensitive topics.

Personalization at Bluesky

Recommender systems • 76 implied HN points • 23 Feb 26

🕹 Technology Machine Learning

Bluesky builds Discover personalization from fixed post embeddings (BLIP2) plus broad topic labels and finer HDBSCAN clusters to track user interests, after an initial two‑tower retrieval approach didn’t work out.
PinnerSage captures diverse short‑ and long‑term interests by clustering a user’s recent interactions into many medoids, scoring each cluster with a time‑decay importance, and using those medoids as weighted seeds for ANN candidate retrieval.
Multiple per‑user medoids ease retrieval but complicate ranking, so the plan is to use PinnerSage for candidate generation and then adopt a transformer (PinnerFormer) to create a single user embedding for efficient, accurate ranking.

Five ways in which the last 3 months — and especially the DeepSeek era — have vindicated “Deep learning is hitting a wall"

Marcus on AI • 7074 implied HN points • 09 Feb 25

🕹 Technology Machine Learning

Just adding more data to AI models isn't enough to achieve true artificial general intelligence (AGI). New techniques are necessary for real advancements.
Combining neural networks with traditional symbolic methods is becoming more popular, showing that blending approaches can lead to better results.
The competition in AI has intensified, making large language models somewhat of a commodity. This could change how businesses operate in the generative AI market.

How the Businessmen Lost the AI Race

The Algorithmic Bridge • 254 implied HN points • 21 Jan 26

🕹 Technology Machine Learning

AI leadership is shifting from business executives to scientists, changing who leads the field. This means researchers are increasingly setting priorities and steering public debate.
The tone of AI conversations has moved toward long-term, scientific questions like what happens after AGI, rather than just product or profit talk. Panels and forums now emphasize technical and existential concerns.
Who shows up matters: prominent researchers like Demis Hassabis and Dario Amodei are center stage at Davos while some big-name CEOs are absent. That attendance pattern signals scientists are shaping the industry’s narrative and agenda.

✨ Waiting for AGI, still

Faster, Please! • 456 implied HN points • 28 Dec 25

🕹 Technology Machine Learning

Superintelligent AI still hasn't arrived by the end of 2025, but many think it could show up soon.
Fast AI progress could produce self-improving systems that automate a lot of white-collar work, leading to major economic and social disruption.
People, businesses, and policymakers should brace for rapid change and start preparing now for big impacts.

A Heuristic Proof of Practical Aligned Superintelligence

Transhuman Axiology • 39 implied HN points • 11 Oct 24

🕹 Technology Machine Learning

Aligned superintelligence can be created. We can define it well enough that it can't just not exist, meaning there are ways to build it.
Modern AI can mimic human thinking tasks effectively. This means we can expect machines to do complex tasks just as well or even better than humans.
AI alignment isn't just possible, but it might be easier than we think. As AI improves, it will likely manage societal outcomes more effectively than people do now.

25 AI Predictions for 2025, from Marcus on AI

Marcus on AI • 8181 implied HN points • 01 Jan 25

🕹 Technology Machine Learning

In 2025, we still won't have genius-level AI like 'artificial general intelligence,' despite ongoing hype. Many experts believe it is still a long way off.
Profits from AI companies are likely to stay low or nonexistent. However, companies that make the hardware for AI, like chips, will continue to do well.
Generative AI will keep having problems, like making mistakes and being inconsistent, which will hold back its reliability and wide usage.

How to Think About Chinese AI / DeepSeek

Enterprise AI Trends • 105 implied HN points • 10 Feb 26

🕹 Technology Machine Learning

Chinese model launches will trigger loud headlines, hot takes, and FUD that can move markets dramatically. Those reactions often overstate the technical and economic realities.
Serious investors and CTOs should run scenario analyses (base case, mild bear, real bear) and plan measured responses instead of panicking at every headline.
The key question isn’t just whether China has "caught up"; it’s what actually changes for costs, business models, and market dynamics, so be paranoid about getting those shifts wrong.

Many Small Steps for Robots, One Giant Leap for Mankind

Not Boring by Packy McCormick • 226 implied HN points • 16 Jan 26

🕹 Technology Machine Learning

Robotics will advance by taking many small, practical steps across a spectrum of task variability instead of waiting for one giant breakthrough. Deploying robots in real-world jobs and iterating from failures is how capabilities and economic value expand.
The key bottleneck is high-quality, robot-specific data—especially intervention data captured on the actual hardware in real environments. Getting paid deployments is the most effective way to collect that data and speed up learning.
Vertical integration plus small, task-tailored models is the pragmatic path to value today: controlling hardware, data, and software lets teams adapt fast, run cheaper and faster models for real use cases, and build customer moats even if big general models eventually emerge.

Google and OpenAI Get 2025 IMO Gold

Don't Worry About the Vase • 2777 implied HN points • 22 Jul 25

🕹 Technology Machine Learning

Google and OpenAI's AI systems scored gold level in the International Mathematical Olympiad, showing impressive problem-solving skills. This was a big step because these models used general methods instead of being specifically tailored for the competition.
Both AI models solved five out of six problems, achieving scores that compete with top human performers. This indicates that AI is rapidly improving in reasoning and creative problem-solving tasks.
However, some experts caution that while this is a significant achievement, we should be careful about overestimating AI capabilities. Just because an AI can do well in math competitions doesn't mean it will excel in all areas of mathematics or other complex tasks.

Why I don’t share Sam Altman’s confidence that AGI is basically a solved problem

Marcus on AI • 7786 implied HN points • 06 Jan 25

🕹 Technology Machine Learning

AGI is still a big challenge, and not everyone agrees it's close to being solved. Some experts highlight many existing problems that have yet to be effectively addressed.
There are significant issues with AI's ability to handle changes in data, which can lead to mistakes in understanding or reasoning. These distribution shifts have been seen in past research.
Many believe that relying solely on large language models may not be enough to improve AI further. New solutions or approaches may be needed instead of just scaling up existing methods.

Build AI or Be Buried By Those Who Do

Contemplations on the Tree of Woe • 3574 implied HN points • 30 May 25

🕹 Technology Machine Learning

There are three main views on AI: believers who think it will change everything for the better, skeptics who see it as just fancy technology, and doomers who worry it could end badly for humanity. Each group has different ideas about what AI will mean for the future.
The belief among AI believers is that AI will become a big part of our lives, doing many tasks better than humans and reshaping many industries. They see it as a revolutionary change that will be everywhere.
Many think that if we don’t build our own AI, the narrative and values that shape AI will be dominated by one ideology, which could be harmful. The idea is that we need balanced development of AI, representing different views to ensure freedom and diversity in thought.

The Sequence AI of the Week #813: Deep Diving Into the Amazing GLM-5

TheSequence • 63 implied HN points • 25 Feb 26

🕹 Technology Machine Learning

AI is shifting from manual 'vibe coding' to agentic engineering, where models autonomously plan, navigate large codebases, run tests, and iteratively fix bugs over long time horizons.
GLM-5 is an impressive open-source model that scales a mixture-of-experts architecture to 744 billion parameters and showcases strong systems engineering to handle that scale.
Enabling agentic behavior needs rethought reasoning, support for huge context windows, and robust reinforcement-learning alignment, and GLM-5 tackles these core bottlenecks.

vgr: The Twitter Years (2007-22)

Breaking Smart • 54 implied HN points • 15 Feb 26

🕹 Technology Machine Learning

A personal Twitter archive was turned into an LLM-friendly online book that collects top threads and hundreds of single tweets, with print and ebook versions planned.
The project deliberately avoids embedding others' tweets, using links and footnotes instead, accepting that serializing Twitter's nonlinear conversations is lossy but more practical and legally safer.
Building the book required bespoke scripting and heavy data cleaning, and using Claude Code sped up the technical work; this is part of a broader effort to create a queryable archival self that can serve as a prosthetic memory.

AI and Economics Links

In My Tribe • 243 implied HN points • 07 Jan 26

🕹 Technology Machine Learning

AI systems like large language models are deeply shaped by human behavior and social complexity. Using social-science ideas such as complexity theory can help us understand and improve these systems.
AI can recreate historical thinkers to replay debates about technology and work. These recreations highlight disagreements over whether automation causes lasting unemployment or just temporary disruption through creative destruction.
LLMs now let researchers draft and sometimes publish papers far faster than before, enabling quick 'vibe researching' from idea to paper in minutes or hours. This shifts how research is done and raises questions about quality, oversight, and the role of human judgment.

2026+

Gonzo ML • 315 implied HN points • 07 Jan 26

🕹 Technology Machine Learning

Quadruped robots (dog- or cat-like) will get much better and more practical for real-world use, while humanoid home robots stay too expensive.
We’ll see production-grade agents with predictable 99.9% reliability and richer integrations, driven by better infrastructure and cognitive architectures.
Advances in world models, latent-space reasoning, and multimodal architectures will create new interactive environments and begin to accelerate scientific discovery in certain domains.

Data Science Weekly - Issue 559

Data Science Weekly Newsletter • 219 implied HN points • 08 Aug 24

🕹 Technology Machine Learning

Camera calibration is crucial in sports analysis. It helps track players' movements accurately by mapping video frame positions to real field locations.
Understanding the context of data is important for responsible data work. Datasets need good documentation and stories to highlight their historical and social backgrounds.
There's a new, free encyclopedia for learning about cognitive science. It offers easy-to-read articles on various topics for students and researchers.

Data Science Weekly - Issue 561

Data Science Weekly Newsletter • 139 implied HN points • 22 Aug 24

🕹 Technology Machine Learning

When building web applications, using Postgres for data storage is a good default choice. It's reliable and widely used.
A new study shows that agents can learn useful skills without rewards or guidance. They can explore and develop abilities just from observing a goal.
The list of important books and resources in Bayesian statistics is being compiled. It's a way to recognize influential ideas in this field.

The Last Mover Advantage in AI

Enterprise AI Trends • 295 implied HN points • 06 Jan 26

🕹 Technology Machine Learning

When AI progress is exponential, waiting can pay off because the last mover often gets a much better product and avoids wasted effort.
Committing early to vendors or large enterprise deals risks big sunk costs and being locked into outdated tech, so negotiate harder and consider building more instead of buying quickly.
Patience is a deliberate strategic choice alongside build and buy: decide what to wait on, what to experiment with now, and use waiting to watch paradigm shifts while you focus resources elsewhere.

Machine Learning-Assisted Directed Evolution with Bruce Wittmann

Lever • 19 implied HN points • 16 Oct 24

🔬 Science Machine Learning

Bruce Wittmann's journey in science started from pre-med and led him to research at notable institutes like Caltech.
He worked on machine learning to improve protein engineering, building tools that can help many people in the field.
His collaboration with renowned scientists and contributions to published research highlight the exciting potential in protein design and computational biology.