The hottest LLMs Substack posts right now

And their main takeaways
Category
Top Technology Topics
Marcus on AI • 13872 implied HN points • 08 Mar 26
  1. Commercial AI leaders often use hype to raise money, overpromise on AGI timelines, and prioritize growth over clear accountability.
  2. Using large language models in high‑stakes settings like military targeting can cause deadly errors, and putting humans 'in the loop' doesn’t stop mistakes when operators are overloaded or overtrust the AI.
  3. Companies claim to care about safety but sometimes abandon pledges, rely on dubious training practices like scraping copyrighted work, and push fragile, hard‑to‑secure agent systems that create real negative side effects.
Don't Worry About the Vase • 1792 implied HN points • 24 Feb 26
  1. Sonnet 4.6 is a faster, cheaper Claude model that gets close to Opus 4.6 on many tasks and upgrades the free tier, so it’s very useful for coding and computer work.
  2. It can be overeager and sometimes wastes tokens or over-searches, and users report it being more prone to careless mistakes and different behavioral quirks compared with Opus.
  3. Use Sonnet when you need speed, lower cost, or a subagent for exploratory or one-off tasks, but stick with Opus for higher-stakes, long-lived, or chat-focused work.
Democratizing Automation • 364 implied HN points • 05 Mar 26
  1. Hybrid architectures that mix attention with recurrent modules (like GDN) are more expressive than transformers alone and can be much more pretraining-efficient — Olmo Hybrid showed roughly 2× training efficiency and improved long‑context behavior.
  2. Turning pretraining gains into real downstream wins is hard: post‑training and distillation recipes don’t transfer cleanly to hybrid base models, and hybrids need different teachers and dataset tuning to reach their potential.
  3. Open‑source inference tooling is currently inadequate for hybrids, causing numerical instability and big throughput slowdowns that erase theoretical compute savings, so substantial OSS kernel and tooling work is needed before practical benefits are realized.
Redwood Research blog • 285 HN points • 17 Jun 24
  1. Achieving a 50% accuracy on the ARC-AGI dataset using GPT-4o involved generating a large number of Python programs and selecting the correct ones based on examples.
  2. Key approaches included meticulous step-by-step reasoning prompts, revision of program implementations, and feature engineering for better grid representations.
  3. Further improvements in performance were noted to be possible by increasing runtime compute, following clear scaling laws, and fine-tuning GPT models for better understanding of grid representations.
DYNOMIGHT INTERNET NEWSLETTER • 1515 implied HN points • 14 Nov 24
  1. Large language models (LLMs) can somewhat play chess, but they struggle after the opening moves. They were not specifically designed for chess, yet they can manage to play using their text training.
  2. The performance of different language models varies significantly when playing chess. Some models like 'gpt-3.5-turbo-instruct' excel at it, while others perform very poorly.
  3. It seems that focusing on instruction tuning can make LLMs worse at chess, suggesting that training style impacts their ability to play games effectively.
Get a weekly roundup of the best Substack posts, by hacker news affinity:
TheSequence • 21 implied HN points • 05 Feb 26
  1. For years AI advanced by scaling up pre-training—more data, bigger models, and huge GPU time to bake capabilities into fixed weights.
  2. Test-time compute flips that idea by letting models use extra computation during inference to reason, plan, backtrack, and self-correct—basically "letting the model think."
  3. The big implication is that model performance depends not just on training compute but also on how much compute is allowed at inference, changing tradeoffs for how we build and deploy AI.
Gradient Flow • 279 implied HN points • 25 Jan 24
  1. Function Calling in AI enables models to interact with external functions, going beyond basic text generation to execute actions based on requests.
  2. Combining Retrieval Augmented Generation (RAG) with Function Calling enhances AI systems, allowing them to access external APIs to improve adaptability and assist in various tasks.
  3. Despite its potential, Function Calling in AI faces challenges like security risks, ethical alignment, technical limitations, and the need for advancements in contextual understanding for full potential realization.
davidj.substack • 23 implied HN points • 13 Jan 26
  1. AGI means an AI that can learn many different tasks and perform many things at least as well as a typical human — it doesn't require sentience or being a superintelligence.
  2. Progress toward AGI will rely more on post-training learning: agents that can learn after deployment, retain skills, and build or use tools, rather than just bigger pretraining runs.
  3. Narrow AGI will appear in specific domains soon via agents that learn and share useful skills while keeping private data local, but these systems will still have clear limits and won't replace all human abilities.
DYNOMIGHT INTERNET NEWSLETTER • 796 implied HN points • 21 Nov 24
  1. LLMs like `gpt-3.5-turbo-instruct` can play chess well, but most other models struggle. Using specific prompts can improve their performance.
  2. Providing legal moves to LLMs can actually confuse them. Instead, repeating the game before making a move helps them make better decisions.
  3. Fine-tuning and giving examples both improve chess performance for LLMs, but combining them may not always yield the best results.
Deep (Learning) Focus • 235 implied HN points • 10 Jul 23
  1. The Falcon models represent a significant advancement in open-source LLMs, rivaling proprietary models in quality and performance.
  2. The creation of the RefinedWeb dataset showcases the potential of utilizing web data at a massive scale for LLM pre-training, leading to highly performant models like Falcon.
  3. Falcon-40B, when compared to other LLMs, stands out for its impressive performance, efficient architecture modifications, and commercial usability.
The Counterfactual • 59 implied HN points • 11 Apr 24
  1. Tokenization won the recent poll, so there will be an in-depth explainer about it soon. This will help people understand how tokenization works in large language models.
  2. The visual reasoning task was a close second, so it might come up in the next poll for more ideas. This shows there is interest in how models think visually.
  3. There are updates about recent publications and discussions on related topics in AI and psychology. These will be shared in upcoming posts, expanding on interesting research topics.
Deep (Learning) Focus • 176 implied HN points • 26 Jun 23
  1. Imitation models need a large and comprehensive dataset to perform well.
  2. Enhancing imitation learning with detailed explanation traces can significantly improve model performance.
  3. Orca showcases the effectiveness of learning from more complex instruction datasets and detailed explanations.
In My Tribe • 212 implied HN points • 16 Jan 25
  1. A school in Arizona is using AI as the only teachers for a new educational model. This approach aims to tailor lessons to students' needs and allow more time for personal interests.
  2. Robots still struggle with tasks that are easy for humans, like picking up objects. This shows that achieving true artificial general intelligence is still a long way off.
  3. Using chatbots like ChatGPT can help with everyday problems, like homework. By asking the right questions, you can get creative suggestions that you might not think of on your own.
MLOps Newsletter • 39 implied HN points • 10 Feb 24
  1. Graph Neural Networks in TensorFlow address data complexity, limited resources, and generalizability in learning from graph-structured data.
  2. RadixAttention and Domain-Specific Language (DSL) are key solutions for efficiently controlling Large Language Models (LLMs), reducing memory usage, and providing a user-friendly interface.
  3. VideoPoet demonstrates hierarchical LLM architecture for zero-shot learning, handling multimodal input, and generating various output formats in video generation tasks.
Cobus Greyling on LLMs, NLU, NLP, chatbots & voicebots • 19 implied HN points • 15 Feb 24
  1. T-RAG is a method that combines RAG architecture with fine-tuned language models and an entity detection system for better information retrieval. This approach helps in answering questions more accurately by focusing on relevant context.
  2. Data privacy is crucial when using language models for sensitive documents, so it's better to use open-source models that can be hosted on-premise instead of public APIs. This helps prevent any risk of leaking private information.
  3. The model uses an entities tree to improve context when processing queries, ensuring relevant entity information is included in the responses. This makes the answers more useful and comprehensive for the user.
State Space Adventures • 2 HN points • 30 May 24
  1. The Chinese AI scene is highly competitive, with companies developing advanced models at a rapid pace to outdo each other.
  2. Chinese AI companies are engaging in a pricing war to make their models more accessible, leading to reduced costs and free versions of top models.
  3. Chinese tech giants like Baidu, Tencent, Alibaba, and ByteDance are investing in AI development and competing against each other in the chatbot space.
Conrado Miranda • 2 HN points • 28 May 24
  1. Evaluating Large Language Models (LLMs) can be challenging, especially with traditional off-the-shelf metrics not always being suitable for broader LLM applications.
  2. Using an LLM-as-a-judge method for evaluation can provide insights, but there's a risk of over-reliance on the black-box model, leading to potential lack of understanding on improvements.
  3. Creating clear, specific evaluation criteria and considering use cases are crucial. Auto-criteria, like auto-prompting, may be future tools to enhance LLM evaluations.
Cobus Greyling on LLMs, NLU, NLP, chatbots & voicebots • 19 implied HN points • 01 Mar 23
  1. Creating conversational interfaces with language learning models (LLMs) is tricky because the responses can be very different each time. This makes it hard to keep conversations flowing smoothly.
  2. If you change something small in the middle of a conversation, it can mess up everything that comes after. This makes planning the conversation a bit complicated.
  3. As these chatbots get more complex, we can use groups of connected steps to manage the conversation better. Future tools might make it easier for people to design these conversations without coding.
Curious futures (KGhosh) • 4 implied HN points • 16 Apr 23
  1. Constantly think about the services you provide and where they fit in the hierarchy of ideas.
  2. Stay updated on various society, tech, DIY, LLM, and People and AI topics.
  3. Luxury brands thrive on impeccable service, repairs, and customer service in times of need.