The hottest LLMs Substack posts right now

And their main takeaways

Something weird is happening with LLMs and chess

DYNOMIGHT INTERNET NEWSLETTER • 1515 implied HN points • 14 Nov 24

🕹 Technology AI LLMs Chess Gaming Research

Large language models (LLMs) can somewhat play chess, but they struggle after the opening moves. They were not specifically designed for chess, yet they can manage to play using their text training.
The performance of different language models varies significantly when playing chess. Some models like 'gpt-3.5-turbo-instruct' excel at it, while others perform very poorly.
It seems that focusing on instruction tuning can make LLMs worse at chess, suggesting that training style impacts their ability to play games effectively.

OK, I can partly explain the LLM chess weirdness now

DYNOMIGHT INTERNET NEWSLETTER • 796 implied HN points • 21 Nov 24

🕹 Technology AI LLMs Machine Learning Data science Chess

LLMs like `gpt-3.5-turbo-instruct` can play chess well, but most other models struggle. Using specific prompts can improve their performance.
Providing legal moves to LLMs can actually confuse them. Instead, repeating the game before making a move helps them make better decisions.
Fine-tuning and giving examples both improve chess performance for LLMs, but combining them may not always yield the best results.

Getting 50% (SoTA) on ARC-AGI with GPT-4o

Redwood Research blog • 285 HN points • 17 Jun 24

🕹 Technology AI Machine Learning Benchmarking LLMs

Achieving a 50% accuracy on the ARC-AGI dataset using GPT-4o involved generating a large number of Python programs and selecting the correct ones based on examples.
Key approaches included meticulous step-by-step reasoning prompts, revision of program implementations, and feature engineering for better grid representations.
Further improvements in performance were noted to be possible by increasing runtime compute, following clear scaling laws, and fine-tuning GPT models for better understanding of grid representations.

LLM Links, 1/16

In My Tribe • 212 implied HN points • 16 Jan 25

🕹 Technology Robotics LLMs

A school in Arizona is using AI as the only teachers for a new educational model. This approach aims to tailor lessons to students' needs and allow more time for personal interests.
Robots still struggle with tasks that are easy for humans, like picking up objects. This shows that achieving true artificial general intelligence is still a long way off.
Using chatbots like ChatGPT can help with everyday problems, like homework. By asking the right questions, you can get creative suggestions that you might not think of on your own.

Expanding AI Horizons: The Rise of Function Calling in LLMs

Gradient Flow • 279 implied HN points • 25 Jan 24

🕹 Technology AI Machine Learning LLMs Data

Function Calling in AI enables models to interact with external functions, going beyond basic text generation to execute actions based on requests.
Combining Retrieval Augmented Generation (RAG) with Function Calling enhances AI systems, allowing them to access external APIs to improve adaptability and assist in various tasks.
Despite its potential, Function Calling in AI faces challenges like security risks, ethical alignment, technical limitations, and the need for advancements in contextual understanding for full potential realization.

Get a weekly roundup of the best Substack posts, by hacker news affinity:

The Q* hypothesis: Tree-of-thoughts reasoning, process reward models, and supercharging synthetic data

Democratizing Automation • 672 implied HN points • 24 Nov 23

🕹 Technology AI LLMs Synthetic Data

Q* hypothesis involves tree-of-thoughts reasoning and process reward models for supercharging synthetic data
The method combines self-play and look-ahead planning for language models
Process Reward Models (PRMs) emphasize scoring each step of reasoning rather than the entire message

Falcon: The Pinnacle of Open-Source LLMs

Deep (Learning) Focus • 235 implied HN points • 10 Jul 23

🕹 Technology Open Source LLMs

The Falcon models represent a significant advancement in open-source LLMs, rivaling proprietary models in quality and performance.
The creation of the RefinedWeb dataset showcases the potential of utilizing web data at a massive scale for LLM pre-training, leading to highly performant models like Falcon.
Falcon-40B, when compared to other LLMs, stands out for its impressive performance, efficient architecture modifications, and commercial usability.

Results from poll #3 (and updates)

The Counterfactual • 59 implied HN points • 11 Apr 24

🕹 Technology AI LLMs Research Education Cognition

Tokenization won the recent poll, so there will be an in-depth explainer about it soon. This will help people understand how tokenization works in large language models.
The visual reasoning task was a close second, so it might come up in the next poll for more ideas. This shows there is interest in how models think visually.
There are updates about recent publications and discussions on related topics in AI and psychology. These will be shared in upcoming posts, expanding on interesting research topics.

Orca: Properly Imitating Proprietary LLMs

Deep (Learning) Focus • 176 implied HN points • 26 Jun 23

🕹 Technology LLMs Deep Learning Open Source Evaluation

Imitation models need a large and comprehensive dataset to perform well.
Enhancing imitation learning with detailed explanation traces can significantly improve model performance.
Orca showcases the effectiveness of learning from more complex instruction datasets and detailed explanations.

What Should Businesses Be Doing With Generative AI? High-Value Use Cases With Short Delivery Timelines.

High ROI Data Science • 59 implied HN points • 08 Feb 24

🕹 Technology AI Generative AI LLMs Startups

Businesses can utilize Generative AI for Customer Intent Detection.
Building specific use cases is more effective than broad functionality.
Avoid common myths like relying solely on chatbots with LLMs.

Graph Neural Networks in Tensorflow

MLOps Newsletter • 39 implied HN points • 10 Feb 24

🕹 Technology LLMs Chatbots Frameworks Models

Graph Neural Networks in TensorFlow address data complexity, limited resources, and generalizability in learning from graph-structured data.
RadixAttention and Domain-Specific Language (DSL) are key solutions for efficiently controlling Large Language Models (LLMs), reducing memory usage, and providing a user-friendly interface.
VideoPoet demonstrates hierarchical LLM architecture for zero-shot learning, handling multimodal input, and generating various output formats in video generation tasks.

Inside LangChain: The Super Popular LLM Framework You Need to Know About

TheSequence • 294 implied HN points • 13 Apr 23

🕹 Technology Frameworks Software Development Open Source LLMs Integration

LangChain integrates LLMs into mainstream software development lifecycles.
LLMs are powerful when integrated with other sources of computation or knowledge.
LangChain is an open-source framework addressing challenges of using LLMs effectively.

T-RAG = RAG + Fine-Tuning + Entity Detection

Cobus Greyling on LLMs, NLU, NLP, chatbots & voicebots • 19 implied HN points • 15 Feb 24

🕹 Technology AI LLMs Data Privacy Software Development Machine Learning

T-RAG is a method that combines RAG architecture with fine-tuned language models and an entity detection system for better information retrieval. This approach helps in answering questions more accurately by focusing on relevant context.
Data privacy is crucial when using language models for sensitive documents, so it's better to use open-source models that can be hosted on-premise instead of public APIs. This helps prevent any risk of leaking private information.
The model uses an entities tree to improve context when processing queries, ensuring relevant entity information is included in the responses. This makes the answers more useful and comprehensive for the user.

It is still early for open-source AI

John’s Contemplations • 39 implied HN points • 29 Jul 23

🕹 Technology AI Open Source LLMs Models Infrastructure

There is optimism about open-source AI catching up to closed-source in the future.
Open-source AI faces challenges like small model sizes and infrastructure limitations.
Customization is a key advantage of open-source AI over closed-source models.

Levels of AGI & Autonomy

Yuxi’s Substack • 19 implied HN points • 14 Nov 23

🕹 Technology AI Autonomy AGI LLMs

DeepMind published a paper on levels of AGI and autonomy.
Current large language models are far from being Superhuman AGI.
AI as an agent is still a distant concept due to current technology limitations.

A Not-So-Quick Overview of the Chinese AI Scene

State Space Adventures • 2 HN points • 30 May 24

🕹 Technology Chatbots LLMs Competitive Landscape

The Chinese AI scene is highly competitive, with companies developing advanced models at a rapid pace to outdo each other.
Chinese AI companies are engaging in a pricing war to make their models more accessible, leading to reduced costs and free versions of top models.
Chinese tech giants like Baidu, Tencent, Alibaba, and ByteDance are investing in AI development and competing against each other in the chatbot space.

The problem with how we evaluate LLMs

Conrado Miranda • 2 HN points • 28 May 24

🕹 Technology LLMs Evaluation Language Models Research

Evaluating Large Language Models (LLMs) can be challenging, especially with traditional off-the-shelf metrics not always being suitable for broader LLM applications.
Using an LLM-as-a-judge method for evaluation can provide insights, but there's a risk of over-reliance on the black-box model, leading to potential lack of understanding on improvements.
Creating clear, specific evaluation criteria and considering use cases are crucial. Auto-criteria, like auto-prompting, may be future tools to enhance LLM evaluations.

These Are The Challenges When Creating A LLM Based Conversational Interface

Cobus Greyling on LLMs, NLU, NLP, chatbots & voicebots • 19 implied HN points • 01 Mar 23

🕹 Technology AI LLMs Chatbots NLP Applications

Creating conversational interfaces with language learning models (LLMs) is tricky because the responses can be very different each time. This makes it hard to keep conversations flowing smoothly.
If you change something small in the middle of a conversation, it can mess up everything that comes after. This makes planning the conversation a bit complicated.
As these chatbots get more complex, we can use groups of connected steps to manage the conversation better. Future tools might make it easier for people to design these conversations without coding.

Microsoft's New Future of Work Report

Engineering Enablement • 15 implied HN points • 02 Feb 24

🕹 Technology AI Software Engineering Developer Productivity AI Tools LLMs

AI can increase developer productivity but benefits vary based on tasks
Writing effective prompts for AI tools is challenging and can lead to overreliance
Less experienced workers benefit the most from AI tools like LLMs

Which is the Wokest AI?

Rozado’s Visual Analytics • 2 HN points • 26 Feb 24

🕹 Technology AI Wokeness LLMs Machine Learning

There are AI models being tested on their 'Wokeness' based on various dimensions like Social justice and Climate Sustainability.
Google's Gemini is not the most 'Woke' AI, with other companies having developed even more 'Woke' AIs.
Experimental fine-tuned AI models like LeftWing GPT and Depolarizing GPT have been created for specific ideological alignments.

What is Meta's CodeCompose?

Machine Economy Press • 3 implied HN points • 07 Jun 23

🕹 Technology AI Programming LLMs

Meta's CodeCompose is a powerful tool using language models for code suggestions in various programming languages like Python.
CodeCompose has high user acceptance rates and positive feedback within Meta, enhancing code authoring and encouraging good coding practices.
The competitive landscape for language models in coding tools is evolving rapidly with advancements from tech giants like Google, Meta, and Amazon.

20230416

Curious futures (KGhosh) • 4 implied HN points • 16 Apr 23

🕹 Technology Society Tech DIY LLMs

Constantly think about the services you provide and where they fit in the hierarchy of ideas.
Stay updated on various society, tech, DIY, LLM, and People and AI topics.
Luxury brands thrive on impeccable service, repairs, and customer service in times of need.

Agentic Search-Augmented Factuality Evaluator (SAFE) For LLMs

Cobus Greyling on LLMs, NLU, NLP, chatbots & voicebots • 0 implied HN points • 05 Apr 24

🕹 Technology AI NLP LLMs Automation Research

The Agentic Search-Augmented Factuality Evaluator (SAFE) is designed to check the facts in long-form texts. It breaks down responses into smaller facts to evaluate them more accurately.
SAFE is cheaper and faster than using human annotators. It costs about 19 cents per evaluation compared to 4 dollars when relying on people.
Google Search is used by SAFE to find current information for checking facts, making sure the evaluations are accurate and up-to-date.

Death by RAG Evals

ScaleDown • 0 implied HN points • 31 Jan 24

🕹 Technology AI Evaluation LLMs Metrics Costs

Evaluating RAG (Retrieval-Augmented Generation) systems is challenging due to the need for assessing accuracy, relevance, and context retrieval.
Human annotation is accurate but time-consuming, error-prone, and not suitable for real-time systems.
The evaluation process for RAG systems can be resource-intensive, time-consuming, and costly, impacting latency and efficiency.

Now You Can Toggle OpenAI Model Determinism

Cobus Greyling on LLMs, NLU, NLP, chatbots & voicebots • 0 implied HN points • 13 Nov 23

🕹 Technology AI NLP LLMs Chatbots Software Development

OpenAI now lets you control whether their model gives consistent answers to the same questions. This means if you ask it something more than once, you'll get the same answer each time.
This feature is useful for testing and debugging, where you need to see the same response to know the system is working correctly.
To get the same output consistently, you need to set a 'seed' number in your request. Make sure to keep the other settings the same each time you ask.

The Counterfactual's poll #4

The Counterfactual • 0 implied HN points • 13 May 24

🕹 Technology AI LLMs Readability Tokenization Language

Subscribers can vote on topics each month for future posts. This means readers have a say in what gets discussed.
Past post topics have included readability and tokenization in language models. These topics show a focus on language and technology.
There’s a free trial offered for new subscribers. People can explore content before committing to a paid subscription.

Reflections on student research projects

Solresol • 0 implied HN points • 27 May 24

🕹 Technology AI Computer Vision Blockchain LLMs Quantum Computing

Many students in the cohort did not train their own computer vision models, instead relying on prompting AI models which proved to be inefficient and not very accurate.
Explainability of results was emphasized in the research projects, with students looking into explaining their models' outcomes.
The compatibility of blockchains with quantum computers is uncertain due to the vulnerability of traditional encryption methods to quantum breaking, leading to ongoing research on solutions.

Stanford Professor Tatsu Hashimoto on AI Biases and Improving LLM Performance

Unsupervised Learning • 0 implied HN points • 10 Jul 23

🕹 Technology AI Academia LLMs Bias Research

Training high-quality LLMs requires balancing data and compute costs
Understanding different components of LLM training process impacts model performance
Academia plays a crucial role in adding diverse opinions to language models

Investment Review #1 - AI Portfolio Allocations for Q4 2023

e/alpha • 0 implied HN points • 05 Jan 24

💰 Finance Investment Portfolio AI Chips GPU LLMs

The AI portfolio performance for Q4 2023 was impressive, outperforming the S&P 500 with an IRR of 95%.
Investing in AI chips continues to be a promising choice, but there are concerns about the speed of commercialization and potential pitfalls.
The future of LLMs (Large Language Models) is uncertain, but GPU investments are expected to stay strong until more clarity emerges.