Yuxi’s Substack

Yuxi's Substack focuses on the intricacies and challenges within the field of artificial intelligence, particularly around the concepts of agents, reinforcement learning, large language models (LLMs), artificial general intelligence (AGI), and the evolving norms in AI research and application. It examines vulnerabilities, the significance of human alignment, the role of AI Stores, and the future of AI through a critical and analytical lens.

Artificial Intelligence Reinforcement Learning Large Language Models Artificial General Intelligence AI Research and Applications Human-AI Interaction Machine Learning Blockchain Technology

The hottest Substack posts of Yuxi’s Substack

And their main takeaways

Q*, Reinforcement Learning and Search

58 implied HN points • 24 Nov 23

🕹 Technology AI Reinforcement Learning Search Language Models Machine Learning

Q* represents the optimal Q value in reinforcement learning integrating learning and search.
Reinforcement learning helps an agent learn a policy to maximize long-term rewards through interactions with the environment.
RL for LLMs combines learning and search techniques for next-generation language models.

Agent: What, Why, How.

58 implied HN points • 31 Aug 23

🕹 Technology AI Machine Learning Models Planning Agents

An agent in AI is the learner and decision maker.
Agents need planning capacity to be effective.
Agents are built with data and/or models to make decisions.

Will AGI Emerge from Large Language Models?

58 implied HN points • 28 Feb 23

🕹 Technology AI Neural Networks Machine Learning AGI Language Models

AGI, or Artificial General Intelligence, is a major goal in the field of AI.
Language models like GPT-3 have shown impressive abilities but still lack full functional competence.
Approaching AGI through large language models may involve integrating language processing with perception, reasoning, and planning.

Over-claim then Correct: A New Norm of Research in the Era of LLMs

39 implied HN points • 24 Oct 23

🔬 Science Research Language Models Academia

In the era of LLMs, it's common to make bold claims then correct them later.
Academia involves groups making over-claims and others providing corrections for iterative improvement.
Industry should exercise caution in the midst of these evolving research norms.

Will synthetic data help?

19 implied HN points • 24 Nov 23

🕹 Technology AI Systems Language Models Training Data Reinforcement Learning

A perfect model can create high-quality data to build strong AI, like AlphaZero - AIZero
Without a perfect model, gathering high-quality data is essential for competent AI - AI∞ or AIx
It is important to start AI systems with ground truth data and work towards bridging the gap between simulation and reality

Get a weekly roundup of the best Substack posts, by hacker news affinity:

Levels of AGI & Autonomy

19 implied HN points • 14 Nov 23

🕹 Technology AI Autonomy AGI LLMs

DeepMind published a paper on levels of AGI and autonomy.
Current large language models are far from being Superhuman AGI.
AI as an agent is still a distant concept due to current technology limitations.

AI Stores

19 implied HN points • 15 Feb 23

🕹 Technology AI Machine Learning Deep Learning Reinforcement Learning Artificial General Intelligence

We are entering the era of AI Stores.
An AI Store provides general AI capabilities like drafting emails, drawing, and suggesting software code.
Contributing to or benefiting from AI Stores can range from being a customer to fine-tuning models based on resources.

Ground-truth-in-the-loop

19 implied HN points • 18 Jul 23

🕹 Technology AI Machine Learning Data Systems Models

Ground-truth-in-the-loop is crucial for designing and evaluating systems, especially in AI and machine learning.
For AI systems, having trustworthy training data, evaluation feedback, and a reliable world model is essential.
Researchers should inform non-experts about limitations and potential issues when building systems without ground-truth.

AI is still (very) vulnerable

19 implied HN points • 28 Jul 23

🕹 Technology AI Vulnerability Language Models Adversarial Attacks

AI programs, even like AlphaGo, can still be exploited.
Language models are not perfect and are easily exploited.
Recent research shows vulnerabilities in various language models.

Human alignment is very hard

19 implied HN points • 04 Sep 23

🔬 Science Ethics Optimization Human feedback Reinforcement Learning

Human alignment is very challenging and complex.
Human alignment involves multiple facets and perspectives.
Balancing trade-offs among various factors is crucial in addressing the human alignment problem.

Where is the boundary for large language models?

19 implied HN points • 12 Mar 23

🕹 Technology AI Language Models Ethics Reinforcement Learning Model development

The boundary for large language models involves considerations of grounding, embodiment, and social interaction.
Language models are transitioning towards incorporating agency and reinforcement learning methods for better performance.
AI Stores may potentially lead to AI models providers encroaching on the territories of downstream model users.

Reinforcement learning is all you need, for next generation language models.

5 HN points • 04 May 23

🕹 Technology Language Models Neural Networks

Iterative improvements from feedback are crucial for language models.
Reinforcement learning is the ideal framework for learning from interactions.
Reinforcement learning is essential for the advancement of next-generation language models.

Study Material for Reinforcement Learning

0 implied HN points • 24 Nov 23

🚌 Education Reinforcement Learning

Key resources for studying Reinforcement Learning include classic courses by David Silver and textbooks by Sutton & Barto
Online platforms like OpenAI Spinning Up and Coursera offer specialized courses on Reinforcement Learning
Advanced resources like DeepMind's lecture series and UC Berkeley's Deep RL course provide in-depth knowledge on the subject

How would Deepmind Gemini work?

0 implied HN points • 08 Nov 23

🕹 Technology AI Machine Learning Deep Learning Robotics Reinforcement Learning

Deepmind is working on multimodality, embodiment, and interaction in addition to language models.
Iterative improvements from feedback are crucial for building successful systems and bridging gaps.
Deepmind is exploring deep reinforcement learning in language models, but its deployment in Gemini is uncertain.

Blockchains Require Dramatic Innovations to Prosper

0 implied HN points • 25 Sep 23

🕹 Technology Blockchain Cryptocurrency AI Innovation Computing

Blockchains need killer apps for true revolution.
The current blockchain practice is misaligned with its goals.
Blockchain is an innovation in information and computation technology.

Autonomous agent is a BIG bubble

0 implied HN points • 23 Jul 23

🕹 Technology AI Machine Learning Autonomous Agents Robotics

Autonomous agent is still an open problem in AI, especially with current language models lacking agency and planning
Approximate models like current LMs can cause issues in tasks such as generating legal moves in games
Even games AI like AlphaGo, while strong, can be exploitable before reaching optimal performance

AGI is a wrong goal

0 implied HN points • 17 Jul 23

🕹 Technology AI Optimization Learning

Multi-objective optimization can't optimize all objectives.
More and tighter constraints make finding a solution harder.
Tasks should be divided into groups and handled collaboratively, rather than expecting a single model to do everything.

Coming soon

0 implied HN points • 13 Feb 23

🚌 Education

Yuxi Li has a Substack newsletter coming soon
The Substack link is yuxili.substack.com
The newsletter is from Yuxi Li and will be available soon

RL(HF) Helps LMs

0 implied HN points • 23 Jul 23

🕹 Technology AI Language Models Reinforcement Learning Human feedback

Reinforcement learning from human feedback helps with human value alignment in language models.
Direct Preference Optimization (DPO) can optimize preference directly without using reward modeling or reinforcement learning.
There are various methods, like TAMER, to handle human preference and alignment in language models beyond DPO.