Yuxi’s Substack

Yuxi's Substack focuses on the intricacies and challenges within the field of artificial intelligence, particularly around the concepts of agents, reinforcement learning, large language models (LLMs), artificial general intelligence (AGI), and the evolving norms in AI research and application. It examines vulnerabilities, the significance of human alignment, the role of AI Stores, and the future of AI through a critical and analytical lens.

Artificial Intelligence Reinforcement Learning Large Language Models Artificial General Intelligence AI Research and Applications Human-AI Interaction Machine Learning Blockchain Technology

The hottest Substack posts of Yuxi’s Substack

And their main takeaways
19 implied HN points 14 Nov 23
  1. DeepMind published a paper on levels of AGI and autonomy.
  2. Current large language models are far from being Superhuman AGI.
  3. AI as an agent is still a distant concept due to current technology limitations.
Get a weekly roundup of the best Substack posts, by hacker news affinity:
19 implied HN points 18 Jul 23
  1. Ground-truth-in-the-loop is crucial for designing and evaluating systems, especially in AI and machine learning.
  2. For AI systems, having trustworthy training data, evaluation feedback, and a reliable world model is essential.
  3. Researchers should inform non-experts about limitations and potential issues when building systems without ground-truth.
0 implied HN points 24 Nov 23
  1. Key resources for studying Reinforcement Learning include classic courses by David Silver and textbooks by Sutton & Barto
  2. Online platforms like OpenAI Spinning Up and Coursera offer specialized courses on Reinforcement Learning
  3. Advanced resources like DeepMind's lecture series and UC Berkeley's Deep RL course provide in-depth knowledge on the subject
0 implied HN points 17 Jul 23
  1. Multi-objective optimization can't optimize all objectives.
  2. More and tighter constraints make finding a solution harder.
  3. Tasks should be divided into groups and handled collaboratively, rather than expecting a single model to do everything.
0 implied HN points 13 Feb 23
  1. Yuxi Li has a Substack newsletter coming soon
  2. The Substack link is yuxili.substack.com
  3. The newsletter is from Yuxi Li and will be available soon
0 implied HN points 23 Jul 23
  1. Reinforcement learning from human feedback helps with human value alignment in language models.
  2. Direct Preference Optimization (DPO) can optimize preference directly without using reward modeling or reinforcement learning.
  3. There are various methods, like TAMER, to handle human preference and alignment in language models beyond DPO.