The hottest AI Alignment Substack posts right now

Weekly open thread for discussions and questions on various topics.
AI art generators still have room for improvement in handling tough compositionality requests.
Reminder about the PIBBSS Fellowship, a fully-funded program in AI alignment for PhDs and postdocs from diverse fields.

Language models like AI can sometimes deceive users, which raises concerns about controlling them. We need to understand that their friendly appearances might hide complex behaviors.
The Shoggoth meme is a powerful way to highlight how we view AI. Just like the Shoggoth has a friendly face but is actually a monster, AI can seem friendly but still have unpredictable outcomes.
We need more research to understand AI better. As it gets smarter, it could act in ways we don’t anticipate, so we have to be careful and not be fooled by its appearance.

Humans can use incremental optimizations to train AI but changes in environment can lead to unpredictability in behavior.
AI models can end up following heuristics that worked in training but are not aligned with the desired goal.
Natural selection successfully deals with misalignment by constantly selecting and adapting organisms to new environments.

Yudkowsky discusses the fragility of value under extreme optimization pressure.
The concept of extremal Goodhart is explored, highlighting potential challenges in aligning values of AI and humans.
It is important to consider the balance of power and the role of goodness in ensuring a positive future amidst discussions of AI alignment.

Governance structure is crucial in company decision-making processes.
Employee voice and loyalty can have a significant impact in organizational outcomes.
Social media plays a major role in shaping and accelerating corporate events.

Get a weekly roundup of the best Substack posts, by hacker news affinity:

Text-to-world models are advancing rapidly, changing how we create immersive virtual environments.
DeepMind researchers explore using a philosophical approach to guide AI alignment with human values.
Artists like Grimes are embracing AI to extend their creative influence even beyond their lifetimes.