AI Snake Oil • 546 implied HN points • 01 Dec 23
- Model alignment focuses on preventing accidental harms, not intentional ones
- Technical approaches like RLHF have limitations but are effective against casual adversaries
- Model alignment is just one aspect of defense, alongside productization and other strategies