AI safety takes • 78 implied HN points • 27 Dec 23
- Superhuman AI can use concepts beyond human knowledge, and we need to understand these concepts to supervise AI effectively.
- Transformers can generalize tasks differently based on the complexity and structure of the task, showing varying capabilities in different scenarios.
- Implementing preprocessing defenses like random input perturbations can be effective against jailbreaking attacks on large language models.