aspiring.dev • 2 HN points • 15 Sep 24
- LLMs can be tricked into creating harmful content even when they are programmed not to. They don't really understand the context of what they generate.
- The way LLMs handle safety is based on prompts, not the content they produce. If the prompt can be manipulated, the output can be too.
- There are suggestions for improving LLM safety, like analyzing outputs during and after generation, rather than only checking the initial request.