AI Snake Oil • 3231 implied HN points • 24 Feb 26
- Reliability is not just accuracy — it also requires consistency, robustness to changed conditions, good calibration about when the agent is uncertain, and failures that are contained and fixable. These ideas can be broken down into about a dozen measurable metrics.
- Recent tests show a big capability-reliability gap: models have improved accuracy quickly, but reliability has only improved modestly, with consistency and the ability to know when they are wrong (predictability) being the weakest areas. Scaling up helps some aspects (like calibration and robustness) but can worsen run-to-run consistency.
- Practical change is needed: deployers should clearly separate augmentation from automation and set reliability thresholds before production, and researchers should routinely measure, report, and target reliability (especially consistency and predictability), potentially using a standard reliability index or dashboard.