Reasons to Be Optimistic • 6 implied HN points • 17 Feb 26
- Text-only models are powerful but incomplete because language misses how the world actually looks, moves, and feels; video offers a far richer, high-volume source of physics, sound, and human behavior.
- True world models must be causal and action-conditioned, predicting the next state step-by-step under intervention; autoregressive diffusion transformer architectures trained on multimodal video and actions are a promising path.
- General world models will turn naive software into systems that understand and interact with the real world, enabling adaptive robots, immersive simulations, new learning tools, and large-scale scientific discovery.