TheSequence • 252 implied HN points • 24 Feb 26
- Video generation models are now functioning as physics engines that can learn and predict object dynamics and interactions from data.
- OpenAI's Sora marked a turning point by framing video models as world simulators, shifting the focus from generating pixels to building data-driven models of physical reality.
- This shift is enabled by architectures like diffusion transformers, which combine diffusion processes with transformer models to capture complex spatiotemporal dynamics.