Squirrel Squadron Substack • 3 implied HN points • 04 Feb 26
- Compression works by removing redundancy to make data smaller; lossless compression preserves every bit while lossy methods discard detail, and truly random data resists any meaningful shrinking. Recompressing already-compressed data usually fails and can make files bigger, so there are strict limits to how far you can compress.
- Information theory defines limits on compression and measures information by how short a program can reproduce the data (Kolmogorov complexity). Effective compression depends on clever representations and adaptive algorithms that capture structure in the data.
- Large language models behave like powerful compression-and-prediction systems that build compact internal models by learning to predict the next token. This predictive compression explains much of their useful, seemingly intelligent behavior and their value as productivity tools, even if they are not human thinkers.