The hottest Large Models Substack posts right now

A new wave of flagship open-weight models from Chinese labs (like Qwen 3.5, GLM-5, MiniMax-M2.5, and StepFun) is pushing architectures such as MoE and hybrid dense variants, and many releases are multimodal with reasoning enabled by default.
Adoption patterns are surprising: a normalized metric shows unexpected winners and losers — some smaller or open-source models (e.g., GPT-OSS, Kimi K2, OCR models) have very high early adoption while notable releases like DeepSeek V3.2 have underperformed.
The ecosystem is maturing and commercializing — demand has already driven price increases for large models, smaller models can rival much larger ones on benchmarks, and there’s rising focus on agentic reasoning plus long-context and sparse-attention capabilities.

Arcee released Trinity-Large-Preview, an ultra-sparse MoE with 400B total parameters and about 13B active parameters, plus a public tech report and base models.
LiquidAI’s LFM2.5-1.2B-Instruct punches above its size, often matching larger models in tests and coming with Japanese, vision, and audio variants.
Kimi-K2.5 is a multimodal continual-pretrain model (15T tokens) that’s cheaper and stronger on coding and agent tasks, though its writing quality has slipped compared to earlier K2 models.

AI is shifting from manual 'vibe coding' to agentic engineering, where models autonomously plan, navigate large codebases, run tests, and iteratively fix bugs over long time horizons.
GLM-5 is an impressive open-source model that scales a mixture-of-experts architecture to 744 billion parameters and showcases strong systems engineering to handle that scale.
Enabling agentic behavior needs rethought reasoning, support for huge context windows, and robust reinforcement-learning alignment, and GLM-5 tackles these core bottlenecks.

The rapid advance of AI has led to a surge in building intelligent computing centers.
High idling rates in new intelligent computing centers are a major issue due to high operating costs.
The resolution of the idling problem depends on developing effective large-model applications.

Major media companies are making equity and licensing deals with AI labs so their characters and franchises can be used inside consumer AI products.
As model quality improvements become harder for users to notice, AI firms are increasingly buying exclusive IP and data access instead of just chasing benchmark gains.
Those exclusive IP deals can shut rivals out and reshape streaming and studio battles, turning content ownership into a strategic moat for consumer AI.

Get a weekly roundup of the best Substack posts, by hacker news affinity:

AI is moving to an agent-first model where LLMs act as operators for long-running, multi-step workflows, improving planning, tool use, and end-to-end task completion.
Open-weight and deployable model families are maturing, letting teams host, fine-tune, and run agentic coding and workflow assistants on their own infrastructure.
Compute and energy limits are now a primary bottleneck, driving investment in efficient architectures like MoEs, distillation, edge inference, and new hardware approaches.

GLM-4.7 is built to act like an "employee" rather than a chatty companion, prioritizing reliable task execution over conversational flair.
Its architecture—mixing a mixture-of-experts design with a "Preserved Thinking" approach—is optimized for long-context loops, terminal error recovery, and stateful reasoning to handle real-world workflows.
As an open-weight model focused on engineering and autonomous workflows, it’s positioned to become a standard choice for software development and task automation in 2026.

Gemini Deep Think is a “thinking layer” added on top of large multimodal models that turns a mixture-of-experts into a coordinated swarm of small reasoning agents.
It runs parallel, coordinated inference-time processes, which let it solve very hard problems and achieve state-of-the-art results on benchmarks like Olympiad-level math.
The key insight is that how you use compute at inference time matters as much as raw parameter count, pushing future model design toward dynamic runtime strategies.