Key Technical Breakthroughs
According to official and community technical blogs, V4 introduces several architectural shifts:
- 1 Million Token Context: Both models support a massive 1M-token context window as standard, enabling full-codebase reasoning.
- Hybrid Attention: V4 uses a new Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA) system. This allows V4-Pro to run at only 27% of the inference FLOPs of previous models like V3.2.
- Engram Memory: This architecture (published in Jan 2026) enables ultra-efficient retrieval from extremely long contexts without massive parameter spikes.
- Massive Scale: V4-Pro features 1.6 trillion total parameters (49B active), while V4-Flash has 284 billion (13B active)