The AI Timeline

The AI Timeline
Archive
Page -12

Flash Attention 4 is nuts

and more about Speculative Speculative Decoding, SWE-CI, and Beyond Language Modeling

by cloud

Mar 04, 2026

Compress Context... Into a LoRA!?

plus more on Learning Without Training and The Geometry of Noise

by cloud

Feb 24, 2026

Google Presents A Brand New Way To Train Latents

plus more about Experiential RL, GLM-5 Report, and Attention Matching

by cloud

Feb 17, 2026

Using Diffusion To Interpret LLMs?! Generative Latent Prior

plus more on Evolving Agents via Recursive Skill-Augmented RL and Low Hanging Fruits in Vision Transformers

by cloud

Feb 10, 2026

New Generative Paradigm: Drifting Model

an insane big week in AI reseasrch

by cloud

Feb 06, 2026

Premium

Adaptive Intelligence 2026: The Rise of Continual Learning & The End of Frozen AI Models?

An early preview of Continual Learning in 2026

by cloud

Feb 03, 2026

The First End-to-End Interpretability Method for Transformers

and more on Quantization-Aware Distillation for NVFP4, RL via Self-Distillation

by cloud

Jan 27, 2026

Learning to Discover at Test Time

plus more on Memorization Dynamics in Knowledge Distillation and Efficient Agents

by cloud

Jan 21, 2026

Yet Another DeepSeek Architectural Research: Engram

plus more on DroPE: Dropping RoPE, STEM, and Dr. Zero

by cloud

Jan 13, 2026

Wait, Wait, Wait... Why Do Reasoning Models Loop?

and more on Dead Salmons of AI Interp, GDPO, From Entropy to Epiplexity

by cloud

Jan 06, 2026

DeepSeek Just Added Parameters Where There Were None...

And more about Recursive Language Models, LongCat ZigZag Attention, and LoRA RL

by cloud

Dec 30, 2025

RoPE Is Inherently Flawed

plus more on Self-Play SWE-RL, Step DeepResearch, and Attention Is Not What You Need

by cloud

First Back

1 2 3 4 5 6 7 8

Next Last

Archive

Flash Attention 4 is nuts

Compress Context... Into a LoRA!?

Google Presents A Brand New Way To Train Latents

Using Diffusion To Interpret LLMs?! Generative Latent Prior

New Generative Paradigm: Drifting Model

Adaptive Intelligence 2026: The Rise of Continual Learning & The End of Frozen AI Models?

The First End-to-End Interpretability Method for Transformers

Learning to Discover at Test Time

Yet Another DeepSeek Architectural Research: Engram

Wait, Wait, Wait... Why Do Reasoning Models Loop?

DeepSeek Just Added Parameters Where There Were None...

RoPE Is Inherently Flawed