The AI Timeline

The AI Timeline
Archive
Page -13004

There Will Be a Scientific Theory of Deep Learning

plus more about Hyperloop Transformer, Qwen-3.5 Omni, and Scaling Self-Play with Self-Guidance

Apr 21, 2026

Kimi Moonshot: Prefill-as-a-Service!?

plus more about Looped Transformers, Nexus, RNN with Memory, and more

by cloud

Apr 14, 2026

Neural Computer: Running an OS within an AI?!

plus more about In-Place TTT, TriAttention, and Interleaved Head Attention.

by cloud

Apr 07, 2026

Embarrassingly Simple Self-Distillation Technique

plus more on Path-Constrained MoE, HISA, and Screening is not enough

by cloud

weekly papers recapweekly papers recap

Mar 31, 2026

LeWorldModel: JEPA but more practical

plus more on Claudini, Composer 2, and self-distillation

by cloud

Mar 25, 2026

Rotate attention by 90 degrees...? Kimi's New Attention Residuals

plus more about V-JEPA 2.1, Mamba 3, and latent planning

by cloud

Mar 17, 2026

You can train OpenClaw just by talking to it?

and more about GLM-OCR, pre-pre-training on NCA, IndexCache, and neural thickets

by cloud

Mar 10, 2026

Flash Attention 4 is nuts

and more about Speculative Speculative Decoding, SWE-CI, and Beyond Language Modeling

by cloud

Mar 04, 2026

Compress Context... Into a LoRA!?

plus more on Learning Without Training and The Geometry of Noise

by cloud

Feb 24, 2026

Google Presents A Brand New Way To Train Latents

plus more about Experiential RL, GLM-5 Report, and Attention Matching

by cloud

Feb 17, 2026

Using Diffusion To Interpret LLMs?! Generative Latent Prior

plus more on Evolving Agents via Recursive Skill-Augmented RL and Low Hanging Fruits in Vision Transformers

by cloud

Feb 10, 2026

New Generative Paradigm: Drifting Model

an insane big week in AI reseasrch

by cloud

First Back

1 2 3 4 5 6 7 8

Next Last

Archive

There Will Be a Scientific Theory of Deep Learning

Kimi Moonshot: Prefill-as-a-Service!?

Neural Computer: Running an OS within an AI?!

Embarrassingly Simple Self-Distillation Technique

LeWorldModel: JEPA but more practical

Rotate attention by 90 degrees...? Kimi's New Attention Residuals

You can train OpenClaw just by talking to it?

Flash Attention 4 is nuts

Compress Context... Into a LoRA!?

Google Presents A Brand New Way To Train Latents

Using Diffusion To Interpret LLMs?! Generative Latent Prior

New Generative Paradigm: Drifting Model