The AI Timeline

The AI Timeline
Archive
Page 1

Multi-Token Attention

Plus more about Inference-Time Scaling for Generalist Reward Modeling and Why do LLMs attend to the first token?

by cloud

Apr 01, 2025

Anthropic's Research On The Biology of a LLM

Plus more about Defeating Prompt Injections by Design and Reasoning to Learn from Latent Thoughts

by cloud

Mar 25, 2025

Transformers without Normalization

Plus more about RWKV-7 "Goose" with Expressive Dynamic State Evolution and Measuring AI Ability to Complete Long Tasks

by cloud

Mar 19, 2025

Inductive Moment Matching

Plus more about Generalized Kullback-Leibler Divergence Loss and Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models

by cloud

Mar 13, 2025

(How) Do Language Models Track State?

Plus more about Optimal Hyperparameter Scaling Law in Large Language Model Pretraining and PokéChamp: an Expert-level Minimax Language Agent

by cloud

Mar 04, 2025

Fractal Generative Models

Plus more about SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution and Reasoning with Latent Thoughts: On the Power of Looped Transformers

by cloud

Feb 25, 2025

DeepSeek's Native Sparse Attention

Plus more about Mixture of Block Attention for Long-Context LLMs, and Idiosyncrasies in Large Language Models

by cloud

Feb 18, 2025

Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling

Plus more about Continuous Concepts (CoCoMix), and Distillation scaling laws

by cloud

Feb 11, 2025

Fully Autonomous AI Agents Should Not be Developed

Plus more about OmniHuman-1, and Simple test-time scaling

by cloud

Feb 04, 2025

Large Language Models Think Too Fast To Explore Effectively

Plus more about Supervised Fine-Tuning (SFT) vs Reinforcement Learning (RL), and Janus-Pro

by cloud

Jan 28, 2025

DeepSeek-R1 Explained

Plus more about Transformer2 and Kimi k1.5

by cloud

Jan 21, 2025

Titans: Learning to Memorize at Test Time

Plus more about MiniMax-01 and Scaling LLM Test-Time Compute

by cloud

First Back

1 2 3 4 5

Next Last

Archive

Multi-Token Attention

Anthropic's Research On The Biology of a LLM

Transformers without Normalization

Inductive Moment Matching

(How) Do Language Models Track State?

Fractal Generative Models

DeepSeek's Native Sparse Attention

Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling

Fully Autonomous AI Agents Should Not be Developed

Large Language Models Think Too Fast To Explore Effectively

DeepSeek-R1 Explained

Titans: Learning to Memorize at Test Time