The AI Timeline
Posts
Google Presents A Brand New Way To Train Latents

Google Presents A Brand New Way To Train Latents

plus more about Experiential RL, GLM-5 Report, and Attention Matching

by cloud
February 24, 2026

Feb 19th ~ Feb 24th
#96 Latest AI Research Explained Simply

🗞️ Industry News in 1 Line

♥ 8.2k Replit has introduced Replit Animation, a new tool that lets users create animated videos in minutes using conversational prompts. It is powered by Gemini 3.1 Pro and makes it easier to produce polished, shareable content without traditional editing software.
♥ 22k Anthropic has released Claude Sonnet 4.6, which comes with stronger performance in complex spreadsheet tasks, multi-step web forms, and expanded integrations through Excel MCP connectors. The Claude API now also supports more accurate web search, dynamic filtering, and general availability of code execution, memory, and programmatic tool use.
♥ 49k Anthropic has announced Claude Code Security to scan codebases for vulnerabilities and suggest targeted patches for human review. This release triggered a sharp selloff in cybersecurity stocks, with companies like JFrog, CrowdStrike, Okta, and Cloudflare all recording declines. The system uses reasoning to trace data flows and flag subtle errors, while enforcing a human-in-the-loop safeguard to ensure developer oversight. If you want to try it, then join the waitlist here.

Learn LLMs Intuitively - Intuitive AI Academy

Want to learn about LLMs, but never have a good place to start?

My latest project: Intuitive AI Academy has the perfect starting point for you! We focus on building your intuition to understand LLMs, from transformer components, to post-training logic. All in one place.

content overview

We currently have a early bird offer, where you would get 40% off yearly plan for our early users.

Use code: TIMELINE

Advertise with The AI Timeline!

Unified Latents (UL): How to train your latents

Heek [Google DeepMind Amsterdam]

♥ 2.1k ALDiffusion

To create high-quality images and videos efficiently, models usually compress data into a "latent" space (a digital shorthand that represents the original image in a smaller, more manageable package). If you compress the data too much, the AI loses fine details like textures and sharp edges; if you don't compress it enough, the AI becomes incredibly slow and expensive to train.

Schematic overview of our model, include the Encoder, the prior latent diffusion model, and the diffusion decoder model.

Traditional methods often rely on manual tuning to strike this balance, which is often more of an art than a science. Researchers have recently introduced a framework called Unified Latents (UL) to turn this guesswork into a systematic, more efficient process. By co-training the compression and generation steps together, they have found a way to maintain stunningly high-quality details while actually lowering the computational cost required for training.

Instead of treating the compressed representation as a static container, Unified Latents uses a "diffusion prior" to monitor and regularize the information flow. By linking the noise level produced during the encoding process directly to the precision of the diffusion model, researchers could create a mathematically tight way to control the latent bitrate.

A selection of samples from a text-to-image trained with Unified Latents

They also paired this with a diffusion-based decoder, which is remarkably better at reconstructing high-frequency details than previous methods. This unified approach allows the system to navigate the trade-off between compression and quality with much greater precision.

Fast KV Compaction via Attention Matching

Zweiger et al. [Massachusetts Institute of Technology]

♥ 196 Attention bycloud’s pick

AI models can take on more complex tasks like long-form coding and multi-day conversations, but they face a significant memory hurdle known as the key-value (KV) cache bottleneck. Every word the model processes adds to a digital "short-term memory" that can quickly balloon into several gigabytes of data.

Until now, researchers have managed this by either summarizing the text (which often strips away vital nuances) or by using expensive optimization techniques that take hours of computing time to compress a single document.

This paper has introduced a technique called Attention Matching that changes how we think about memory compaction. Instead of relying on slow, iterative training to condense information, this approach treats memory like a mathematical puzzle that can be solved directly in "latent space".

By focusing on how a model "pays attention" to specific pieces of information, the researchers found they could create a compact version of the memory that mimics the original's behavior. They discovered that this problem can be broken down into smaller sub-problems with efficient, closed-form solutions, allowing them to bypass the slow trial-and-error process of traditional machine learning.

Accuracy vs. compaction ratio across methods.

This new framework can shrink a model's memory by up to 50 times in just a matter of seconds, rather than hours, with almost no impact on the quality of the model's output.

Experiential Reinforcement Learning

Shi et al. [KRAFTON, University of Wisconsin–Madison, UC Berkeley, Microsoft Research]

♥ 1k LLM RL

Teaching artificial intelligence to navigate complex tasks is often a game of high-stakes guessing. In standard reinforcement learning, a model typically receives a single "reward" signal. This makes it incredibly difficult for the AI to pinpoint exactly where it tripped up or how to adjust its behavior for the next try.

Researchers recognized that this "blind" trial-and-error is far less efficient than how humans naturally learn. When we fail at a task, we don't just try again at random; we stop, reflect on what went wrong, and form a mental plan to do better. To solve this, researchers introduced Experiential Reinforcement Learning (ERL), a new approach designed to turn silent failures into structured, durable lessons.

In Experiential Reinforcement Learning (ERL), instead of learning from feedback or outcome directly

The researchers developed a clever "experience-reflection-consolidation" loop that embeds human-like reasoning directly into the training process. Instead of moving on immediately after an attempt, the model receives feedback from its environment and is prompted to generate a verbal reflection on its own performance.

It looks at its errors and produces a self-critique that guides a refined second attempt at the same task. If this second try succeeds, the model "internalizes" the successful correction. This is the breakthrough moment: by training the model to reproduce the improved behavior from the original task alone, the AI eventually learns to skip the reflection step entirely.

Conceptual comparison of learning dynamics in RLVR and Experiential Reinforcement Learning (ERL)

This ensures that the final model is both smarter and faster, maintaining high performance at deployment without any extra computational cost.

In complex multi-step tasks like Sokoban, which require deep planning and spatial reasoning, ERL improved performance by a staggering 81% over standard methods. It also showed reliable gains in agentic reasoning tasks that involve using external tools to answer questions.

Overview of Experiential Reinforcement Learning (ERL).

By allowing the AI to accumulate "corrective knowledge" in a persistent memory, researchers have created a way for models to build on their past successes rather than repeating the same mistakes.

GLM-5: from Vibe Coding to Agentic Engineering

Zhipu AI & Tsinghua University

♥ 290 LLM Agents

Researchers are working to shift our relationship with AI from "vibe coding" (where humans provide constant prompts) toward a more autonomous era of "agentic engineering." Traditional models often struggle with the sheer computational cost of keeping track of long conversations, or they lose their way during tasks that require hours of planning and execution.

GLM-5 was designed to overcome these bottlenecks by creating a more independent assistant that can plan, implement, and iterate on technical challenges with minimal human intervention.

To make this possible, the researchers introduced a dynamic mechanism called DeepSeek Sparse Attention. Rather than the model straining to analyze every single word in a massive document with equal intensity (which is incredibly expensive and slow), it now identifies which tokens are truly important for the task at hand.

This allows the model to manage massive amounts of data, such as entire codebases or long-term business simulations, while significantly reducing the hardware power required. The model also features an "interleaved thinking" process where it pauses to reason before every action it takes.

It can "preserve" these thoughts across a long conversation, ensuring it doesn't lose its train of thought when moving between different stages of a project. By using a new asynchronous training infrastructure that allows the model to learn from complex, it was able to reach an unprecedented ability to solve end-to-end software engineering challenges.

Reply

or to participate.