You can train OpenClaw just by talking to it?

and more about GLM-OCR, pre-pre-training on NCA, IndexCache, and neural thickets

Mar 10th ~ Mar 17th
#99 Latest AI Research Explained Simply

🗞️ Industry News in 1 Line

  1. ♥ 24k Claude 3.5 models (Opus and Sonnet) now support a 1-million-token context window, and allow users to process large codebases, extensive document sets, and up to 600 images or PDF pages per request. This update is available across all plans and is integrated by default into Claude Code at standard pricing.

  2. ♥ 44k Google Maps has integrated Gemini AI to help you explore and navigate more easily. You can now use a new "Ask Maps" feature to get conversational answers to specific, real-world questions, like finding a place to charge your phone or a well-lit tennis court. Additionally, a new "Immersive Navigation" tool is rolling out, which provides vivid 3D visuals and more detailed route guidance to help you navigate your surroundings with more confidence.

  3. ♥ 5.3k Hermes Agent by Nous Research is an open-source, Python-based tool designed to grow with you by utilizing a multi-level memory system and persistent machine access, similar to OpenClaw. It works across your CLI and various messaging platforms, and offers developers an extensible framework for complex tasks like subagent management and programmatic tool calling.

  4. ♥ 6.2k Covenant-72B is the largest decentralized LLM pre-training run in history. It is a 72B parameter model trained across commodity internet connections without centralized clusters or whitelisting. By using innovative techniques like SparseLoCo for bandwidth efficiency and a blockchain-based "Gauntlet" system for validation, the project achieved performance levels competitive with models trained in traditional data centers. Try it in browser.

  5. ♥ 19k Yann LeCun’s new startup Advanced Machine Intelligence (AMI) has secured $1.03 billion in one of the largest seed rounds in history to develop AI systems capable of advanced reasoning, persistent memory, and world-model understanding. These models are designed to understand the physical world while featuring persistent memory and the ability to reason, plan, and operate safely.

Intuitive AI Academy - NEW Distillation Chapter!

My latest project: Intuitive AI Academy has the perfect starting point for you! We focus on building your intuition to understand LLMs, from transformer components, to post-training logic. All in one place.

We just added a new chapter on Distillation too!

We currently have an early bird offer, where you would get 40% off on the yearly plan for our early users.

Use code: TIMELINE

OpenClaw-RL: Train Any Agent Simply by Talking

Wang et al. [Princeton University]

♥ 674   RL  

Every time we interact with an AI, it receives immediate feedback: a follow-up question, a software error, or a screen transition. Existing AI systems throw this valuable experience away. They treat our replies merely as context for their very next move, missing a massive opportunity to learn. Researchers wanted to solve this by capturing these everyday reactions, what they call next-state signals, and turning them into a live, continuous learning stream.

We can build a system where an AI naturally improves simply by being used, turning ordinary conversations and software tasks into a seamless training loop without needing to pause for offline updates. The researchers built a unified framework called OpenClaw-RL.

The AI receives two powerful forms of information. First, there are evaluative signals, which act like a simple score indicating whether an action succeeded or frustrated the user. Second, there are directive signals. When a user corrects an AI by explaining how it should have responded, or when a software tool outputs a detailed error, it provides a clear map for improvement.

The framework extracts these specific textual hints to give the AI rich, word-by-word guidance. Because the system is built asynchronously, the AI can chat with a user, a background judge can evaluate its performance, and a training engine can update the AI's core behavior all at the exact same time, without ever interrupting the workflow.

By combining basic scoring with these rich textual hints, researchers found that personal assistants rapidly adapt their tone, becoming much more natural after just a handful of conversations.

Neural Thickets: Diverse Task Experts Are Dense Around Pretrained Weights

Gan and Isola [MIT CSAIL]

♥ 818   Pretraining   bycloud’s pick  

Teaching an AI a new skill feels like searching for a microscopic needle in a haystack. Researchers have long believed that adapting a massive, billion-parameter model required highly complex, meticulous, step-by-step mathematical adjustments just to find a version of the system that performed a specific task well. Blindly guessing the right settings was considered mathematically impossible and entirely out of the question.

However, this study discovered that as models grow larger and undergo extensive initial training, adapting them to specific tasks like mathematical reasoning or coding no longer requires such rigid, painstaking effort. The heavy lifting of learning has already been done, signaling an exciting era where customizing powerful technology is becoming surprisingly natural, fast, and accessible.

The researchers found that scaling up these models fundamentally transforms their underlying structure. Instead of a desolate landscape where a good solution is a lone needle, massive models are surrounded by a dense, flourishing "thicket" of specialized solutions.

By making random, tiny adjustments to the model's underlying numbers, the researchers uncovered an abundance of hidden specialists waiting nearby. One random tweak might produce an expert in creative writing, while another creates a brilliant chemist, each specializing in one area while forgetting others.

To harness this rich diversity, the team tested a beautifully simple approach: they generated thousands of random tweaks simultaneously, kept the top performers for a specific task, and had them vote on the final answer. This parallel guess-and-check strategy matched the accuracy of today’s most advanced training methods but operated in a fraction of the time because it avoided slow, sequential updates.

IndexCache: Accelerating Sparse Attention via Cross-Layer Index Reuse

Bai et al. [Tsinghua University, Z.ai]

♥ 540   Attention  

AI models use an attention mechanism to connect different pieces of information, but as the text gets longer, this demands a staggering amount of computing power. Developers recently introduced a clever shortcut called sparse attention, which uses a specialized "indexer" tool to scan the text and pick only the most relevant words for the AI to focus on at each step. It is a brilliant fix, but researchers ran into a new bottleneck. This indexer operates independently at every single layer of the AI network. As the text grows, just running this indexer consumes a massive chunk of the system's processing time, slowing everything down.

Side-by-side comparison of inference loops.

Looking closely at how these models process data, researchers noticed an incredible inefficiency. They discovered that consecutive layers of the AI were repeatedly selecting almost the exact same important words, often with a near-perfect overlap. To solve this, the researchers developed an elegant, hopeful solution called IndexCache.

Instead of forcing every layer to do the heavy lifting of scanning and selecting information, IndexCache designates a few specific layers as "Full" layers to run the indexer. The remaining layers become "Shared" layers, which simply borrow the selected words from the nearest Full layer. The team created two ways to apply this: a training-free method that calculates the absolute best pattern of Full and Shared layers for existing models, and a training-aware method that actively teaches the AI to share this data.

Training-free IndexCache at 1/2, 1/4, and 1/8 indexer retention. ‘Long’ and ‘G&R’ aggregate benchmark scores.

By simply reusing this cached information, researchers eliminated 75 percent of the heavy indexer computations with negligible drops in the model's reasoning quality. When tested on massive systems, this straightforward change nearly doubled the speed at which the AI reads information and significantly accelerated its ability to generate answers.

Training Language Models via Neural Cellular Automata

Lee et al. [MIT, Improbable AI Lab]

♥ 1.5k   LLM Sampling  

LLMs rely on massive amounts of human-written text to learn how to reason and communicate. However, this approach faces a looming wall: high-quality human data is finite, often riddled with biases, and mixes actual reasoning with messy, subjective language. They investigated whether models could learn the fundamental mechanics of reasoning by training on purely synthetic, non-linguistic data before ever seeing a human sentence.

To test this, researchers turned to neural cellular automata (NCA), algorithmic systems that generate complex, ever-changing grid patterns using simple, local rules. Unlike static text, these patterns can be generated cheaply and in infinite supply, allowing for precise control over the "complexity" of the training data.

The team pre-trained models on these synthetic grid trajectories, then followed up with standard training on natural language. The results show that models that practiced on just 164 million synthetic tokens learned faster and performed better than those trained solely on significantly larger amounts of traditional internet text.

While traditional text training can cause a model to rely on human biases or semantic shortcuts, the synthetic grids force the model to focus purely on tracking long-range patterns and inferring underlying rules. Furthermore, the researchers found they could tune the complexity of this synthetic data to match specific domains; for instance, code benefited from simpler rules, while math and web text thrived on higher complexity.

GLM-OCR Technical Report

Duan et al. [Zhipu AI, Tsinghua University]

♥ 1.1k   LLM RL  

Modern information systems rely heavily on extracting knowledge from complex, visually dense documents like financial reports, invoices, and scientific papers. While recent multimodal AI models have improved how we read these documents, they often suffer from a major drawback: their massive size makes them slow, memory-intensive, and difficult to deploy in practical, real-world settings.

Architecture and workflow of the GLM-OCR framework

The researchers developed GLM-OCR, a lightweight, highly optimized framework that packs significant power into a compact 0.9-billion-parameter design. The system merges a specialized visual encoder with a streamlined language decoder. What makes it particularly clever is the shift away from standard “one-token-at-a-time” generation, which is notoriously slow for structured documents.

Instead, the researchers implemented a Multi-Token Prediction mechanism that allows the model to predict several tokens simultaneously. By using a shared-parameter scheme to keep memory usage low, the model effectively boosts throughput, the speed at which it processes data, by roughly 50% without sacrificing accuracy.

To handle real-world complexities, the system employs a two-stage pipeline. First, it uses an analysis module to detect the layout of a document, breaking a complex page into manageable regions. These regions are then processed in parallel, allowing for faster and more robust recognition of everything from handwritten text to complicated table structures.

Reply

or to participate.