Breaking down "Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?"
FreeFlow, DeepSeekMath-V2, Soft Adaptive Policy Optimization, and more
Plus more on Seer, Virtual Width Networks, SAM 3, and Evolution Strategies at the Hyperscale
LeJEPA, The Path Not Taken, and more
From Memorization to Reasoning in the Spectrum of Loss Curvature and Introducing Nested Learning: A new ML paradigm for continual learning
and more on Kimi Linear, Looped Transformer, How FP16 fixes RL...
How to Compress Long Text into Images To Reduce LLM Tokens and more
RLM, RAE, Reasoning with Sampling, and more
Plus more about Moloch's Bargain: Emergent Misalignment When LLMs Compete for Audiences and LLM Fine-Tuning Beyond Reinforcement Learning
Plus more about Polychromic Objectives for Reinforcement Learning and Stochastic activations
Plus more about Thinking Augmented Pre-training and Reinforcement Learning on Pre-Training Data
Plus more about Discovery of Unstable Singularities and AToken: A Unified Tokenizer for Vision