From Memorization to Reasoning in the Spectrum of Loss Curvature and Introducing Nested Learning: A new ML paradigm for continual learning
and more on Kimi Linear, Looped Transformer, How FP16 fixes RL...
How to Compress Long Text into Images To Reduce LLM Tokens and more
RLM, RAE, Reasoning with Sampling, and more
Plus more about Moloch's Bargain: Emergent Misalignment When LLMs Compete for Audiences and LLM Fine-Tuning Beyond Reinforcement Learning
Plus more about Polychromic Objectives for Reinforcement Learning and Stochastic activations
Plus more about Thinking Augmented Pre-training and Reinforcement Learning on Pre-Training Data
Plus more about Discovery of Unstable Singularities and AToken: A Unified Tokenizer for Vision
Plus more about Analog in-memory computing attention mechanism for fast and energy-efficient large language models, and the Majority is not always right: RL training for solution aggregation
Plus more about Small Language Models are the Future of Agentic AI and Why Do MLLMs Struggle with Spatial Understanding?
Plus more about StepWiser: Stepwise Generative Judges for Wiser Reasoning and Prophesy in LLMs: Diffusion LMs know the answer before decoding
Plus more about Reinforcement Learning with Rubric Anchors and DuPO: Enabling Reliable LLM Self-Verification via Dual Preference Optimization