Plus more about Inference-Time Scaling for Generalist Reward Modeling and Why do LLMs attend to the first token?
Plus more about Defeating Prompt Injections by Design and Reasoning to Learn from Latent Thoughts
Plus more about RWKV-7 "Goose" with Expressive Dynamic State Evolution and Measuring AI Ability to Complete Long Tasks
Plus more about Generalized Kullback-Leibler Divergence Loss and Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models
Plus more about Optimal Hyperparameter Scaling Law in Large Language Model Pretraining and PokéChamp: an Expert-level Minimax Language Agent
Plus more about SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution and Reasoning with Latent Thoughts: On the Power of Looped Transformers
Plus more about Mixture of Block Attention for Long-Context LLMs, and Idiosyncrasies in Large Language Models
Plus more about Continuous Concepts (CoCoMix), and Distillation scaling laws
Plus more about OmniHuman-1, and Simple test-time scaling
Plus more about Supervised Fine-Tuning (SFT) vs Reinforcement Learning (RL), and Janus-Pro
Plus more about Transformer2 and Kimi k1.5
Plus more about MiniMax-01 and Scaling LLM Test-Time Compute