Plus more about BitNet b1.58 2B4T Technical Report and ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
Plus more about One-Minute Video Generation with Test-Time Training and Gaussian Mixture Flow Matching Models
Plus more about Inference-Time Scaling for Generalist Reward Modeling and Why do LLMs attend to the first token?
Plus more about Defeating Prompt Injections by Design and Reasoning to Learn from Latent Thoughts
Plus more about RWKV-7 "Goose" with Expressive Dynamic State Evolution and Measuring AI Ability to Complete Long Tasks
Plus more about Generalized Kullback-Leibler Divergence Loss and Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models
Plus more about Optimal Hyperparameter Scaling Law in Large Language Model Pretraining and PokéChamp: an Expert-level Minimax Language Agent
Plus more about SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution and Reasoning with Latent Thoughts: On the Power of Looped Transformers
Plus more about Mixture of Block Attention for Long-Context LLMs, and Idiosyncrasies in Large Language Models
Plus more about Continuous Concepts (CoCoMix), and Distillation scaling laws
Plus more about OmniHuman-1, and Simple test-time scaling
Plus more about Supervised Fine-Tuning (SFT) vs Reinforcement Learning (RL), and Janus-Pro