plus more about If LLMs Have Human-Like Attributes, Then So Does Age of Empires II, Cosmos 3, and Robots Need More than VLA and World Models
plus more about Bitter Lesson in Data Filtering, Do Language Models Need Sleep, and Neural Weight Norm.
plus more on the Benefits of Subword Tokenization, HRM-Text, Probabilistic Tiny Recursive Model, and Vector Policy Optimization
plus more about Self-distilled Agentic RL, Embedded Language Flows, and Negation Neglect
plus more on Sparser, Faster, Lighter Transformer LMs, Manifold Steering, and Teaching Claude Why
can't believe they removed this paper unknowningly
plus more about Hyperloop Transformer, Qwen-3.5 Omni, and Scaling Self-Play with Self-Guidance
plus more about Looped Transformers, Nexus, RNN with Memory, and more
plus more about In-Place TTT, TriAttention, and Interleaved Head Attention.
plus more on Path-Constrained MoE, HISA, and Screening is not enough
plus more on Claudini, Composer 2, and self-distillation
plus more about V-JEPA 2.1, Mamba 3, and latent planning