plus more about Self-distilled Agentic RL, Embedded Language Flows, and Negation Neglect
plus more on Sparser, Faster, Lighter Transformer LMs, Manifold Steering, and Teaching Claude Why
can't believe they removed this paper unknowningly
plus more about Hyperloop Transformer, Qwen-3.5 Omni, and Scaling Self-Play with Self-Guidance
plus more about Looped Transformers, Nexus, RNN with Memory, and more
plus more about In-Place TTT, TriAttention, and Interleaved Head Attention.
plus more on Path-Constrained MoE, HISA, and Screening is not enough
plus more on Claudini, Composer 2, and self-distillation
plus more about V-JEPA 2.1, Mamba 3, and latent planning
and more about GLM-OCR, pre-pre-training on NCA, IndexCache, and neural thickets
and more about Speculative Speculative Decoding, SWE-CI, and Beyond Language Modeling
plus more on Learning Without Training and The Geometry of Noise