- The AI Timeline
- Posts
- 🚨This week’s top AI/ML research papers - Oct 26th
🚨This week’s top AI/ML research papers - Oct 26th
(Oct 19 ~ Oct 26, 2024)
🚨This week’s top AI/ML research papers:
Sparse Crosscoders
Rethinking Softmax
ZIP-FIT
Mechanistic Unlearning
Decomposing The Dark Matter of Sparse Autoencoders
Automatically Interpreting Millions of Features in Large Language Models
Breaking the Memory Barrier
Can Knowledge Editing Really Correct Hallucinations?
Framer: Interactive Frame Interpolation
Beyond Position
A Hitchhiker's Guide to Scaling Law Estimation
Scaling up Masked Diffusion Models on Text
Why Does the Effective Context Length of LLMs Fall Short?
Scaling Diffusion Language Models via Adaptation from Autoregressive Models
Improve Vision Language Model Chain-of-thought Reasoning
PyramidDrop
FrugalNeRF
SAM2Long
SeerAttention
FiTv2
overview for each + authors' explanations ⬇️
Sparse Crosscoders for Cross-Layer Features and Model Diffing
Overview:
This research introduces "sparse crosscoders," a tool that tracks shared features across layers in neural networks, simplifying feature analysis and model comparisons.
Crosscoders support long-term feature tracking, streamline circuit analysis by removing redundant features, and detect unique model differences, aiding in fine-tuning and architecture studies.
Early results show they outperform per-layer methods in capturing cross-layer structures, though with higher computational cost.
Blog:
Author's Explanation:
New Anthropic research: Evaluating feature steering.
In May, we released Golden Gate Claude: an AI fixated on the Golden Gate Bridge due to our use of “feature steering”. We've now done a deeper study on the effects of feature steering.
Read the post: anthropic.com/research/evalu…
— Anthropic (@AnthropicAI)
3:47 PM • Oct 25, 2024
---
Rethinking Softmax: Self-Attention with Polynomial Activations
Overview:
This paper reevaluates the role of softmax in attention mechanisms, proposing that its effectiveness is due to implicit regularization of the Frobenius norm of the attention matrix rather than probability distribution generation.
The study introduces polynomial activations as alternatives that similarly regularize the Frobenius norm, showing potential for use in attention-based architectures.
Experiments reveal that these polynomial activations perform on par or better than softmax in multiple computer vision and language tasks, indicating promising new directions for attention mechanisms.
Paper:
---
ZIP-FIT: Embedding-Free Data Selection via Compression-Based Alignment
Overview:
ZIP-FIT introduces a data selection framework using gzip compression to evaluate alignment between training data and target task distribution, optimizing LLM performance for tasks like Autoformalization and Python code generation.
It significantly outperforms established methods such as DSIR and D4 by reducing cross-entropy loss and improving selection speed, showing efficiency improvements up to 85.1% and 65.8% respectively.
The approach highlights that smaller, well-aligned datasets can surpass larger, less targeted ones, emphasizing the importance of quality data over quantity.
The research underscores the efficacy of compression-based task alignment in enhancing domain adaptation and model learning efficiency.
Paper:
Author's Explanation:
🚨 What’s the best way to select data for fine-tuning LLMs effectively?
📢Introducing ZIP-FIT—a compression-based data selection framework that outperforms leading baselines, achieving up to 85% faster convergence in cross-entropy loss, and selects data up to 65% faster.
🧵1/8 x.com/i/web/status/1…
— Elyas Obbad (@ObbadElyas)
5:35 PM • Oct 25, 2024
---
Mechanistic Unlearning: Robust Knowledge Unlearning and Editing via Mechanistic Localization
Overview:
This paper explores the enhancement of knowledge editing and unlearning in LLMs through mechanistic interpretability, focusing on identifying specific model components (circuits) linked to particular mechanisms.
It finds significant differences in robustness when training components localized by various methods, especially distinguishing those preserving outputs from those identifying mechanisms with predictable intermediates.
Localizing edits to components linked with lookup-table mechanisms for factual recall enhances robustness across input/output formats and reduces side effects, outperforming baselines on datasets like sports facts and CounterFact.
Additionally, certain localized edits more effectively disrupt latent knowledge, making unlearning more resilient to attacks.
Paper:
Author's Explanation:
We used mechanistic interpretability to supercharge model editing and unlearning! 🚀 Our new method removes unwanted knowledge more effectively and prevents it from coming back. With @aaquib_syed1, @abhayesian, @aidanprattewart, @gkdziugaite
— Phillip Guo (@phuguo)
1:40 AM • Oct 21, 2024
---
Reply