The AI Timeline
Posts
🚨This week’s top AI/ML research papers - Oct 20th

🚨This week’s top AI/ML research papers - Oct 20th

(Oct 13 ~ Oct 20th, 2024)

by cloud
October 20, 2024

🚨This week’s top AI/ML research papers:

REPA: Representation Alignment for Generation
Sabotage evaluations for frontier models
Janus
What Matters in Transformers? Not All Attention is Needed
The Curse of Multi-Modalities
When Attention Sink Emerges in Language Models
Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free
Sample what you can't compress
Mix Data or Merge Models?
Simplifying, Stabilizing and Scaling Continuous-Time Consistency Models
SeedLM
LOKI
Semantic Image Inversion and Editing using Rectified Stochastic Differential Equations
Baichuan-Omni Technical Report
KV Prediction for Improved Time to First Token
Thinking LLMs
MoH
WorldCuisines
Fluid
FlatQuant
A Comparative Study on Reasoning Patterns of OpenAI's o1 Model
Revealing the Barriers of Language Agents in Planning
HumanEval-V
EvolveDirector
Self-Data Distillation for Recovering Quality in Pruned Large Language Models
CoTracker3
SANA
LLM X MapReduce
MLLM can see?
Animate-X
Deep Compression Autoencoder for Efficient High-Resolution Diffusion Models
Model Swarms
Fundamental Limitations on Subquadratic Alternatives to Transformers
Inference Scaling for Long-Context Retrieval Augmented Generation
Refined LLC
TorchTitan
You Know What I'm Saying: Jailbreak Attack via Implicit Reference
Strong Model Collapse

overview for each + authors' explanations ⬇️

Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think

Overview:

This paper introduces REPresentation Alignment (REPA), a regularization technique that enhances diffusion models by aligning noisy input projections with pretrained image representations.

By incorporating high-quality external visual representations, REPA significantly improves training efficiency and generation quality in diffusion and flow-based transformers such as DiTs and SiTs.

This approach accelerates SiT training by over 17.5 times while achieving state-of-the-art generation results, such as an FID score of 1.42 with classifier-free guidance.

Paper:

https://arxiv.org/abs/2410.06940

Author's Explanation:

Representation matters.
Representation matters.
Representation matters, even for generative models.
We might've been training our diffusion models the wrong way this whole time. Meet REPA: Training Diffusion Transformers is easier than you think! sihyun.me/REPA/(🧵1/n) x.com/i/web/status/1…
— Saining Xie (@sainingxie)
5:01 PM • Oct 13, 2024

—

Sabotage Evaluations for Frontier Models

Overview:

This paper examines the potential for AI models to engage in sabotage, such as undermining oversight, evading behavior monitoring, or interfering with deployment decisions.

The authors develop threat models and evaluations to assess whether a model can successfully disrupt activities of major organizations. Tests on Anthropic’s Claude 3 Opus and Claude 3.5 Sonnet suggest that minimal mitigations currently manage these risks, though stronger measures may be needed as AI capabilities advance.

The study also highlights the benefits of mitigation-aware evaluations and simulating large-scale deployments through smaller-scale tests.

Paper:

https://www.anthropic.com/research/sabotage-evaluations

Author's Explanation:

New Anthropic research: Sabotage evaluations for frontier models
How well could AI models mislead us, or secretly sabotage tasks, if they were trying to?
Read our paper and blog post here: anthropic.com/research/sabot…
— Anthropic (@AnthropicAI)
5:55 PM • Oct 18, 2024

—

Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation

Overview:

Janus is an autoregressive framework designed to unify multimodal understanding and generation by decoupling visual encoding into separate pathways, while employing a single transformer architecture.

This approach addresses the issue of suboptimal performance arising from using a single visual encoder for both tasks due to different information granularity requirements.

By allowing independent selection of encoding methods for understanding and generation, Janus enhances flexibility and effectively mitigates encoder role conflicts.

Experiments indicate that Janus not only outperforms previous unified models but also rivals or surpasses the performance of task-specific models, showcasing its potential as a leading multimodal framework.

Paper:

https://arxiv.org/abs/2410.13848

Code & Model:

https://github.com/deepseek-ai/Janus

Author's Explanation:

🚀 Introducing Janus: a revolutionary autoregressive framework for multimodal AI!
By decoupling visual encoding & unifying them with a single transformer, it outperforms previous models in both understanding & generation.
⚡️ Powerful, simple, flexible, & next-gen ready! 🔥… x.com/i/web/status/1…
— DeepSeek (@deepseek_ai)
8:21 AM • Oct 18, 2024

---

What Matters in Transformers? Not All Attention is Needed

Overview:

This paper investigates redundancy in Transformer-based LLMs, focusing on MLP and Attention layers, using a similarity-based metric.

It reveals that many attention layers can be pruned without significant performance loss, exemplified by Llama-2-70B, which realized a 48.4% speedup with only a 2.4% performance reduction by pruning half of its attention layers.

Additionally, a joint method for dropping both Attention and MLP layers is proposed, enabling more aggressive pruning, as demonstrated by Llama-2-13B retaining 90% performance on the MMLU task after dropping 31 layers.

Paper:

https://arxiv.org/abs/2406.15786

GitHub:

https://github.com/CASE-Lab-UMD/LLM-Drop

---

The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio

Subscribe to keep reading

This content is free, but you must be subscribed to The AI Timeline to continue reading.

Already a subscriber?Sign in.Not now

Reply

or to participate.