- The AI Timeline
- Posts
- 🚨This week’s top AI/ML research papers - Nov 3rd
🚨This week’s top AI/ML research papers - Nov 3rd
(Oct 28 ~ Nov 3rd, 2024)
🚨This week’s top AI/ML research papers:
GPT-4o System Card
Are LLMs Better than Reported?
Can Language Models Replace Programmers?
CLEAR: Character Unlearning in Textual and Visual Modalities
What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking
SelfCodeAlign
Mixture of Parrots
Unpacking SDXL Turbo
A prescriptive theory for brain-like inference
Modular Duality in Deep Learning
Learning Video Representations without Natural Videos
CORAL
Task Vectors are Cross-Modal
Mind Your Step (by Step)
ShadowKV
MarDini
COAT
Fast Best-of-N Decoding via Speculative Rejection
Continuous Speech Synthesis using per-token Latent Diffusion
Teach Multimodal LLMs to Comprehend Electrocardiographic Images
FasterCache
Read-ME
VibeCheck
HoPE
In-Context LoRA for Diffusion Transformers
Knowledge Graph Enhanced Language Agents for Recommendation
$100K or 100 Days
On Memorization of Large Language Models in Logical Reasoning
Unveiling the Hidden Structure of Self-Attention via Kernel Principal Component Analysis
Grounding by Trying
Relaxed Recursive Transformers
Combining Induction And Transduction For Abstract Reasoning
overview for each + authors' explanations ⬇️
GPT-4o System Card
Overview:
The paper mainly talks about the model's safety and alignment, detailing capabilities and limitations on a wide range of languages, and includes third-party assessments on potential societal impacts.
Paper:
---
Are LLMs Better than Reported? Detecting Label Errors and Mitigating Their Effect on Model Performance
Overview:
This paper explores how LLMs can enhance label accuracy in NLP benchmarks by flagging potential errors in existing datasets, using an ensemble of LLMs in the "LLM-as-a-judge" approach.
Findings show a large number of label errors, and correcting these errors leads to notable improvements in model performance, indicating that many perceived mistakes by LLMs are due to inaccurate labels rather than model deficiencies.
The paper also proposes strategies to address mislabeled data during training to further boost performance.
Paper:
Author's Explanation:
🚀 Excited to share our research: "Are LLMs Better than Reported? Detecting Label Errors and Mitigating Their Effect on Model Performance" 👇
arxiv.org/abs/2410.18889#NLP#LLMs#MachineLearning
— Omer Nahum (@omer6nahum)
12:59 PM • Oct 27, 2024
---
Can Language Models Replace Programmers? REPOCOD Says 'Not Yet'
Overview:
LLMs demonstrate high accuracy on Python coding in benchmarks like HumanEval and MBPP.
However, they do not yet match human developers in code completion for real-world tasks, which current benchmarks fail to accurately assess.
To address this, the authors introduce REPOCOD, a benchmark with 980 problems from actual projects, featuring longer solutions and higher complexity.
Evaluations reveal that none of the ten LLMs surpassed a 30 pass@1 rate on REPOCOD, highlighting the need for more robust LLMs to aid in real-world software development.
Paper:
---
Reply