The AI Timeline
Posts
Self-Attention with Polynomial Activation, The Failure of Human Preference Alignment, and System Prompts Doesn't Improve Performance

Self-Attention with Polynomial Activation, The Failure of Human Preference Alignment, and System Prompts Doesn't Improve Performance

#29 | Latest AI Research Explained Simply

by cloud
October 29, 2024

In this issue: 4x industry news, 3x AI research papers

Oct 21st ~ Oct 27th

🗞️ Industry News in 1 Line

♥ 3.4k An AI startup called “genmo” has developed, Mochi 1, a new open-source video generation model which can produce videos at 30fps for upto 5 seconds. This is a big deal as it is one of the few models which produces realistic videos and has been released under Apache 2.0 license. You can download Mochi 1 weights via torrent or try it on their online playground.

♥ 1.8k Clone, a humanoid company, has released a musculoskeletal torso which can use hydraulic tendon muscles to mimic human actions. Right now, it has pretty limited functionality. See this video demonstration where it autonomously moves its arms and elbows or where it mimics human hand movements.
♥ 4.9k A twitter user compared the performance of old and new Claude 3.5 by setting them free in a Minecraft world. It uses an automation agent called Mindcraft, which can be used to run any LLM in a MC server. Watch this video to see Claude 3.5 building a Minecraft world.
♥ 2.2k Meta has released a new set of Llama 3.2 models which can run on mobile devices. These models were quantized using two techniques: Quantization-Aware Training and SpinQuant. Together, they were able to reduce the model size by 56% and reduce the memory usage by 41%.

When "A Helpful Assistant" Is Not Really Helpful: Personas in System Prompts Do Not Improve Performances of Large Language Models

Zheng et al. [Carnegie Mellon University, Stanford University, LGAIResearch, University of Illinois Chicago, University of Michigan]

♥ 416 System Prompt

Intro to AI Personas

AI companies commonly include role-based personas in their system prompts when deploying Large Language Models (LLMs) - for instance, ChatGPT uses "You are a helpful assistant" as its default persona. While role-based chatbots have gained popularity for business applications, we haven't had clear evidence about whether giving an LLM a specific persona helps or hurts its ability to provide accurate responses.

This paper examines 162 different personas across four major LLM families and tests them against 2,410 factual questions. The researchers carefully selected personas representing six types of interpersonal relationships and eight domains of expertise, ensuring broad coverage. They evaluated not just overall performance, but also dug deeper to understand how factors like gender, role type, and domain alignment affected accuracy.

Research Methodology

The researchers set up a controlled experimental framework with three distinct prompt types: a baseline with no role, speaker-specific prompts where the LLM assumes a persona, and audience-specific prompts where the LLM addresses a specific type of audience. This experimental design allowed for direct comparison of performance across different prompting strategies while maintaining consistent question content.

The researchers chose the MMLU (Massive Multitask Language Understanding) dataset due to several reasons: its established use as a benchmark in the field, its diverse subject coverage which enables testing of domain-aligned personas, and its consistent question format which reduces unwanted variability in the results.

They implemented a sampling pipeline to ensure balanced representation across subjects while controlling for question length - specifically limiting questions to 150 words for 99% of the sample to manage computational resources effectively. The final dataset of 2,410 questions was carefully balanced across 26 subjects, which were then mapped to eight core categories (Law, Medicine, Computer Science, Math, Politics, Psychology, Natural Science, and Economics).

When Implementing this, the researchers differentiated between speaker-specific prompts (where the LLM assumes a role) and audience-specific prompts (where the LLM addresses a specific type of audience). This allowed them to test different aspects of persona-based interactions. This distinction is particularly important for understanding how different types of role framing might impact model performance, and the standardized prompt templates ensure consistency across experiments while accommodating model-specific formatting requirements.

Evaluating Different AI Personas for LLMs

Researchers tested whether it's possible to automatically select the optimal persona or role for different types of questions when prompting LLMs. These tests showed that no single persona consistently improved performance across all question types, certain personas showed effectiveness when aligned with specific domains.

Selecting the optimal role for each question could significantly improve model performance (as shown by the "best role per question" upper bound), the practical implementation of automatic role selection proved more difficult. Most automatic selection strategies only marginally outperformed random role selection, and in some cases, such as with the Qwen model, they actually performed worse than random selection. This suggests that the impact of personas on LLM performance might be inherently unpredictable and that creating reliable automated systems for role selection remains a significant challenge in the field.

Rethinking Softmax: Self-Attention with Polynomial Activations

Saratchandran et al. [Australian Institute of Machine Learning]

♥ 377 Attention

Introduction to Self-Attention with Polynomial Activations

The traditional understanding of softmax attention in transformer architectures has centered around three key properties: non-negativity, normalized row sums (summing to 1), and sparsity. These properties have been widely accepted as crucial for attention mechanisms because they allow the model to interpret attention weights as probabilities, enabling it to focus on relevant input elements while filtering out less important ones.

This paper challenges this established perspective by proposing a novel theoretical framework that suggests softmax's success is primarily due to its ability to implicitly regularize the Frobenius norm of the attention matrix during training, rather than its probability distribution properties. The authors demonstrate this by developing alternative polynomial activations that intentionally violate one or more of the traditional softmax properties, yet still achieve the crucial regularization effect.

Understanding Transformer Attention and Its Mathematical Properties

The transformer architecture's core component is its attention mechanism, which uses a mathematical framework to process sequences of data. It uses the attention head A(X), which transforms input sequences using three key components: query (q), key (k), and value (v) matrices. These are computed through trainable weight matrices Q, K, and V.

The traditional transformer uses softmax as its activation function, which has been widely adopted but not fully understood theoretically. The researchers make two significant theoretical discoveries about softmax's properties in attention mechanisms:

Implicit Regularization: Softmax naturally constrains the Frobenius norm of the self-attention matrix to grow no faster than the square root of the sequence length (√N). This acts as a built-in stabilizer during training, preventing the attention weights from growing too large. Specifically:
1. The Frobenius norm of softmax(A) is bounded by √N
2. The gradient of softmax has a Frobenius norm bounded by 2√N
Polynomial Alternatives: The researchers prove that properly scaled polynomial functions can achieve similar regularization properties as softmax. By scaling polynomial activations by 1/√N, they can:
1. Match softmax's O(√N) growth behavior
2. Maintain stable gradients during training
3. Potentially outperform softmax in certain tasks

This theoretical framework provides new insights into why transformers work so well and suggests alternative activation functions that could be more computationally efficient while maintaining the desirable properties of softmax.

Experimental Results and Performance Analysis of SoftMax

The researchers conducted comprehensive experiments across multiple domains to validate their theoretical findings about polynomial activations as alternatives to softmax. The attention patterns in Vision Transformers using heat map visualizations show distinct differences between softmax and polynomial (1/√14x³) activations. The polynomial activation demonstrated both positive and negative attention values, unlike softmax which only produces positive values. Moreover, visual inspection revealed that different activation functions focused on different regions of input images, suggesting they learn distinct feature representations.

For Natural Language Processing Tasks, the experiments were conducted on Long Range Arena (LRA) benchmarks and it showed mixed results as Softmax maintained superior performance on Text Classification (63.8%) and Pathfinder tasks (72.9%). But overall, these results suggest that properly scaled polynomial activations can serve as viable alternatives to softmax in transformer architectures, often matching or exceeding its performance while potentially offering computational advantages.

Beyond Preferences in AI Alignment

Zhi-Xuan et al. [MIT, UC Berkeley, University College London, University of Cambridge]

♥ 1k LLM Alignment bycloud’s pick

Intro to AI alignment

The term “AI alignment” means that we want AI systems to do what humans want without telling them when operating autonomously. This assumes that human values can be effectively captured and operationalized through preference modeling, utility functions, and reward optimization. However, this approach has proven problematic on multiple levels, from the technical difficulties of preference inference and aggregation to deeper philosophical questions about whether preferences can truly represent the full complexity of human values.

This paper presents a new approach by challenging its four fundamental theses regarding rational choice theory, expected utility theory, and both single-principal and multi-principal alignment. Instead of trying to match or aggregate human preferences, the authors propose reframing AI alignment around normative standards that are specific to AI systems' social roles.

For instance, rather than attempting to align a general-purpose AI assistant with individual user preferences or aggregated human values, they suggest developing role-appropriate normative standards through stakeholder negotiation and agreement. This alternative framework promises to better accommodate the plurality of human values while establishing clear boundaries for AI behavior that promote mutual benefit and minimize harm, even in the context of divergent human values and goals.

Rethinking AI Alignment via RLHF

The current approach to making AI systems behave in alignment with human values primarily relies on Reinforcement Learning from Human Feedback (RLHF) and similar reward-learning methods. While these techniques have shown impressive results, particularly in training large language models like GPT-4, they operate under some potentially problematic assumptions. The core idea is that by learning a reward function from human preferences and optimizing for it, we can create AI systems that act in accordance with human values. However, this approach treats human preferences as static, context-independent, and purely individual – assumptions that don't reflect the complex reality of human values and decision-making.

A more nuanced understanding reveals that human preferences are dynamic (changing over time), context-dependent (varying based on situation), and socially constructed (influenced by others and societal norms). Our preferences adapt as we learn and grow, shift based on our experiences, and are deeply intertwined with our social connections and moral principles. Additionally, some choices involve incomparable options where there isn't a clear "better" choice that can be optimized for. These aspects of human nature pose fundamental challenges to preference-based alignment methods.

The paper suggests a significant shift in how we approach AI alignment: instead of trying to align AI systems with individual or aggregate human preferences, we should align them with role-specific normative standards. For instance, a general-purpose AI assistant should be aligned with the normative ideal of what makes a good assistant – one that respects user autonomy, acknowledges uncertainty, understands value changes, and considers broader social impacts. This approach would require AI systems to learn not just preferences, but the underlying values and principles that generate those preferences, while respecting the plurality of human values and the legitimate ways they can change over time. This reframing could lead to AI systems that better serve diverse human needs while maintaining appropriate ethical boundaries.

Rethinking AI Alignment via Rational Choice Theory

Rational choice theory relies on simplifying assumptions about human decision-making. It says that humans consistently act to maximize their preferences which can be represented as a utility function. However, this framework struggles to capture the nuanced reality of human behavior. It fails to account for systematic deviations from optimality, such as cognitive biases and heuristics, and ignores the limitations of human cognitive resources.

Resource rationality offers a more realistic model by acknowledging the computational costs of decision-making and explains seemingly irrational behavior as a consequence of limited cognitive resources. This approach provides a stronger inductive bias for learning human preferences, focusing on how a more resourceful agent might act. Beyond simply refining the connection between preferences and actions, alternative preference representations like temporal logics, reward machines, and vector/interval-valued utilities offer richer ways to capture the complexities of human values. These approaches address the limitations of traditional utility functions by allowing for time-extended preferences and acknowledging situations with incomplete or incommensurable values, ultimately providing a more robust and nuanced framework for modeling human decision-making.

Final Thoughts on AI Alignment

The paper argues that rather than treating preferences as the ultimate target for AI alignment, we should be viewing them as indicators or proxies of deeper underlying structures - our values, norms, and reasoning processes. This shift represents a fundamental reimagining of AI alignment and suggests that future AI systems should be developed with a richer understanding of human decision-making that acknowledges our bounded rationality, complex value systems, and the context-dependent nature of our choices.

We should not be creating AI that maximizes human preferences, but focus on developing AI systems that can engage meaningfully with the full spectrum of human values while recognizing the normative complexity of their social roles. This more sophisticated approach promises to lead to AI systems that don't just align with what we prefer in the moment, but truly serve and enhance the values we hold dear as individuals and as a society.

🚨This week’s top AI/ML research papers:
- Sparse Crosscoders
- Rethinking Softmax
- Mechanistic Unlearning
- Decomposing The Dark Matter of Sparse Autoencoders
- ZIP-FIT
- Automatically Interpreting Millions of Features in Large Language Models
- Breaking the Memory Barrier
-… x.com/i/web/status/1…
— The AI Timeline (@TheAITimeline)
6:07 PM • Oct 26, 2024

Reply

or to participate.