• The AI Timeline
  • Posts
  • AI Scientist, DeepSeek Prover V1.5, Diffusion Guided LM

AI Scientist, DeepSeek Prover V1.5, Diffusion Guided LM

#19 | Latest AI Research Explained Simply

#19 | In this issue: x3 industry news, x3 AI research papers

Aug 12th ~ Aug 18th

🗞️ Industry News in 1 Line

  1. ♥ 1.5k Nous Research, an open-source focused lab, has released Hermes 3, an open-source model which aims to push the boundaries of alignment and inclusivity in a way that big companies are too afraid to try. It comes in 7 different variants and offers advanced capabilities like function calling and better instruction following.

  2. ♥ 638 Last time Google released a model, it received a lot of criticism as it was producing offensive and discriminatory images in the name of diversity. This week, Google has released Imagen 3, their latest text-to-image model capable of producing ultra-realistic photos which are pretty impressive.

  3. ♥ 483 A new mechanism called FlexAttention combines the Flexibility of PyTorch with the Performance of FlashAttention. This new API allows you to implement diverse attention variants in just a few lines of PyTorch code.

Thinkbuddy: MacOS AI Lab of Power Users

🌀 LLM Remix | 🤖 Multi-Models | ⚡️ Shortcuts + 🎙️ Whisper | 🔑 All-Access Single Subscription

Hey, AI enthusiasts! We've created the ultimate macOS companion for power users. Thinkbuddy isn't just another ChatGPT wrapper; it's deeply integrated with macOS, leveraging shortcuts, prompt templates, and advanced features like AI model mixing for ultimate responses.

You don't need to pay $25 for Claude, $40 for ChatGPT + Gemini, and be stuck with different UIs from various inference companies to test Meta's new LLaMa 3.1 405B. Our solution offers all these in one place!

Features:

🤖 LLM Remix: Combine GPT-4o + Claude Sonnet 3.5 + Gemini 1.5 Pro

🔀 10+ leading models, no extra subscriptions - free to download!

💻 50+ Local LLMs (coming soon)

🍎 Deep MacOS integration

🌐 Multilingual support (80+ languages)

🎙️ Audio transcription & 📸 quick screenshot analysis

📄 PDF, DOCX, XLSX, JSON, SQL file support

⚡ Async LLM requests & remix under 10 seconds

🔒 Privacy-focused: Local chat storage

🌐 Web 📱 Mobile 💻 Windows (coming soon)

Curious?

Access all assistants with one subscription. Remix their best answers for the ultimate AI response. No need to pay $20 each. Download now for free!

First 20 sales only: Pay once, get all models for life - %30 OFF - 130$ (COUPON CODE: BYCLOUD) and watch Alex's review to see all features in 5 minutes

(Not satisfied? 30-day return policy - No questions asked!)

The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery

Lu et al. [Sakana AI, FLAIR University of Oxford, University of British Columbia, Vector Institute, Canada CIFAR AI Chair]

♥ 5.4k   LLM Agents
Architecture of The AI Scientist

Architecture of The AI Scientist

Introduction to The AI Scientist

While AI models can write articles and reviews, they are incapable of thinking, which means they cannot be used for conducting scientific research and discovering new knowledge independently; LLMs have been used to assist human scientists in tasks like brainstorming ideas or writing code but they still require extensive manual supervision or are heavily constrained to specific tasks. 

This paper introduces "The AI Scientist," which is the first comprehensive system for fully automatic scientific discovery. This system aims to enable foundation models, such as LLMs, to perform research independently. The AI Scientist is designed to handle all aspects of the research process, including:

  • Generating novel research ideas

  • Writing necessary code

  • Executing experiments

  • Summarizing experimental results

  • Visualizing results

  • Presenting findings in a full scientific manuscript

  • Simulate a peer-review process for quality control

How Does The AI Scientist Work?

The AI Scientist is designed to emulate the entire scientific process, from ideation to peer review, without requiring any human intervention. Here's a breakdown of how The AI Scientist works:

  1. Idea Generation: The process begins with a broad research direction and a simple initial codebase which are often sourced from open-source repositories on platforms like GitHub. The AI Scientist then engages in a "brainstorming" phase and generates a diverse set of novel research directions. To ensure originality, it utilizes Semantic Scholar to verify the novelty of its ideas.

  2. Experimental Iteration: Once an idea is selected, The AI Scientist moves into the experimental phase. It autonomously:

    1. Executes proposed experiments

    2. Obtains results

    3. Produces plots and visualizations

    4. Creates descriptive notes for each plot

  3. Paper Write-up: The AI Scientist then composes a comprehensive scientific manuscript in LaTeX format which adheres to standard machine learning conference proceedings style. In this stage, it performs the following actions:

    1. Summarizes experimental results

    2. Contextualizes findings within the broader field

    3. Cites relevant literature (using Semantic Scholar for reference)

    4. Generates appropriate figures and tables

  4. Automated Paper Reviewing: This review process serves two purposes: it helps refine the current project and informs future iterations, creating a continuous feedback loop for open-ended ideation. This system can evaluate generated papers with decent accuracy (rejected most of the generated papers), provide detailed feedback, and suggest improvements, but still far from perfect when evaluated on real ICLR 2022 submissions. (all reject @59% vs best LLM reviewer @70% accuracy)

  5. Iterative Improvement: The AI Scientist uses insights from previous ideas and feedback to improve subsequent generations of research. This iterative process mimics the collaborative nature of the human scientific community and allows for continuous refinement and exploration of new research directions.

Violin diagram of each AI scientist’s performance with different LLM backbone. Sonnet 3.5 has generated papers that is considered to be “accepted“ by the LLM peer-reviewer. (2 is strong reject; 6 is weak accept)

Results and Real-World Implications of The AI Scientist

The AI Scientist's workflow can implement and develop a full paper at an approximate cost of $15, when paired with the most capable LLMs. The AI Scientist with a Sonnet 3.5 backbone has the ability to produce papers that its automated reviewer judges as "Weak Accept" at top-tier machine learning conferences (read sample papers).

Benchmark results of The AI Scientist

Benchmark results of The AI Scientist

The AI Scientist still has lots room for improvement. It can't see or fix visual problems in papers, sometimes creating unreadable plots or messy layouts. The system occasionally implements ideas incorrectly or makes unfair comparisons which lead to misleading results. It also sometimes makes big mistakes when writing about and evaluating results, like struggling to compare numbers correctly. 

DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search

Xin et al. [DeepSeek-AI]

♥ 497   LLM Reasoning

Introduction to DeepSeek-Prover-V1.5

DeepSeek-Prover-V1.5’s Overall Framework

LLMs face many challenges when it comes to proving formal theorems. Current methods for automated theorem proving have their limitations. Firstly, proof-step generation requires continuous verification at each step, which is computationally intensive and time-consuming. On the other hand, whole-proof generation is often prone to errors due to the absence of intermediate feedback which leads to inaccuracies in the final output. 

DeepSeek-Prover-V1.5 aims to improve the accuracy and efficiency of automated theorem proving by combining the benefits of both proof-step and whole-proof generation through a "truncate-and-resume" method. This approach initiates with whole-proof generation and then checks for errors. If an error is detected, the process truncates the incorrect section and resumes from the last correct point, ensuring the advantages of whole-proof generation while incorporating the necessary intermediate feedback.

How Does DeepSeek-Prover-V1.5 Work?

In the past, we explained how the DeepSeek team improved upon the architecture of DeepSeek V2 to create a new LLM called DeepSeek Coder V2. Similarly in this study, the researchers improved their AI system for solving math problems in a formal language called Lean. Here's how they did it:

  1. The researchers made the AI write out its thoughts in plain English before solving the problem. This helps the AI think through the problem step-by-step, just like a human would.

  2. Next, they added extra information to the training data to help the AI learn to keep track of its progress. After each step of a proof, they included comments showing what the computer understood about the problem at that point. 

  3. This study used a special learning method called reinforcement learning. The AI tries to solve many problems, and when it gets one right, it's rewarded. This encourages the AI to find better ways to solve problems.

  4. The researchers tested their AI on two sets of math problems: one for high school level and one for college level. They found that their new method works better than the old one. The AI can now solve more problems correctly, especially when it's allowed to try multiple times. They also discovered that when the AI writes out its thoughts in English (they call this "chain-of-thought"), it does even better than when it just tries to solve the problem directly.

In simple terms, they taught their AI to "think out loud" in English, keep track of its progress, and learn from its successes. This made the AI much better at solving formal math problems.

Evaluating DeepSeek-Prover-V1.5

This paper shows that DeepSeek-Prover-V1.5 can prove complex formal theorems using a number of techniques including specialized pre-training, supervised fine-tuning, and reinforcement learning. The system achieved a 60.2% pass rate on the miniF2F test set, which is a 10.2 percentage point improvement over the previous version. When using tree search techniques, the pass rate further increased to 63.5%, setting a new state-of-the-art record. The system also performed well on the ProofNet benchmark and achieved pass rates of 23.7% on the test set and 25.3% with tree search. 

Benchmark results of DeepSeek-Prover-V1.5

Benchmark results of DeepSeek-Prover-V1.5

Diffusion Guided Language Modeling

Lovelace et al. [Cornell University]

♥ 533   LLM Alignment
Architecture of Diffusion Guided Language Modeling

Architecture of Diffusion Guided Language Modeling

Introduction to Diffusion Guided Language Modeling

LLMs are good at creating text but they struggle when it comes to controlling specific aspects of text, such as making sure the tone is positive or ensuring the text isn't offensive. When you try to guide these models to produce text with specific characteristics, it often leads to errors that get worse as the text continues to generate, resulting in lower-quality output. 

The paper introduces a new approach called Diffusion Guided Language Modeling (DGLM). This method generates a rough idea (or proposal) of what the text should look like, including the desired characteristics, and then passes this to a traditional language model, which uses its strength in fluency to produce the final, polished text.

This way, the model can generate text that is both high-quality and tailored to specific needs, like being funny but not rude, without causing the errors that usually happen with guided text generation. Additionally, if you want to control a new attribute (like making the text sound formal), you only need to train a simple classifier, making it much easier and more flexible than other methods. 

Inner-Workings of Diffusion Guided Language Modeling

This system uses three main parts to generate text that not only makes sense but also meets specific requirements, like being polite or non-offensive.

  1. Diffusion Network: First, we have the diffusion network which is an idea generator. When you give a piece of text (let's call it the "prefix"), this network suggests possible ways the text could continue. This suggestion isn't an exact continuation but rather a "semantic proposal," meaning it represents the general idea or direction in which the text could go.

  2. Prompt Generator: Once the diffusion network has suggested a continuation, the prompt generator takes this rough idea and translates it into a "soft prompt," a form that the text generator (decoder) can easily understand and use. 

  3. Auto-Regressive Decoder: Finally, the auto-regressive decoder uses the soft prompt to create the actual text. Because this decoder is trained on lots of text data, it can produce fluent, natural-sounding text that aligns with the suggestion made by the diffusion network.

Semantic Proposal Conditioning

The system uses a special model called Sentence-T5 to understand and encode the meaning behind sentences. When you give the system a text prefix and its continuation, it splits them up and uses Sentence-T5 to create a high-level summary or "embedding" of what the continuation should be like.

The diffusion network works in this high-level space to come up with its semantic proposals. To help the decoder (the final text generator) understand these proposals, the prompt generator translates them into a format the decoder can use to produce the final text. This step is fine-tuned so that the decoder learns to follow these prompts closely, resulting in text that fits the original idea.

Noise Handling

Sometimes, the ideas generated by the diffusion network aren't perfect. To deal with this, the system adds a little bit of noise to the proposal. This noise helps the decoder correct any minor mistakes made by the diffusion network and ensures the final text is smooth and accurate. The amount of noise can be adjusted. If there's less noise, the decoder sticks closely to the proposal. If there's more noise, the decoder relies more on its own abilities, which helps fix any issues with the proposal.

Final Text Generation

When the system is actually generating text, it uses the diffusion network's proposal (with some noise) to guide the decoder. The decoder then produces text that not only flows well but also meets any specific conditions you want, like being positive or formal. This setup lets the system generate high-quality text that can be easily controlled to match different needs.

Benchmark Results of Diffusion Guided Language Models

DGLM can produce text that is not only high-quality but also more diverse than the standard methods. It outperformed the popular GPT-2 model in terms of diversity and accuracy. The model smoothly shifted between relying on its original training and the new diffusion-guided ideas, resulting in better text generation.

This method reduced toxic language and could adjust the sentiment (positive or negative) of the text without losing its natural flow or diversity. The model could also handle more complex tasks, like combining multiple controls (e.g., making the text both positive and about a specific topic), with good results.

If there's an option to support us $2/mo, would you sub?

ad free + just support my endeavor in general <3

Login or Subscribe to participate in polls.

Reply

or to participate.