**Jiayi Zhang*¹, Simon Yu¹, Derek Chong², Anthony Sicilia³,

**Michael R. Tomz², Christopher D. Manning², Weiyan Shi¹**

¹Northeastern University ²Stanford University ³West Virginia University

$^$: Project co-lead. Order determined randomly.*


🌐 **Homepage | 📜 **Paper | 🐦 **X Thread |** 💻 **GitHub |** 📓 **Colab |** 🖼️ Examples | 📡 Podcast Summary


<aside>

TL;DR

Figure 1. An illustration of Verbalized Sampling (VS) mitigating mode collapse. Left: How typicality bias causes a base LLM to collapse to a single modal response when prompted directly.  Right: Our method Verbalized Sampling can mitigate mode collapse. While direct prompting (1) repeatedly yields the same collapsed output, Verbalized Sampling (2) asks the model to generate a diverse set of responses with their probabilities, effectively improving output variety and bypassing mode collapse.

Figure 1. An illustration of Verbalized Sampling (VS) mitigating mode collapse. Left: How typicality bias causes a base LLM to collapse to a single modal response when prompted directly. Right: Our method Verbalized Sampling can mitigate mode collapse. While direct prompting (1) repeatedly yields the same collapsed output, Verbalized Sampling (2) asks the model to generate a diverse set of responses with their probabilities, effectively improving output variety and bypassing mode collapse.

Table of Contents

The Problem: Alignment Causes Mode Collapse

You ask your favorite LLM for a joke about coffee. You ask again. You get the same joke, no matter which model you try. You ask for a story, and it always begins with "Once upon a time..." The brainstorming ideas feel generic, the outputs repetitive. This frustrating phenomenon is known as **mode collapse.**

Screenshot 2025-10-15 at 12.12.23 AM.png

image.png

image.png

Figure 2. Mode Collapse in Action. Three of the leading AI models: Claude, Gemini, and ChatGPT, all respond with the exact same joke when asked for one about coffee. This convergence on the most probable answer shows mode collapse.

Why This Matters: Mode collapse reduces LLM output diversity, and thus limits LLMs’ potential in various important applications. For instance: