**Jiayi Zhang*¹, Simon Yu¹, Derek Chong², Anthony Sicilia³,

**Michael R. Tomz², Christopher D. Manning², Weiyan Shi¹**

¹Northeastern University ²Stanford University ³West Virginia University

$^*$: Project co-lead. Orders are determined randomly.

[🌐 **Homepage]** [📜 **Paper] [**💻 **Code] [📦 Package] [**🐦 **X Thread] [📓 **Colab Notebook]

<aside>

TL;DR

The problem. Post-training alignment leads to mode collapse, reducing LLM diversity.
The cause. While past studies have blamed mode collapse on algorithmic limitations, we identify a fundamental, pervasive cause rooted in preference data: human typicality bias, where annotators systematically prefer familiar answers, which in turn trains models to be less diverse. This was a problem with no name, and explains why different models collapse similarly and why we must rethink how we collect preference data.
The solution. We find a simple, training-free prompting method: Verbalized Sampling (VS). It asks models to output an explicit probability distribution over responses (e.g., "Generate 5 jokes with their corresponding probabilities").
Results. Verbalized Sampling increases diversity by 1.6-2.1× in creative writing, improves human evaluation scores by 25.7%, recovers 66.8% of the base model’s pre-alignment diversity. It provides tunable diversity across tasks like social simulation, open-ended QA, synthetic data generation, all without sacrificing safety or quality. </aside>

Figure 1. An illustration of Verbalized Sampling (VS) mitigating mode collapse. Left: How typicality bias causes a base LLM to collapse to a single modal response when prompted directly. Right: Our method Verbalized Sampling can mitigate mode collapse. While direct prompting (1) repeatedly yields the same collapsed output, Verbalized Sampling (2) asks the model to generate a diverse set of responses with their probabilities, effectively improving output variety and bypassing mode collapse.

The Problem: Alignment Causes Mode Collapse

You ask your favorite LLM for a joke about coffee. You ask again. You get the same joke, no matter which model you try. You ask for a story, and it always begins with "Once upon a time..." The brainstorming ideas feel generic, the outputs repetitive. This frustrating phenomenon is known as **mode collapse.**

Screenshot 2025-10-15 at 12.12.23 AM.png

Figure 2. Mode Collapse in Action. Three of the leading AI models: Claude, Gemini, and ChatGPT, all respond with the exact same joke when asked for one about coffee. This convergence on the most probable answer shows mode collapse.

Why This Matters: Mode collapse reduces LLM output diversity, and thus limits LLMs’ potential in various important applications. For instance:

In ideation and brainstorming, instead of offering lots of creative options, it outputs the same few ideas over and over [1].
For creative writing, it forces writers and creators to battle a constant wave of clichés and predictable tropes, burying the unique voices the AI could otherwise produce [2].
In AI-enhanced scientific discovery, it points researchers down well-known paths, causing them to miss novel hypotheses and breakthroughs [3].
For rollout diversity in RL training, it hinders the development of more capable models by causing them to get stuck and stop exploring, a problem known as "entropy collapse" [4].

Past research has largely attributed mode collapse to algorithmic causes, such as inadequate reward models or majority-favoring optimization processes [5, 6]. But we discovered a more fundamental cause: The problem isn't just the algorithms. It's also us humans. Specifically, we show a systematic human typicality bias, where ****annotators ****consistently prefer familiar, conventional text over equally valid but less typical alternatives. ****Critically, this implies that even with a perfect reward model and optimization process, inherent bias within preference datasets will still lead to mode collapse, affecting the majority of alignment methods.

TL;DR

Table of Contents

The Problem: Alignment Causes Mode Collapse