**Jiayi Zhang*¹, Simon Yu¹, Derek Chong², Anthony Sicilia³,
**Michael R. Tomz², Christopher D. Manning², Weiyan Shi¹**
¹Northeastern University ²Stanford University ³West Virginia University
$^*$: Project co-lead. Orders are determined randomly.
[🌐 **Homepage]** [📜 **Paper] [**💻 **Code] [📦 Package] [**🐦 **X Thread] [📓 **Colab Notebook]
<aside>
Figure 1. An illustration of Verbalized Sampling (VS) mitigating mode collapse. Left: How typicality bias causes a base LLM to collapse to a single modal response when prompted directly. Right: Our method Verbalized Sampling can mitigate mode collapse. While direct prompting (1) repeatedly yields the same collapsed output, Verbalized Sampling (2) asks the model to generate a diverse set of responses with their probabilities, effectively improving output variety and bypassing mode collapse.
You ask your favorite LLM for a joke about coffee. You ask again. You get the same joke, no matter which model you try. You ask for a story, and it always begins with "Once upon a time..." The brainstorming ideas feel generic, the outputs repetitive. This frustrating phenomenon is known as **mode collapse.**
Figure 2. Mode Collapse in Action. Three of the leading AI models: Claude, Gemini, and ChatGPT, all respond with the exact same joke when asked for one about coffee. This convergence on the most probable answer shows mode collapse.
Why This Matters: Mode collapse reduces LLM output diversity, and thus limits LLMs’ potential in various important applications. For instance:
Past research has largely attributed mode collapse to algorithmic causes, such as inadequate reward models or majority-favoring optimization processes [5, 6]. But we discovered a more fundamental cause: The problem isn't just the algorithms. It's also us humans. Specifically, we show a systematic human typicality bias, where ****annotators ****consistently prefer familiar, conventional text over equally valid but less typical alternatives. ****Critically, this implies that even with a perfect reward model and optimization process, inherent bias within preference datasets will still lead to mode collapse, affecting the majority of alignment methods.