Reasoning Boosts Opinion Alignment in LLMs

ICLR 2026

F. Berdoz, Y. Billeter, Y. Vonlanthen, R. Wattenhofer

ETH Zurich, Switzerland

opinion-modelingreinforcement-learningpolitical-alignmentlanguage-models

Abstract

Opinion modeling aims to capture individual or group political preferences, enabling applications such as digital democracies, where models could help shape fairer and more popular policies. Given their versatility, strong generalization capabilities, and demonstrated success across diverse text-to-text applications, large language models (LLMs) are natural candidates for this task. However, due to their statistical nature and limited causal understanding, they tend to produce biased opinions when prompted naively. In this work, we study whether reasoning can improve opinion alignment. Motivated by the recent advancement in mathematical reasoning enabled by reinforcement learning (RL), we train models to produce profile-consistent answers through structured reasoning. We evaluate our approach on three datasets covering U.S., European, and Swiss politics. Results indicate that reasoning enhances opinion modeling and is competitive with strong baselines, but does not fully remove bias, highlighting the need for additional mechanisms to build faithful political digital twins using LLMs. By releasing both our method and datasets, we establish a solid baseline to support future research on LLM opinion alignment.

Overview

Can LLMs faithfully represent the political opinions of individuals? Current approaches rely on demographic prompting, which suffers from poor representativeness, limited steerability, and inconsistency across topics. We propose a different approach: training LLMs to reason about political opinions using reinforcement learning on survey data. By framing opinion formation as a structured reasoning task, agents learn to generate a justification before committing to a stance, and are rewarded when the stance matches the respondent’s actual survey answer.

Method

We employ Group Relative Policy Optimization (GRPO) to train per-individual agents on political survey responses. Each agent produces output in a fixed schema: a reasoning trace wrapped in <reasoning> tags followed by an answer in <answer> tags. A composite reward function scores each generation along three dimensions: format compliance, reasoning length, and correctness (match with the ground-truth survey response). An optional supervised fine-tuning (SFT) stage warm-starts the model with synthetic chain-of-thought demonstrations before GRPO training.

We evaluate on three datasets spanning distinct political systems: Swiss candidates (smartvote, binary stances), German party positions (Wahl-o-Mat, three-class including Neutral), and U.S. voters (ANES 2020, three-class). Each dataset is split into training and held-out test questions, and we train one model per individual or party.

Results

F1 scores broken down by political ideology group (Left, Center, Right) across the three datasets, showing that center and right-leaning groups are consistently harder to model. — **Figure 2:** F1 scores by ideology group. While SFT+GRPO generally performs best, every method underperforms on center and right-leaning groups, revealing that political ideology affects the learnability of preferences.

SFT+GRPO outperforms naive baselines, in-context learning, and SFT-only across most settings, with the best macro-F1 reaching 70.73% on smartvote and 53.21% on Wahl-o-Mat. Performance on ANES is lower (45.43%), partly because the Neutral class aggregates multiple response behaviors (uncertainty, strategic non-commitment) that are difficult to learn through reasoning alone.

PCA projection of Swiss candidate positions showing that trained agents shift toward the political center rather than exhibiting the left-libertarian bias commonly reported in the literature. — **Figure 3:** Agent positions in the smartvote PCA space. Contrary to documented left-libertarian biases in LLMs, our trained agents shift toward the center-right, with group-averaged displacement vectors (arrows) showing a conservative pull across all groups.

Analysis of agent positions in semantic space reveals that trained agents do not exhibit the left-libertarian bias commonly reported for general-purpose LLMs. Instead, agents are pulled toward the political center: left-wing individuals become more conservative, while right-wing individuals shift left.

An inversion experiment – where all survey answers are flipped – confirms that the performance gap between political groups is not solely due to base model bias. Right-leaning candidates improve after inversion but do not fully recover the F1 levels of left-leaning candidates, suggesting that certain preference profiles are intrinsically harder to learn from survey signals.

Key takeaway: Structured reasoning via reinforcement learning improves LLM opinion alignment over prompting and supervised baselines, but systematic disparities across the political spectrum persist. Neutral stances remain particularly challenging, and right-leaning profiles are consistently harder to model, highlighting the need for bias mitigation before such systems can be deployed for democratic applications.

Citation

@inproceedings{berdoz2026opinion,
  author = {Berdoz, F. and Billeter, Y. and Vonlanthen, Y. and Wattenhofer, R.},
  title = {{Reasoning Boosts Opinion Alignment in LLMs}},
  booktitle = {{International Conference on Learning Representations (ICLR)}},
  year = {2026}
}