Reasoning Boosts Opinion Alignment in LLMs
ICLR 2026
ETH Zurich, Switzerland
Abstract
Overview

Can LLMs faithfully represent the political opinions of individuals? Current approaches rely on demographic prompting, which suffers from poor representativeness, limited steerability, and inconsistency across topics. We propose a different approach: training LLMs to reason about political opinions using reinforcement learning on survey data. By framing opinion formation as a structured reasoning task, agents learn to generate a justification before committing to a stance, and are rewarded when the stance matches the respondent’s actual survey answer.
Method
We employ Group Relative Policy Optimization (GRPO) to train per-individual agents on political survey responses. Each agent produces output in a fixed schema: a reasoning trace wrapped in <reasoning> tags followed by an answer in <answer> tags. A composite reward function scores each generation along three dimensions: format compliance, reasoning length, and correctness (match with the ground-truth survey response). An optional supervised fine-tuning (SFT) stage warm-starts the model with synthetic chain-of-thought demonstrations before GRPO training.
We evaluate on three datasets spanning distinct political systems: Swiss candidates (smartvote, binary stances), German party positions (Wahl-o-Mat, three-class including Neutral), and U.S. voters (ANES 2020, three-class). Each dataset is split into training and held-out test questions, and we train one model per individual or party.
Results

SFT+GRPO outperforms naive baselines, in-context learning, and SFT-only across most settings, with the best macro-F1 reaching 70.73% on smartvote and 53.21% on Wahl-o-Mat. Performance on ANES is lower (45.43%), partly because the Neutral class aggregates multiple response behaviors (uncertainty, strategic non-commitment) that are difficult to learn through reasoning alone.

Analysis of agent positions in semantic space reveals that trained agents do not exhibit the left-libertarian bias commonly reported for general-purpose LLMs. Instead, agents are pulled toward the political center: left-wing individuals become more conservative, while right-wing individuals shift left.

An inversion experiment – where all survey answers are flipped – confirms that the performance gap between political groups is not solely due to base model bias. Right-leaning candidates improve after inversion but do not fully recover the F1 levels of left-leaning candidates, suggesting that certain preference profiles are intrinsically harder to learn from survey signals.
Citation
@inproceedings{berdoz2026opinion,
author = {Berdoz, F. and Billeter, Y. and Vonlanthen, Y. and Wattenhofer, R.},
title = {{Reasoning Boosts Opinion Alignment in LLMs}},
booktitle = {{International Conference on Learning Representations (ICLR)}},
year = {2026}
}