Projects

Research projects on AI alignment, LLM reasoning, and more.

Can AI Agents Agree?

Can AI Agents Agree?

We evaluate LLM-based agents on Byzantine consensus and find that reliable agreement is not yet a dependable capability, even in benign no-stake settings.

multi-agentconsensusbyzantine-fault-tolerancelanguage-models

High-Fidelity Speech Enhancement via Discrete Audio Tokens

High-Fidelity Speech Enhancement via Discrete Audio Tokens

We introduce DAC-SE1, a simplified language model framework on high-resolution discrete audio tokens that achieves state-of-the-art speech enhancement without multi-stage pipelines.

speech-enhancementaudio-tokenslanguage-modelsbandwidth-extension

Text-to-Scene with Large Reasoning Models

Text-to-Scene with Large Reasoning Models

We introduce Reason-3D, a text-to-3D scene synthesis system that leverages large reasoning models for collision-aware spatial reasoning and semantic object retrieval.

text-to-scene3d-generationreasoning-modelsspatial-reasoning

Reasoning Boosts Opinion Alignment in LLMs

Reasoning Boosts Opinion Alignment in LLMs

We show that reasoning boosts opinion alignment in LLMs, enabling models to produce profile-aligned opinions across political datasets for faithful digital twins of democratic processes.

opinion-modelingreinforcement-learningpolitical-alignmentlanguage-models

Steering Pretrained Drafters during Speculative Decoding

Steering Pretrained Drafters during Speculative Decoding

We introduce a lightweight steering mechanism that injects verifier hidden states into pretrained drafters during speculative decoding, boosting acceptance rates by up to 35% with negligible overhead.

speculative-decodingllm-inferenceactivation-steeringlanguage-models

Reasoning Structure of Large Language Models

Reasoning Structure of Large Language Models

Logical Reasoning of LLMs Workshop @ ICLR 2026

We introduce a graph-based framework and benchmark for analyzing the reasoning structure of LLMs, revealing that token count is a poor proxy for reasoning quality.

reasoningbenchmarksgraph-analysislanguage-models

Subliminal Signals in Preference Labels

Subliminal Signals in Preference Labels

Agents in the Wild Workshop @ ICLR 2026

We show that binary preference labels can function as a covert communication channel, enabling a biased judge to transmit hidden behavioral traits to a student model through alignment.

ai-alignmentpreference-learningsuperalignmentllm-evaluation

Alignment-Aware Decoding

Alignment-Aware Decoding

We introduce alignment-aware decoding (AAD), a training-free inference-time method that steers LLM decoding toward aligned outputs using the implicit reward signal from DPO.

ai-alignmentdecodingpreference-optimizationlanguage-models

Recommender Systems for Democracy: Toward Adversarial Robustness in Voting Advice Applications

Recommender Systems for Democracy: Toward Adversarial Robustness in Voting Advice Applications

We expose 11 manipulation strategies in voting advice applications and propose robustness metrics and more resilient matching methods.

voting-adviceadversarial-robustnesscomputational-social-choicedemocracy

Can an AI Agent Safely Run a Government? Existence of Probably Approximately Aligned Policies

Can an AI Agent Safely Run a Government? Existence of Probably Approximately Aligned Policies

We provide formal guarantees for AI alignment in social decision-making and introduce a practical safeguarding method that makes any autonomous agent provably safe.

ai-alignmentsocial-choiceformal-guaranteesautonomous-agents

Fundamentals of Task-Agnostic Data Valuation

Fundamentals of Task-Agnostic Data Valuation

We introduce a task-agnostic data valuation framework based on diversity and relevance metrics over second-moment statistics, without requiring a downstream task or validation set.

data-valuationdata-marketsfederated-learningprivacy

Scalable Collaborative Learning via Representation Sharing

Scalable Collaborative Learning via Representation Sharing

Decentralization and Trustworthy ML Workshop @ NeurIPS 2022

We introduce a privacy-preserving collaborative learning framework where clients share feature prototypes via contrastive knowledge distillation, achieving scalable learning with minimal communication.

federated-learningknowledge-distillationprivacycollaborative-learning