Projects

Research projects on AI alignment, LLM reasoning, and more.

Can AI Agents Agree?

Can AI Agents Agree?

Preprint 2026
We evaluate LLM-based agents on Byzantine consensus and find that reliable agreement is not yet a dependable capability, even in benign no-stake settings.
multi-agentconsensusbyzantine-fault-tolerancelanguage-models
High-Fidelity Speech Enhancement via Discrete Audio Tokens

High-Fidelity Speech Enhancement via Discrete Audio Tokens

ICASSP 2026
We introduce DAC-SE1, a simplified language model framework on high-resolution discrete audio tokens that achieves state-of-the-art speech enhancement without multi-stage pipelines.
speech-enhancementaudio-tokenslanguage-modelsbandwidth-extension
Text-to-Scene with Large Reasoning Models

Text-to-Scene with Large Reasoning Models

AAAI 2026
We introduce Reason-3D, a text-to-3D scene synthesis system that leverages large reasoning models for collision-aware spatial reasoning and semantic object retrieval.
text-to-scene3d-generationreasoning-modelsspatial-reasoning
Reasoning Boosts Opinion Alignment in LLMs

Reasoning Boosts Opinion Alignment in LLMs

ICLR 2026
We show that reasoning boosts opinion alignment in LLMs, enabling models to produce profile-aligned opinions across political datasets for faithful digital twins of democratic processes.
opinion-modelingreinforcement-learningpolitical-alignmentlanguage-models
Steering Pretrained Drafters during Speculative Decoding

Steering Pretrained Drafters during Speculative Decoding

AAAI 2026
We introduce a lightweight steering mechanism that injects verifier hidden states into pretrained drafters during speculative decoding, boosting acceptance rates by up to 35% with negligible overhead.
speculative-decodingllm-inferenceactivation-steeringlanguage-models
Reasoning Structure of Large Language Models

Reasoning Structure of Large Language Models

Logical Reasoning of LLMs Workshop @ ICLR 2026
We introduce a graph-based framework and benchmark for analyzing the reasoning structure of LLMs, revealing that token count is a poor proxy for reasoning quality.
reasoningbenchmarksgraph-analysislanguage-models
Subliminal Signals in Preference Labels

Subliminal Signals in Preference Labels

Agents in the Wild Workshop @ ICLR 2026
We show that binary preference labels can function as a covert communication channel, enabling a biased judge to transmit hidden behavioral traits to a student model through alignment.
ai-alignmentpreference-learningsuperalignmentllm-evaluation
Alignment-Aware Decoding

Alignment-Aware Decoding

Preprint 2025
We introduce alignment-aware decoding (AAD), a training-free inference-time method that steers LLM decoding toward aligned outputs using the implicit reward signal from DPO.
ai-alignmentdecodingpreference-optimizationlanguage-models
Recommender Systems for Democracy: Toward Adversarial Robustness in Voting Advice Applications

Recommender Systems for Democracy: Toward Adversarial Robustness in Voting Advice Applications

IJCAI 2025
We expose 11 manipulation strategies in voting advice applications and propose robustness metrics and more resilient matching methods.
voting-adviceadversarial-robustnesscomputational-social-choicedemocracy
Can an AI Agent Safely Run a Government? Existence of Probably Approximately Aligned Policies

Can an AI Agent Safely Run a Government? Existence of Probably Approximately Aligned Policies

NeurIPS 2024
We provide formal guarantees for AI alignment in social decision-making and introduce a practical safeguarding method that makes any autonomous agent provably safe.
ai-alignmentsocial-choiceformal-guaranteesautonomous-agents
Fundamentals of Task-Agnostic Data Valuation

Fundamentals of Task-Agnostic Data Valuation

AAAI 2023
We introduce a task-agnostic data valuation framework based on diversity and relevance metrics over second-moment statistics, without requiring a downstream task or validation set.
data-valuationdata-marketsfederated-learningprivacy
Scalable Collaborative Learning via Representation Sharing

Scalable Collaborative Learning via Representation Sharing

Decentralization and Trustworthy ML Workshop @ NeurIPS 2022
We introduce a privacy-preserving collaborative learning framework where clients share feature prototypes via contrastive knowledge distillation, achieving scalable learning with minimal communication.
federated-learningknowledge-distillationprivacycollaborative-learning