Research Interests
My current research is mainly around the LLM safety and evaluation.
LLM Safety
CoT monitoring, deceptive reasoning, code security, and realistic risks in autonomous agents.
CoT Monitoring
Agent Safety
Code Security
Deception
LLM Evaluation
Benchmarks and evaluation protocols for reasoning models and agents.
Agent Evaluation
Benchmark Contamination
/
/
Projects
(* denotes the equal contribution)
-
MonitorBench: A Comprehensive Benchmark for Chain-of-Thought Monitorability in Large Language Models
Han Wang*, Yifan Sun*, Brian Ko*, Mann Talati, Jiawen Gong, Zimeng Li, Naicheng Yu, Xucheng Yu, Wei Shen, Vedant Jolly, Huan Zhang
Preprint
-
CoT-Guard: Small Models for Strong Monitoring
Nirav Diwan*, Han Wang*, Berkcan Kapusuzoglu, Ramin Moradi, Supriyo Chakraborty, Giri Iyengar, Sambit Sahu, Huan Zhang, Gang Wang
Preprint
-
FORTIS: Benchmarking Over-Privilege in Agent Skills
Li Li, Chenxiao Yu, Han Wang, Wei Yang, Ryan A. Rossi, Franck Dernoncourt, Xiyang Hu, Philip S. Yu, Chaowei Xiao, Huan Zhang, Yue Zhao
Preprint
-
DecepChain: Inducing Deceptive Reasoning from Large Language Model
Wei Shen*, Han Wang*, Haoyu Li*, Huan Zhang
ICML 2026
-
On The Fragility of Benchmark Contamination Detection in Reasoning Models
Han Wang*, Haoyu Li*, Brian Ko*, Huan Zhang
ICLR 2026
-
How do Visual Attributes Influence Web Agents? A Comprehensive Evaluation of User Interface Design Factors
Kuai Yu*, Naicheng Yu*, Han Wang, Rui Yang, Huan Zhang
ACL 2026 Findings
-
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Junyu Zhang*, Runpei Dong*, Han Wang, Xuying Ning, Haoran Geng,
Peihao Li, Xialin He, Yutong Bai, Jitendra Malik, Saurabh Gupta, Huan Zhang
EMNLP 2025 Main && MATH-AI Workshop @ NeurIPS 2025
-
The Emperor's New Clothes in Benchmarking? A Rigorous Examination of Mitigation Strategies for LLM Benchmark Data Contamination
Yifan Sun*, Han Wang*, Dongbai Li*, Gang Wang, Huan Zhang
ICML 2025 && Data Problems Workshop @ ICLR 2025
-
Steering Away from Harm: An Adaptive Approach to Defending Vision Language Model Against Jailbreaks
Han Wang, Gang Wang, Huan Zhang
CVPR 2025
-
ALI-Agent: Assessing LLMs' Alignment with Human Values via Agent-based Evaluation
Jingnan Zheng*, Han Wang*, An Zhang, Tai D. Nguyen, Jun Sun, Tat-Seng Chua
NeurIPS 2024
Industrial Experience
IBM Research, NY
Research Scientist Intern • May 2026 to Present
Education
University of Illinois Urbana-Champaign, IL
Ph.D. • Aug. 2024 to Present
Zhejiang University, China
B.Eng. • Aug. 2020 to June 2024
Services
Conference Reviewer: NeurIPS 2025 - 2026, ICLR 2026, ICML 2026, COLM 2026, ACL ARR 2025 - 2026, Trustworthy AI Workshop @ ICLR 2026, AISec Workshop @ ACM CCS 2025, MATH-AI Workshop @ NeurIPS 2025
Journal Reviewer: IEEE TNNLS 2025, Neural Computation 2026
Teaching Assistant: ECE 484 (Principles of Safe Autonomy), UIUC, Fall 2025
|