I am a fourth-year Ph.D. student in the joint program between the University of Science and Technology of China (USTC) and Microsoft Research Asia (MSRA), co-supervised by Prof. Qiang Huo at MSRA and Prof. Jun Du at USTC. My Ph.D. research focuses on Document Intelligence (including OCR, document layout analysis, and document understanding) and Large Language Models (including MLLM, Agent, and RAG). Prior to this, I received my B.S. degree from the School of the Gifted Young (a.k.a. 少年班) at the University of Science and Technology of China in 2021, majoring in Computer Science.

During my Ph.D. studies, I gained valuable industry experience through internships at MSRA, DeepSeek, and ByteDance. At DeepSeek, I contributed to the development of DeepSeek VL2 and DeepSeek V3. My internship at MSRA involved working on the Microsoft OneOCR project and the Microsoft Document Intelligence project under the guidance of Researcher Qiang Huo and Lei Sun. Most recently, I began an internship with the ByteDance Seed team, where I am working on LLM/MLLM Agent projects. I have published over 10 papers (Citation: 3200+) at top-tier international AI journals and conferences.

I am currently seeking full-time job opportunities. If you are interested in my resume, please feel free to email me at jarvisustc@gmail.com. I am currently based in Beijing, China. If you would like to have a coffee chat, please feel free to reach out! ☕😊✨

🔥 News

2026.02: 🔥 We release a blog about Experience-Driven Learning paradigm and share many interesting findings. 🚧 Please find all details in our notion blog.
2025.12: 🎉 We are thrilled to announce that WideSearch has been selected as a key benchmark for evaluating agent capabilities in Seed 1.8. We are also honored to contribute to the Seed 1.8 API as core developers of the new context management feature.
2025.09: 🔥 We introduce EMPG: a new framework that solves the credit assignment bottleneck in long-horizon agent training by fixing a fundamental flaw in policy gradients. 🚧 Please find all details in our project page.
2025.09: 🎉 We're excited to have contributed to MCP Mark, a solid benchmark for stress-testing comprehensive MCP use. We have open-sourced all details in Github. Welcome to join us!
2025.08: 🔥 We introduce WideSearch: a new benchmark to test if AI agents can handle large-scale, repetitive information gathering — the real bottleneck in productivity. 🚧 Please find all details in our project page.
2025.06: 🎉 Our paper on VLM robustness, "Safeguarding Vision-Language Models: Mitigating Vulnerabilities to Gaussian Noise in Perturbation-based Attacks," has been accepted by ICCV 2025! See you in Hawaii!
2025.05: 🔥 Our latest research, DoctorAgent-RL: A Multi-Agent Collaborative Reinforcement Learning System for Multi-Turn Clinical Dialogue, is now available. This work highlights a crucial principle in Human-Agent Interaction: Agents must proactively request necessary information to excel, as humans may not always volunteer it. This "Agent-must-ask" paradigm is central to DoctorAgent-RL's ability to facilitate better task completion in complex multi-turn dialogues.
2025.04: 🎉 Thrilled to kick off my new internship with the ByteDance Seed Team!
2025.04: 🔥 Our latest work on VLM robustness, "Safeguarding Vision-Language Models: Mitigating Vulnerabilities to Gaussian Noise in Perturbation-based Attacks," has been released on Arxiv. We've open-sourced the Robust-VLGuard dataset and DiffPure-VLM defense.
2025.03: 🎉 Our paper "UniHDSA: A Unified Relation Prediction Approach for Hierarchical Document Structure Analysis" has been accepted by Pattern Recognition Journal!
2024.12: 💻 We've launched a new GitHub project: Awesome-Multimodal-RAG! Check out the latest in multimodal RAG and contribute!
2024.12: 🤝 We're excited to have contributed to DeepSeek-VL2, an advanced Vision-Language Model with strong performance and fewer parameters.

🔥 More News

2024.08-09: 🗣️ Presented DLAFormer and DRFormer at ICDAR in Athens! Photos can be found here. A memorable experience meeting colleagues and exploring the city.
2024.08: ✍️ The complete version of DLAFormer, titled "UniHDSA: A Unified Relation Prediction Approach for Hierarchical Document Structure Analysis", has been submitted to Pattern Recognition Journal.
2024.07: 🎉 Our Detect-Order-Construct have been accepted by Pattern Recognition!
2024.06: 🗣️ Our DLAFormer, UniVIE, and DRFormer selected for oral presentation at ICDAR 2024!
2024.03: 🚀 Azure AI Document Intelligence now supports Hierarchical Document Structure Analysis (HDSA), based on our "Detect-Order-Construct: A Tree Construction based Approach for Hierarchical Document Structure Analysis" paper. Details on arXiv and the official announcement.
2024.03: 💻 Source code released for our Language-Enhanced Image New Category Discovery solution from the CVPR 2023 HIT Workshop.
2024.02: ✍️ Our new work on Document Layout Analysis, DLAFormer: A End-to-End Transformer for Document Layout Analysis, submitted to ICDAR 2024.
2024.01: 💡 Introduced UniVIE: A Unified Label Space Approach to Visual Information Extraction from Form-like Documents! Reframing VIE as relation prediction with a unified label space.
2024.01: 📄 New technical paper released: Dynamic Relation Transformer for Contextual Text Block Detection!
2023.12: 🏆 2nd Prize, 2023 International Algorithm Case Competition (Visual Prompt Tuning Challenge @ CVPR 2023 HIT Workshop), 200,000 RMB bonus!
2023.11: ✍️ Our new progress on Hierarchical Document Structure Analysis submitted to Pattern Recognition Journal.
2023.07: 🎉 "Robust Table Structure Recognition with Dynamic Queries Enhanced Detection Transformer" accepted by Pattern Recognition Journal!
2023.04: 🎉 Two papers accepted by ICDAR 2023!
2023.03: 💡 Proposed a new Dynamic Queries based Detection Transformer for more robust table structure recognition!
2022.12: 🏆 2nd Prize, 2022 International Algorithm Case Competition (Panoptic Scene Graph Challenge @ ECCV 2022 SenseHuman Workshop), 100,000 RMB bonus!
2022.09: 🎉 One paper accepted by ACM MM 2022!

💻 Experiences

2025.4-Now: Research Intern, Seed Team, ByteDance , Beijing, China.
2024.09-2025.03: Research Intern, Multimodal Interaction Group, Microsoft Research Asia , Beijing, China.
2024.06-2024.08: AGI Research Intern, Multimodal LLM Team, DeepSeek , Beijing, China.
2020.09-2024.05: Research Intern, Multimodal Interaction Group, Microsoft Research Asia , Beijing, China.

📖 Educations

2021.09-2026.6: Ph.D. in Information and Communication Engineering, University of Science and Technology of China, Hefei, Anhui, China.
2017.09-2021.06: B.S. in the School of the Gifted Young (major in Computer Science), University of Science and Technology of China, Hefei, Anhui, China.

📝 Publications

✉️ means Corresponding Author; * means Equal Contribution

🤖 LLMs & MLLMs

[Submitted to ICLR 2026] Harnessing Uncertainty: Entropy-Modulated Policy Gradients for Long-Horizon LLM Agents
Jiawei Wang, Jiacai Liu, Yuqian Fu, Yingru Li, Xintao Wang, Yuan Lin, Yu Yue, Lin Zhang, Yang Wang$^✉️$, Ke Wang$^✉️$
[Submitted to ICLR 2026] WideSearch: Benchmarking Agentic Broad Info-Seeking
Ryan Wong*, Jiawei Wang*, Junjie Zhao, Li Chen, Yan Gao, Long Zhang, Xuan Zhou, Zuo Wang, Kai Xiang, Ge Zhang, Wenhao Huang, Yang Wang$^✉️$, Ke Wang$^✉️$
[Submitted to ICLR 2026] MCPMark: Stress-Testing Comprehensive MCP Use
Zijian Wu, Xiangyan Liu, Xinyuan Zhang, Lingjun Chen, Fanqing Meng, Lingxiao Du, Yiran Zhao, Fanshi Zhang, Yaoqi Ye, Jiawei Wang, Zirui Wang, Jinjie Ni, Yufan Yang, Arvin Xu, Michael Qizhe Shieh
[Submitted to ICASSP 2026] DoctorAgent-RL: A Multi-Agent Collaborative Reinforcement Learning System for Multi-Turn Clinical Dialogue
Yichun Feng*, Jiawei Wang*, Lu Zhou, Yixue Li$^✉️$
[ICCV 2025] (CCF-A) Safeguarding Vision-Language Models: Mitigating Vulnerabilities to Gaussian Noise in Perturbation-based Attacks
Jiawei Wang*$^✉️$, Yushen Zuo*, Yuanjun Chai, Zhendong Liu, Yicheng Fu, Yichun Feng$^✉️$, Kin-man Lam$^✉️$
[GigaScience 2025] A Scalable Retrieval-Augmented Reasoning Framework Based on Large Language Models for Knowledge Mining in Biomedical Literature
Yichun Feng, Jiawei Wang, Lu Zhou, Yixue Li$^✉️$
[arXiv 2024] (Cutting-edge Project) DeepSeek-V3 Technical Report
DeepSeek-AI
[arXiv 2024] (Cutting-edge Project) DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding
Zhiyu Wu*, Xiaokang Chen*, Zizheng Pan*, Xingchao Liu*, Wen Liu*, Damai Dai, Huazuo Gao, Yiyang Ma, Chengyue Wu, Bingxuan Wang, Zhenda Xie, Yu Wu, Kai Hu, Jiawei Wang, Yaofeng Sun, Yukun Li, Yishi Piao, Kang Guan, Aixin Liu, Xin Xie, Yuxiang You, Kai Dong, Xingkai Yu, Haowei Zhang, Liang Zhao, Yisong Wang, Chong Ruan$^✉️$

📄 Document Intelligence

[Pattern Recognition 2025] (SCI Q1 Journal) UniHDSA: A Unified Relation Prediction Approach for Hierarchical Document Structure Analysis
Jiawei Wang$^✉️$, Kai Hu, Qiang Huo
[ICDAR 2024] (Oral) DLAFormer: An End-to-End Transformer For Document Layout Analysis
Jiawei Wang*$^✉️$, Kai Hu*$^✉️$, Qiang Huo
[ICDAR 2024] (Oral) Dynamic Relation Transformer for Contextual Text Block Detection
Jiawei Wang*$^✉️$, Shunchi Zhang*$^✉️$, Kai Hu*$^✉️$, Chixiang Ma, Zhuoyao Zhong, Lei Sun, Qiang Huo
[ICDAR 2024] (Oral) UniVIE: A Unified Label Space Approach to Visual Information Extraction from Form-Like Documents
Kai Hu$^✉️$, Jiawei Wang, Weihong Lin, Zhuoyao Zhong, Lei Sun, Qiang Huo
[Pattern Recognition 2024] (SCI Q1 Journal) Detect-Order-Construct: A Tree Construction based Approach for Hierarchical Document Structure Analysis
Jiawei Wang$^✉️$, Kai Hu, Zhuoyao Zhong, Lei Sun, Qiang Huo
[ICDAR 2023] (Oral) DQ-DETR: Dynamic Queries Enhanced Detection Transformer for Arbitrary Shape Text Detection
Chixiang Ma$^✉️$, Lei Sun, Jiawei Wang, Qiang Huo
[ICDAR 2023] A Hybrid Approach to Document Layout Analysis for Heterogeneous Document Images
Zhuoyao Zhong$^✉️$, Jiawei Wang, Haiqing Sun, Kai Hu, Erhan Zhang, Lei Sun, Qiang Huo
[Pattern Recognition 2023] (SCI Q1 Journal) Robust Table Structure Recognition with Dynamic Queries Enhanced Detection Transformer
Jiawei Wang, Weihong Lin, Chixiang Ma, Mingze Li, Zheng Sun, Lei Sun$^✉️$, Qiang Huo
[ACM Multimedia 2022] (CCF-A) TSRFormer: Table Structure Recognition with Transformers
Weihong Lin, Zheng Sun, Chixiang Ma, Mingze Li, Jiawei Wang, Lei Sun, Qiang Huo$^✉️$

[The European Physical Journal Plus 2022] Study of nonequilibrium phase transitions mechanisms in exclusive network and node model of heterogeneous assignment based on real experimental data of KIF3AC and KIF3CC motors
Yuqing Wang$^✉️$, Chang Xu, Molin Fang, Tianze Li, Liwen Zhang, Dasen Wei, Kaichen Ouyang, Tunyu Zhang, Chuzhao Xu, Haosong Sun, Yunzhi Wang, Jiawei Wang
[International Journal of Modern Physics B 2019] Physical mechanisms in impacts of interaction factors on totally asymmetric simple exclusion processes
Yuqing Wang$^✉️$, Jiawei Wang, Binghong Wang
[International Journal of Modern Physics B 2019] Stochastic dynamics in nonequilibrium phase transitions of multiple totally asymmetric simple exclusion processes coupled with strong and weak interacting effects
Yuqing Wang*$^✉️$, Jiawei Wang*, Ziang Zhu*, Binghong Wang
[International Journal of Modern Physics B 2019] Evolvement laws and stability analyses of traffic network constituted by changing ramps and main road
Yuqing Wang$^✉️$, Chaofan Zhou, Jiawei Wang, Xinpeng Ni
[Modern Physics Letters B 2018] A macroscopic model for VOC emissions process complemented by real data
Yuqing Wang$^✉️$, Chaofan Zhou, Ziang Zhu, Jiawei Wang, Zimeng Wang, Chenhao Fang, Bin Jia
[ICMRA 2018] (Best Presentation Award) Control Strategies for Reducing VOCs Emission Process Based on Empirical Data
Yuqing Wang*$^✉️$, Jiawei Wang*, Ziang Zhu*, Chaofan Zhou, Yiyao Kou, Jing Sun, Zhengwei Mei, Ziwu Li, Peng Wu, Donghu Wang, Si Zhang, Wenli Zhang

📚 Academic Services

ICDAR Reviewer (2023, 2024)
IJDAR Reviewer (2024)
ACM MM Reviewer (2025)
AAAI Reviewer (2026)

🎖 Honors and Awards

2021‑2024: Core Contributor of Microsoft Azure AI Document Intelligence, Outstanding Contribution Award 📍 Microsoft
2023: 2nd Prize, Visual Prompt Tuning Challenge @ CVPR 2023 HIT Workshop (CNY 200,000 bonus) (2/200+) 📍 China
2022: 2nd Prize, Panoptic Scene Graph Challenge @ ECCV 2022 SenseHuman Workshop (CNY 100,000 bonus) (2/100+) 📍 China
2021: Provincial excellent graduate (Top 1%) 📍 Anhui, China
2020: Outstanding Student Scholarship Gold Award 📍 USTC
2019: Tang Lixin Scholarship (Annual funding of CNY 10,000 until Ph.D., Top 1%) 📍 USTC
2019: Suzhou Yucai Scholarship (Top 10 undergraduates per year) 📍 USTC
2018: Outstanding Student Scholarship Gold Award 📍 USTC
2018: First prize for freshman seminar papers 📍 USTC
2017‑2021: Cyrus Tang Scholarship (Awarded to college students who are both good in academics and enthusiastic about social welfare) 📍 USTC

💬 Invited Talks

2024.10: Towards Universal Layout Analysis. Hosted by Microsoft.

Jiawei Wang