I am a fourth-year Ph.D. student in the joint program between the University of Science and Technology of China (USTC) and Microsoft Research Asia (MSRA), co-supervised by Prof. Qiang Huo at MSRA and Prof. Jun Du at USTC. My Ph.D. research focuses on Document Intelligence (including OCR, document layout analysis, and document understanding) and Large Language Models (including MLLM, Agent, and RAG). Prior to this, I received my B.S. degree from the School of the Gifted Young (a.k.a. 少年班) at the University of Science and Technology of China in 2021, majoring in Computer Science.
During my Ph.D. studies, I gained valuable industry experience through internships at MSRA, DeepSeek, and ByteDance. At DeepSeek, I contributed to the development of DeepSeek VL2 and DeepSeek V3. My internship at MSRA involved working on the Microsoft OneOCR project and the Microsoft Document Intelligence project under the guidance of Researcher Qiang Huo and Lei Sun. Most recently, I began an internship with the ByteDance Seed team, where I am working on LLM/MLLM Agent projects. I have published over 10 papers (Citation: 3200+) at top-tier international AI journals and conferences.
I am currently seeking full-time job opportunities. If you are interested in my resume, please feel free to email me at jarvisustc@gmail.com. I am currently based in Beijing, China. If you would like to have a coffee chat, please feel free to reach out! ☕😊✨
🔥 News
- 2025.08: 🔥 We introduce WideSearch: a new benchmark to test if AI agents can handle large-scale, repetitive information gathering — the real bottleneck in productivity. 🚧 Please find all details in our project page.
- 2025.06: 🎉 Our paper on VLM robustness, "Safeguarding Vision-Language Models: Mitigating Vulnerabilities to Gaussian Noise in Perturbation-based Attacks," has been accepted by ICCV 2025! See you in Hawaii!
- 2025.05: 🔥 Our latest research, DoctorAgent-RL: A Multi-Agent Collaborative Reinforcement Learning System for Multi-Turn Clinical Dialogue, is now available. This work highlights a crucial principle in Human-Agent Interaction: Agents must proactively request necessary information to excel, as humans may not always volunteer it. This "Agent-must-ask" paradigm is central to DoctorAgent-RL's ability to facilitate better task completion in complex multi-turn dialogues.
- 2025.04: 🎉 Thrilled to kick off my new internship with the ByteDance Seed Team!
- 2025.04: 🔥 Our latest work on VLM robustness, "Safeguarding Vision-Language Models: Mitigating Vulnerabilities to Gaussian Noise in Perturbation-based Attacks," has been released on Arxiv. We've open-sourced the Robust-VLGuard dataset and DiffPure-VLM defense.
- 2025.03: 🎉 Our paper "UniHDSA: A Unified Relation Prediction Approach for Hierarchical Document Structure Analysis" has been accepted by Pattern Recognition Journal!
- 2024.12: 💻 We've launched a new GitHub project: Awesome-Multimodal-RAG! Check out the latest in multimodal RAG and contribute!
- 2024.12: 🤝 We're excited to have contributed to DeepSeek-VL2, an advanced Vision-Language Model with strong performance and fewer parameters.
🔥 More News
- 2024.08-09: 🗣️ Presented DLAFormer and DRFormer at ICDAR in Athens! Photos can be found here. A memorable experience meeting colleagues and exploring the city.
- 2024.08: ✍️ The complete version of DLAFormer, titled "UniHDSA: A Unified Relation Prediction Approach for Hierarchical Document Structure Analysis", has been submitted to Pattern Recognition Journal.
- 2024.07: 🎉 Our Detect-Order-Construct have been accepted by Pattern Recognition!
- 2024.06: 🗣️ Our DLAFormer, UniVIE, and DRFormer selected for oral presentation at ICDAR 2024!
- 2024.03: 🚀 Azure AI Document Intelligence now supports Hierarchical Document Structure Analysis (HDSA), based on our "Detect-Order-Construct: A Tree Construction based Approach for Hierarchical Document Structure Analysis" paper. Details on arXiv and the official announcement.
- 2024.03: 💻 Source code released for our Language-Enhanced Image New Category Discovery solution from the CVPR 2023 HIT Workshop.
- 2024.02: ✍️ Our new work on Document Layout Analysis, DLAFormer: A End-to-End Transformer for Document Layout Analysis, submitted to ICDAR 2024.
- 2024.01: 💡 Introduced UniVIE: A Unified Label Space Approach to Visual Information Extraction from Form-like Documents! Reframing VIE as relation prediction with a unified label space.
- 2024.01: 📄 New technical paper released: Dynamic Relation Transformer for Contextual Text Block Detection!
- 2023.12: 🏆 2nd Prize, 2023 International Algorithm Case Competition (Visual Prompt Tuning Challenge @ CVPR 2023 HIT Workshop), 200,000 RMB bonus!
- 2023.11: ✍️ Our new progress on Hierarchical Document Structure Analysis submitted to Pattern Recognition Journal.
- 2023.07: 🎉 "Robust Table Structure Recognition with Dynamic Queries Enhanced Detection Transformer" accepted by Pattern Recognition Journal!
- 2023.04: 🎉 Two papers accepted by ICDAR 2023!
- 2023.03: 💡 Proposed a new Dynamic Queries based Detection Transformer for more robust table structure recognition!
- 2022.12: 🏆 2nd Prize, 2022 International Algorithm Case Competition (Panoptic Scene Graph Challenge @ ECCV 2022 SenseHuman Workshop), 100,000 RMB bonus!
- 2022.09: 🎉 One paper accepted by ACM MM 2022!
💻 Experiences
- 2025.4-Now: Research Intern, Seed Team, ByteDance
, Beijing, China.
- 2024.09-2025.03: Research Intern, Multimodal Interaction Group, Microsoft Research Asia
, Beijing, China.
- 2024.06-2024.08: AGI Research Intern, Multimodal LLM Team, DeepSeek
, Beijing, China.
- 2020.09-2024.05: Research Intern, Multimodal Interaction Group, Microsoft Research Asia
, Beijing, China.
📖 Educations
- 2021.09-2026.6: Ph.D. in Information and Communication Engineering, University of Science and Technology of China, Hefei, Anhui, China.
- 2017.09-2021.06: B.S. in the School of the Gifted Young (major in Computer Science), University of Science and Technology of China, Hefei, Anhui, China.
📝 Publications
- ✉️ means Corresponding Author; * means Equal Contribution
🤖 LLMs & MLLMs
ICCV 2025
(CCF-A) Safeguarding Vision-Language Models: Mitigating Vulnerabilities to Gaussian Noise in Perturbation-based Attacks, Jiawei Wang*$^✉️$, Yushen Zuo*, Yuanjun Chai, Zhendong Liu, Yicheng Fu, Yichun Feng$^✉️$, Kin-man Lam$^✉️$Submitted to Nature Communications (2025)
[A Scalable Retrieval-Augmented Reasoning Framework Based on Large Language Models for Knowledge Mining in Biomedical Literature], Yichun Feng, Jiawei Wang, Lu Zhou, Yixue Li$^✉️$arXiv 2025
(Cutting-edge Project) DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning, DeepSeek AIarXiv 2024
(Cutting-edge Project) DeepSeek-V3 Technical Report, DeepSeek-AIarXiv 2024
(Cutting-edge Project) DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding, Zhiyu Wu*, Xiaokang Chen*, Zizheng Pan*, Xingchao Liu*, Wen Liu*, Damai Dai, Huazuo Gao, Yiyang Ma, Chengyue Wu, Bingxuan Wang, Zhenda Xie, Yu Wu, Kai Hu, Jiawei Wang, Yaofeng Sun, Yukun Li, Yishi Piao, Kang Guan, Aixin Liu, Xin Xie, Yuxiang You, Kai Dong, Xingkai Yu, Haowei Zhang, Liang Zhao, Yisong Wang, Chong Ruan$^✉️$Submitted to AAAI 2026
DoctorAgent-RL: A Multi-Agent Collaborative Reinforcement Learning System for Multi-Turn Clinical Dialogue, Yichun Feng*, Jiawei Wang*, Lu Zhou, Yixue Li$^✉️$arXiv 2025
WideSearch: Benchmarking Agentic Broad Info-Seeking, Ryan Wong*, Jiawei Wang*, Junjie Zhao, Li Chen, Yan Gao, Long Zhang, Xuan Zhou, Zuo Wang, Kai Xiang, Ge Zhang, Wenhao Huang, Yang Wang$^✉️$, Ke Wang$^✉️$
📄 Document Intelligence
Pattern Recognition 2025
(SCI Q1 Journal) UniHDSA: A Unified Relation Prediction Approach for Hierarchical Document Structure Analysis, Jiawei Wang$^✉️$, Kai Hu, Qiang HuoICDAR 2024
(Oral) DLAFormer: An End-to-End Transformer For Document Layout Analysis, Jiawei Wang*$^✉️$, Kai Hu*$^✉️$, Qiang HuoICDAR 2024
(Oral) Dynamic Relation Transformer for Contextual Text Block Detection, Jiawei Wang*$^✉️$, Shunchi Zhang*$^✉️$, Kai Hu*$^✉️$, Chixiang Ma, Zhuoyao Zhong, Lei Sun, Qiang HuoICDAR 2024
(Oral) UniVIE: A Unified Label Space Approach to Visual Information Extraction from Form-Like Documents, Kai Hu$^✉️$, Jiawei Wang, Weihong Lin, Zhuoyao Zhong, Lei Sun, Qiang HuoPattern Recognition 2024
(SCI Q1 Journal) Detect-Order-Construct: A Tree Construction based Approach for Hierarchical Document Structure Analysis, Jiawei Wang$^✉️$, Kai Hu, Zhuoyao Zhong, Lei Sun, Qiang HuoICDAR 2023
(Oral) DQ-DETR: Dynamic Queries Enhanced Detection Transformer for Arbitrary Shape Text Detection, Chixiang Ma$^✉️$, Lei Sun, Jiawei Wang, Qiang HuoICDAR 2023
A Hybrid Approach to Document Layout Analysis for Heterogeneous Document Images, Zhuoyao Zhong$^✉️$, Jiawei Wang, Haiqing Sun, Kai Hu, Erhan Zhang, Lei Sun, Qiang HuoPattern Recognition 2023
(SCI Q1 Journal) Robust Table Structure Recognition with Dynamic Queries Enhanced Detection Transformer, Jiawei Wang, Weihong Lin, Chixiang Ma, Mingze Li, Zheng Sun, Lei Sun$^✉️$, Qiang HuoACM Multimedia 2022
(CCF-A) TSRFormer: Table Structure Recognition with Transformers, Weihong Lin, Zheng Sun, Chixiang Ma, Mingze Li, Jiawei Wang, Lei Sun, Qiang Huo$^✉️$
🧪 Physics Related
The European Physical Journal Plus 2022
Study of nonequilibrium phase transitions mechanisms in exclusive network and node model of heterogeneous assignment based on real experimental data of KIF3AC and KIF3CC motors, Yuqing Wang$^✉️$, Chang Xu, Molin Fang, Tianze Li, Liwen Zhang, Dasen Wei, Kaichen Ouyang, Tunyu Zhang, Chuzhao Xu, Haosong Sun, Yunzhi Wang, Jiawei WangInternational Journal of Modern Physics B 2019
Physical mechanisms in impacts of interaction factors on totally asymmetric simple exclusion processes, Yuqing Wang$^✉️$, Jiawei Wang, Binghong WangInternational Journal of Modern Physics B 2019
Stochastic dynamics in nonequilibrium phase transitions of multiple totally asymmetric simple exclusion processes coupled with strong and weak interacting effects, Yuqing Wang*$^✉️$, Jiawei Wang*, Ziang Zhu*, Binghong WangInternational Journal of Modern Physics B 2019
Evolvement laws and stability analyses of traffic network constituted by changing ramps and main road, Yuqing Wang$^✉️$, Chaofan Zhou, Jiawei Wang, Xinpeng NiModern Physics Letters B 2018
A macroscopic model for VOC emissions process complemented by real data, Yuqing Wang$^✉️$, Chaofan Zhou, Ziang Zhu, Jiawei Wang, Zimeng Wang, Chenhao Fang, Bin JiaICMRA 2018
(Best Presentation Award) Control Strategies for Reducing VOCs Emission Process Based on Empirical Data, Yuqing Wang*$^✉️$, Jiawei Wang*, Ziang Zhu*, Chaofan Zhou, Yiyao Kou, Jing Sun, Zhengwei Mei, Ziwu Li, Peng Wu, Donghu Wang, Si Zhang, Wenli Zhang
📚 Academic Services
- ICDAR Reviewer (2023, 2024)
- IJDAR Reviewer (2024)
- ACM MM Reviewer (2025)
- AAAI Reviewer (2026)
🎖 Honors and Awards
- 2021‑2024: Core Contributor of Microsoft Azure AI Document Intelligence, Outstanding Contribution Award 📍 Microsoft
- 2023: 2nd Prize, Visual Prompt Tuning Challenge @ CVPR 2023 HIT Workshop (CNY 200,000 bonus) (2/200+) 📍 China
- 2022: 2nd Prize, Panoptic Scene Graph Challenge @ ECCV 2022 SenseHuman Workshop (CNY 100,000 bonus) (2/100+) 📍 China
- 2021: Provincial excellent graduate (Top 1%) 📍 Anhui, China
- 2020: Outstanding Student Scholarship Gold Award 📍 USTC
- 2019: Tang Lixin Scholarship (Annual funding of CNY 10,000 until Ph.D., Top 1%) 📍 USTC
- 2019: Suzhou Yucai Scholarship (Top 10 undergraduates per year) 📍 USTC
- 2018: Outstanding Student Scholarship Gold Award 📍 USTC
- 2018: First prize for freshman seminar papers 📍 USTC
- 2017‑2021: Cyrus Tang Scholarship (Awarded to college students who are both good in academics and enthusiastic about social welfare) 📍 USTC
💬 Invited Talks
- 2024.10: Towards Universal Layout Analysis. Hosted by Microsoft.