I am a fourth-year Ph.D. student in the joint program between the University of Science and Technology of China (USTC) and Microsoft Research Asia (MSRA), co-supervised by Prof. Qiang Huo at MSRA and Prof. Jun Du at USTC. My Ph.D. research focuses on Document Intelligence (including OCR, document layout analysis, and document understanding) and Large Language Models (including MLLM, Agent, and RAG). Prior to this, I received my B.S. degree from the School of the Gifted Young (a.k.a. 少年班) at the University of Science and Technology of China in 2021, majoring in Computer Science.

During my Ph.D. studies, I gained valuable industry experience through internships at MSRA, DeepSeek, and ByteDance. At DeepSeek, I contributed to the development of DeepSeek VL2 and DeepSeek V3. My internship at MSRA involved working on the Microsoft OneOCR project and the Microsoft Document Intelligence project under the guidance of Researcher Qiang Huo and Lei Sun. Most recently, I began an internship with the ByteDance Seed team, where I am working on LLM/MLLM Agent projects. I have published over 10 papers (Citation: 3200+) at top-tier international AI journals and conferences.

I am currently seeking full-time job opportunities. If you are interested in my resume, please feel free to email me at jarvisustc@gmail.com. I am currently based in Beijing, China. If you would like to have a coffee chat, please feel free to reach out! ☕😊✨

🔥 News

🔥 More News

💻 Experiences

  • 2025.4-Now: Research Intern, Seed Team, ByteDance , Beijing, China.
  • 2024.09-2025.03: Research Intern, Multimodal Interaction Group, Microsoft Research Asia , Beijing, China.
  • 2024.06-2024.08: AGI Research Intern, Multimodal LLM Team, DeepSeek , Beijing, China.
  • 2020.09-2024.05: Research Intern, Multimodal Interaction Group, Microsoft Research Asia , Beijing, China.

📖 Educations

  • 2021.09-2026.6: Ph.D. in Information and Communication Engineering, University of Science and Technology of China, Hefei, Anhui, China.
  • 2017.09-2021.06: B.S. in the School of the Gifted Young (major in Computer Science), University of Science and Technology of China, Hefei, Anhui, China.

📝 Publications

  • ✉️ means Corresponding Author; * means Equal Contribution

🤖 LLMs & MLLMs

📄 Document Intelligence

📚 Academic Services

  • ICDAR Reviewer (2023, 2024)
  • IJDAR Reviewer (2024)
  • ACM MM Reviewer (2025)
  • AAAI Reviewer (2026)

🎖 Honors and Awards

  • 2021‑2024: Core Contributor of Microsoft Azure AI Document Intelligence, Outstanding Contribution Award 📍 Microsoft
  • 2023: 2nd Prize, Visual Prompt Tuning Challenge @ CVPR 2023 HIT Workshop (CNY 200,000 bonus) (2/200+) 📍 China
  • 2022: 2nd Prize, Panoptic Scene Graph Challenge @ ECCV 2022 SenseHuman Workshop (CNY 100,000 bonus) (2/100+) 📍 China
  • 2021: Provincial excellent graduate (Top 1%) 📍 Anhui, China
  • 2020: Outstanding Student Scholarship Gold Award 📍 USTC
  • 2019: Tang Lixin Scholarship (Annual funding of CNY 10,000 until Ph.D., Top 1%) 📍 USTC
  • 2019: Suzhou Yucai Scholarship (Top 10 undergraduates per year) 📍 USTC
  • 2018: Outstanding Student Scholarship Gold Award 📍 USTC
  • 2018: First prize for freshman seminar papers 📍 USTC
  • 2017‑2021: Cyrus Tang Scholarship (Awarded to college students who are both good in academics and enthusiastic about social welfare) 📍 USTC

💬 Invited Talks

  • 2024.10: Towards Universal Layout Analysis. Hosted by Microsoft.