About
I am a master’s student at the School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, expected to graduate in March 2027. I am advised by Prof. Wenjie Pei. My research interests lie in multimodal large models and image generation, editing, and understanding.
From May 2025 to March 2026, I worked at Tencent WeChat AI on image generation and evaluation. Since March 2026, I have been with the Tencent Hunyuan team, focusing on GUI-related research. If you are interested in my research, feel free to get in touch!
Research
My current research interests include:
- GUI
- OCR and scene text understanding
- Image generation and editing
- Vision–language models (VLMs / VLLMs)
Publications
2026
- Zhengyao Fang, et al. “Too Vivid to Be Real? Benchmarking and Calibrating Generative Color Fidelity” CVPR 2026 (Highlight).
ArXiv · Code - Zhengyao Fang, et al. “Prune Redundancy, Preserve Essence: Vision Token Compression in VLMs via Synergistic Importance-Diversity” ICLR 2026.
ArXiv · Code 2025 - Zhengyao Fang, et al. “Recognition-Synergistic Scene Text Editing” CVPR 2025.
ArXiv · Code 2024 - Jingjing Wu, Zhengyao Fang, et al. “WeCromCL: Weakly Supervised Cross-Modality Contrastive Learning for Transcription-only Supervised Text Spotting” ECCV 2024.
ArXiv · Code
Contact
- Email: zhengyaonineve@outlook.com
- GitHub: @ZhengyaoFang
- Google Scholar: Google Scholar