😁Hi there, I’m Xiaofu

👨🏻‍💻I’m a First-Year NLP PhD student at MBZUAI supervised by Prof. Yova Kementchedjhieva. I am also closely working with my friend Yaxin Luo. I am so glad to do research projects under the supervision of Prof. Dimitrios Papadopoulos and Prof.Weizhi Meng.

📔My research focuses on Multimodal Learning and Representation Learning, with a particular emphasis on fine-grained alignment and interpretable representations across images, text, and video. I aim to move beyond coarse understanding toward reliable distinction and verification of fine details, applying these capabilities to dense image/video captioning and temporal event understanding. I also focus on multimodal evaluation and factual consistency, developing detail-sensitive metrics and benchmarks.

Publications:

📄SPECS: Specificity-Enhanced CLIP-Score for Long Image Caption Evaluation(EMNLP 2025 Main)

Xiaofu Chen, Israfel Salazar, Yova Kementchedjhieva

📄DViN: Dynamic Visual Routing Network for Weakly Supervised Referring Expression Comprehension(CVPR 2025)

Xiaofu Chen, Yaxin Luo, Gen Luo, Jiayi Ji, Henghui Ding, Yiyi Zhou

📄APL: Anchor-based Prompt Learning for One-stage Weakly Supervised Referring Expression Comprehension(ECCV 2024)

Yaxin Luo, Jiayi Ji, Xiaofu Chen, Yuxin Zhang, Tianhe Ren, Gen Luo