😁Hi there, I’m Xiaofu

👨🏻‍💻 I am a first-year NLP PhD student at MBZUAI, supervised by Prof. Yova Kementchedjhieva. I also work closely with my friend Yaxin Luo, and I am glad to have worked on research projects under the supervision of Prof. Dimitrios Papadopoulos and Prof. Weizhi Meng.

📔 My current research interests lie in agents, especially agents designed for long-horizon tasks. I am particularly interested in memory systems for agents: how an agent can build, maintain, and use long-term memory across extended interactions, evolving tasks, and changing contexts. I hope to explore memory mechanisms that go beyond simply storing past context, and instead help agents remember what matters, update their understanding over time, and make better use of previous experience in future tasks.

At the same time, I remain interested in multimodal learning, especially vision-language models. My work in this area focuses on fine-grained alignment across images, text, and video, interpretable multimodal representations, and the reliability of VLMs in detailed understanding, factual consistency, and long-form image/video description. In the long run, I am also interested in how multimodal understanding can be integrated into agent systems, enabling agents to better perceive, reason, and interact with the visual world.

Publications:

📄SPECS: Specificity-Enhanced CLIP-Score for Long Image Caption Evaluation(EMNLP 2025 Main)

Xiaofu Chen, Israfel Salazar, Yova Kementchedjhieva

📄DViN: Dynamic Visual Routing Network for Weakly Supervised Referring Expression Comprehension(CVPR 2025)

Xiaofu Chen, Yaxin Luo, Gen Luo, Jiayi Ji, Henghui Ding, Yiyi Zhou

📄APL: Anchor-based Prompt Learning for One-stage Weakly Supervised Referring Expression Comprehension(ECCV 2024)

Yaxin Luo, Jiayi Ji, Xiaofu Chen, Yuxin Zhang, Tianhe Ren, Gen Luo