Comparison results with other methods on M-BEIR test set. R@K refers to the Recall@K metric. qt, qi, ct and ci denote the text query, image query, text candidates and image candidates, respectively. Retrv-R1 achieves SOTA performance.
Example queries and retrieval results illustrating the effectiveness of Retrv-R1.
The research was partially supported by the RGC General Research Fund 11200323, NSFC/RGC JRS Project N_CityU198/24. We thank Mr. Liqun Liu and Mr. Peng Shu from Tencent for their collaborations, insightful discussions, and support with computational resources in this work.
@article{zhu2025retrv,
title={Retrv-R1: A Reasoning-Driven MLLM Framework for Universal and Efficient Multimodal Retrieval},
author={Zhu, Lanyun and Ji, Deyi and Chen, Tianrun and Wu, Haiyang and Wang, Shiqi},
journal={Advances in Neural Information Processing Systems},
year={2025}
}