Bin Zhu

I am an Assistant Professor of Computer Science at the School of Computing and Information Systems, Singapore Management University (SMU). Before joining SMU, I was a Postdoctoral Researcher working with Prof. Dima Damen at the University of Bristol, contributing to the EPSRC Visual AI Program Grant led by Prof. Andrew Zisserman . I earned my Ph.D. degree from Department of Computer Science, City University of Hong Kong in 2021, under the supervision of Prof. Chong-Wah Ngo and Dr. Wing-Kwong Chan. Earlier, I obtained my master and bachelor degrees from Zhejiang University and Southeast University respectively.
My research interest lies in Human Centered Multimedia Computing, including Cross-modal retrieval, Multi‐modal Large Language Model, Egocentric Video Understanding, Generative models and AI for Healthcare. Specifically, the objective is to conduct frontier research and develop cutting-edge technologies for processing, modeling, analyzing, understanding and generating multimedia content, that facilitate natural and immersive human experience and exert positive impact for our society.
🔥🔥🔥Openings: I am actively looking for self-motivated Ph.D. students, CSC visiting students, and (remote) interns. If you are interested in working with me, please feel free to drop me an email with your CV and other supporting documents (if any).
news
Jun 26, 2025 | One paper on Multimodal Large Language Model is accepted by ICCV 2025. |
---|---|
Jun 19, 2025 | One paper on Recipe Progress Tracking in Non-Visual Cooking are accepted by ASSETS 2025. |
Jun 09, 2025 | One paper on Cooking Procedural Image Generation is accepted by ACM TOMM. |
Jun 06, 2025 | I will serve as the Program Co-Chair for ACM ICMR 2027, which will be held in Singapore! |
Apr 23, 2025 | One paper on Nutrition Estimation is accepted by ICMR 2025. |
Mar 21, 2025 | One paper on Ingredient Recognition is accepted by ICME 2025 (oral). |
Mar 08, 2025 | One paper on Egocentric Video Understanding is accepted by CVPR 2025. |
Feb 23, 2025 | One paper on Recipe Following in Cooking Video is accepted by CHI (LBW) 2025. |
Jan 16, 2025 | One paper on Large Multimodal Model in Food Domain is accepted by IEEE TMM. |
Dec 09, 2024 | Two papers on Grasp Generation and Text-to-Hand-Image Generation are accepted by AAAI 2025. |
Dec 05, 2024 | We are excited to announce our Special Session on Multimedia for Cooking and Eating Activities at ICME 2025. We warmly invite you to submit your papers! |
Oct 29, 2024 | One paper on Recipe Generation is accepted by WACV 2025. |
Jul 15, 2024 | One paper on Cross-modal Recipe Retrieval is accepted by ECCV 2024. |
Jul 15, 2024 | One paper on Time-series Weight Prediction is accepted by ACM MM 2024 and is further selected as an Oral presentation (3.97%). |
Jun 15, 2024 | One paper on Text-driven Video Prediction is accepted by ACM TOMM. |
Feb 15, 2024 | Two papers on Unsupervised Video Hashing and Generalizable Food Recognition are accepted by IEEE TMM. |
Jan 01, 2024 | I joined Singapore Management University as an Assistant Professor of Computer Science. ![]() ![]() |
selected publications
- AAAIRAGG: Retrieval-Augmented Grasp Generation ModelIn Proceedings of the AAAI Conference on Artificial Intelligence, 2025
- ArXivCalling a Spade a Heart: Gaslighting Multimodal Large Language Models via NegationarXiv preprint arXiv:2501.19017, 2025
- ArXivDon’t Deceive Me: Mitigating Gaslighting through Attention Reallocation in LMMsarXiv preprint arXiv:2504.09456, 2025
- MM Oral