Bin Zhu

prof_pic.jpg
Google Scholar

E-mail

80 Stamford Road, Singapore 178902

I am an Assistant Professor of Computer Science at the School of Computing and Information Systems, Singapore Management University (SMU). Before joining SMU, I was a Postdoctoral Researcher working with Prof. Dima Damen at the University of Bristol, contributing to the EPSRC Visual AI Program Grant led by Prof. Andrew Zisserman . I earned my Ph.D. degree from Department of Computer Science, City University of Hong Kong in 2021, under the supervision of Prof. Chong-Wah Ngo and Dr. Wing-Kwong Chan. Earlier, I obtained my master and bachelor degrees from Zhejiang University and Southeast University respectively.

My research interest lies in Human Centered Multimedia Computing, including Cross-modal retrieval, Multi‐modal Large Language Model, Egocentric Video Understanding, Generative models and AI for Healthcare. Specifically, the objective is to conduct frontier research and develop cutting-edge technologies for processing, modeling, analyzing, understanding and generating multimedia content, that facilitate natural and immersive human experience and exert positive impact for our society.

🔥🔥🔥Openings: I am actively looking for self-motivated Ph.D. students, CSC visiting students, and (remote) interns. If you are interested in working with me, please feel free to drop me an email with your CV and other supporting documents (if any).

news

Jun 26, 2025 One paper on Multimodal Large Language Model is accepted by ICCV 2025.
Jun 19, 2025 One paper on Recipe Progress Tracking in Non-Visual Cooking are accepted by ASSETS 2025.
Jun 09, 2025 One paper on Cooking Procedural Image Generation is accepted by ACM TOMM.
Jun 06, 2025 I will serve as the Program Co-Chair for ACM ICMR 2027, which will be held in Singapore!
Apr 23, 2025 One paper on Nutrition Estimation is accepted by ICMR 2025.
Mar 21, 2025 One paper on Ingredient Recognition is accepted by ICME 2025 (oral).
Mar 08, 2025 One paper on Egocentric Video Understanding is accepted by CVPR 2025.
Feb 23, 2025 One paper on Recipe Following in Cooking Video is accepted by CHI (LBW) 2025.
Jan 16, 2025 One paper on Large Multimodal Model in Food Domain is accepted by IEEE TMM.
Dec 09, 2024 Two papers on Grasp Generation and Text-to-Hand-Image Generation are accepted by AAAI 2025.
Dec 05, 2024 We are excited to announce our Special Session on Multimedia for Cooking and Eating Activities at ICME 2025. We warmly invite you to submit your papers!
Oct 29, 2024 One paper on Recipe Generation is accepted by WACV 2025.
Jul 15, 2024 One paper on Cross-modal Recipe Retrieval is accepted by ECCV 2024.
Jul 15, 2024 One paper on Time-series Weight Prediction is accepted by ACM MM 2024 and is further selected as an Oral presentation (3.97%).
Jun 15, 2024 One paper on Text-driven Video Prediction is accepted by ACM TOMM.
Feb 15, 2024 Two papers on Unsupervised Video Hashing and Generalizable Food Recognition are accepted by IEEE TMM.
Jan 01, 2024 I joined Singapore Management University as an Assistant Professor of Computer Science. :sparkles: :smile:

selected publications

  1. ICCV
    ICCV25-Dual-LoRA.jpg
    From Holistic to Localized: Local Enhanced Adapters for Efficient Visual Instruction Fine-Tuning
    Pengkun Jiao, Bin Zhu, Jingjing Chen, Chong-Wah Ngo, and Yugang Jiang
    In International Conference on Computer Vision, 2025
  2. CVPR
    HDEPIC.jpg
    HD-EPIC: A highly-detailed egocentric video dataset
    Toby Perrett, Ahmad Darkhalil, Saptarshi Sinha, Omar Emara, Sam Pollard, Kranti Kumar Parida, Kaiting Liu, Prajwal Gatti, Siddhant Bansal, Kevin Flanagan, Jacob Chalk, Zhifan Zhu, Rhodri Guerrier, Fahd Abdelazim, Bin Zhu, Davide Moltisanti, Michael Wray, Hazel Doughty, and Dima Damen
    In Proceedings of the Computer Vision and Pattern Recognition Conference, 2025
  3. AAAI
    Hand1000.jpg
    Hand1000: Generating realistic hands from text with only 1,000 images
    Haozhuo Zhang, Bin Zhu, Yu Cao, and Yanbin Hao
    In Proceedings of the AAAI Conference on Artificial Intelligence, 2025
  4. AAAI
    HandGrasp.jpg
    RAGG: Retrieval-Augmented Grasp Generation Model
    Zhenhua Tang, Bin Zhu, Yanbin Hao, Chong-Wah Ngo, and Richang Hong
    In Proceedings of the AAAI Conference on Artificial Intelligence, 2025
  5. TMM
    FoodLMM.jpg
    Foodlmm: A versatile food assistant using large multi-modal model
    Yuehao Yin, Huiyan Qi, Bin Zhu, Jingjing Chen, Yu-Gang Jiang, and Chong-Wah Ngo
    IEEE Transactions on Multimedia, 2025
  6. ArXiv
    GaslightingBench.jpg
    Calling a Spade a Heart: Gaslighting Multimodal Large Language Models via Negation
    Bin Zhu, Huiyan Qi, Yinxuan Gui, Jingjing Chen, Chong-Wah Ngo, and Ee-Peng Lim
    arXiv preprint arXiv:2501.19017, 2025
  7. ArXiv
    GaslightingBench-R.jpg
    Reasoning Models Are More Easily Gaslighted Than You Think
    Bin Zhu, Hailong Yin, Jingjing Chen, and Yu-Gang Jiang
    arXiv preprint arXiv:2506.09677, 2025
  8. ArXiv
    Gaslighting-Attention.jpg
    Don’t Deceive Me: Mitigating Gaslighting through Attention Reallocation in LMMs
    Pengkun Jiao, Bin Zhu, Jingjing Chen, Chong-Wah Ngo, and Yu-Gang Jiang
    arXiv preprint arXiv:2504.09456, 2025
  9. MM Oral
    cover_weightprediction.jpg
    Navigating weight prediction with diet diary
    Yinxuan Gui, Bin Zhu, Jingjing Chen, Chong Wah Ngo, and Yu-Gang Jiang
    In Proceedings of the 32nd ACM International Conference on Multimedia, 2024
  10. ECCV
    cover_DAR.jpg
    Enhancing recipe retrieval with foundation models: A data augmentation perspective
    Fangzhou Song, Bin Zhu, Yanbin Hao, and Shuo Wang
    In European Conference on Computer Vision, 2024