Bin Zhu

prof_pic.jpg
Google Scholar

E-mail

80 Stamford Road, Singapore 178902

I am an Assistant Professor of Computer Science at the School of Computing and Information Systems, Singapore Management University (SMU). Before joining SMU, I was a Postdoctoral Researcher working with Prof. Dima Damen at the University of Bristol, contributing to the EPSRC Visual AI Program Grant led by Prof. Andrew Zisserman . I earned my Ph.D. degree from Department of Computer Science, City University of Hong Kong in 2021, under the supervision of Prof. Chong-Wah Ngo and Dr. Wing-Kwong Chan. Earlier, I obtained my master and bachelor degrees from Zhejiang University and Southeast University respectively.

My research interest lies in Human Centered Multimedia Computing, including Cross-modal retrieval, Multi‐modal Large Language Model, Egocentric Video Understanding, Generative models and AI for Healthcare. Specifically, the objective is to conduct frontier research and develop cutting-edge technologies for processing, modeling, analyzing, understanding and generating multimedia content, that facilitate natural and immersive human experience and exert positive impact for our society.

🔥🔥🔥Openings: I am actively looking for self-motivated Ph.D. students, CSC visiting students, and (remote) interns. If you are interested in working with me, please feel free to drop me an email with your CV and other supporting documents (if any).

news

Jul 19, 2025 One paper on Large Lithium-ion Battery Model is accepted by Nature Communications as co-corresponding author.
Jul 05, 2025 One paper on Multimodal Large Language Model is accepted by ACM MM 2025.
Jun 26, 2025 One paper on Multimodal Large Language Model is accepted by ICCV 2025.
Jun 19, 2025 One paper on Recipe Progress Tracking in Non-Visual Cooking are accepted by ASSETS 2025.
Jun 09, 2025 One paper on Cooking Procedural Image Generation is accepted by ACM TOMM.
Jun 06, 2025 I will serve as the Program Co-Chair for ACM ICMR 2027, which will be held in Singapore!
Apr 23, 2025 One paper on Nutrition Estimation is accepted by ICMR 2025.
Mar 21, 2025 One paper on Ingredient Recognition is accepted by ICME 2025 (oral).
Mar 08, 2025 One paper on Egocentric Video Understanding is accepted by CVPR 2025.
Feb 23, 2025 One paper on Recipe Following in Cooking Video is accepted by CHI (LBW) 2025.
Jan 16, 2025 One paper on Large Multimodal Model in Food Domain is accepted by IEEE TMM.
Dec 09, 2024 Two papers on Grasp Generation and Text-to-Hand-Image Generation are accepted by AAAI 2025.
Dec 05, 2024 We are excited to announce our Special Session on Multimedia for Cooking and Eating Activities at ICME 2025. We warmly invite you to submit your papers!
Oct 29, 2024 One paper on Recipe Generation is accepted by WACV 2025.
Jul 15, 2024 One paper on Cross-modal Recipe Retrieval is accepted by ECCV 2024.
Jul 15, 2024 One paper on Time-series Weight Prediction is accepted by ACM MM 2024 and is further selected as an Oral presentation (3.97%).
Jun 15, 2024 One paper on Text-driven Video Prediction is accepted by ACM TOMM.
Feb 15, 2024 Two papers on Unsupervised Video Hashing and Generalizable Food Recognition are accepted by IEEE TMM.
Jan 01, 2024 I joined Singapore Management University as an Assistant Professor of Computer Science. :sparkles: :smile:

selected publications

  1. ICCV
    ICCV25-Dual-LoRA.jpg
    From Holistic to Localized: Local Enhanced Adapters for Efficient Visual Instruction Fine-Tuning
    Pengkun Jiao, Bin Zhu, Jingjing Chen, Chong-Wah Ngo, and Yugang Jiang
    In International Conference on Computer Vision, 2025
  2. MM
    MM2025.png
    Look before you decide: Prompting active deduction of mllms for assumptive reasoning
    Yian Li, Wentao Tian, Yang Jiao, Jingjing Chen, Tianwen Qian, Bin Zhu, Na Zhao, and Yu-Gang Jiang
    In Proceedings of the 33rd ACM International Conference on Multimedia, 2025
  3. CVPR
    HDEPIC.jpg
    HD-EPIC: A highly-detailed egocentric video dataset
    Toby Perrett, Ahmad Darkhalil, Saptarshi Sinha, Omar Emara, Sam Pollard, Kranti Kumar Parida, Kaiting Liu, Prajwal Gatti, Siddhant Bansal, Kevin Flanagan, Jacob Chalk, Zhifan Zhu, Rhodri Guerrier, Fahd Abdelazim, Bin Zhu, Davide Moltisanti, Michael Wray, Hazel Doughty, and Dima Damen
    In Proceedings of the Computer Vision and Pattern Recognition Conference, 2025
  4. AAAI
    Hand1000.jpg
    Hand1000: Generating realistic hands from text with only 1,000 images
    Haozhuo Zhang, Bin Zhu, Yu Cao, and Yanbin Hao
    In Proceedings of the AAAI Conference on Artificial Intelligence, 2025
  5. AAAI
    HandGrasp.jpg
    RAGG: Retrieval-Augmented Grasp Generation Model
    Zhenhua Tang, Bin Zhu, Yanbin Hao, Chong-Wah Ngo, and Richang Hong
    In Proceedings of the AAAI Conference on Artificial Intelligence, 2025
  6. TMM
    FoodLMM.jpg
    FoodLMM: A versatile food assistant using large multi-modal model
    Yuehao Yin, Huiyan Qi, Bin Zhu, Jingjing Chen, Yu-Gang Jiang, and Chong-Wah Ngo
    IEEE Transactions on Multimedia, 2025
  7. ArXiv
    GaslightingBench.jpg
    Calling a Spade a Heart: Gaslighting Multimodal Large Language Models via Negation
    Bin Zhu, Huiyan Qi, Yinxuan Gui, Jingjing Chen, Chong-Wah Ngo, and Ee-Peng Lim
    arXiv preprint arXiv:2501.19017, 2025
  8. ArXiv
    GaslightingBench-R.jpg
    Reasoning Models Are More Easily Gaslighted Than You Think
    Bin Zhu, Hailong Yin, Jingjing Chen, and Yu-Gang Jiang
    arXiv preprint arXiv:2506.09677, 2025
  9. ArXiv
    Gaslighting-Attention.jpg
    Don’t Deceive Me: Mitigating Gaslighting through Attention Reallocation in LMMs
    Pengkun Jiao, Bin Zhu, Jingjing Chen, Chong-Wah Ngo, and Yu-Gang Jiang
    arXiv preprint arXiv:2504.09456, 2025
  10. MM Oral
    cover_weightprediction.jpg
    Navigating weight prediction with diet diary
    Yinxuan Gui, Bin Zhu, Jingjing Chen, Chong Wah Ngo, and Yu-Gang Jiang
    In Proceedings of the 32nd ACM International Conference on Multimedia, 2024
  11. ECCV
    cover_DAR.jpg
    Enhancing recipe retrieval with foundation models: A data augmentation perspective
    Fangzhou Song, Bin Zhu, Yanbin Hao, and Shuo Wang
    In European Conference on Computer Vision, 2024
  12. MM
    cover_CgT-GAN.jpg
    CgT-GAN: clip-guided text GAN for image captioning
    Jiarui Yu, Haoran Li, Yanbin Hao, Bin Zhu, Tong Xu, and Xiangnan He
    In Proceedings of the 31st ACM International Conference on Multimedia, 2023
  13. NeurIPS
    cover_VISOR.jpg
    Epic-kitchens visor benchmark: Video segmentations and object relations
    Ahmad Darkhalil, Dandan Shan, Bin Zhu, Jian Ma, Amlan Kar, Richard Higgins, Sanja Fidler, David Fouhey, and Dima Damen
    In Advances in Neural Information Processing Systems Track on Datasets and Benchmarks, 2022
  14. TIP
    cover_VIREOFood251.jpg
    A study of multi-task and region-wise deep learning for food ingredient recognition
    Jingjing Chen, Bin Zhu, Chong-Wah Ngo, Tat-Seng Chua, and Yu-Gang Jiang
    IEEE Transactions on Image Processing, 2020
  15. MM
    cover_crossdomain.jpg
    Cross-domain cross-modal food transfer
    Bin Zhu, Chong-Wah Ngo, and Jing-jing Chen
    In Proceedings of the 28th ACM International Conference on Multimedia, 2020
  16. CVPR
    cover_CookGAN.jpg
    CookGAN: Causality based text-to-image synthesis
    Bin Zhu and Chong-Wah Ngo
    In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020
  17. CVPR
    cover_R2GAN.jpg
    R2GAN: Cross-modal recipe retrieval with generative adversarial network
    Bin Zhu, Chong-Wah Ngo, Jingjing Chen, and Yanbin Hao
    In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019