Bin Zhu

80 Stamford Road, Singapore 178902

I am an Assistant Professor of Computer Science at the School of Computing and Information Systems, Singapore Management University (SMU). Before joining SMU, I was a Postdoctoral Researcher working with Prof. Dima Damen at the University of Bristol, contributing to the EPSRC Visual AI Program Grant led by Prof. Andrew Zisserman . I earned my Ph.D. degree from Department of Computer Science, City University of Hong Kong in 2021, under the supervision of Prof. Chong-Wah Ngo and Dr. Wing-Kwong Chan. Earlier, I obtained my master and bachelor degrees from Zhejiang University and Southeast University respectively.

My research interest lies in Human Centered Multimedia Computing, including Embodied Perception and Planning, Cross-modal Retrieval, Multi‐modal Large Language Model, Egocentric Video Understanding, Generative models and AI for Healthcare. Specifically, the objective is to conduct frontier research and develop cutting-edge technologies for processing, modeling, analyzing, understanding and generating multimedia content, that facilitate natural and immersive human experience and exert positive impact for our society.

🔥🔥🔥Openings: I am actively looking for self-motivated Ph.D. students, CSC visiting students, and (remote) interns. In addition, I am looking for postdoctoral researchers and funded visiting students to work on embodied AI. If you are interested in working with me, please feel free to drop me an email with your CV and other supporting documents (if any).

news

Dec 22, 2025	One paper on Vision Language Navigation is accepted by TIP. Congrats to Guangzhao and all collaborators!
Dec 08, 2025	Our project “VISTA: A Value-Informed Safety & Trust Architecture for Autonomous Agents” has been awarded by AI Singapore (AISG), with more than SGD$1,000,000 in total funding. I will serve as Co-Principal Investigator. Congrats Zhiguang!
Nov 29, 2025	I will serve as Organization Chair for Pacific Graphics 2026.
Nov 21, 2025	I will serve as Area Chair for ICME 2026.
Nov 20, 2025	I am invited to serve as Workshop Chair at ACM ICMR 2026.
Nov 08, 2025	One paper on Reinforcement Learning for Robotic Manipulation is accepted as an Oral presentation by AAAI 2026. Congrats to Jiarui and all collaborators!
Oct 30, 2025	Two papers on Large Reasoning Model Safety and Multimodal Large Language Model are accepted by MMM 2026.
Sep 08, 2025	I was invited to deliver a talk at the National University of Singapore NExT Research Centre on “Food Computing from an Egocentric Video Perspective”.
Sep 02, 2025	I was honored to serve on the Board of Examiners for Ph.D. Candidate Gabriele Goletto, whose thesis focused on Egocentric Vision. Congratulations to Dr. Goletto on a successful defense! 🎓
Aug 26, 2025	One paper on Large Lithium-ion Battery Model is accepted by Nature Communications as co-corresponding author.
Aug 22, 2025	I am awarded a Singapore Ministry of Education (MOE) Academic Research Fund (AcRF) Tier 2 grant as Principal Investigator for my project “Self-Adaptive Planning with Environmental Awareness for Embodied Agents”, with total funding of SGD$959,166.
Jul 05, 2025	One paper on Multimodal Large Language Model is accepted by ACM MM 2025.
Jun 26, 2025	One paper on Multimodal Large Language Model is accepted by ICCV 2025.
Jun 19, 2025	One paper on Recipe Progress Tracking in Non-Visual Cooking are accepted by ASSETS 2025.
Jun 09, 2025	One paper on Cooking Procedural Image Generation is accepted by ACM TOMM.
Jun 06, 2025	I will serve as the Program Co-Chair for ACM ICMR 2027, which will be held in Singapore!
Apr 23, 2025	One paper on Nutrition Estimation is accepted by ICMR 2025.
Mar 21, 2025	One paper on Ingredient Recognition is accepted by ICME 2025 (oral).
Mar 08, 2025	One paper on Egocentric Video Understanding is accepted by CVPR 2025.
Feb 23, 2025	One paper on Recipe Following in Cooking Video is accepted by CHI (LBW) 2025.
Jan 16, 2025	One paper on Large Multimodal Model in Food Domain is accepted by IEEE TMM.
Dec 09, 2024	Two papers on Grasp Generation and Text-to-Hand-Image Generation are accepted by AAAI 2025.
Dec 05, 2024	We are excited to announce our Special Session on Multimedia for Cooking and Eating Activities at ICME 2025. We warmly invite you to submit your papers!
Oct 29, 2024	One paper on Recipe Generation is accepted by WACV 2025.
Jul 15, 2024	One paper on Cross-modal Recipe Retrieval is accepted by ECCV 2024.
Jul 15, 2024	One paper on Time-series Weight Prediction is accepted by ACM MM 2024 and is further selected as an Oral presentation (3.97%).
Jun 15, 2024	One paper on Text-driven Video Prediction is accepted by ACM TOMM.
Feb 15, 2024	Two papers on Unsupervised Video Hashing and Generalizable Food Recognition are accepted by IEEE TMM.
Jan 01, 2024	I joined Singapore Management University as an Assistant Professor of Computer Science.

selected publications

AAAI

Actor-Critic for Continuous Action Chunks: A Reinforcement Learning Framework for Long-Horizon Robotic Manipulation with Sparse Reward

Jiarui Yang, Bin Zhu, Jingjing Chen, and Yu-Gang Jiang

In Proceedings of the AAAI Conference on Artificial Intelligence, 2026

PDF
TIP

ThinkMatter: Panoramic-Aware Instructional Semantics for Monocular Vision-and-Language Navigation

Guangzhao Dai, Shuo Wang, Hao Zhao, Bin Zhu, Qianru Sun, and Xiangbo Shu

IEEE Transactions on Image Processing, 2026

PDF
MMM

Benchmarking Gaslighting Negation Attacks Against Reasoning Models

Bin Zhu, Hailong Yin, Jingjing Chen, and Yu-Gang Jiang

In International Conference on Multimedia Modeling, 2026

PDF Website
NC

LLiM: Large Lithium-ion Battery Model for Secure Shared E-bike Battery in Smart Cities

Donghui Ding, Zhao Li, Linhao Luo, Ming Jin, Bin Zhu, Yichen Zhong, Junhao Hu, Peng Cai, and Huiqi Hu

Nature Communications, 2025

PDF Code
ICCV

From Holistic to Localized: Local Enhanced Adapters for Efficient Visual Instruction Fine-Tuning

Pengkun Jiao, Bin Zhu, Jingjing Chen, Chong-Wah Ngo, and Yugang Jiang

In International Conference on Computer Vision, 2025

PDF Website
MM

Look before you decide: Prompting active deduction of mllms for assumptive reasoning

Yian Li, Wentao Tian, Yang Jiao, Jingjing Chen, Tianwen Qian, Bin Zhu, Na Zhao, and Yu-Gang Jiang

In Proceedings of the 33rd ACM International Conference on Multimedia, 2025

PDF Website
CVPR

HD-EPIC: A highly-detailed egocentric video dataset

Toby Perrett, Ahmad Darkhalil, Saptarshi Sinha, Omar Emara, Sam Pollard, Kranti Kumar Parida, Kaiting Liu, Prajwal Gatti, Siddhant Bansal, Kevin Flanagan, Jacob Chalk, Zhifan Zhu, Rhodri Guerrier, Fahd Abdelazim, Bin Zhu, Davide Moltisanti, Michael Wray, Hazel Doughty, and Dima Damen

In Proceedings of the Computer Vision and Pattern Recognition Conference, 2025

PDF Website
AAAI

Hand1000: Generating realistic hands from text with only 1,000 images

Haozhuo Zhang, Bin Zhu, Yu Cao, and Yanbin Hao

In Proceedings of the AAAI Conference on Artificial Intelligence, 2025

PDF Code Website
AAAI

RAGG: Retrieval-Augmented Grasp Generation Model

Zhenhua Tang, Bin Zhu, Yanbin Hao, Chong-Wah Ngo, and Richang Hong

In Proceedings of the AAAI Conference on Artificial Intelligence, 2025

PDF
TMM

FoodLMM: A versatile food assistant using large multi-modal model

Yuehao Yin, Huiyan Qi, Bin Zhu, Jingjing Chen, Yu-Gang Jiang, and Chong-Wah Ngo

IEEE Transactions on Multimedia, 2025

PDF Code
ArXiv

Calling a Spade a Heart: Gaslighting Multimodal Large Language Models via Negation

Bin Zhu, Huiyan Qi, Yinxuan Gui, Jingjing Chen, Chong-Wah Ngo, and Ee-Peng Lim

arXiv preprint arXiv:2501.19017, 2025

PDF
ArXiv

Don’t Deceive Me: Mitigating Gaslighting through Attention Reallocation in LMMs

Pengkun Jiao, Bin Zhu, Jingjing Chen, Chong-Wah Ngo, and Yu-Gang Jiang

arXiv preprint arXiv:2504.09456, 2025

PDF
MM Oral

Navigating weight prediction with diet diary

Yinxuan Gui, Bin Zhu, Jingjing Chen, Chong Wah Ngo, and Yu-Gang Jiang

In Proceedings of the 32nd ACM International Conference on Multimedia, 2024

PDF Website
ECCV

Enhancing recipe retrieval with foundation models: A data augmentation perspective

Fangzhou Song, Bin Zhu, Yanbin Hao, and Shuo Wang

In European Conference on Computer Vision, 2024

PDF Code
MM

CgT-GAN: clip-guided text GAN for image captioning

Jiarui Yu, Haoran Li, Yanbin Hao, Bin Zhu, Tong Xu, and Xiangnan He

In Proceedings of the 31st ACM International Conference on Multimedia, 2023

PDF Code
NeurIPS

Epic-kitchens visor benchmark: Video segmentations and object relations

Ahmad Darkhalil, Dandan Shan, Bin Zhu, Jian Ma, Amlan Kar, Richard Higgins, Sanja Fidler, David Fouhey, and Dima Damen

In Advances in Neural Information Processing Systems Track on Datasets and Benchmarks, 2022

PDF Website
TIP

A study of multi-task and region-wise deep learning for food ingredient recognition

Jingjing Chen, Bin Zhu, Chong-Wah Ngo, Tat-Seng Chua, and Yu-Gang Jiang

IEEE Transactions on Image Processing, 2020

PDF
MM

Cross-domain cross-modal food transfer

Bin Zhu, Chong-Wah Ngo, and Jing-jing Chen

In Proceedings of the 28th ACM International Conference on Multimedia, 2020

PDF
CVPR

CookGAN: Causality based text-to-image synthesis

Bin Zhu and Chong-Wah Ngo

In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020

PDF
CVPR

R2GAN: Cross-modal recipe retrieval with generative adversarial network

Bin Zhu, Chong-Wah Ngo, Jingjing Chen, and Yanbin Hao

In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019

PDF