publications

Please see Google Scholar for more recent works and arXiv papers.

2025

  1. ICCV
    ICCV25-Dual-LoRA.jpg
    From Holistic to Localized: Local Enhanced Adapters for Efficient Visual Instruction Fine-Tuning
    Pengkun Jiao, Bin Zhu, Jingjing Chen, Chong-Wah Ngo, and Yugang Jiang
    In International Conference on Computer Vision, 2025
  2. CVPR
    HDEPIC.jpg
    HD-EPIC: A highly-detailed egocentric video dataset
    Toby Perrett, Ahmad Darkhalil, Saptarshi Sinha, Omar Emara, Sam Pollard, Kranti Kumar Parida, Kaiting Liu, Prajwal Gatti, Siddhant Bansal, Kevin Flanagan, Jacob Chalk, Zhifan Zhu, Rhodri Guerrier, Fahd Abdelazim, Bin Zhu, Davide Moltisanti, Michael Wray, Hazel Doughty, and Dima Damen
    In Proceedings of the Computer Vision and Pattern Recognition Conference, 2025
  3. AAAI
    Hand1000.jpg
    Hand1000: Generating realistic hands from text with only 1,000 images
    Haozhuo Zhang, Bin Zhu, Yu Cao, and Yanbin Hao
    In Proceedings of the AAAI Conference on Artificial Intelligence, 2025
  4. AAAI
    HandGrasp.jpg
    RAGG: Retrieval-Augmented Grasp Generation Model
    Zhenhua Tang, Bin Zhu, Yanbin Hao, Chong-Wah Ngo, and Richang Hong
    In Proceedings of the AAAI Conference on Artificial Intelligence, 2025
  5. TMM
    FoodLMM.jpg
    Foodlmm: A versatile food assistant using large multi-modal model
    Yuehao Yin, Huiyan Qi, Bin Zhu, Jingjing Chen, Yu-Gang Jiang, and Chong-Wah Ngo
    IEEE Transactions on Multimedia, 2025
  6. ArXiv
    GaslightingBench.jpg
    Calling a Spade a Heart: Gaslighting Multimodal Large Language Models via Negation
    Bin Zhu, Huiyan Qi, Yinxuan Gui, Jingjing Chen, Chong-Wah Ngo, and Ee-Peng Lim
    arXiv preprint arXiv:2501.19017, 2025
  7. ArXiv
    GaslightingBench-R.jpg
    Reasoning Models Are More Easily Gaslighted Than You Think
    Bin Zhu, Hailong Yin, Jingjing Chen, and Yu-Gang Jiang
    arXiv preprint arXiv:2506.09677, 2025
  8. ArXiv
    Gaslighting-Attention.jpg
    Don’t Deceive Me: Mitigating Gaslighting through Attention Reallocation in LMMs
    Pengkun Jiao, Bin Zhu, Jingjing Chen, Chong-Wah Ngo, and Yu-Gang Jiang
    arXiv preprint arXiv:2504.09456, 2025
  9. WACV
    RAG-WACV25.jpg
    Retrieval augmented recipe generation
    Guoshan Liu, Hailong Yin, Bin Zhu, Jingjing Chen, Chong-Wah Ngo, and Yu-Gang Jiang
    In IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2025
  10. TOMM
    TOMM25-cookingdiff.png
    Cookingdiffusion: Cooking procedural image generation with stable diffusion
    Yuan Wang, Bin Zhu, Yanbin Hao, Chong-Wah Ngo, Yi Tan, and Xiang Wang
    ACM Transactions on Multimedia Computing, Communications and Applications, 2025
  11. ICMR
    FastFood.jpg
    Advancing Food Nutrition Estimation via Visual-Ingredient Feature Fusion
    Huiyan Qi, Bin Zhu, Chong-Wah Ngo, Jingjing Chen, and Ee-Peng Lim
    In ACM International Conference on Multimedia Retrieval (ICMR), 2025
  12. ICME
    ICME25.jpg
    Efficient Prompt Tuning for Hierarchical Ingredient Recognition
    Yinxuan Gui, Bin Zhu, Jingjing Chen, and Chong-Wah Ngo
    In IEEE International Conference on Multimedia and Expo (ICME), 2025
  13. ASSETS
    Assets25.png
    Exploring Object Status Recognition for Recipe Progress Tracking in Non-Visual Cooking
    Franklin Mingzhe Li, Kaitlyn Ng, Bin Zhu, and Patrick Carrington
    In International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS), 2025
  14. CHI-LBW
    OSCAR-CHI2025.jpg
    OSCAR: Object Status and Contextual Awareness for Recipes to Support Non-Visual Cooking
    Franklin Mingzhe Li, Kaitlyn Ng, Bin Zhu, and Patrick Carrington
    In Proceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems, 2025

2024

  1. MM Oral
    cover_weightprediction.jpg
    Navigating weight prediction with diet diary
    Yinxuan Gui, Bin Zhu, Jingjing Chen, Chong Wah Ngo, and Yu-Gang Jiang
    In Proceedings of the 32nd ACM International Conference on Multimedia, 2024
  2. ECCV
    cover_DAR.jpg
    Enhancing recipe retrieval with foundation models: A data augmentation perspective
    Fangzhou Song, Bin Zhu, Yanbin Hao, and Shuo Wang
    In European Conference on Computer Vision, 2024
  3. TMM
    cover_canteen.jpg
    From canteen food to daily meals: Generalizing food recognition to more practical scenarios
    Guoshan Liu, Yang Jiao, Jingjing Chen, Bin Zhu, and Yu-Gang Jiang
    IEEE Transactions on Multimedia, 2024
  4. TMM
    TMM24-hashing.jpg
    Efficient Unsupervised Video Hashing with Contextual Modeling and Structural Controlling
    Jingru Duan, Yanbin Hao, Bin Zhu, Lechao Cheng, Pengyuan Zhou, and Xiang Wang
    IEEE Transactions on Multimedia, 2024
  5. TOMM
    cover_TVP.jpg
    Text-driven video prediction
    Xue Song, Jingjing Chen, Bin Zhu, and Yu-Gang Jiang
    ACM Transactions on Multimedia Computing, Communications and Applications, 2024
  6. TOMM
    TOMM24-caption.jpg
    CVLP-NaVD: Contrastive Visual-Language Pre-training Models for Non-annotated Visual Description
    Haoran Li, Yanbin Hao, Jiarui Yu, Bin Zhu, Shuo Wang, and Tong Xu
    ACM Transactions on Multimedia Computing, Communications and Applications, 2024
  7. MM Asia
    MMAsia24.jpg
    Active Object Segmentation: A New Modality for Egocentric Action Recognition
    Jian Ma, Bin Zhu, Kun Li, and Dima Damen
    In Proceedings of the 6th ACM International Conference on Multimedia in Asia, 2024
  8. ECCVW
    ECCV24-videoEditing.png
    Video editing for video retrieval
    Bin Zhu, Kevin Flanagan, Adriano Fragomeni, Michael Wray, and Dima Damen
    In European Conference on Computer Vision Workshop, 2024