Publications

Please see Google Scholar for more recent works and arXiv papers.

2025

  1. TMM
    FoodLMM: A Versatile Food Assistant Using Large Multi-modal Model
    Yuehao Yin, Huiyan Qi, Bin Zhu, Jingjing Chen, Yu-Gang Jiang, Chong-Wah Ngo
    IEEE Transactions on Multimedia (TMM) , 2025
  2. AAAI
    Hand1000: Generating Realistic Hands from Text with Only 1,000 Images
    Haozhuo Zhang, Bin Zhu, Yu Cao, Yanbin Hao
    The 39th Annual AAAI Conference on Artificial Intelligence (AAAI) , 2025
  3. AAAI
    RAGG: Retrieval-Augmented Grasp Generation Model
    Zhenhua Tang, Bin Zhu, Yanbin Hao, Chong-Wah Ngo, Richang Hong
    The 39th Annual AAAI Conference on Artificial Intelligence (AAAI) , 2025
  4. WACV
    Retrieval Augmented Recipe Generation
    Guoshan Liu, Hailong Yin, Bin Zhu, Jingjing Chen, Chong-Wah Ngo, Yu-Gang Jiang
    IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) , 2025

2024

  1. ACM MM
    Navigating Weight Prediction with Diet Diary
    Yinxuan Gui, Bin Zhu, Jingjing Chen, Chong Wah Ngo, and Yu-Gang Jiang
    Proceedings of the 32st ACM International Conference on Multimedia (ACM MM) , 2024
  2. TOMM
    CVLP-NaVD: Contrastive Visual-Language Pre-training Models for Non-annotated Visual Description
    Haoran Li, Yanbin Hao, Jiarui Yu, Bin Zhu, Shuo Wang, Tong Xu
    ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM)
  3. ECCV
    Enhancing Recipe Retrieval with Foundation Models: A Data Augmentation Perspective
    Fangzhou Song, Bin Zhu, Yanbin Hao and Shuo Wang
    European Conference on Computer Vision (ECCV) , 2024
  4. MM Asia
    Active Object Segmentation: A New Modality for Egocentric Action Recognition
    Jian Ma, Bin Zhu, Kun Li, Dima Damen
    ACM Multimedia Asia (MM Asia), 2024
  5. TOMM
    Text-driven Video Prediction
    Xue Song, Jingjing Chen, Bin Zhu, Yugang Jiang
    ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM)
  6. TMM
    From Canteen Food to Daily Meals: Generalizing Food Recognition to More Practical Scenarios
    Guoshan Liu, Yang Jiao, Jingjing Chen, Bin Zhu, Yu-Gang Jiang
    IEEE Transactions on Multimedia (TMM)
  7. TMM
    Efficient Unsupervised Video Hashing with Contextual Modeling and Structural Controlling
    Jingru Duan, Yanbin Hao, Bin Zhu, Lechao Cheng, Pengyuan Zhou, Xiang Wang
    IEEE Transactions on Multimedia (TMM)