Bin Zhu

prof_pic.jpg

I am currently an Assistant Professor of Computer Science in the School of Computing and Information Systems , Singapore Management University (SMU) . My research interest lies in Human Centered Multimedia Analysis, including Cross-modal Search and Creation, Egocentric Video Understanding, Multi‐modal Large Language Model and AI for Healthcare. Specifically, the objective is to conduct frontier research and develop cutting-edge technologies for processing, modeling, analyzing, and understanding multimedia content, that facilitate natural and immersive human experience and exert positive impact for our society.

Before joining SMU, I was a Postdoctoral Researcher working with Prof. Dima Damen at the University of Bristol, as part of the EPSRC Visual AI Program Grant. Previously, I received my Ph.D. degree from Department of Computer Science, City University of Hong Kong in 2021, under the supervision of Prof. Chong-Wah Ngo and Dr. Wing-Kwong Chan. Prior to CityU, I obtained my master and bachelor degrees from Zhejiang University and Southeast University respectively.

🔥🔥🔥Openings:

I am actively looking for multiple fully funded Ph.D. students. If you are interested in working with me, please feel free to drop me an email with your CV and other supporting documents (if any).

news

Jul 21, 2024 One paper on Time-series Weight Prediction is accepted by ACM MM 2024 and is further selected as an Oral presentation (3.97%).
Jul 1, 2024 One paper on Cross-modal Recipe Retrieval is accepted by ECCV 2024.
Jun 25, 2024 One paper on Text-driven Video Prediction is accepted by TOMM.
Feb 14, 2024 Two papers on Unsupervised Video Hashing and Generalizable Food Recognition are accepted by IEEE TMM.
Jan 1, 2024 I joined Singapore Management University as an Assistant Professor of Computer Science.

selected publications

2024

  1. ACM MM
    Navigating Weight Prediction with Diet Diary
    Yinxuan Gui, Bin Zhu, Jingjing Chen, Chong Wah Ngo, and Yu-Gang Jiang
    In Proceedings of the 32st ACM International Conference on Multimedia (ACM MM) , 2024
  2. ECCV
    Enhancing Recipe Retrieval with Foundation Models: A Data Augmentation Perspective
    Fangzhou Song, Bin Zhu, Yanbin Hao, and Shuo Wang
    In European Conference on Computer Vision (ECCV) , 2024
  3. TOMM
    Text-driven Video Prediction
    Xue Song, Jingjing Chen, Bin Zhu, and Yugang Jiang
    ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) , 2024
  4. TMM
    Efficient Unsupervised Video Hashing with Contextual Modeling and Structural Controlling
    Jingru Duan, Yanbin Hao, Bin Zhu, Lechao Cheng, Pengyuan Zhou, and Xiang Wang
    IEEE Transactions on Multimedia (TMM) , 2024
  5. TMM
    From Canteen Food to Daily Meals: Generalizing Food Recognition to More Practical Scenarios
    Guoshan Liu, Yang Jiao, Jingjing Chen, Bin Zhu, and Yu-Gang Jiang
    IEEE Transactions on Multimedia (TMM) , 2024

2023

  1. ACM MM
    CgT-GAN: CLIP-guided Text GAN for Image Captioning
    Jiarui Yu, Haoran Li, Yanbin Hao, Bin Zhu, Tong Xu, and Xiangnan He
    In Proceedings of the 31st ACM International Conference on Multimedia (ACM MM) , 2023

2022

  1. NeurIPS
    Epic-kitchens visor benchmark: Video segmentations and object relations
    Ahmad* Darkhalil, Dandan Shan*, Bin Zhu*, Jian Ma*, Amlan Kar, Richard Higgins, Sanja Fidler, David Fouhey, and Dima Damen
    Advances in Neural Information Processing Systems (NeurIPS) , 2022
  2. ACM MM
    Mix-DANN and Dynamic-Modal-Distillation for Video Domain Adaptation
    Yuehao Yin, Bin Zhu, Jingjing Chen, Lechao Cheng, and Yu-Gang Jiang
    In Proceedings of the 30th ACM International Conference on Multimedia (ACM MM) , 2022
  3. ACM MM
    Unsupervised Video Hashing with Multi-granularity Contextualization and Multi-structure Preservation
    Yanbin Hao, Jingru Duan, Hao Zhang, Bin Zhu, Pengyuan Zhou, and Xiangnan He
    In Proceedings of the 30th ACM International Conference on Multimedia (ACM MM) , 2022
  4. ICMR
    Cross-lingual adaptation for recipe retrieval with mixup
    Bin Zhu, Chong-Wah Ngo, Jingjing Chen, and Wing-Kwong Chan
    In Proceedings of the 2022 International Conference on Multimedia Retrieval (ICMR) , 2022

2021

  1. TMM
    Learning from web recipe-image pairs for food recognition: Problem, baselines and performance
    Bin Zhu, Chong-Wah Ngo, and Wing-Kwong Chan
    IEEE Transactions on Multimedia (TMM) , 2021
  2. TIP
    Learning to match anchor-target video pairs with dual attentional holographic networks
    Yanbin Hao, Chong-Wah Ngo, and Bin Zhu
    IEEE Transactions on Image Processing (TIP) , 2021

2020

  1. CVPR
    CookGAN: Causality based text-to-image synthesis
    Bin Zhu, and Chong-Wah Ngo
    In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , 2020
  2. TIP
    A study of multi-task and region-wise deep learning for food ingredient recognition
    Jingjing Chen, Bin Zhu, Chong-Wah Ngo, Tat-Seng Chua, and Yu-Gang Jiang
    IEEE Transactions on Image Processing (TIP) , 2020
  3. ACM MM
    Cross-domain cross-modal food transfer
    Bin Zhu, Chong-Wah Ngo, and Jing-jing Chen
    In Proceedings of the 28th ACM International Conference on Multimedia (ACM MM) , 2020

2019

  1. CVPR
    R2GAN: Cross-modal recipe retrieval with generative adversarial network
    Bin Zhu, Chong-Wah Ngo, Jingjing Chen, and Yanbin Hao
    In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , 2019