publications
Please see Google Scholar for more recent works and arXiv papers.
2025
- AAAIRAGG: Retrieval-Augmented Grasp Generation ModelIn Proceedings of the AAAI Conference on Artificial Intelligence, 2025
- ArXivCalling a Spade a Heart: Gaslighting Multimodal Large Language Models via NegationarXiv preprint arXiv:2501.19017, 2025
- ArXivDon’t Deceive Me: Mitigating Gaslighting through Attention Reallocation in LMMsarXiv preprint arXiv:2504.09456, 2025
- WACVRetrieval augmented recipe generationIn IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2025
- ICMEEfficient Prompt Tuning for Hierarchical Ingredient RecognitionIn IEEE International Conference on Multimedia and Expo (ICME), 2025
- ASSETSExploring Object Status Recognition for Recipe Progress Tracking in Non-Visual CookingIn International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS), 2025
- CHI-LBWOSCAR: Object Status and Contextual Awareness for Recipes to Support Non-Visual CookingIn Proceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems, 2025
2024
- MM Oral
- TMMFrom canteen food to daily meals: Generalizing food recognition to more practical scenariosIEEE Transactions on Multimedia, 2024
- TMMEfficient Unsupervised Video Hashing with Contextual Modeling and Structural ControllingIEEE Transactions on Multimedia, 2024
- TOMMText-driven video predictionACM Transactions on Multimedia Computing, Communications and Applications, 2024
- TOMMCVLP-NaVD: Contrastive Visual-Language Pre-training Models for Non-annotated Visual DescriptionACM Transactions on Multimedia Computing, Communications and Applications, 2024
- MM AsiaActive Object Segmentation: A New Modality for Egocentric Action RecognitionIn Proceedings of the 6th ACM International Conference on Multimedia in Asia, 2024
- ECCVW
2023
2022
- MM OralMix-dann and dynamic-modal-distillation for video domain adaptationIn Proceedings of the 30th ACM International Conference on Multimedia, 2022
- ICMRCross-lingual adaptation for recipe retrieval with mixupIn Proceedings of the 2022 International Conference on Multimedia Retrieval, 2022
2021
- TMMLearning from web recipe-image pairs for food recognition: Problem, baselines and performanceIEEE Transactions on Multimedia, 2021
- TIPLearning to match anchor-target video pairs with dual attentional holographic networksIEEE Transactions on Image Processing, 2021
2020
- TIPA study of multi-task and region-wise deep learning for food ingredient recognitionIEEE Transactions on Image Processing, 2020
- MM Grand ChallengePerson-level action recognition in complex events via tsd-tsm networksIn Proceedings of the 28th ACM International Conference on Multimedia Grand Challenge: Human Centric Analysis, 2020
- MMCross-domain cross-modal food transferIn Proceedings of the 28th ACM International Conference on Multimedia, 2020
- CVPRCookGAN: Causality based text-to-image synthesisIn Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020
2019
- CVPRR2GAN: Cross-modal recipe retrieval with generative adversarial networkIn Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019