publications
Please see Google Scholar for more recent works and arXiv papers.
2025
- AAAIRAGG: Retrieval-Augmented Grasp Generation ModelIn Proceedings of the AAAI Conference on Artificial Intelligence, 2025
- ArXivCalling a Spade a Heart: Gaslighting Multimodal Large Language Models via NegationarXiv preprint arXiv:2501.19017, 2025
- ArXivDon’t Deceive Me: Mitigating Gaslighting through Attention Reallocation in LMMsarXiv preprint arXiv:2504.09456, 2025
- WACVRetrieval augmented recipe generationIn IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2025
- ICMEEfficient Prompt Tuning for Hierarchical Ingredient RecognitionIn IEEE International Conference on Multimedia and Expo (ICME), 2025
- ASSETSExploring Object Status Recognition for Recipe Progress Tracking in Non-Visual CookingIn International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS), 2025
- CHI-LBWOSCAR: Object Status and Contextual Awareness for Recipes to Support Non-Visual CookingIn Proceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems, 2025
2024
- MM Oral
- TMMFrom canteen food to daily meals: Generalizing food recognition to more practical scenariosIEEE Transactions on Multimedia, 2024
- TMMEfficient Unsupervised Video Hashing with Contextual Modeling and Structural ControllingIEEE Transactions on Multimedia, 2024
- TOMMText-driven video predictionACM Transactions on Multimedia Computing, Communications and Applications, 2024
- TOMMCVLP-NaVD: Contrastive Visual-Language Pre-training Models for Non-annotated Visual DescriptionACM Transactions on Multimedia Computing, Communications and Applications, 2024
- MM AsiaActive Object Segmentation: A New Modality for Egocentric Action RecognitionIn Proceedings of the 6th ACM International Conference on Multimedia in Asia, 2024
- ECCVW