Work & Publications

May 2, 2026IEEE International Conference on Multimedia & Expo Workshops — ICMEW 2026, Bangkok● Accepted

Cluster Erase: Zero‑Shot Mass‑Similar and Multi‑Object Removal in a Single Pass

A demonstration of a single‑pass removal approach for visually similar object clusters in natural imagery, extending zero‑shot inpainting toward the messy, repetitive scenes that one‑object methods choke on.

Mar 17, 2026IEEE International Conference on Multimedia & Expo — ICME 2026, Bangkok● Accepted

PANDORA: Pixel‑wise Attention Dissolution and Latent Guidance for Zero‑Shot Object Removal

A zero‑shot object removal method that leverages pixel‑wise attention dissolution and latent guidance within a diffusion framework — achieving clean inpainting without task‑specific fine‑tuning on the target scene.

December 13, 2025The 14th International Symposium on Information and Communication Technology (SOICT 2025)● Accepted

Visionary: Optimized Temporal Video Retrieval via Large Language Model‑Enhanced Query Processing

Addressing the Ho Chi Minh City AI Challenge 2025, Visionary introduces four key contributions: a novel adaptive keyframe extraction algorithm; an enhanced pre‑processing pipeline using Qwen3‑VL for metadata generation with integrated OCR; a flexible architecture supporting multiple embedding models; and the use of Reciprocal Rank Fusion to synthesise retrieval results for complex, large‑scale video retrieval tasks.

December 13, 2025The 14th International Symposium on Information and Communication Technology (SOICT 2025)● Accepted

ENAug: ENT Endoscopy Images Classification Using Ensemble and Augmentation Methods

A robust classification framework for ENT endoscopy images based on an ensemble of deep learning models. A novel augmentation strategy combining symmetry‑based label flipping with Mixup, Mosaic, and other techniques addresses class imbalance. Evaluated on a curated ENT dataset covering seven anatomical categories, achieving 95.82% accuracy.

October 27, 2025The 33rd ACM International Conference on Multimedia (ACM MM 2025)● Published

EVENT‑Retriever: Event‑Aware Multimodal Image Retrieval for Realistic Captions

A multi‑stage retrieval framework combining dense article retrieval, event‑aware language model reranking, and caption‑guided semantic matching. Leverages Qwen3 for article search, Qwen3‑Reranker for contextual alignment, and Qwen2‑VL for image scoring, fused via RRF. Achieved top‑1 score on the private test set of Track 2 in the EVENTA 2025 Grand Challenge at ACM MM.

August 16, 2025The 2025 International Conference on Multimedia Analysis and Pattern Recognition (MAPR)● Published

SAMURAI: Shape‑Aware Multimodal Retrieval for 3D Object Identification

SAMURAI integrates CLIP‑based semantic matching with shape‑guided re‑ranking derived from binary silhouettes of masked regions, alongside a majority voting strategy. A preprocessing pipeline enhances mask quality by extracting the largest connected component and removing background noise, achieving competitive performance on the ROOMELSA private test set.

December 14, 2024The 13th International Symposium on Information and Communication Technology (SOICT 2024)● Published

NewsInsight2.0: An Enhanced Version Integrating Large Language Model‑Based Query Optimisation with Advanced Temporal Mechanisms

Built for the Ho Chi Minh AI Challenge 2024, NewsInsight2.0 leverages CLIP trained on a 5 billion‑parameter dataset (DFN‑5B), a refined temporal query mechanism, and an automatic query generator powered by open‑source LLMs for streamlined query optimisation.

December 8, 2024The 1st Large Vision–Language Model Learning and Applications Workshop, ACCV 2024● Published

Systems built to solve real problems.

Peer‑reviewed research.

Cluster Erase: Zero‑Shot Mass‑Similar and Multi‑Object Removal in a Single Pass

PANDORA: Pixel‑wise Attention Dissolution and Latent Guidance for Zero‑Shot Object Removal

Visionary: Optimized Temporal Video Retrieval via Large Language Model‑Enhanced Query Processing

ENAug: ENT Endoscopy Images Classification Using Ensemble and Augmentation Methods

EVENT‑Retriever: Event‑Aware Multimodal Image Retrieval for Realistic Captions

SAMURAI: Shape‑Aware Multimodal Retrieval for 3D Object Identification

NewsInsight2.0: An Enhanced Version Integrating Large Language Model‑Based Query Optimisation with Advanced Temporal Mechanisms

An Approach to Complex Visual Data Interpretation with Vision‑Language Models

AI‑Enhanced Photo Authenticity: A User‑Focused Approach to Detecting and Analysing Manipulated Images

A Hybrid Approach for Cheapfake Detection Using Reputation Checking and End‑To‑End Network

A Unified Network for Detecting Out‑Of‑Context Information Using Generative Synthetic Data

Transparent Tracking of Spermatozoa with YOLOv8