November 10, 2025The 14th International Symposium on Information and Communication Technology (SOICT 2025)● Accepted
Visionary: Optimized Temporal Video Retrieval via Large Language Model‑Enhanced Query Processing
Addressing the Ho Chi Minh City AI Challenge 2025, Visionary introduces four key contributions: a novel adaptive keyframe extraction algorithm; an enhanced pre‑processing pipeline using Qwen3‑VL for metadata generation with integrated OCR; a flexible architecture supporting multiple embedding models; and the use of Reciprocal Rank Fusion to synthesise retrieval results for complex, large‑scale video retrieval tasks.
December 13, 2025The 14th International Symposium on Information and Communication Technology (SOICT 2025)● Accepted
EnAug: ENT Endoscopy Images Classification Using Ensemble and Augmentation Methods
A robust classification framework for ENT endoscopy images based on an ensemble of deep learning models. A novel augmentation strategy combining symmetry‑based label flipping with Mixup, Mosaic, and other techniques addresses class imbalance. Evaluated on a curated ENT dataset covering seven anatomical categories, achieving 95.82% accuracy.
October 27, 2025The 33rd ACM International Conference on Multimedia (ACM MM 2025)● Published
A multi‑stage retrieval framework combining dense article retrieval, event‑aware language model reranking, and caption‑guided semantic matching. Leverages Qwen3 for article search, Qwen3‑Reranker for contextual alignment, and Qwen2‑VL for image scoring, fused via RRF. Achieved top‑1 score on the private test set of Track 2 in the EVENTA 2025 Grand Challenge at ACM MM.
August 16, 2025The 2025 International Conference on Multimedia Analysis and Pattern Recognition (MAPR)● Published
SAMURAI integrates CLIP‑based semantic matching with shape‑guided re‑ranking derived from binary silhouettes of masked regions, alongside a majority voting strategy. A preprocessing pipeline enhances mask quality by extracting the largest connected component and removing background noise, achieving competitive performance on the ROOMELSA private test set.
December 14, 2024The 13th International Symposium on Information and Communication Technology (SOICT 2024)● Published
Built for the Ho Chi Minh AI Challenge 2024, NewsInsight2.0 leverages CLIP trained on a 5 billion‑parameter dataset (DFN‑5B), a refined temporal query mechanism, and an automatic query generator powered by open‑source LLMs for streamlined query optimisation.
December 8, 2024The 1st Large Vision–Language Model Learning and Applications Workshop, ACCV 2024● Published
Adapted MMMU benchmarks and applied prompt engineering with a voting‑based ensemble method to enhance Large Vision‑Language Models' performance on complex visual data interpretation, achieving a top score of 0.85 in the LAVA Workshop 2024 challenge.
August 16, 2024The 2024 International Conference on Multimedia Analysis and Pattern Recognition (MAPR)● Published
Development of an AI‑assisted system designed to aid users in identifying and verifying the genuineness of photos, combining deep learning detection models with user‑facing interfaces for practical media literacy.
July 2, 2024The 1st Workshop on Security‑Centric Strategies for Combating Information Disorder, ACM AsiaCCS 2024● Published
Demonstrates the promise of a hybrid method combining online reputation verification with an end‑to‑end neural network for cheapfake detection, making a significant contribution to preserving the integrity of multimedia content.
June 11, 2024The 2024 International Conference on Multimedia Retrieval (ACM ICMR 2024)● Published
Highlights the potential of an end‑to‑end network for cheapfakes detection using generative synthetic training data, contributing to the advancement of multimedia content integrity at scale.
February 1, 2024The MediaEval 2023 Workshop, MMM 2024● Published
An efficient method for detection and tracking of spermatozoa using YOLOv8 trained on a COCO‑format dataset, contributing a transparent and reproducible pipeline for biomedical video analysis.