Nguyễn Văn Lộc— Vol. 26
Online—:—
№ 026 · The Practitioner Issue
Established 2002
Set in Cormorant & JetBrains Mono

Work & Publications

A complete register of projects shipped, papers accepted, and systems built.

§ 01Projects

Systems built to solve real problems.

01 /Doodle Duel (Jankenpon)A real‑time competitive web party game where a vision‑LLM acts as referee on a shared canvas. Players draw objects under a time limit; the Gemini API classifies drawings and determines which object wins. An enhanced drawing is generated by ImageGen 4 for added fun.● LiveReactTypeScriptViteTailwind CSSNode.jsExpressSocket.IOGemini APIDockerTraefik02 /ArcanaAIAI‑powered tarot reading service using OpenAI function calling, real‑time chat, multiple decks, dual payment (Lemon Squeezy & MetaMask), Cloudflare R2 asset storage, and WebSocket notifications. Containerised with Docker and CI/CD via GitHub Actions.● LiveOpenAIFastAPISQLAlchemyPostgreSQLRedisCeleryNext.jsTypeScriptCloudflare R2Docker03 /SanghaGPTMultilingual Buddhist Q&A chatbot (English & Vietnamese) powered by state‑of‑the‑art LLMs via MCP. Features a data pipeline for curating high‑quality question‑answer pairs, automated evaluation, and rigorous benchmarking against reference answers.PythonHuggingFaceFastAPIMCPLangChainRAG04 /Eastern Religion CorpusHigh‑quality structured corpus from classical Vietnamese Buddhist texts (e.g. Thiền Uyển Tập Anh). The pipeline includes PDF preprocessing, automated sentence segmentation, header/footer and poem detection, and semantic entity recognition for digital humanities research.PythonPyPDFUndertheseaRegEx05 /GraphRAGRAG system integrating local LLMs with a knowledge graph. Automated triplet extraction and knowledge graph construction enable structured reasoning. Vector similarity search (Milvus) combined with graph traversal for hybrid retrieval, with a ranking algorithm blending semantic and graph‑based results.PythonMilvusLLMFastAPIReactJSRAG06 /Fake News Analysis & DetectionEnd‑to‑end neural networks for cheapfake detection (manipulated images/videos), online reputation verification, and real‑time authenticity guidance. Features credibility scoring algorithms and a Streamlit interface. Undergraduate thesis project.PythonPyTorchStreamlitSeleniumDocker07 /ConnectFeature‑rich chat application (real‑time messaging, group chat, file sharing) built for the Software Architecture course. ReactJS frontend + Django REST Framework backend, PostgreSQL, Supabase, CI/CD with Docker, Jenkins, and GitHub Actions.ReactJSDjango REST FrameworkPostgreSQLSupabaseDockerJenkins08 /MiniShoppingE‑commerce mobile app (Android) with user authentication, product catalogue, cart management, order processing, and payment integration. Django REST Framework backend with NoSQL database. Mobile Development course project.AndroidDjango REST FrameworkNoSQL09 /PC Control via EmailRemote computer control via specially formatted emails. Custom email protocol handler processes commands for system operations, file management, and monitoring tasks. Built with Python socket programming for the Computer Network course.PythonSocket10 /Chess GameFull chess engine with standard rules, move validation, graphical two‑player gameplay, complete move history, and intelligent move suggestions. Built entirely in C++ for the OOP course, demonstrating inheritance, polymorphism, and encapsulation.C++11 /Search EngineHigh‑performance text search engine with efficient string matching algorithms, data indexing, fuzzy search, and result ranking. Processes large datasets of text files with fast query response. Arts of Programming course project.C++12 /Big IntegersArbitrary‑precision arithmetic library implementing addition, subtraction, multiplication, division, and logical operations on unbounded integers. Demonstrates algorithm optimisation, memory management, and number theory concepts in C++.C++
§ 02Publications

Peer‑reviewed research.

November 10, 2025The 14th International Symposium on Information and Communication Technology (SOICT 2025)● Accepted

Visionary: Optimized Temporal Video Retrieval via Large Language Model‑Enhanced Query Processing

Addressing the Ho Chi Minh City AI Challenge 2025, Visionary introduces four key contributions: a novel adaptive keyframe extraction algorithm; an enhanced pre‑processing pipeline using Qwen3‑VL for metadata generation with integrated OCR; a flexible architecture supporting multiple embedding models; and the use of Reciprocal Rank Fusion to synthesise retrieval results for complex, large‑scale video retrieval tasks.

December 13, 2025The 14th International Symposium on Information and Communication Technology (SOICT 2025)● Accepted

EnAug: ENT Endoscopy Images Classification Using Ensemble and Augmentation Methods

A robust classification framework for ENT endoscopy images based on an ensemble of deep learning models. A novel augmentation strategy combining symmetry‑based label flipping with Mixup, Mosaic, and other techniques addresses class imbalance. Evaluated on a curated ENT dataset covering seven anatomical categories, achieving 95.82% accuracy.

October 27, 2025The 33rd ACM International Conference on Multimedia (ACM MM 2025)● Published

EVENT‑Retriever: Event‑Aware Multimodal Image Retrieval for Realistic Captions

A multi‑stage retrieval framework combining dense article retrieval, event‑aware language model reranking, and caption‑guided semantic matching. Leverages Qwen3 for article search, Qwen3‑Reranker for contextual alignment, and Qwen2‑VL for image scoring, fused via RRF. Achieved top‑1 score on the private test set of Track 2 in the EVENTA 2025 Grand Challenge at ACM MM.

August 16, 2025The 2025 International Conference on Multimedia Analysis and Pattern Recognition (MAPR)● Published

SAMURAI: Shape‑Aware Multimodal Retrieval for 3D Object Identification

SAMURAI integrates CLIP‑based semantic matching with shape‑guided re‑ranking derived from binary silhouettes of masked regions, alongside a majority voting strategy. A preprocessing pipeline enhances mask quality by extracting the largest connected component and removing background noise, achieving competitive performance on the ROOMELSA private test set.

December 14, 2024The 13th International Symposium on Information and Communication Technology (SOICT 2024)● Published

NewsInsight2.0: An Enhanced Version Integrating Large Language Model‑Based Query Optimisation with Advanced Temporal Mechanisms

Built for the Ho Chi Minh AI Challenge 2024, NewsInsight2.0 leverages CLIP trained on a 5 billion‑parameter dataset (DFN‑5B), a refined temporal query mechanism, and an automatic query generator powered by open‑source LLMs for streamlined query optimisation.

December 8, 2024The 1st Large Vision–Language Model Learning and Applications Workshop, ACCV 2024● Published

An Approach to Complex Visual Data Interpretation with Vision‑Language Models

Adapted MMMU benchmarks and applied prompt engineering with a voting‑based ensemble method to enhance Large Vision‑Language Models' performance on complex visual data interpretation, achieving a top score of 0.85 in the LAVA Workshop 2024 challenge.

February 1, 2024The MediaEval 2023 Workshop, MMM 2024● Published

Transparent Tracking of Spermatozoa with YOLOv8

An efficient method for detection and tracking of spermatozoa using YOLOv8 trained on a COCO‑format dataset, contributing a transparent and reproducible pipeline for biomedical video analysis.