Browsing by Author "NETO, RUI JORGE MACHADO"
Now showing 1 - 1 of 1
Results Per Page
Sort Options
- Resumo estruturado de vídeo usando LLMs e MLLMsPublication . NETO, RUI JORGE MACHADO; Pereira, Nuno Alexandre MagalhãesIn industrial environments, operational safety and efficiency depend heavily on timely detection of anomalies. This dissertation presents a complete, structured video summarization pipeline tailored to identify anomalies in industrial settings, using recent advances in Large Language Models (LLMs) and Multimodal LLMs (MLLMs). Beyond reviewing stateof-the-art methodologies in video captioning and anomaly detection, this work delivers a practical implementation combining intelligent frame sampling, context-aware captioning using advanced MLLMs such as gpt-4.1-mini and gemini-2.5-pro, and object detection via YOLOv11. A custom benchmark dataset with 100 Image-Question-Answer (IQA) triplets was developed to evaluate the perceptual capabilities of various MLLMs in industrial scenarios. Additionally, a novel "Model-as-a-Judge" framework was employed to assess models captioning and pipeline summarization quality beyond lexical metrics. The final pipeline achieved a summarization quality score of 0.72 and accurately detected five of six safety-critical anomalies in over an hour of our self-recorded, real-world CNC machine footage. The research has been recognized by the scientific community, being accepted for presentation at the SASYR Symposium. These contributions advance the field of applied Artificial Intelligence (AI) for industrial safety monitoring through a robust and efficient multimodal video analysis system.
