ISEP - DM – Engenharia de Inteligência Artificial
Permanent URI for this collection
Browse
Browsing ISEP - DM – Engenharia de Inteligência Artificial by Sustainable Development Goals (SDG) "09:Indústria, Inovação e Infraestruturas"
Now showing 1 - 10 of 36
Results Per Page
Sort Options
- Adversarial agent for synthetic data generation for phishing detectionPublication . CARDOSO, FRANCISCO FONSECA FERREIRA; Pereira, Isabel Cecília Correia da Silva Praça Gomes; Maia, Eva Catarina GomesPhishing attacks continue to be a significant security challenge, causing financial and reputational damage to organizations and individuals, with emails being the primary way for these attacks. While modern defenses continue to rely on phishing detection systems, their effectiveness is being challenged by the evolution of these attacks. Attackers are moving from generic emails to highly personalised and context-specific messages, which conventional models struggle to detect. The performance of these systems is mostly limited by the scarcity of specialised, domain-specific training data needed to recognise such threats. This thesis tries to address this gap by introducing CANDACE, a modular framework designed to generate context-aware synthetic email messages to train and improve these detection systems. The main innovation of CANDACE comes from its dual Knowledge Graph (KG) architecture, which gives the generation process a contextual foundation. The first KG maps external, real-world information about an organization, while the second models its internal structure, such as employees and projects. A Small Language Model (SLM) then uses the information of these KGs, with other important components, such as URL, to generate an email message that is contextually relevant to the domain of the organization. The contributions of this work include the complete design, end-to-end implementation, and validation of the CANDACE pipeline. A case study in the Public Administration sector presents the framework’s ability to produce convincing, context-aware synthetic messages. The findings confirm that contextual grounding is essential for creating better and more focused training data. This research shows the need to move beyond generic emails datasets, to build more resilient detection systems capable of detecting the more sophisticated and personalised phishing attacks.
- AI-based synthesis of bacterial colony evolution imagesPublication . SILVA, MIGUEL ÂNGELO FERRAZ DA; Martinho, Diogo Emanuel Pereira; Marreiros, Maria Goreti CarvalhoThe growing demand for safety and efficiency in healthcare highlights the importance of optimising sterilisation procedures, where delays or errors can compromise patient outcomes. In this context, microbiological analysis of agar plates is a fundamental step, as it allows the identification of microbial growth that may compromise sterilisation quality. However, traditional inspection methods are time-consuming and rely heavily on manual observation, which limits their scalability in clinical environments. Meanwhile, Artificial Intelligence has demonstrated strong potential in image analysis and forecasting, offering opportunities to enhance microbiological analysis and support decision-making in healthcare workflows. This dissertation addresses the problem of detecting and predicting the growth of bacterial colonies on agar plates. Anticipating how colonies evolve is essential to evaluate contamination levels, yet this task remains challenging due to the natural variability of growth patterns, the occurrence of overlapping colonies, and the diversity of experimental conditions that affect microbial behaviour. To tackle this problem, an integrated application was developed and structured into three main modules. The first is a detection module that applies the YOLO object detection architecture to identify bacterial colonies from agar plate images. The second is a synthetic forecasting module based on convolutional autoencoders capable of predicting future colony states from early observations. The third is a contamination analysis module that translates predictions into interpretable indicators such as colony count, average size, growth rate, and coverage. Together, these modules form a complete pipeline designed to combine visual fidelity with biological relevance. The results show that the system can detect colonies with high accuracy, achieving a Precision of 99.1%, a Recall of 91.7%, and an F1 score of 95.3%. In addition, the forecasting module generated realistic predictions of colony growth, and the contamination analysis provided meaningful metrics across different experimental conditions. The exploration of different temporal intervals revealed complementary trade-offs between predictive detail and biological plausibility, reinforcing the flexibility of the proposed methodology. The main conclusion of this dissertation is that Artificial Intelligence can be effectively applied to predict microbial growth in laboratory settings. By integrating detection, forecasting, and contamination analysis within a single framework, this work establishes a technological foundation that supports the transition to more intelligent sterilisation workflows and contributes to the broader vision of safe, efficient, and smart healthcare environments.
- Ai-driven emotion recognition for mental health diagnoses: Assessing mental health through emotional state evaluationPublication . PRETO, PEDRO MIGUEL PERES; Conceição, Luís Manuel Silva; Figueiredo, Ana Maria Neves Almeida BaptistaMental health conditions remain a concerning challenge across the globe, requiring timely and reliable approaches to correctly make accurate diagnoses and effective interventions. Traditional assessment methods often rely on subjective self-reports and clinical interviews, which may not always capture the full spectrum of an individual’s emotional state. In this context, computational techniques for emotion analysis provide a complementary perspective by identifying patterns in facial expressions, speech, and language. This dissertation evaluates the potential of multimodal emotional state analysis and its contribution to mental health assessment, through the development of a computational application. A systematic review was conducted to evaluate existing methodologies and highlight their strengths, limitations, and applicability in clinical contexts. Building on this review, the present work explores an integration of visual, vocal, textual patterns, assessing the contribution of their combined capacity to improve the consistency and depth of emotional interpretation. An analysis centered on methodological design was conducted by applying techniques such as preprocessing, fine-tuning, and data augmentation on the datasets to enhance the model’s capacity. Ethical and security considerations were also incorporated to strengthen system robustness and ensure responsible deployment in the market. The proposed solution consists of an artificial intelligence based multimodal system that integrates the analysis of emotions present in facial expressions, voice, and text patterns to provide a comprehensive assessment of the user’s emotional state. The application’s modular architecture enables real-time processing and the generation of clinical reports. The experimental validation of the system revealed promising results across several DSM-5 domains, the clinical reference manual that defines diagnostic criteria for mental disorders cases. High F1-scores were recorded in domains such as Anger (0.84) and Personality Functioning (0.87), while more subtle domains, such as Dissociation (0.43) and Repetitive Behaviors (0.52), revealed more modest performance. The overall analysis resulted in an observed agreement level of 71.9% and a Cohen’s Kappa of 0.42, indicating moderate agreement with the DSM-5. The findings underline the promise of computational emotion analysis as a supplementary tool for mental health professionals, while also emphasizing the importance of critical evaluation of its limitations and careful integration into clinical practice.
- Application of NLP techniques for the optimization of SQL driven data analysis in ERP softwarePublication . VIOLANTE, DIOGO DE SÁ; Martinho, Diogo Emanuel PereiraData management in Industry 4.0 has become a growing complexity process, leading to industries increasingly relying on large-scale datasets, which results in traditional analysis methods becoming inefficient and even inaccessible for end users. Enterprise Resource Planning systems deal with heterogeneous data from multiple modules and processes, which creates a need for more accessible and sophisticated tools. Recently, the growth of Artificial Intelligence solutions has played a central role in addressing these challenges. Fields like Natural Language Processing, Computer Vision and Machine Learning have helped the development of systems that create more value from complex datasets, making information more manageable across industrial environments. The objective of this thesis is the exploration, implementation and validation of NLP solutions with generative capabilities that can integrate into these systems, by proposing a solution that aims at providing a more efficient and optimized way of analyzing SQL data, through a pipeline that transforms user natural queries into SQL queries used for data retrieval. A conversational chatbot, capable of translating natural language queries into SQL statements, was developed, with the central feature of this project being a RAG component used to search files with database tables schema to provide context to a LLM, for it to generate SQL statements that can be used to retrieve information, without compromising the user experience or the database itself. The user’s intent is detected and the RAG component is adapted according to it. A mechanism to search the Web for information was also developed, to help provide context, when there is not enough to create a valid answer. The generated queries are analyzed, to prevent potential dangers for the integrity of the database and, if they are considered as valid, they are persisted by another component, to be used in future context to formulate other queries. The chosen LLM model, as the backbone for this pipeline, allows not only for the generation of the queries but also for providing text answers for several matters, including user manuals or simple informal conversations, depending on the need. Also, it’s multi-language support helps in enhancing the overall user experience and accessibility. A test set with real-world examples was created, to help validate the system, by using evaluation metrics like Exact Match Accuracy, Execution Accuracy and Valid Efficiency Score. A manual validity test was also conducted, to determine if the queries that did not achieve a good Exact Match Accuracy score, could still be considered as valid, given the ambiguity of the SQL language. The results demonstrate that the system is capable of handling queries with simple to medium complexity, but needs further optimization for higher ones. This helps to conclude that NLP-driven text-to-SQL solutions can enhance data accessibility for both technical and non-technical users, while compliance with privacy and security requirements.
- ARMS: Augmented Reasoning Multi-Agent SystemPublication . OLIVEIRA, FRANCISCO LUÍS PEREIRA; Gomes, Luís Filipe de OliveiraThe developments brought by the transformer architecture have sparked a technological revolution that created a wide range of possible use cases where these models are employed as individuals responsible for handling a diverse array of tasks, from chatbots to more deterministic such as API call and control. However, due to the novelty of these models there has been a lack for standardization when developing proper, controlled real implementations. It is assumed that the present time is considered to be an alchemy-like stage of large-language model usage, and many dierent innovations are born almost every day and everywhere around the world. Great investments are also being made on the field, and there has never been better time to dedicate eorts into discovering and exploring the limits and capacities of this technology of the future. Multi-agent systems belong to a domain of artificial intelligence that has been in the development for many years resulting in refined and mature architectures, communication protocols, and implementation paradigms. However, implementation might sometimes be diicult due to the overhead required in orchestrating proper communication protocols, decision engines, and agent architecture. Furthermore, agent-to-human communication is not always seamless since most agents have programmatic-machine language which might not be easy for actors that are not contextualized or are technically inclined to interact with. This dissertation proposes a system that aims to fuse the capabilities of large-language models to communicate through natural language and rationalize inputs with the capabilities that distributed multi-agent systems oer to resolve tasks that might be present in industrial and smart-building scenarios. Moreover, through the implementation of specific pieces of hardware, referred below as tools, the proposed system tries to increase the degree of impact that decisions made by large-language models have in the environment around them. The system proposed, named “Augmented Reasoning Multi-Agent System” (ARMS), also allows users to communicate directly with agents through natural language conversations facilitating information and desire exchange. Agent-to-Agent communication is also deeply investigated and controlled using specific techniques to manage communication flow and objective-oriented exchanges. Besides a review of the state-of-the-art on topics related to the solution that culminates in a discussion about large-language model-powered agents vs traditional agents, this thesis includes five dierent that test the solution: basic task delegation, interconnected agents, user registration system, vacation system, and building control. Each of these case studies were built incrementally, meaning that the most basic and core principles were firstly tested on the first use cases, culminating on a final one that integrated multiple components previously tested at a large scale. The results from the case studies demonstrated positive results in achieving a multi-agent system that can manipulate the world around it and establish human communication as needed, leveraging large-language models’ capabilities for the decision-making processes, as well as inter-connection.
- Assistente virtual inteligente para acesso a dados de negócioPublication . FRANCO, GUILHERME LIMA; Conceição, Luís Manuel SilvaModern organisations increasingly struggle to access and interpret enterprise data that is dispersed across isolated Business Information Systems (BIS). These silos hinder the ability to obtain a unified view of information, which is essential for timely and informed decisionmaking. Advances in Large Language Models (LLMs) offer the possibility of querying such data in natural language, thereby lowering the technical barrier for business users. However, the adoption of these models in corporate environments is constrained by concerns over data privacy, regulatory compliance, and the high operational costs of cloud-based solutions. These challenges underline the need for on-premises, resource-efficient approaches that preserve control over sensitive information. This dissertation presents an intelligent virtual assistant that answers business questions by orchestrating Model Context Protocol (MCP) tools to inspect schemas, draft explicitprojection SQL, validate read-only execution, and ground responses in results from a local Microsoft SQL Server instance of AdventureWorksDW2022. No model fine-tuning is performed; instead, the approach combines runtime schema filtering, deny-list validation, and prompt scaffolding to minimise hallucinations and enforce governance. A controlled evaluation over 52 representative prompts compares three configurations: a prompt-only baseline (B0), MCP with unfiltered schemas (B1), and a curated setup with filtering and explicit projections (S). The curated configuration yields substantially higher execution accuracy and fewer schema-error incidents than both baselines, demonstrating that governed tool use materially increases correctness without relaxing the privacy posture on a single on-premises workstation. Latency observations are reported descriptively and are attributable primarily to model generation rather than orchestration. These findings support the feasibility of privacy-preserving, on-premises conversational analytics under the EU General Data Protection Regulation (GDPR) and the EU Artificial Intelligence Act (Regulation (EU) 2024/1689), and suggest practical next steps: broadening schema coverage, refining curation policies, and exploring lighter local models and decoding strategies to improve interactivity.
- Automatização da extração e normalização de custos aéreos no setor logístico: Um modelo de inteligência artificial baseado em NLPPublication . GONÇALVES, DANIEL CARVALHO; Gomes, Luís Filipe de OliveiraAir cargo is a critical pillar of modern supply chains, enabling short lead times and timesensitive operations. However, the information required for quoting and selecting carriers reaches operators in heterogeneous, hard-to-compare formats, which increases operational effort and hinders consistent, auditable decisions. In this context, fast and reliable normalisation of tariff data becomes a source of competitiveness in the logistics sector. The specific problem addressed in this work is the extraction and consolidation of essential attributes (e.g., origin, destination, service type, quantity, and unit of measure) from carrier tender files received in multiple, non-standardised formats. The existing manual process is time-consuming, prone to human error, and limits comparative analyses and timely responses. As a proposed solution, we present a work model centred on clear, systematic instructions for reading and extraction, supported by validation rules and a canonical schema that standardises the critical fields. The approach prioritises robustness to document variability and decision traceability, reducing reliance on manual processes without resorting to technologyspecific descriptions at the core of the proposal. The application was demonstrated in three representative case studies: (i) complete files (the “happy path”); (ii) documents with missing attributes; and (iii) scenarios with no relevant information. In each case, the solution performed extraction and normalisation to subsequently generate uniform, comparable files, enabling operational analysis and integration into existing workflows. The results show substantial gains: a 97–98% reduction in processing time compared with the manual method and per-file savings between €8.13 and €42.81, depending on case complexity. We conclude that the proposed approach improves the efficiency, consistency, and scalability of the air-carrier selection process, strengthening decision quality and data governance. Limitations include dependence on document quality and extreme format variability, which inform future work.
- Bridging automation and customization: MLOps in recommender system developmentPublication . JORDÃO, MIGUEL JOSÉ RIBEIRO; Pereira, Isabel Cecília Correia da Silva Praça GomesRecommender systems have become essential in modern digital platforms, supporting decision making and personalization across domains such as e-commerce, media, and enterprise applications. At BMW Group, MyWorkplace (MWP) is a centralized hub managed by Critical Techworks (CTW) that provides access to hundreds of internal tools. Discoverability remains challenging given the size and heterogeneity of the tools catalog. This creates inefficiencies, highlighting the need for a scalable, reliable, and auditable recommendation solution. This project presents an MLOps-first approach for a recommender grounded in the CRISPML(Q) process model. It characterizes the recommendation problem, available data sources, and success criteria, and proposes a reference architecture integrating automated ETL, feature preparation, containerized training and serving, and CI/CD for continuous delivery. Several content-based approaches are implemented and evaluated under realistic data constraints using established ranking metrics; collaborative and hybrid extensions are outlined for future phases once interaction feedback becomes available. The contributions of this work are both technical and methodological: the design and validation of a recommendation strategy for the hub platform; an assessment of operational and governance requirements, including security and compliance, and the demonstration of the system in a real-world industrial environment. In addition to the deployment within BMW Group, this project advances the understanding of how MLOps principles can be applied to balance automation and customization in recommender systems. Results indicate that an MLOps-first design improves scalability, maintainability, and auditability, and lays the groundwork for collaborative filtering, feedback loops, and, when governance permits, large language model components. The system and methodology are applicable to enterprise-scale recommendation scenarios with similar operational constraints.
- Deep learning for monocular visual odometry: From sequential pose regression to self-attention learningPublication . DATSENKO, DARYNA; Dias, André Miguel PinheiroMonocular visual odometry (VO) estimates the position and orientation of a moving system using images from a single camera. It is widely used in robotics, autonomous driving, and UAVs. Compared to stereo or LiDAR systems, monocular VO avoids extra hardware, but it faces challenges such as scale ambiguity, sensitivity to lighting changes, and poor generalization to new environments. Deep learning has recently become a promising approach, as it allows networks to learn motion and geometry directly from images. This thesis studies deep learning methods for monocular VO. First, a simple CNN–LSTM baseline inspired by DeepVO is evaluated. This model works well on KITTI with Absolute Trajectory Error(ATE): 37.14 m; scale recovery: 0.998) and trains relatively fast, but it fails to converge on more dynamic or indoor datasets like TartanAir and EuRoC MAV, showing the limitations of learning pose from images alone. To improve performance, the model is gradually extended with self-attention and an auxiliary depth prediction branch, forming a multi-task framework that jointly learns pose and depth. This adds geometric constraints that reduce scale drift and improve trajectory consistency. The training strategy combines synthetic pretraining on TartanAir, using perfect depth supervision, with fine-tuning on EuRoC MAV using pseudo-depth maps. Experiments show significant improvements: on EuRoC V102, the multi-task model achieves an ATE of 0.825 m over a 42.53 m path, closely matching the ground truth (40.12 m) with a scale recovery of 1.059. These results outperform classical methods like ORB-SLAM3 and approach state-of-the-art learning-based approaches. The two main contributions of this work are: first, proposing and testing a framework that gradually moves from simple CNN–LSTM pose regression to a multi-task model with depth and self-attention; second, analyzing the benefits and limitations of this approach. The results show that depth supervision, even if not perfect, stabilizes motion estimation and improves consistency, pointing to promising directions for learning-based pose estimation in complex environments.
- Desenvolvimento de um assistente robótico para localização e recuperação de objetos em ambientes domésticosPublication . MOREIRA, JOÃO PEDRO VIEIRA COELHO; Martinho, Diogo Emanuel PereiraO envelhecimento progressivo da população coloca desafios crescentes na promoção da autonomia e na manutenção da qualidade de vida dos idosos. A robótica de assistência tem ganho destaque como resposta a estes desafios, oferecendo soluções capazes de apoiar a realização de tarefas quotidianas e de reduzir a dependência de terceiros em ambientes domésticos. Esta dissertação apresenta o desenvolvimento de um assistente robótico para localização e recuperação de objetos em casa, concebido para apoiar os utilizadores na execução de atividades simples, mas relevantes para a vida diária. O robô foi projetado para operar em espaços domésticos estruturados, integrando algoritmos de navegação, visão computacional baseada em deep learning e mecanismos de interação homem-robô. A solução garante adaptação a pequenas variações no posicionamento dos objetos e na configuração do espaço, assegurando flexibilidade e utilidade prática. A solução desenvolvida baseou-se numa leitura compreensiva da literatura e do estado da arte, organizada e conduzida segundo a metodologia PRISMA, o que assegurou uma revisão sistemática e a seleção fundamentada das tecnologias mais adequadas. O sistema foi concebido e testado exclusivamente em ambientes reais, permitindo avaliar o seu desempenho em condições práticas e representativas de um contexto de utilização doméstica. Os resultados obtidos evidenciam o potencial da solução para apoiar a autonomia dos utilizadores e reduzir a dependência de terceiros, mas enquadram-se ainda no âmbito de um protótipo experimental. O sistema demonstrou que é possível implementar um assistente robótico funcional em ambiente real mesmo com recursos de hardware limitados, desde que sustentado por uma arquitetura de processamento distribuído e por técnicas adequadas de visão computacional, navegação e manipulação. Esta prova de conceito confirmou a viabilidade da abordagem e estabeleceu uma base sólida para trabalhos futuros, nos quais poderão ser exploradas melhorias ao nível da perceção, da robustez da navegação e da adaptação a cenários domésticos mais complexos.
