Repository logo
 

ISEP - DM – Engenharia de Inteligência Artificial

Permanent URI for this collection

Browse

Recent Submissions

Now showing 1 - 10 of 80
  • Remaining useful life prediction on the NASA CMAPSS dataset comparing LSTM and transformer models
    Publication . GUILHERME, DAVID NUNO VILAS BOAS; Ramos, Carlos Fernando da Silva
    Predictive maintenance has been gaining importance in industry, especially in complex and critical systems, such as turbofan engines used in aviation. The main objective on this dissertation is the prediction of the Remaining Useful Life (RUL) of jet engines, using the dataset provided by NASA, known as Commercial Modular Aero-Propulsion System Simulation (CMAPSS). Accurate RUL estimation reduces maintenance costs, prevents unexpected failures and improves operational safety. This research began with a detailed dataset analysis, exploring its different subsets, each representing distinct operating conditions and fault modes. Data preprocessing was then performed, including normalization, feature selection, and construction of temporal sequences. Feature selection techniques were also applied, such as low variance and high correlation filters as well as Boruta method, to reduce the number of features used. Thus, only selecting variables with real impact on RUL were employed in model training. Subsequently, two models were implemented based on architectures studied in the literature. The first model, based on Long Short-Term Memory (LSTM) networks, leverages their ability to capture long term temporal dependencies. The second model was a Transformer, whose main innovation lies in the attention mechanism. Experimental results were evaluated using the Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE) metrics. The LSTM model achieved competitive performance on FD001, confirming prior studies that highlight it as a robust baseline for simple scenarios. The Transformer showed an advantage on the FD002 subset. However, in more complex subsets such as FD004, the performance of both models converged, reflecting the remaining challenges in generalizing these models. The comparison between LSTM and Transformer revealed that LSTMs are more consistent in controlled scenarios with simple, well defined operating conditions. The transformer demonstrated potential in datasets with greater variability, such as FD002, although its results were not consistent across all subsets. These contrasts reinforce the idea that, in their current state, LSTMs remain a dependable choice, while Transformers still face generalization challenges. Nevertheless, the literature points to future improvements, particularly through the implementation of hybrid architectures or specialized variants, which may overcome these limitations. In summary, this dissertation contributes to the advancement of knowledge in predictive maintenance by providing a comparative analysis between two of the most relevant architectures. The results reinforce the need to continue exploring innovative model and methodology combinations to develop prognostic systems that are increasingly accurate, interpretable and applicable in real industrial scenarios.
  • Deep learning for monocular visual odometry: From sequential pose regression to self-attention learning
    Publication . DATSENKO, DARYNA; Dias, André Miguel Pinheiro
    Monocular visual odometry (VO) estimates the position and orientation of a moving system using images from a single camera. It is widely used in robotics, autonomous driving, and UAVs. Compared to stereo or LiDAR systems, monocular VO avoids extra hardware, but it faces challenges such as scale ambiguity, sensitivity to lighting changes, and poor generalization to new environments. Deep learning has recently become a promising approach, as it allows networks to learn motion and geometry directly from images. This thesis studies deep learning methods for monocular VO. First, a simple CNN–LSTM baseline inspired by DeepVO is evaluated. This model works well on KITTI with Absolute Trajectory Error(ATE): 37.14 m; scale recovery: 0.998) and trains relatively fast, but it fails to converge on more dynamic or indoor datasets like TartanAir and EuRoC MAV, showing the limitations of learning pose from images alone. To improve performance, the model is gradually extended with self-attention and an auxiliary depth prediction branch, forming a multi-task framework that jointly learns pose and depth. This adds geometric constraints that reduce scale drift and improve trajectory consistency. The training strategy combines synthetic pretraining on TartanAir, using perfect depth supervision, with fine-tuning on EuRoC MAV using pseudo-depth maps. Experiments show significant improvements: on EuRoC V102, the multi-task model achieves an ATE of 0.825 m over a 42.53 m path, closely matching the ground truth (40.12 m) with a scale recovery of 1.059. These results outperform classical methods like ORB-SLAM3 and approach state-of-the-art learning-based approaches. The two main contributions of this work are: first, proposing and testing a framework that gradually moves from simple CNN–LSTM pose regression to a multi-task model with depth and self-attention; second, analyzing the benefits and limitations of this approach. The results show that depth supervision, even if not perfect, stabilizes motion estimation and improves consistency, pointing to promising directions for learning-based pose estimation in complex environments.
  • Automatização da extração e normalização de custos aéreos no setor logístico: Um modelo de inteligência artificial baseado em NLP
    Publication . GONÇALVES, DANIEL CARVALHO; Gomes, Luís Filipe de Oliveira
    Air cargo is a critical pillar of modern supply chains, enabling short lead times and timesensitive operations. However, the information required for quoting and selecting carriers reaches operators in heterogeneous, hard-to-compare formats, which increases operational effort and hinders consistent, auditable decisions. In this context, fast and reliable normalisation of tariff data becomes a source of competitiveness in the logistics sector. The specific problem addressed in this work is the extraction and consolidation of essential attributes (e.g., origin, destination, service type, quantity, and unit of measure) from carrier tender files received in multiple, non-standardised formats. The existing manual process is time-consuming, prone to human error, and limits comparative analyses and timely responses. As a proposed solution, we present a work model centred on clear, systematic instructions for reading and extraction, supported by validation rules and a canonical schema that standardises the critical fields. The approach prioritises robustness to document variability and decision traceability, reducing reliance on manual processes without resorting to technologyspecific descriptions at the core of the proposal. The application was demonstrated in three representative case studies: (i) complete files (the “happy path”); (ii) documents with missing attributes; and (iii) scenarios with no relevant information. In each case, the solution performed extraction and normalisation to subsequently generate uniform, comparable files, enabling operational analysis and integration into existing workflows. The results show substantial gains: a 97–98% reduction in processing time compared with the manual method and per-file savings between €8.13 and €42.81, depending on case complexity. We conclude that the proposed approach improves the efficiency, consistency, and scalability of the air-carrier selection process, strengthening decision quality and data governance. Limitations include dependence on document quality and extreme format variability, which inform future work.
  • Hybrid sentiment-based recommender system for e-commerce
    Publication . MOREIRA, ANA PATRÍCIA MANTA; Santos, Joaquim Filipe Peixoto dos
    This dissertation presents a personalized product recommendation system designed for the use in e-commerce, which has a lot of sentiments inside reviews made by users. Conventionally, most recommendation systems tend to ignore the ne details and emotive sentiments in this user-contributed texts, often relying on numerical ratings or demographical information such as gender or age. This can result in suggestions not entirely in line with a user's genuine interest or emotional reaction to products. A user could have assigned a high numerical rating but still express discontent with a particular aspect such as the durability or the cost in the review, this detail is often lost in the case of most standard methods. This initiative has been created in partnership with the Cognitively Smart Assistant in Physical-Digital Environment (CAPE) project. CAPE is a joint initiative, subsidized by the European Regional Development Fund (ERDF), dedicated to the triple transformation of the retail market: sustainability, digitalization, and evolution of skills. In preparation for this system, a systematic literature review took place, which served as essential groundwork for the state of the art of the current e-commerce recommendation systems, including those with emotions, Arti cial Intelligence (AI), and Natural Language Processing (NLP). This review was structured around speci c research questions about how emotions impact the recommendation, the most e cient methods of segmenting the user for the highest possible sales, and possible weaknesses and the ethics in the recommendation systems. This systematic review followed the PRISMA protocol. The dataset chosen for the project was the Amazon Electronics dataset, chosen for the wide range of electronic products and the large collection of reviews, which are both ideally suited for the needs of the CAPE project. Because of its large size, there was a critical preprocessing of the data to improve the quality of the data and accelerate the model training. The suggested system involves the use of the strengths of BERT for learning the context of the words and the LSTM networks for dealing with long-term dependency in sequential information. A range of this architectures (V1 to V6) was created in order to nd the optimum point of results and time, which was measured by the Mean Squared Error (MSE) metric. It was also placed a particular focus on the consideration of ethics, seeking to overcome common problems such as the avoidance of social biases. Data protection legislation such as the GDPR (General Data Protection Regulation) and the AI Act (Arti cial Intelligence Act) are complied with by the use of an anonymized public dataset and by the use of the pseudonymous user IDs with the avoidance of sensitive personal attributes in order to reduce bias. The ultimate goal of this research is to improve purchase probability and overall user satisfaction, providing increased customer loyalty by the use of a more accurate and ethically aware recommendation system.
  • Gestão inteligente e distribuída de comunidades de cidadãos
    Publication . SILVA, RAFAEL DUARTE PEREIRA DA; Gomes, Luís Filipe de Oliveira; Vale, Zita Maria Almeida do
    A utilização e integração de modelos inteligentes nos edifícios pode transformar as experiências dos utilizadores dentro do edifício, proporcionando a otimização dos espaços e formas eficientes de utilizar e interagir com os recursos do edifício. A utilização de soluções inteligentes traz alguns desafios que devem ser estudados, como a heterogeneidade entre os recursos e a necessidade de adaptar os edifícios já existentes ao conceito de edifícios inteligentes. Embora os edifícios inteligentes possam revolucionar a forma como as pessoas utilizam e interagem com os espaços, o grupo de edifícios, ao criar comunidades, traz novas oportunidades para permitir que os membros interligados atinjam objetivos comuns, modelando papéis cooperativos, colaborativos e, por vezes, competitivos. Esta nova dinâmica em que os sistemas orgânicos podem comunicar e interagir também levanta desafios quanto à modelação dos utilizadores, às suas preferências e à existência de infraestruturas comuns para permitir a implementação de modelos inteligentes ao nível da comunidade, edifício e utilizador. Esta dissertação tem como objetivo conceber, implementar, testar e validar uma infraestrutura baseada em containers, intitulada Caravels, que combina os conceitos de comunidades inteligentes e edifícios inteligentes para desenvolver uma solução sensível ao contexto que considera diferentes utilizadores e edifícios. A solução concebida emprega uma arquitetura distribuída para a gestão de comunidades inteligentes de cidadãos, onde cada membro opera como uma entidade autónoma, enquanto permanece interligado através de uma infraestrutura partilhada. A arquitetura permite serviços tanto a nível local como comunitário, sendo que um membro pode integrar serviços individuais, escolhidos especificamente para esse utilizador, ao mesmo tempo que contribui e beneficia de otimizações a nível comunitário. Central ao projeto está a modelação das preferências do utilizador em ambientes complexos, dinâmicos e multiutilizador. A dissertação explora os desafios psicológicos e cognitivos da representação de preferências, reconhecendo que os utilizadores têm dificuldades em articular ou priorizar as suas próprias preferências. Os modelos propostos podem adaptar-se ao longo do tempo, incorporando feedback e dados comportamentais para apoiar a tomada de decisões proativas e conscientes do contexto. As técnicas de inteligência artificial, incluindo a aprendizagem supervisionada, não supervisionada e por reforço, estão integradas em todo o sistema para permitir a análise preditiva, a otimização e o controlo autónomo. Para validar a arquitetura e as metodologias propostas, foram conduzidos vários estudos de caso em cenários realistas, refletindo as diferentes necessidades dos utilizadores, procura de energia e recursos distribuídos. Os resultados demonstram que o sistema pode modelar o comportamento do utilizador, apoiar a cooperação a nível comunitário e melhorar a eficiência e a inteligência geral do edifício inteligente. Os resultados desta dissertação contribuíram para seis publicações científicas, incluindo uma revista com um fator de impacto de 6,6.
  • MASterFLow: Cadeia de sistemas multiagente inteligentes para a criação de pipelines de aprendizagem automática e aprendizagem federada
    Publication . BARBARROXA, RAFAEL ALEXANDRE SILVA; Gomes, Luís Filipe de Oliveira
    The growing demand for secure, privacy-preserving AI solutions is particularly noticeable in domains such as renewable energy or healthcare, where sensitive data is involved. As society continues to transition to AI-driven systems, the need for decentralized machine learning systems has become increasingly evident. Traditional machine learning methods rely heavily on centralized datasets, often compromising privacy and security. Although federated learning addresses these concerns by enabling decentralized model training while maintaining data privacy, several challenges remain. These include the complexity of creating, configuring, and managing federated learning models, particularly when dealing with a large number of clients and different configurations. As federated learning grows in popularity, there is also a need for more automated solutions that can simplify this process for users with varying levels of expertise. This dissertation presents MASterFLow, a novel system that combines multi-agent systems with large language models to intelligently create machine learning models and federated learning federations. By integrating LLMs and Retrieval-Augmented Generation, MASterFLow provides an efficient way to configure, execute, and analyze FL training simulations. The system streamlines the process by allowing users to interact with intelligent agents that manage different tasks, such as configuring machine learning models, setting up federated learning simulations, and analyzing training logs. MASterFLow is designed with a user-friendly web-based interface that allows users to engage with the system’s agents and configure simulations according to their needs. Through extensive case studies, the dissertation benchmarks various multi-agent frameworks and demonstrates the effectiveness of combining multi-agent systems and large language models to automate the creation of machine learning and federated learning pipelines. The results indicate that MASterFLow provides a more accessible, secure, and adaptable alternative to traditional machine learning methods, offering improved efficiency and usability for AI development.
  • Técnicas avançadas de inteligência artificial para a deteção e rastreio de doenças gastrointestinais
    Publication . PEREIRA, HUGO SIMÃO DA ROCHA; Martinho, Diogo Emanuel Pereira
    As doenças gastrointestinais têm vindo a aumentar devido a vários fatores associados ao estilo de vida moderno, como por exemplo uma alimentação inadequada, sedentarismo e tabagismo. A colonoscopia continua a ser o método de referência para o diagnóstico de patologias intestinais, permitindo a deteção e tratamento de lesões. No entanto, a sua precisão depende fortemente da experiência do médico, resultando em variabilidade nos diagnósticos e potenciais atrasos na deteção de condições críticas. Para além disso, o procedimento de colonoscopia pode exigir a realização de biópsias invasivas, que, embora essenciais para diagnóstico definitivo, acarretam riscos e desconforto para os pacientes. Esta dissertação de tese explora a integração de técnicas de visão computacional e deep learning para otimizar a análise de colonoscopias, com o objetivo de melhorar a precisão na deteção de lesões e apoiar a tomada de decisão clínica. Através do uso de redes neuronais convolucionais (CNNs) e modelos de segmentação como o ResNet, DenseNet e Inception, esta investigação propõe o desenvolvimento de um sistema baseado em inteligência artificial capaz de identificar e classificar lesões colorretais com maior precisão e consistência. O sistema proposto visa complementar a experiência médica, reduzindo a variabilidade nos diagnósticos e otimizando os processos de rastreio. Os resultados desta dissertação mostram que os modelos híbridos, que combinam diferentes arquiteturas convolucionais, superaram os modelos baseados apenas em transfer learning, que apresentaram desempenhos insatisfatórios. A melhor performance foi alcançada pelo modelo híbrido ResNet + EfficientNet + DenseNet, com accuracy de 86,67%. Esses resultados sugerem que a abordagem híbrida é mais eficaz para a deteção de lesões gastrointestinais, podendo contribuir para diagnósticos mais rápidos e precisos, além de reduzir a necessidade de biópsias desnecessárias.
  • From relational waters to intelligent oceans: A lakehouse-centric approach to conversational artificial intelligence
    Publication . FIGUEIREDO, JOANA RODRIGUES; Gomes, Luís Filipe de Oliveira
    of handling large volumes of heterogeneous and unstructured data while enabling real-time intelligent decision-making. In the water management domain, where legacy systems and operational complexity often obstruct innovation, there is an increasing need to adopt artificial intelligencepowered solutions that promote efficiency, traceability, and accessibility. Responding to this challenge, this dissertation presents CLARA — a Conversational Lakehouse Architecture supported by Real-time Artificial intelligence. CLARA is a modular solution that integrates modern data infrastructures, artificial intelligence models, and natural language interaction to support intelligent management in water utility operations. CLARA was conceived and developed from scratch, following the data lakehouse paradigm to consolidate structured and unstructured data, such as field images. The infrastructure adopts a medallion architecture (Bronze, Silver, Gold) and includes pipelines for ingestion, loading, and transformation. Particular attention was given to documentation of transformations, and integration of flows for experiment tracking, enabling a robust foundation for artificial intelligence development and data governance. The solution currently features two artificial intelligence models that demonstrate how the lakehouse paradigm can support intelligent reasoning beyond conventional structured data processing. The first is an optical character recognition model, which enables the automated interpretation of water meter readings directly from field images, a type of unstructured data typically excluded from traditional storage systems. This model exemplifies how AI can be embedded into the data architecture to support validation and data quality assurance workflows. The second is a predictive model based on neural networks, designed to anticipate the symptom of the next operational intervention by analyzing historical maintenance sequences. Together, these models illustrate the potential of unifying data storage and artificial intelligence reasoning within a single environment. At the user interaction layer, a custom-built conversational assistant leverages a cascade of large language models to classify and respond to user queries in real-time. The system routes each input to one of four specialized modules: (1) to access structure data in real-time, (2) to execute and access artificial intelligence models, (3) to consult software support manuals, and (4) to provide fallback conversational support only on water-related topics. The assistant also integrates multilingual support and a semantic permission-verification mechanism that maps the user’s intent and role to the structure of the underlying database, preventing unauthorized actions. Developed in partnership with A2O – Água, Ambiente e Organização, Lda., and validated through four real-world case studies, CLARA demonstrated how a carefully orchestrated artificial intelligence pipeline, backed by an efficient data infrastructure, can modernize and improve decision-making, enhance transparency, and simplify access to complex systems through natural language.
  • Tetrahedron-Tetrahedron intersection and volume computation using neural networks
    Publication . PEDRO, ERENDIRO SANGUEVE NJUNJUVILI; Ramos, Carlos Fernando da Silva
    This thesis introduces a framework for fast, learning-based analysis of tetrahedron-tetrahedron interactions, combining scalable dataset generation with an efficient neural model. At its core is TetrahedronPairDatasetV1, a curated collection of one million labeled tetrahedron pairs with ground truth intersection status and volumes, filling a longstanding gap in geometry learning. Built on this dataset, we present TetrahedronPairNet, a neural architecture that adapts PointNet and DeepSets for processing tetrahedron pairs. The model simultaneously predicts intersection classification and intersection volume, achieving real-time performance: over 98% classification accuracy and a mean absolute error of ≈ 0.0012 in volume estimation (R2 = 0.68). It processes over 30,000 samples per second with full preprocessing—orders of magnitude faster than classical algorithms. Unlike traditional symbolic approaches, TetrahedronPairNet is robust to degenerate configurations and requires no handcrafted geometry logic. Its fully batched, differentiable design supports seamless integration into simulation pipelines, CAD tools, and learning-based physics engines. This work reframes geometric intersection as a data-driven inference task, laying the foundation for scalable, real-time, and intelligent geometry processing across computational design, simulation, AR/VR, and scientific computing.
  • Melhorar a deteção de anomalias em video com deteção de objeto
    Publication . PEREIRA, BRUNO ALVES; Soares, Pedro Miguel Machado
    Video Anomaly Detection (VAD) is a critical task in video surveillance and security systems, aiming to automatically identify events that deviate from normal patterns. These systems enable real-time monitoring, offer scalability for processing large volumes of data across diverse environments, and help reduce human error. Despite recent advances, most VAD models rely solely on spatio-temporal features. This project investigates the impact of incorporating contextual information, specifically object-level features, into the pipeline of a State of The Art (SoTA) VAD model. For this aim, we propose modifications in a SoTA model by presenting a new architecture that integrates object detection features. Intermediate and late fusion techniques were explored to determine the most effective method for combining object-level with spatio-temporal features used by the model. The experiments were conducted on a modified version of a SoTA dataset, adapted for weakly supervised training. The findings indicate that integrating object-level features enhances the performance of the baseline model, with improvements observed across three key metrics: Area Under the Curve (AUC), Average Precision (AP), and F1-score, particularly in the late fusion models. Freezing weights of the base model was shown essential to achieve the best results. However, the inclusion of the new channel introduced additional computational costs during training and a slight increase in inference time. Although these factors can affect the scalability of the project, they are not very significant since tasks can be parallelized, or executed in better hardware infrastructures. This work demonstrates that incorporating contextual cues from object detection into existing VAD frameworks can lead to better anomaly discrimination, paving the way for more reliable and context-aware surveillance systems.