ISEP - DM – Engenharia de Inteligência Artificial
Permanent URI for this collection
Browse
Browsing ISEP - DM – Engenharia de Inteligência Artificial by advisor "Marreiros, Maria Goreti Carvalho"
Now showing 1 - 9 of 9
Results Per Page
Sort Options
- Anomaly behavior detection in webPublication . David, Gabriel Henrique Ribeiro; Marreiros, Maria Goreti CarvalhoIn the domain of web application development, JavaScript plays an important role in enhancing the productivity and interactivity of web applications. However, its flexibility and dynamic nature also introduce potential security risks. Attackers can exploit vulnerabilities in JavaScript to perform various malicious activities, such as data theft, injection attacks, and unauthorized web modifications, including data tampering. This work introduces a novel approach to enhancing the security of web applications by focusing on malicious behavior executed through client-side JavaScript. The core objective of this research is to develop a model capable of identifying anomalous behaviors caused by third-party scripts on web pages. To this end, the research conducts a comparative analysis of four distinct models: One-class SVM, Isolation Forest, Local Outlier Factor, and Autoencoders. To identify the most effective solution, these models are evaluated based on specific performance metrics, including Area Under the Curve (AUC) and F-score. The selected model is used to pinpoint irregularities indicative of potential security breaches or malicious activities. This research significantly advances the field of web application security by providing actionable insights to enhance real-time response capabilities. By addressing the growing threat posed by malicious JavaScript, this work contributes to the development of more robust security measures. The dissertation employs a multi-faceted methodology to ensure a comprehensive approach. Initially, a systematic review methodology is used for a structured and unbiased literature analysis, providing a thorough understanding of the current state of the art. The CRISP-DM framework is adopted for the development phase, facilitating continuous adaptation in response to evolving insights. A Comparative Analysis methodology rigorously evaluates different anomaly detection algorithms, ensuring their possible practical applicability in real-world scenarios. The findings demonstrate that the chosen model can effectively identify anomalies with high accuracy and minimal false positives. This research highlights the importance of integrating anomaly detection with existing Data Loss Prevention (DLP) solutions to monitor and protect sensitive data against cyber-attacks.
- Aplicação técnicas aprendizagem automática no cancro da mamaPublication . Santos, José Carlos Cordeiro Andrade; Marreiros, Maria Goreti CarvalhoO cancro da mama continua atualmente a ser um importante problema de saúde pública a nível internacional e nacional pelo que a problemática da sua abordagem continua a ter todo o interesse. Em Portugal, anualmente são detetados cerca de 7.000 novos casos de cancro da mama, e 1.800 mulheres morrem com esta doença. De acordo com a Norma da Direção-Geral da Saúde para abordagem imagiológica da mama feminina, todas as mulheres assintomáticas com idade compreendida entre 50 e 69 anos, devem realizar uma mamografia de rastreio a cada dois anos. Na presença de alterações morfológicas ou em mulheres com risco moderado a elevado de cancro da mama, o médico assistente pode sugerir antecipar a realização da mamografia e complementar a investigação diagnóstica com os métodos que achar necessários. Se o cancro for detetado precocemente, a probabilidade de o tratamento ser eficaz e bem-sucedido é muito mais elevada. A ressonância magnética é um exame de alta sensibilidade e especificidade moderada, sugerida em pacientes jovens, com aumento substancial do risco, i.e., que apresentam predisposição genética ou história familiar da doença. Este exame utiliza uma tecnologia à base de ondas de radiofrequência num forte campo magnético a fim de obter imagens mais detalhadas dos tecidos internos da mama, no entanto, o seu uso é limitado pela indisponibilidade (imediata) comparada com outros exames e preço associado e contraindicado em pessoas com claustrofobia, dispositivos metálicos como pacemakers ou próteses ou reações ao meio de contraste. Assim, esta tese tem como objetivo desenvolver uma ferramenta de aprendizagem automática com recurso a Redes Adversariais Generativas Cíclicas, capaz de converter uma imagem de mamografia numa semelhante ao produto de uma ressonância magnética, com o intuito de proporcionar uma melhor perceção do campo cirúrgico e aumentar os ganhos em saúde. O conjunto de dados foi cedido pelo Centro Hospitalar Universitário de São João e continha volumes de cortes transversais sucessivos de mamas. Neste caso, o corte seccional com área transversal máxima era o único com interesse para estudo, por isso, extraímos todas as localizações dos cortes para obter os cortes mediais respetivos das mamas. As Redes Adversariais Generativas são pares de sistemas de Inteligência Artificial treinados para criar conteúdo e realizar tarefas mais rapidamente do que um único sistema. Nesta tese, estas realizam a tradução para uma imagem com base noutra singular não emparelhada, ou seja, uma imagem semelhante ao produto de uma ressonância magnética com base numa mamografia, sem imagem de ressonância magnética correspondente. As ferramentas métricas de Medida do Índice de Similaridade Estrutural e de Relação Sinal-Ruído de Pico foram usadas para avaliar a qualidade da imagem sintetizada em relação à imagem real. Com o valor de 0.69667, o valor obtido pela medida do índice de similaridade estrutural indica alta similaridade da imagem criada com a de referência. Quanto à relação sinal-ruído de pico obtida de 31.805 dB, usada para quantificar a qualidade da imagem reconstruída a partir de uma imagem original que sofreu compressão, encontra-se dentro do intervalo de valores típicos. Embora as ferramentas métricas forneçam um resultado quantitativo do desempenho, a melhor resposta que obtivemos foi visual. As imagens sintéticas obtidas apresentam uma aparência visualmente realista, embora seja possível detetar nestes alguns artefactos, devido à diferente forma de captação de imagem pelos diferentes exames e definição inferior dos exames originais usados como base em comparação com a ressonância magnética. Em conclusão, a partir de um conjunto de dados com 57 imagens obtidas por mamografia, em perfil cefalo-caudal, foi possível gerar imagens sintéticas da estrutura mamária semelhantes ao produto da ressonância magnética baseadas em mamografia implementando e testando modelos de rede adversarial generativa, usando dados não emparelhados, como demonstrado pelas diversas métricas e verificações gráficas.
- Application of active learning on medical images to enhance machine learning modelsPublication . Santos, Maria Inês Salvador dos; Marreiros, Maria Goreti CarvalhoArtificial intelligence has made some huge advancements in the healthcare field, particularly in medical imaging. However, data and annotations in this area are often scarce and expensive to obtain. Labeling images, although essential for machine learning models, is a tedious and time-consuming task. Active learning addresses this challenge by selecting informative samples to try and create a subset of unlabeled data where the model could have more difficulty predicting the labels which are then given to experts to annotate. The goal is to try to use less amount of annotated data, whilst still getting a good model performance. Breast cancer is one of the most common cancers in women. The proposed solution uses the Patch- Camelyon dataset, a variation of the Camelyon16 dataset with patches from histopathologic scans of sentinel lymph node sections for the detection of metastatic tissue of breast cancer patients. This work proposes an active learning approach that includes the division of the unlabeled data into clusters which are then classified based on their level of informativeness (based on Shannon Entropy). Then, from each cluster several samples are selected based on the previously defined informativeness level and each sample is scored based on a formula that includes both entropy and Euclidean distance to the cluster centroid. Finally, samples with the lowest uncertainty score are added to the training dataset with the model’s prediction. The proposed method includes both model uncertainty and data distribution. The solution showed promising results when compared with a random sampling approach. To evaluate the proposed solution, greyscale and Macenko normalization techniques were used in all different approaches (random sampling approach, a variation of the proposed solution with no pseudo label task and the proposed solution). In some iterations, the difference between the F1 score in the proposed active learning solution and random sampling was more than 0,20. With the application of this method, experts can spend less time annotating images while still achieving a high-performance model.
- Benchmark aplicado à Deteção de Objetos de Mamoas Arqueológicas a partir de dados LiDARPublication . Silva, Miguel Ribeiro Vilar Brás da; Marreiros, Maria Goreti CarvalhoHuman history and its archaeological evidence are priceless and should be preserved, esteemed and respected. However, the traditional work of an archaeologist is mainly manual labour, sluggish and requires specialized knowledge as well as considerable experience, which represents quite a limitation due to the available community of archaeologists. Besides this fact, concerns about global warming, the generalized rise of sea levels or destruction due to human activities, among others, contribute to a growing fear of losing some archaeological sites as the traditional method of identification and preservation of these sites can’t keep up with the propagation speed of such problems. Because of this, a growing willingness to implement Artificial Intelligence techniques has been evidenced, which allows some help to the archaeologist in several tasks, with particular focus to archaeological sitting identification, through remote detection. Currently, there are no applications or tools that can execute such work, however, there has been a growing effort in studies and work on a scientific and academic level. This thesis aims to implement a tool that, through LiDAR data readings, gathered from some geographical area, can perform object detection to specific archaeological findings (such as mounds), testing a variety of machine learning models to, assigning a classification, determine if it’s in the presence of an archaeological mound. The input of the work done for this thesis consists of a Digital Terrain Model (DTM), a Local Relief Model (LRM) and a Slope obtained from drone flights over Viana do Castelo, with the use of LiDAR sensors. The combination of these three images was processed to achieve a single image with higher identification of certain features for future model training. For comparison reasons, two datasets were built with different margin sizes around each archaeological mound. The goal of the thesis is to perform tests on some object detection architectures, compare the efficiency of their evaluations and be able to determine which of the tested models performs a better prediction result on detecting the presence of an archaeological mound. This study was able to perform the comparison of a total of nine Deep Learning (DL) architectures, testing four two-stage detectors and five one-stage detectors. As expected, most of the two-stage detectors outperformed the one-stage detectors in terms of mean average precision for the detection of archaeological mounds, except for the one stage detector Fully Convolutional One-Stage (FCOS), which achieved the highest mean average precision from all, showing results between 68.1% to 78.6% for both size dataset.
- Detection and Classification of Anomalies in Railway TracksPublication . Magalhães, José Pedro da Silva; Marreiros, Maria Goreti CarvalhoEm Portugal, existe uma grande afluência dos transportes ferroviários. Acontece que as empresas que providenciam esses serviços por vezes necessitam de efetuar manutenção às vias-férreas/infraestruturas, o que leva à indisponibilização e/ou atraso dos serviços e máquinas, e consequentemente perdas monetárias. Assim sendo, torna-se necessário preparar um plano de manutenção e prever quando será fundamental efetuar manutenções, de forma a minimizar perdas. Através de um sistema de manutenção preditivo, é possível efetuar a manutenção apenas quando esta é necessária. Este tipo de sistema monitoriza continuamente máquinas e/ou processos, permitindo determinar quando a manutenção deverá existir. Uma das formas de fazer esta análise é treinar algoritmos de machine learning com uma grande quantidade de dados provenientes das máquinas e/ou processos. Nesta dissertação, o objetivo é contribuir para o desenvolvimento de um sistema de manutenção preditivo nas vias-férreas. O contributo específico será detetar e classificar anomalias. Para tal, recorrem-se a técnicas de Machine Learning e Deep Learning, mais concretamente algoritmos não supervisionados e semi-supervisionados, pois o conjunto de dados fornecido possui um número reduzido de anomalias. A escolha dos algoritmos é feita com base naquilo que atualmente é mais utilizado e apresenta melhores resultados. Assim sendo, o primeiro passo da dissertação consistiu em investigar quais as implementações mais comuns para detetar e classificar anomalias em sistemas de manutenção preditivos. Após a investigação, foram treinados os algoritmos que à primeira vista seriam capazes de se adaptar ao cenário apresentado, procurando encontrar os melhores hiperparâmetros para os mesmos. Chegou-se à conclusão, através da comparação da performance, que o mais enquadrado para abordar o problema da identificação das anomalias seria uma rede neuronal artifical Autoencoder. Através dos resultados deste modelo, foi possível definir thresholds para efetuar posteriormente a classificação da anomalia.
- Diabetic-Friendly Multi-Agent Recommendation System for Restaurants based on Social Media Sentiment Analysis and Multi-Criteria Decision MakingPublication . Teixeira, Bruno César Jantarada; Marreiros, Maria Goreti CarvalhoLifestyle, poor diet, stress, among other factors, strongly contribute to aggravate people's health problems, such as diabetes and high blood pressure. Some of these problems could be avoided if some of the essential recommendations for the practice of a healthy lifestyle were followed. This dissertation proposes a solution designed improve the quality of life of diabetic patients, more specifically in the context of finding restaurant in the nearby location, that are more suitable for the health needs of this patients. A diabetic-friendly feature that will combine multi-criteria decision making built through Multi-Agent System (MAS) that considers the user preferences initially recorded, and that provides the user with three category recommendations to potentially benefit and improve the user lifestyle and health. The solution proposes the use of Case-Based Reasoning algorithm to enable the solution to evolve and improve in each interaction with the user. Sentiment Analysis was also used for identifying the restaurant reviews score, since this is one of the defined criteria for the solution.
- Mineração Preditiva de Processos - Otimização de processos de negócio através de técnicas preditivasPublication . Silva, Eduardo Coelho da; Marreiros, Maria Goreti CarvalhoThe complexity of business processes has reached an all-time high and the environments in which organisations operate have never been so competitive and dynamic. This created the need for business processes to be continuously analysed, improved, and supported by an adequate set of tools and techniques, which led to the conception of Process Mining (PM). Predictive Process Mining (PPM) emerges as the integration of PM with predictive mecha nisms, with the goal of enabling more proactive decision-making and problem-solving, com pared to the reactive approach adopted with traditional PM. This dissertation aims to raise awareness of its benefits and increase its adoption, by studying real-world applications of PPM in a multinational organisation. During this work, interviews conducted with key business users led to the conclusion that PPM, from a management perspective, not only improves process transparency and under standing of its complexities, but also allows the future behaviour of ongoing processes to be predicted and actions taken to align them with business interests. Regarding operations, the interviewees expect the change to a more proactive approach to lead to an improvement in process efficiency, resource management, and performance metrics (e.g., user satisfaction, lead time), as a result of smoother process execution, with reduced delays and setbacks. To support these expectations and study the application of predictive techniques in PPM, two distinct solutions have been developed. The first use case: Next-Event Prediction, covers a specific sequence of steps from a purchasing process and aims to predict whether a purchase request will be rejected during the review stage, following its creation. The second use case: Outcome Prediction, covers a complex multi-step change approval process and aims to make an early prediction on whether the decision will be delayed or not, based on a predefined deadline. In both cases, an early prediction allows users to make the necessary changes to avoid an undesirable outcome. During development, process analyses revealed significant potential for the process perfor mance to be optimised and allowed the definition of Key Performance Indicators (KPIs) to measure the real impact of the use cases once they are deployed to production. When it comes to the implementation, several techniques have been studied, with particular empha sis on the analysis of different representations for the process data (e.g., aggregated vs. sequential), and the performance comparison between ensemble (e.g., eXtreme Gradient Boosting (XGBoost)) and deep learning models (e.g., Long Short Term Memory (LSTM)). In both use cases, XGBoost demonstrated notable performance and outperformed the other models, with F1-Scores ranging from 84% to 87%. In the end, not only have the initial expectations from stakeholders been met, but they have also gained a better understanding of their needs and PPM’s capabilities. By maintaining close communication with end users and stakeholders, addressing their needs and concerns, and building on top of the work from this thesis, PPM will surely be on the right track to realise its potential and thrive in the ever-evolving business landscape, helping organisations adapt to the challenges and opportunities that lie ahead.
- Transfer learning applied to government auditing: A focused approach on financial statements in Maranhão, BrazilPublication . Coelho, Heloisa Guimarães; Marreiros, Maria Goreti CarvalhoSince Brazil’s return to democracy, dozens of laws, decrees and normative instructions have been drafted with the purpose of regulating and improving the mechanisms for controlling and monitoring municipal public resources. These regulations are specifically aimed at the process of accountability by elected officials, who currently rely on the help of accountants responsible for preparing and submitting financial statements to the Courts of Auditors. However, according to data from the TCU (Federal Court of Accounts), in 2023, Maranhão was the Brazilian State with the highest number of rejected accounts. There are several reasons that can lead to these processes being challenged, including incorrect application of resources, flaws in documentation, human errors, among others. In practice, the routine of accountants includes repetitive and mechanical activities that requires considerable time to prepare and review documents, hence often leading to errors in classification and issuing of documentation. In this context, this dissertation investigates the use of Transfer Learning (TL) to improve automation and accuracy in the classification of financial commitment notes, an initial document in the public expenditure cycle, with a specific focus on the context of the state of Maranhão. To this end, BERTimbau, a pre-trained language model for Brazilian Portuguese, was fine-tuned to assist government accountants in reducing classification errors and ensuring compliance with local and national financial regulations. The CRISP-DM methodology, widely used in data science, was adopted to structure the development of the project. The dataset used, consisting of several classifications of commitment notes for the year 2023, was thoroughly analyzed and pre-processed. For the fine-tuning process of the model, two samples with a similar number of data were selected, varying only the number of possible classifications, due to the high degree of imbalance between the classes. Even in a multiclass context with datasets with a reduced number of classes, the results obtained indicate that the BERTimbau model presents strong performance in the classification task, achieving 98% accuracy with an error rate of 0.10 in the test set, highlighting the effectiveness of BERTimbau in public financial auditing applications. These results highlight the effectiveness of BERTimbau for public financial auditing applications. It is therefore concluded that TL models have great potential to optimize and improve financial auditing processes, with positive implications for wider adoption in Brazil.
- Visão computacional para deteção de hábitos alimentaresPublication . Antelo, Ana Catarina Lopes; Marreiros, Maria Goreti CarvalhoO excesso de peso e a obesidade são fatores comportamentais que têm vindo a causar um aumento substancial de mortes em Portugal. Estes fatores podem trazer complicações musculoesqueléticas, efeitos metabólicos como diabetes, riscos cardiovasculares, efeitos sobre a saúde mental e o aparecimento e/ou agravamento de cancro. Seguir uma dieta saudável é importante não apenas para controlar os níveis de açúcar, mas também o perfil lipídico, a tensão arterial, minimizando assim o risco cardiovascular e de complicações microvasculares. Torna-se, portanto, crucial a implementação de soluções que orientem os utilizadores a optar por opções alimentares mais benéficas à sua saúde, para que os indivíduos previnam o aparecimento de outras doenças ou exacerbações das doenças que já possam possuir. Estas soluções podem ser manuais como a contagem manual de hidratos de carbono ou digitais como as várias aplicações móveis existentes no mercado que permitem a monitorização de doenças e o controlo nutricional. Atualmente, grande parte da sociedade possui um dispositivo móvel com capacidade de tirar fotografias e cada vez mais os telemóveis são usados como assistentes pessoais, ajudando o ser humano a ser mais eficaz nas suas tarefas diárias. Estes dispositivos representam um recurso computacional versátil, com grande capacidade de deteção e inferência. As técnicas de machine learning aplicadas nas câmaras dos telemóveis permitem a estabilização de imagem, tradução de texto automática, deteção de objetos, reconhecimento de rostos, entre outros. Os próprios sensores dos telemóveis são cada vez mais complexos e podem ser usados para detetar movimentos e padrões, inferir níveis de stress e emocionais do utilizador, reconhecimento de lugares, estimativa de profundidade dos elementos numa fotografia, e assim por diante. Estes sensores possibilitam a extração de dados sem que o utilizador tenha de realizar uma tarefa específica. O objetivo desta tese foi implementar e estudar sistemas inovadores que, através de visão computacional, auxiliem na tarefa de controlo nutricional e que permitam a monitorização de doenças. Neste âmbito, desenvolveuse um sistema de reconhecimento de alimentos utilizando Detectron2 com o modelo PointRend que, com o auxílio de um modelo de Regressão Linear capaz de prever uma estimativa do peso dos alimentos presentes em uma imagem, permitiu que o controlo nutricional se tornasse em uma tarefa simples. A solução proposta nesta dissertação permitirá que o utilizador poupe tempo e esforço, e que realize decisões alimentares mais conscientes. Além disso, esta solução também estará preparada para auxiliar pacientes diabéticos, indicando, por exemplo, as unidades de insulina que deve injetar, tendo em conta a refeição que irá ingerir.