Machine Learning powered serverless fraud detection

Costa, Ricardo André FernandesChostak, Christian2021-03-112021-03-1120202020http://hdl.handle.net/10400.22/17423Dissertação de Mestrado em Engenharia InformáticaThere is an increasing concern about fraud in all market sectors. Although there is a great fuzz about fraud and fraud detection, just a small fraction of it was fully incorporated into real world applications. Counterfeited documents are reproductions or imitations of the originals ones. The present work aims to fulfill a gap in fraud analysis by automating and identifying those documents in seconds. Generally speaking, a payload containing a suspect fraudulent document will reach an Application Programming Interface gateway, which will redirect the request to Lambda functions and based on the event store it on SQS - Simple Queue Service, this queue will trigger a fleet of micro-services powered by Lambda functions as well. The non-exhaustive list of functions will proceed to read this queue and in the first moment create the metadata of the received document, registering on a Serverless Relational Database, whilst storing the document itself on S3 - Simple Storage Service. After that, it will call the second batch that will start the process of machine learning on the already saved image. Triggered by the finished process, a message will go to the SNS - Simple Notification Service - alerting the user. The output of the given analysis contains a sample of the input document showing where the fraud is if there is one. With the percentage and area given, the operator will be able to see what portion of the image was considered a fraud and from that moment forward, the user will have technical basis to accept the document or not.Existe uma preocupação crescente sobre fraude em todos os setores da sociedade. Apesar de existir grande alvoroço sobre fraude e detecção de fraude, apenas uma pequena parte dela foi implementada em aplicações reais e ainda sim, em setor relacionados a streaming de mídia. Documentos falsificados são reproduções ou imitações, inteiras ou parciais de seus originais. O presente trabalho tenta preencher uma lacuna na análise de fraudes automatizando, e identificando-a em segundos. Em termos gerais, um payload contendo um documento fraudulento atingirá uma Interface de Programação Aplicacional - API, que então direcionará os pedidos para funções Lambda, e, baseado no evento, armazenará em SQS - Serviço de Queue Simples. Esta queue iniciará o gatilho para uma frota de micro-serviços, também executados em Lambda. A lista não exaustiva de funções prosseguirá e lerá os eventos da queue que nessa fase contém apenas um identificador único do arquivo, bem como uma breve descrição informada pelo utilizador, e, em primeiro momento criará a metadata do documento recebido, registrando-o em uma base de dados relacional, enquanto armazena o próprio documento no S3 - Serviço de Armazenamento Simples. Depois disso, iniciará o segundo lote de processamento sobre a imagem já salva, neste momento começam algoritmos de Machine Learning, bem como, processamento habitual de imagem. Iniciado pelo fim do processo, uma mensagem irá passar pelo SNS - Sistema de notificação simples, alertando o utilizador final. O relatório da análise conterá uma amostra do documento que foi processado indicando onde está a fraude, se existir uma. Com a percentagem e área indicada, o utilizador poderá ver quais porções do documento foram possivelmente alteradas e poderá considerar ou não o documento, afinal, terá base técnica para fazê-lo.engServerlessMachine LearningLambdaFraudMachine Learning powered serverless fraud detectionmaster thesis202636267