Deployment of ML Mechanisms for Cybersecurity in Resource-Constrained Embedded Systems

Vicente, Pedro Miguel Casal

http://hdl.handle.net/10400.22/24060

Use this identifier to reference this record.

Name:	Description:	Size:	Format:
Tese_4972_v3.pdf		4.15 MB	Adobe PDF	Download

Send Feedback

Authors

Vicente, Pedro Miguel Casal

Advisor(s)

Santos, Pedro Miguel Salgueiro dos

Abstract(s)

The increase of low security devices in the Internet is being exploited by hackers to compro mise data or use to use them as external agents to perform further attacks. As so, it is of crucial importance that networks posses a system that correctly assess the nature of incom ing and outgoing packets to protect the local network and the overall Internet connected systems. To achieve this, Machine Learning is being broadly used due to his early success. Nevertheless, these mechanisms are better inserted at the entry point of local networks, an embedded system which has limited resources to train machine learning models and/or to perform inference tasks. Since Cybersecurity is a real-time problem, the embedded systems should perform these activities in a very restricted time interval. The time required to clas sify the packets depends on the overall system load, machine learning models complexity and desired accuracy. This thesis aims to assess the current support for ML in embedded systems, either through the interoperability of models or through their development in low level languages, and the relationship between the time required by different embedded sys tems, the different tools and models. This thesis explored one transpilation tool, m2cgen, two interoperability formats, PMML and Open Neural Network Exchange (ONNX) and one real time environment, ONNXRuntime, to deploy an already trained model in a device with limited resources. Results demonstrate that ONNXRuntime was the only machine learn ing tool with a perfect match regarding samples prediction’s classification from the original models. An analysis on the time required to execute this task revealed that ONNXRun time is faster than Scikit-Learn with the Isolation Forest (ISO), One Class Support Vector Machine (OCSVM) and Stochastic Gradient Descent One Class Support Vector Machine (SGDOCSVM) models and slower with the Local Outlier Factor (LOF) model.

O aumento do número de dispositivos ligados à Internet com fracos níveis de segurança está a ser explorado por piratas informáticos para comprometer dados ou para os utilizar como agentes externos para realizar novos ataques. Como tal, é de importância crucial que as redes possuam um sistema que avalie corretamente a natureza dos pacotes que chegam e que saem para proteger a rede local e os sistemas conectados à Internet em geral. Para conseguir isso, Machine Learning é uma tecnologia que está a ser amplamente usada devido ao seu sucesso inicial. No entanto, estes mecanismos conseguem proteger melhor a rede local se forem inseridos no seu ponto de entrada, um dispositivo embebido que possui recursos limitados para treinar modelos Machine Learning e/ou executar tarefas de inferência. Como a Cibersegurança é um problema em tempo real, os sistemas embebidos teem realizar essas atividades num intervalo de tempo muito restrito. O tempo necessário para classificar os pacotes depende da carga do sistema, da complexidade dos modelos e da precisão desejada. Esta tese tem como objetivo avaliar o suporte atual desta tecnologia em sistemas embebidos, seja através da interoperabilidade de modelos ou através do seu desenvolvimento em linguagens de baixo nível, e a relação entre o tempo exigido por diferentes sistemas embebidos, diferentes ferramentas e modelos. Esta tese explorou uma ferramenta de transpilação, m2cgen, dois formatos de interoperabilidade, PMML e ONNX e um ambiente em tempo real, ONNXRuntime, para implementar um modelo já treinado num dispositivo com recursos limitados. Os resultados demonstram que ONNXRuntime foi a única ferramenta de Machine Learning com uma correspondência perfeita em relação à classificação das amostras dos modelos originais. Uma análise do tempo necessário para executar esta tarefa revelou que ONNXRuntime é mais rápido do que Scikit-Learn com os modelos ISO, OCSVM e SGDOCSVM e mais lento com o modelo LOF.