Browsing by Author "CARDOSO, FRANCISCO FONSECA FERREIRA"
Now showing 1 - 1 of 1
Results Per Page
Sort Options
- Adversarial agent for synthetic data generation for phishing detectionPublication . CARDOSO, FRANCISCO FONSECA FERREIRA; Pereira, Isabel Cecília Correia da Silva Praça Gomes; Maia, Eva Catarina GomesPhishing attacks continue to be a significant security challenge, causing financial and reputational damage to organizations and individuals, with emails being the primary way for these attacks. While modern defenses continue to rely on phishing detection systems, their effectiveness is being challenged by the evolution of these attacks. Attackers are moving from generic emails to highly personalised and context-specific messages, which conventional models struggle to detect. The performance of these systems is mostly limited by the scarcity of specialised, domain-specific training data needed to recognise such threats. This thesis tries to address this gap by introducing CANDACE, a modular framework designed to generate context-aware synthetic email messages to train and improve these detection systems. The main innovation of CANDACE comes from its dual Knowledge Graph (KG) architecture, which gives the generation process a contextual foundation. The first KG maps external, real-world information about an organization, while the second models its internal structure, such as employees and projects. A Small Language Model (SLM) then uses the information of these KGs, with other important components, such as URL, to generate an email message that is contextually relevant to the domain of the organization. The contributions of this work include the complete design, end-to-end implementation, and validation of the CANDACE pipeline. A case study in the Public Administration sector presents the framework’s ability to produce convincing, context-aware synthetic messages. The findings confirm that contextual grounding is essential for creating better and more focused training data. This research shows the need to move beyond generic emails datasets, to build more resilient detection systems capable of detecting the more sophisticated and personalised phishing attacks.
