| Name: | Description: | Size: | Format: | |
|---|---|---|---|---|
| 5.99 MB | Adobe PDF |
Authors
Advisor(s)
Abstract(s)
Devido Ć crescente necessidade de integrar dados transacionais dispersos em sistemas
modulares surgem desafios relacionados à governança de dados e à disponibilidade de dados
em tempo real para anÔlise e relatórios. Este trabalho aborda a implementação de uma
arquitetura DataMesh no sistema Sifox, que estĆ” a ser reescrito seguindo os princĆpios do
Domain-Driven Design (DDD). O projeto tem como objetivo consolidar e relacionar dados de
módulos transacionais (OLTP) numa camada analĆtica (OLAP) utilizando mecanismos de Change
Data Capture (CDC), permitindo uma integração near real time. A arquitetura DataMesh
promove a criação de Data Products reutilizĆ”veis e acessĆveis, descentralizando a governanƧa
de dados e facilitando o consumo ad hoc atravƩs de ferramentas como o Power BI e APIs.
Adicionalmente, o projeto explora o uso de Data Products para anƔlises preditivas utilizando
Jupyter Notebooks. Este estudo tambƩm define diretrizes de governanƧa e explora os
benefĆcios e desafios da adoção do DataMesh, comparando-o com abordagens tradicionais de
gestão de dados, como Data Warehouses e Data Lakes.
The demand for integrating transactional data from modular systems is growing, bringing significant challenges in data governance and ensuring real-time availability for analytics and reporting. This thesis explores the implementation of a Data Mesh architecture in the Sifox system, a solution undergoing a rewrite based on Domain-Driven Design (DDD) principles. The primary objective is to consolidate and relate data from transactional modules (OLTP) into an analytical layer (OLAP) using Change Data Capture (CDC) mechanisms. This enables near realtime integration while maintaining the modularity and autonomy of the system's components. By adopting the Data Mesh paradigm, the project introduces reusable and accessible Data Products, decentralizing governance and enabling ad hoc data consumption through tools like Power BI and APIs. A key focus of this research is on using Data Products for predictive analytics, leveraging advanced machine learning techniques such as Federated Learning (FL). FL methodologies, including Horizontal, Vertical, and Split Learning, are explored for training models across decentralized domains. These approaches prioritize privacy by keeping raw data localized, facilitating tasks like fraud detection and personalized recommendations while addressing challenges of data heterogeneity across domains. Split Learning is particularly emphasized for its ability to balance data privacy and computational efficiency. The thesis also evaluates the core principles of Data Mesh: Data as a Product, Domain Ownership, Federated Governance, and Self-Serve Data Platform. These principles are compared with traditional centralized architectures like Data Warehouses and Data Lakes, highlighting differences in scalability, interoperability, and governance. The research further investigates CDC strategies for synchronizing OLTP and OLAP systems, emphasizing the role of modular input/output ports, Service Level Objectives (SLOs), and automated data contracts to enhance data connectivity and reusability within the mesh. Despite its advantages, the adoption of Data Mesh is not without challenges. Predictive analytics in this decentralized setup can be limited by the complexity of coordinating data from independent domains, ensuring consistency, and maintaining compliance with global governance policies. This work presents a detailed analysis of these limitations while proposing strategies to overcome them through robust infrastructure and policy enforcement. By bridging theoretical insights with practical implementation guidelines, this research aims to provide a roadmap for organizations seeking to adopt Data Mesh architectures, addressing both immediate integration needs and long-term scalability.
The demand for integrating transactional data from modular systems is growing, bringing significant challenges in data governance and ensuring real-time availability for analytics and reporting. This thesis explores the implementation of a Data Mesh architecture in the Sifox system, a solution undergoing a rewrite based on Domain-Driven Design (DDD) principles. The primary objective is to consolidate and relate data from transactional modules (OLTP) into an analytical layer (OLAP) using Change Data Capture (CDC) mechanisms. This enables near realtime integration while maintaining the modularity and autonomy of the system's components. By adopting the Data Mesh paradigm, the project introduces reusable and accessible Data Products, decentralizing governance and enabling ad hoc data consumption through tools like Power BI and APIs. A key focus of this research is on using Data Products for predictive analytics, leveraging advanced machine learning techniques such as Federated Learning (FL). FL methodologies, including Horizontal, Vertical, and Split Learning, are explored for training models across decentralized domains. These approaches prioritize privacy by keeping raw data localized, facilitating tasks like fraud detection and personalized recommendations while addressing challenges of data heterogeneity across domains. Split Learning is particularly emphasized for its ability to balance data privacy and computational efficiency. The thesis also evaluates the core principles of Data Mesh: Data as a Product, Domain Ownership, Federated Governance, and Self-Serve Data Platform. These principles are compared with traditional centralized architectures like Data Warehouses and Data Lakes, highlighting differences in scalability, interoperability, and governance. The research further investigates CDC strategies for synchronizing OLTP and OLAP systems, emphasizing the role of modular input/output ports, Service Level Objectives (SLOs), and automated data contracts to enhance data connectivity and reusability within the mesh. Despite its advantages, the adoption of Data Mesh is not without challenges. Predictive analytics in this decentralized setup can be limited by the complexity of coordinating data from independent domains, ensuring consistency, and maintaining compliance with global governance policies. This work presents a detailed analysis of these limitations while proposing strategies to overcome them through robust infrastructure and policy enforcement. By bridging theoretical insights with practical implementation guidelines, this research aims to provide a roadmap for organizations seeking to adopt Data Mesh architectures, addressing both immediate integration needs and long-term scalability.
Description
Keywords
DataMesh Change Data Capture (CDC) Data Products Data Science Data Decentralization Data Management
