Browsing by Author "Andrade, Paulo Jorge Ricardo"
Now showing 1 - 1 of 1
Results Per Page
Sort Options
- Engenharia de ResiliênciaPublication . Andrade, Paulo Jorge Ricardo; Azevedo, Isabel de Fátima SilvaThis thesis presents a study of a new discipline called Chaos Engineering and its approaches, that help to verify the correct behavior of a system and to discover new information about it, through chaos experiments like the shutdown of a machine or the simulation of latency in the network connections between applications. The case study was carried out at the company Mindera, to verify and improve the resilience to failures of a client’s project. Initially the chaos maturity of the project within the Chaos Maturity Model wasin the first levels and it was necessary to increase its sophistication and adoption by conducting experiments to test and improve the resilience. The cloud environment that the project uses, and the architecture is explained to contextualize the components that the experiments will use and test. Different alternatives to test disaster recovery plans are compared as well as the differences between the use of a test environment and the production environment. The value of carrying out experiments for the client project is described, as well as the identification of their value proposal. In the end, the analysis of the different chaos tools is performed using the TOPSIS method. The four performed experiments test the system's resilience to failure of a database’s primary node, the impact of latency in the network connections between different components, the system's reaction to the exhaustion of physical resources of a machine and finally the global test of a system's resiliency in the face of a server failure. After the execution, the experiences were evaluated by company experts. In the end, the conclusions about the work developed are presented. The experiments carried out were classified as important for the project. A problem was found after in the latency introduction experiment and after changing the application’s code, the system reaction was positive, and the number of responses was increased.