Comparing time series forecasting models for health indicators: A clustering analysis approach

Cruz, Cláudia Beatriz Silva

Publicação

Comparing time series forecasting models for health indicators: A clustering analysis approach

2024-11-22Dissertação de mestrado

dc.contributor.advisor	Oliveira, Alexandra
dc.contributor.advisor	Faria, Brígida Mónica
dc.contributor.advisor	Pimenta, Rui
dc.contributor.author	Cruz, Cláudia Beatriz Silva
dc.date.accessioned	2025-02-13T10:48:28Z
dc.date.available	2025-02-13T10:48:28Z
dc.date.issued	2024-11-22
dc.date.submitted	2024-11-22
dc.description.abstract	Time series can be defined as the sequence of observations ordered by equal time intervals, thus being fundamental to address questions of causality, trends, and forecast. Temporal data and its analysis can be applied to several áreas, such as engineering, finance, and health. With the constant study of time series, several problems arise, one of wich is at the level of clustering, wich aims to identify similarities between the series. This aspect is particularly relevent when time series are modeled by Autoregressive Integrated Moving Average (ARIMA) models, which makes understanding their parameters essential for their analysis. One of the main applications of time series in public health and biomedicine has been in epidemiological studies of infectious and chronic diseases, studies on the prediction of demand for health services, and studies on the assessment of health outcomes through data on mortality and morbidity. These indicators are direct measures of health care needs, reflecting the global burden of disease in the population, and are therefore crucial for the study and surveillance of public health, and for the preocesses of organization and intevention of health services. The sum of mortality and morbidity is referred to as “Burden of Disease” and can be measured by a metric called “Disability Adjusted Life Year” (DALYs). The analysis of this type of data is essential to identify geographic patterns, which allows a better perception of health disparities in the population. The main objectives for this dissertation are to model health indicators through Moving average (MA), Autoregressive Moving Average (ARMA) or Autoregressive Integrated Moving Average processes; evaluate the quality of fito f the models to the data; and compare the distances between processes regarding their effectiveness in identifying natural groups. The study begins by exploring the temporal characteristics of DALYs of five non-communicable diseases (cardiovascular diseases, chronic respiratory diseases, neurological disorders, chronic kidney diseases, and diabetes), highlighting underlying patterns and trends. Then, using na automated algorithm, Autoregressive Integrated Moving Average models are applied to represent and describe the time series. The fito f the model was assessed with forecast accuracy metrics, such as Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), Mean Squared Error (MSE), and Mean Absolute Percentage Error (MAPE). It is on this representation of time series that the Piccolo, the Maharaj, and the LPC distance measures were applied to use clustering techniques and identify clusters. Six diferente hierarchial clustering methods were used, the Ward, the Complete, the Avearge, the Single, the MEdian, and the Centroid linkage. Additionally, the performance of the clustering algorithm was weighed through evaluation metrics, such as the Silhouette scire, CIndex, McClain Index, and Dunn Index. The resulto n non-communicable diseases DALYs data specific to 48 European countries, show that the choice of distance measure greatly influences ckustering outcomes, and the number of clusters formed. While certain methods revealed geographic patterns, other factos, such as, cultural or economic similarities can also influence cluster formation. Furthermore, some countries were frequently isolated in their own cluster across clustering methods and distance measures, suggesting that their Autoregressive Integrated Moving Average model was signifcantly diferente from the rest. For exemple, Latvia, which formed isolated lusters in cardiovascular diseases. Other countries, such as Albania, Belarus, Lithuania, and Swedenwere grouped into the same cluster across various clustering methods when the Piccolo distance was applied to neurological disorders. For chronic respiratory diseases, 15 clusters were formed with the LPC distance, between 8 and 15 clusters with the Piccolo distance, and between 9 and 15 clusters with the Mahara distance. These insights, not only contribute to advancing the field of public health surveillance and intervention, ultimately aiming to alleviate the global burden if disease, but also contribute to our understanding of clustering Autoregressive Integrated Moving Average models and how the use of diferente distance measures influence clusters outcomes.	por
dc.identifier.tid	203852397
dc.identifier.uri	http://hdl.handle.net/10400.22/29497
dc.language.iso	eng
dc.rights.uri	N/A
dc.subject	Distance measures
dc.subject	Piccolo
dc.subject	LPC
dc.subject	Maharaj
dc.subject	Clustering
dc.subject	DALYs
dc.subject	ARIMA models
dc.title	Comparing time series forecasting models for health indicators: A clustering analysis approach	por
dc.type	master thesis
dspace.entity.type	Publication

Ficheiros

Principais

A mostrar 1 - 1 de 1

Nome:: Dissertação_MBBAS__CláudiaVinhal_V.Final.pdf
Tamanho:: 20.98 MB
Formato:: Adobe Portable Document Format

Ver/Abrir

Licença

A mostrar 1 - 1 de 1

Nome:: license.txt
Tamanho:: 4.03 KB
Formato:: Item-specific license agreed upon to submission
Descrição:

Ver/Abrir

Coleções

ESS - DM - Bioestatística e Bioinformática Aplicadas à Saúde