Publication
Comparing time series forecasting models for health indicators: A clustering analysis approach
dc.contributor.advisor | Oliveira, Alexandra | |
dc.contributor.advisor | Faria, Brígida Mónica | |
dc.contributor.advisor | Pimenta, Rui | |
dc.contributor.author | Cruz, Cláudia Beatriz Silva | |
dc.date.accessioned | 2025-02-13T10:48:28Z | |
dc.date.available | 2025-02-13T10:48:28Z | |
dc.date.issued | 2024-11-22 | |
dc.date.submitted | 2024-11-22 | |
dc.description.abstract | Time series can be defined as the sequence of observations ordered by equal time intervals, thus being fundamental to address questions of causality, trends, and forecast. Temporal data and its analysis can be applied to several áreas, such as engineering, finance, and health. With the constant study of time series, several problems arise, one of wich is at the level of clustering, wich aims to identify similarities between the series. This aspect is particularly relevent when time series are modeled by Autoregressive Integrated Moving Average (ARIMA) models, which makes understanding their parameters essential for their analysis. One of the main applications of time series in public health and biomedicine has been in epidemiological studies of infectious and chronic diseases, studies on the prediction of demand for health services, and studies on the assessment of health outcomes through data on mortality and morbidity. These indicators are direct measures of health care needs, reflecting the global burden of disease in the population, and are therefore crucial for the study and surveillance of public health, and for the preocesses of organization and intevention of health services. The sum of mortality and morbidity is referred to as “Burden of Disease” and can be measured by a metric called “Disability Adjusted Life Year” (DALYs). The analysis of this type of data is essential to identify geographic patterns, which allows a better perception of health disparities in the population. The main objectives for this dissertation are to model health indicators through Moving average (MA), Autoregressive Moving Average (ARMA) or Autoregressive Integrated Moving Average processes; evaluate the quality of fito f the models to the data; and compare the distances between processes regarding their effectiveness in identifying natural groups. The study begins by exploring the temporal characteristics of DALYs of five non-communicable diseases (cardiovascular diseases, chronic respiratory diseases, neurological disorders, chronic kidney diseases, and diabetes), highlighting underlying patterns and trends. Then, using na automated algorithm, Autoregressive Integrated Moving Average models are applied to represent and describe the time series. The fito f the model was assessed with forecast accuracy metrics, such as Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), Mean Squared Error (MSE), and Mean Absolute Percentage Error (MAPE). It is on this representation of time series that the Piccolo, the Maharaj, and the LPC distance measures were applied to use clustering techniques and identify clusters. Six diferente hierarchial clustering methods were used, the Ward, the Complete, the Avearge, the Single, the MEdian, and the Centroid linkage. Additionally, the performance of the clustering algorithm was weighed through evaluation metrics, such as the Silhouette scire, CIndex, McClain Index, and Dunn Index. The resulto n non-communicable diseases DALYs data specific to 48 European countries, show that the choice of distance measure greatly influences ckustering outcomes, and the number of clusters formed. While certain methods revealed geographic patterns, other factos, such as, cultural or economic similarities can also influence cluster formation. Furthermore, some countries were frequently isolated in their own cluster across clustering methods and distance measures, suggesting that their Autoregressive Integrated Moving Average model was signifcantly diferente from the rest. For exemple, Latvia, which formed isolated lusters in cardiovascular diseases. Other countries, such as Albania, Belarus, Lithuania, and Swedenwere grouped into the same cluster across various clustering methods when the Piccolo distance was applied to neurological disorders. For chronic respiratory diseases, 15 clusters were formed with the LPC distance, between 8 and 15 clusters with the Piccolo distance, and between 9 and 15 clusters with the Mahara distance. These insights, not only contribute to advancing the field of public health surveillance and intervention, ultimately aiming to alleviate the global burden if disease, but also contribute to our understanding of clustering Autoregressive Integrated Moving Average models and how the use of diferente distance measures influence clusters outcomes. | por |
dc.identifier.tid | 203852397 | |
dc.identifier.uri | http://hdl.handle.net/10400.22/29497 | |
dc.language.iso | eng | |
dc.rights.uri | N/A | |
dc.subject | Distance measures | |
dc.subject | Piccolo | |
dc.subject | LPC | |
dc.subject | Maharaj | |
dc.subject | Clustering | |
dc.subject | DALYs | |
dc.subject | ARIMA models | |
dc.title | Comparing time series forecasting models for health indicators: A clustering analysis approach | por |
dc.type | master thesis | |
dspace.entity.type | Publication |
Files
Original bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- Dissertação_MBBAS__CláudiaVinhal_V.Final.pdf
- Size:
- 20.98 MB
- Format:
- Adobe Portable Document Format
License bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- license.txt
- Size:
- 4.03 KB
- Format:
- Item-specific license agreed upon to submission
- Description: