Browsing by Issue Date, starting with "2025-09-10"
Now showing 1 - 1 of 1
Results Per Page
Sort Options
- Integração de várias fontes de dados para a previsão de florações de algas nocivas cianobactérias usando aprendizagem automáticaPublication . QUERIDO, MARCO ANDRÉ MORGADO; Pereira, Ivo André Soares; Cunha , Bruno Miguel Almeida; Amorim, Ivone de Fátima da Cruz; Barbosa , Hugo Fernando AzevedoThis dissertation addresses the issue of forecasting cyanobacterial harmful algal blooms (cyanoHABs), a source of harm to aquatic ecosystems and human health via toxin production and water quality degradation. Motivated by the limitations of traditional forecast methods based on single-source data, this work explores multi-source integration to further optimize forecast precision. The conceptual model is based on the assumption that a combination of several biogeochemical, physical, and meteorological parameters better characterizes cyanobacteria bloom variability’s multi-faceted drivers. Based on Copernicus Marine Service data, the model combines chlorophyll-a concentration, as a factor for the magnitude of the bloom, with parameters such as sea surface temperature (SST), rising significant wave height (SSWH), nutrient concentration (e.g., phosphate, ammonium), net primary production (nppv), phytoplankton biomass (phyc), and euphotic depth (zeu). The approach used was based on the CRISP-DM methodology. The importance of predicting cyanoHAB was realized, and data understanding and preparation involved collecting, cleaning, and preprocessing multi-source time-series data. Ensemble classifiers (Random Forest, Bagging, XGBoost) were used for chlorophyll-a classification and regression models (Random Forest Regressor, ARIMA, SARIMA, LSTM, GRU, CNN) for forecasting trends of chlorophyll-a in the modeling phase. Performance comparison employed ROC AUC, precision, and recall for classification tasks and R² and RMSE for regression. Results show ensemble classifiers labeled chlorophyll-a with almost perfect accuracy and ROC AUC values close to 1.00, and they noted biogeochemical features nppv, phyc, and zeu as the most predictive. Random Forest Regressor was best for regression on timeseries (R² = 0.594), simulating short-term chlorophyll-a patterns accurately. Though, under oversmoothing or instability with noise in the data, the traditional models (ARIMA, SARIMA) and deep learning models (LSTM, GRU, CNN) were not as good. These findings confirm that multi-source data integration evidently enhances cyanoHAB forecasting and that the use of ensemble machine learning models to make accurate and interpretable predictions is confirmed. The dissertation ends by observing that environmental factors need to be enhanced in prediction models and explainable AI approaches incorporated to build confidence and improve decision-making for water quality management.