Repository logo
 

ESS - DM - Bioestatística e Bioinformática Aplicadas à Saúde

Permanent URI for this collection

Browse

Recent Submissions

Now showing 1 - 10 of 23
  • Predicting treatment response in exudative age-related macular degeneration through OCT biomarkers
    Publication . Sousa, Vânia Guimarães de; Carneiro, Ângela Maria; Faria, Brígida Mónica
    Age-related Macular Degeneration (AMD) is a significant cause of vision loss, particularly in its exudative form, where abnormal blood vessel growth and fluid buildup in the retina occur. Anti-Vascular Endothelial Growth Factor (anti-VEGF) intravitreal injections have improved outcomes for exudative AMD, though patient responses vary, and the treatment burden is considerable due to frequent injections. This study aimed to identify Optical Coherence Tomography (OCT) biomarkers and clinical factors that predict treatment response in exudative AMD, analyzing data over three years. By applying statistical and machine learning methods, particularly supervised learning models like decision trees, biomarkers that significantly influenced outcomes were identified, such as choroidal thickness, neovascular membrane type, and fluid localization, among others. The decision tree model demonstrated good predictive accuracy (71.7%) and precision (75.8%). The findings suggest that OCT biomarkers can be instrumental in guiding personalized treatment strategies and optimizing anti-VEGF therapy to enhance patient outcomes while reducing the frequency of injections. This approach helps identify patients less likely to respond to standard treatments, facilitating more individualized care that improves clinical outcomes and quality of life for those with exudative AMD.
  • Life cycle assessment using machine learning
    Publication . Gomes, Sofia Carolina Moura; Faria, Brígida Mónica; Oliveira, Alexandra Alves; Pinto, Edgar
    Life Cycle Assessment (LCA) is a scientific methodology that allows for assessing the impacto f a producto or servisse on the environment, throughout its life ccycle. It includes defining objectives and contexto, inventory, impact assessment, and interpretation phases. Artificial Itelligence (AI) refers to computer systems capable of performing tasks that typically require human intelligence. Machine Learning (ML) is na área of AI that envolves the development of algorithms capable of learning from data and making predictions or decisions based on data. LCA and ML have been combined o overcome LCA’s complexity at various stages and for different purposes, namely, to develop surrogate LCA tools. This study focuses on the application of ML in the Life Cycle Inventory (LCI) phase to find pollutant emissions generated into the environment to complete the LCI phase of the LCA. The presente work seeks to answer the following question: “Can Machine Learning techniques be applied to predict outcome variables of the LCI phase of LCA?”. These variables include all the inouts and outputs throughout the life cycle of a producto. The database used in this work comprises 865 observations containing agricultural input variables (e.g. chemical fertilizer, pesticides, huma labor, diesel fuel) and production output (yield and environmental emissions). The data was collected from literature and refers to kiwi, watermelon, citrus, tea, and hazelnut crops in Guilan province in northern Iran. Na expert in the field validated the estimation of pollutant emissions, calculated using Agri-footprint 4.0 and the updated version Agri-footprint 6. Additional key methodologies, standards and reports were also cponsulted for this research. Th Decision Tress and Neural Network models developed were able to estimate the pollutant emissions generated into the environment throughout the production process. The results of the Absolute Normalized Error for the Decision Tree, Neural Network1 and Neural Network2 were 1124.79, 0.07 and 0.14 respectively. The Friedman test, with p-value˂ 0.001, less than α=0.05, reveals statistically significant diferences in the Absolute Normalized Error values in at least one of the models. The Wilcoxon tes (p-value˂0.001)indicates significant diferences between all the models.
  • In-silico prediction of the complete ataxin-3 protein network relevant for Spinocerebellar Ataxia type 3 (SCA3)
    Publication . Batista, Paulo Jorge Canedo; Vieira, Cristina; Faria, Brígida Mónica; Vieira, Jorge
    Spinocerebellar Ataxia Type 3 (SCA3), also known as Machado-Joseph disease, is a neurodegenerative disorder caused by an expanded (exp) polyglutamine (polyQ) tract in ataxin-3. This study aims to characterize the ataxin-3 network, using identified interactors in main databases, as well as interactors from interlogs, cell models of the disease, and/or other animal models. Furthermore, in-silico analyses, using different 3D protein structure predictions, were performed to identify interacting regions (IR), that have been used to confirm the predictions. These proteins were divided into two groups, those that bind more at JD region and those that bind more at C-terminal region. Five IR have been identified in the first group, and one for the second group, respectively. The research focused on 355 proteins described in main databases, and for 323 evidence supports them as true ataxin-3 interactors. Furthermore, 42 new proteins are also predicted as new ataxin-3 interactors. Moreover, 60 ataxin-3 interactors that behave differently in the presence of an exp polyQ region have been identified, and these may be key SCA3 factors. In conclusion, these findings contribute to a deeper understanding of SCA3 pathogenesis and offer potential targets for future experimental studies.
  • Fatores associados ao impacto da rinite alérgica na produtividade laboral
    Publication . Ferreira, Laura de Melo; Pinto, Bernardo Sousa; Alves, Sandra Maria Ferreira; Amaral, Rita
    A rinite alérgica é uma condição de saúde prevalente que afeta tanto a produtividade quanto o bem-estar no ambiente de trabalho. Este estudo tem como objetivo investigar a associação entre os sintomas da renite alérgica e o impacto laboral e identificar os fatores que contribuem para um maior ou menor impacto laboral. Foi realizado um estudo observacional, analisando dados de 260378 observações de 20724 utilizadores únicos da aplicação móvel Mask-air, em 30 países, registados de maio de 2015 a dezembro de 2023. O coeficiente de correlação de Spearman foi calculado para avaliar a correlação entre a gravidade dos sintomas de rinite e o impacto no trabalho. O modelo de regressão linear de efeitos mistos foi realizado para identificar os fatores que têm impacto no trabalho. A correlação de Spearman revelou uma alta correlação positiva entre a gravidade dos sintomas alérgicos globais (rs =0.74, Ƿ˂0.001), sintomas nasais (rs =0.70, Ƿ˂0.001), oculares (rs =0.67, Ƿ˂0.001) e o impacto no trabalho; e correlação positiva moderada entre os sintomas de asma (rs =0.45, Ƿ˂0.001) e o impacto no trabalho. O modelo de regressão linear de efeitos mistos identificou que fatores como idade (bidade=-0.07, Ƿ˂0.001), o sexo masculino (bsexo masculino=-2.71, Ƿ˂0.001), uso de imunoterapia (bimunoterapia=-3.67, Ƿ˂0.01) e o controlo dos sintomas de rinite (bcontrolo sintomas da rinite=-2.35, Ƿ˂0.001) estão associados a uma diminuição do impacto no trabalho. Em contrapartida, a presença de asma (basma2.47, Ƿ˂0.01), o uso de medicação única (bmedicação única=5.58, Ƿ˂0.001) e co-medicação (b=6.02, Ƿ˂0.001), comparativamente com quem não realiza medicação tem impacto aumentado no trabalho. O estudo destaca a utilidade dos dados recolhidos por aplicações como a MASK-air, permitindo a monitorização contínua dos sintomas e tratamento.
  • Bioestatística em contexto empresarial
    Publication . Coelho, Heitor Rafael Teixeira; Alves, Sandra; Albuquerque, João
    A bioestatística desempenha um papel crucial em ensaios clínicos, assegurando a integridade científica e a fiabilidade dos dados desde o planeamento até à comunicação dos resultados. O presente documento retrata as atividades e o conhecimento adquiridos após a integração no departamento de Programação de Dados Clínicos e Estatística da BlueClinical an Astrum Company. Para ilustrar as atividades desenvolvidas, incluindo a elaboração de planos e listas de randomização, a programação de tabelas e a análise descritiva dos conjuntos de dados, apresenta-se a um estudo paralelo, prospetivo e randomizado, que avalia a segurança e eficácia de um fármaco para enxaqueca crónica, com dados simulados. Simularam-se 100 indivíduos, dos quais 60 foram randomizados para receber o medicamento ou placebo. Quanto à eficácia, o grupo tratado com o fármaco apresentou uma redução significativa no número de enxaquecas em comparação ao placebo. No entanto, não houve melhorias significativas na qualidade de vida ou na satisfação dos participantes. Quanto à segurança, o fármaco foi considerado seguro, sem um aumento relevante de eventos adversos em relação ao placebo. Estas atividades permitiram o desenvolvimento de competências técnicas como a programação em R e o aprofundamento da análise estatística aplicada.
  • Comparing time series forecasting models for health indicators: A clustering analysis approach
    Publication . Cruz, Cláudia Beatriz Silva; Oliveira, Alexandra; Faria, Brígida Mónica; Pimenta, Rui
    Time series can be defined as the sequence of observations ordered by equal time intervals, thus being fundamental to address questions of causality, trends, and forecast. Temporal data and its analysis can be applied to several áreas, such as engineering, finance, and health. With the constant study of time series, several problems arise, one of wich is at the level of clustering, wich aims to identify similarities between the series. This aspect is particularly relevent when time series are modeled by Autoregressive Integrated Moving Average (ARIMA) models, which makes understanding their parameters essential for their analysis. One of the main applications of time series in public health and biomedicine has been in epidemiological studies of infectious and chronic diseases, studies on the prediction of demand for health services, and studies on the assessment of health outcomes through data on mortality and morbidity. These indicators are direct measures of health care needs, reflecting the global burden of disease in the population, and are therefore crucial for the study and surveillance of public health, and for the preocesses of organization and intevention of health services. The sum of mortality and morbidity is referred to as “Burden of Disease” and can be measured by a metric called “Disability Adjusted Life Year” (DALYs). The analysis of this type of data is essential to identify geographic patterns, which allows a better perception of health disparities in the population. The main objectives for this dissertation are to model health indicators through Moving average (MA), Autoregressive Moving Average (ARMA) or Autoregressive Integrated Moving Average processes; evaluate the quality of fito f the models to the data; and compare the distances between processes regarding their effectiveness in identifying natural groups. The study begins by exploring the temporal characteristics of DALYs of five non-communicable diseases (cardiovascular diseases, chronic respiratory diseases, neurological disorders, chronic kidney diseases, and diabetes), highlighting underlying patterns and trends. Then, using na automated algorithm, Autoregressive Integrated Moving Average models are applied to represent and describe the time series. The fito f the model was assessed with forecast accuracy metrics, such as Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), Mean Squared Error (MSE), and Mean Absolute Percentage Error (MAPE). It is on this representation of time series that the Piccolo, the Maharaj, and the LPC distance measures were applied to use clustering techniques and identify clusters. Six diferente hierarchial clustering methods were used, the Ward, the Complete, the Avearge, the Single, the MEdian, and the Centroid linkage. Additionally, the performance of the clustering algorithm was weighed through evaluation metrics, such as the Silhouette scire, CIndex, McClain Index, and Dunn Index. The resulto n non-communicable diseases DALYs data specific to 48 European countries, show that the choice of distance measure greatly influences ckustering outcomes, and the number of clusters formed. While certain methods revealed geographic patterns, other factos, such as, cultural or economic similarities can also influence cluster formation. Furthermore, some countries were frequently isolated in their own cluster across clustering methods and distance measures, suggesting that their Autoregressive Integrated Moving Average model was signifcantly diferente from the rest. For exemple, Latvia, which formed isolated lusters in cardiovascular diseases. Other countries, such as Albania, Belarus, Lithuania, and Swedenwere grouped into the same cluster across various clustering methods when the Piccolo distance was applied to neurological disorders. For chronic respiratory diseases, 15 clusters were formed with the LPC distance, between 8 and 15 clusters with the Piccolo distance, and between 9 and 15 clusters with the Mahara distance. These insights, not only contribute to advancing the field of public health surveillance and intervention, ultimately aiming to alleviate the global burden if disease, but also contribute to our understanding of clustering Autoregressive Integrated Moving Average models and how the use of diferente distance measures influence clusters outcomes.
  • Contributions for the validation of the portuguese version of the vascular quallity of Life-6 questionnaire in peripheral artery disease patients
    Publication . Oliveira, Rafaela Monteiro; Silva, Ivone; Pedras, Susana; Pimenta, Rui
    Peripheral Arterial Disease (PAD) is an occlusive atherosclerotic disease that affects ˃230 million people worldwide. The most common symptom is intermitente claudication (IC) that leads to lower quality of life (QoL). Thus, this study aimed to contribute to the validation of the VascuQol-6 questionnaire for the portuguese popultion to obtain a quick, sensitive, and easy-to-use way to assess QoL in PAD. The VascuQol-6 was adapted and translated into European Portuguese. 115 patients were included with a mean age of 65 years and with PAD with IC stable for more than 3 months. Reliability, construct validity analysis through convergente and discriminant validity, known-group validity, and respossiveness analysis were tested. The Average Variance Extracted for the latent construct was 0.40 and the Composite Reliability was 0.79, indicating strong internal consistency. VascuQol-6 was positively associated with SF-36 Physical Component Summary and Mental Component Summary scores (r=.64, p˂.01 and r =.42, p˂.01, respectively). In turn, there was no significant correlation between VascuQol-6 scores and the PADKP or IPAQ. A statistically significant difference between groups according to IC severity (F(2,47)=8.35, p˂0.001) was found. A paired samples t-test showed diferences between VascuQol-6 scores before a walking program (M=15.65, SD=3.09), and after a walking program (M=17.41, SD=2.71), t(67)=3.94, p=˂.001. The VascuQol-6 is a 6-item instrument to assess the QoL associated with PAD with good psychometric properties, convergente and discriminant validity with SF-36, PADKQ and IPAQ. The instrument proved to have known group validity and responsiveness.
  • Application of machine learning techniques for a recommendation system in pharmacy
    Publication . Torres, Beatriz Freitas; Oliveira, Alexandra Alves; Faria, Brígida Mónica; Alves, Sandra Maria Ferreira
    Community Pharmacy (CP) plays a crucial role in the population, improving patients’ quality of life and minimising medication risks. In Portugal, CPs dispense prescription and non-prescription products. Pharmacy professionals have an added responsibility when advising non-prescription products and should pay attention to self-medication and possible interactions. Therefore, a product recommendation system that incorporates relevant information about the products supports a more informed recommendation by the professional. Although there are a few studies in the area of medication RS, they are still scarce, and to the best of our knowledge, no medication RS is applied in community pharmacies in Portugal. This work aims to develop a conceptual pharmaceutical product recommendation framework and identify relevant groups of products according to their characteristics and experts’ opinions. The specific objectives consist of describing recommendation systems in pharmacy, defining and comparing distance functions capable of creating groups of similar and clinically relevant products for pharmaceutical counselling, applying machine learning techniques and comparing them, and communicating the results. For this purpose, the background of pharmaceutical products counselling without a prescription was analysed. Public databases were selected to be included in the conceptual framework, and the data obtained was processed. Therefore, a database was obtained with 1426 products (over-the-counter medication, homoeopathic medication, and dermocosmetics) and their clinical and scientific information. In order to identify relevant groups of products, seven hierarchical (single linkage, complete linkage, average linkage, median linkage, centroid linkage, and ward linkage) and non-hierarchical (K-means) clustering techniques were applied and evaluated. Dendrograms, the Calinski-Harabasz score, silhouette score, Davies-Bouldin score and the inflexion point method were used to determine the ideal number of clusters for each technique and evaluate its validity. An experts consultation was performed to define a distance function aligned with pharmaceutical counselling. This consultation allowed the identification of the importance of the variables in the distance function definition. The resultant data was analysed in Microsoft Excel, SPSS and Python with the libraries Pandas, Natural Language Toolkit (NLTK), Unidecode, Plotly, Matplotlib, NumPy, SciPy, and Scikit-learn, using Spyder IDE. Twenty-two groups of similar products were formed with K-means, the most effective clustering approach for forming pharmacologically homogeneous groups. However, the obtained clusters did not present enough clinical relevance to support professionals during counselling. Consequently, a new distance function was defined, enhancing the importance of the pharmacotherapeutic group of the products and aligned with the results obtained in the experts’ consultation. Twenty-four groups of similar products were formed with K-means, which was once again the technique that presented pharmacologically homogeneous groups, based mainly on safe use during pregnancy and breastfeeding and pharmacotherapeutic group. The remaining clustering techniques, non-hierarchical techniques, did not present pharmacologically homogeneous groups with any of the distance functions.
  • In silico dessection of the immunomodulatory effects of cholesterol on colorectal cancer
    Publication . Machado, Ana Luísa Marinho da Cunha; Fernandes, Verónica; Velho, Sérgia; Antunes, Luís
    Cholesterol plays a pivotal role in the progression of tumors, serving as a crucial component for cell membrane formation and the generation of specific proteins and enzymes that stimulate the growth and dissemination of tumor cells. Additionally, cholesterol levels within the tumor microenvironment exert influence over immune responses by hindering the activity of vital components like T-cells and NK-cells, which are indispensable for effective anti-cancer immunity. The primary objective of this research is to investigate whether it is possible to categorize colon cancer tumors based on disparities in cholesterol-related characteristics and whether these groupings correlate with distinct immune profiles. The Cancer Genome Atlas (TCGA) project is an open-access catalog aiming to comprehensively understand the genomic alterations responsible for various cancer types, by encompassing a vast array of molecular data from thousands of patient samples. One of the pivotal advantages of utilizing TCGA data lies in its sheer scale and diversity. By integrating genomic, transcriptomic, proteomic, and clinical data from a multitude of patients, researchers can identify patterns, mutations, and biomarkers associated with specific cancers. Taking advantage of this catalog, we selected TCGA RNA-seq dataset from patients with colorectal cancer (480 tumor colon samples and 167 tumor rectum samples). Firstly, we used the Gene Set Enrichment Analysis (GSEA) tool, a powerful tool employed in bioinformatics and computational biology, to determine the sets of genes and pathways that showed statistically significance. Upon comparing these samples with their corresponding normal adjacent tissues, notable disparities in lipid metabolism were discerned. While cholesterol-related pathways did not rank as the top differentially regulated pathways, we exclusively observed an upregulation of lipid-related pathways in normal adjacent tissue in comparison to tumor tissue within the colon samples. Subsequently, we conducted in-depth analyses to determine whether colon tumors can be stratified based on differences in cholesterol metabolism and whether these variations correlate with disparities in the tumor microenvironment.By using the ssGSEA scores of the pathways related to cholesterol metabolism we employed the k-means method to cluster the samples. Remarkably, colon tumor samples naturally segregated into two distinct groups: one characterized by low expression of cholesterol-related genes and the other exhibiting increased expression. Notably, these groupings exhibited disparities in colon sample staging and the prevalence of molecular subtypes within each category. The group displaying enhanced cholesterol metabolism showcased reduced prolifiv eration, underscoring the significance of tumor microenvironment remodeling. Among the top enriched pathways, were pathways associated with modified antigen presentation, cytotoxic immune responses, and remodeling of the extracellular matrix. These observations were consistent with increased infiltration of immune cells driven by the activation of cholesterol metabolism. However, despite the higher quantity of these immune cells, their activation levels were lower in tumors characterized by upregulated cholesterol metabolism. Comparison of signaling pathways between these groups revealed significant differences in pathways linked to tumor malignancy. In summary, these findings underscore the role of cholesterol metabolism alterations in driving substantial adaptations within the tumor microenvironment. Stratifying colon tumors based on cholesterol metabolism presents a promising avenue, potentially benefiting patients through immunotherapy and cholesterol modulation as adjuvant therapy.
  • Automatic FoodEx2 classification system for food description
    Publication . Fonseca, João Emanuel Sousa; Faria, Brígida Mónica; Reis, Luís Paulo; Pimenta, Rui
    Food is an impacting factor in human health. Food security protects the consumers by offering a safety net from which they can trust the quality of the product. In Europe, entities such as the European Food and Safety Authority (EFSA) are risk assessors. They provide information used to shape laws around food security. To collect data regarding food safety the EFSA developed a comprehensive food classification and description system, called FoodEx2. The FoodEx2 coding system uses manual process to map food descriptions to FoodEx2 codes. The motivation for this work comes from the reduced time that could be obtained by using an algorithm to automate the code generation. It is already known that the application of Knowledge Discovery in Databases is a fundamental area to automatically produce patterns from large quantities of data. The main objective of this project is to explore automatic approaches to classify food descriptions with FoodEx2 codes. In this work several classic classifiers are compared in the prediction of FoodEx2 base codes, a multiclass classification task. The performances were explored in distinct datasets along with different levels of text preprocessing using the metrics exact match ratio and the f1-score and document representation Bag-Of-Words with TF IDF weighting. All the datasets contain imbalanced data distributions. The documents are composed of short texts describing ingredients, dishes, and animal sample details. The performances varied mainly between datasets and classifiers. The best performing classifiers were Random Forests, Decision Trees, and Linear Support Vector Machines. The results show that the creation of an automatic classifier is dependent on further exploration of the available data.