ESS - BBB - Posters apresentados em eventos científicos
Permanent URI for this collection
Browse
Recent Submissions
- Assessing the utility of the REVEL Score: A comprehensive evaluation across diverse genomic and clinical contextsPublication . Ribeiro, Inês; Abreu, Maria; Leão, Marta; Abreu, Miguel; Faria, Brigida Monica; Faria, Brigida MonicaInterpreting germline variant pathogenicity is challenging, even with increased access to genomic data and in silico prediction tools. The REVEL score, an ensemble method combining 13 prediction tools, has become a key resource for classifying missense variants. This study evaluates REVEL's accuracy using gnomAD data, focusing on three aspects: its agreement with ClinVar classifications, its reliability with variants of moderate-to-high prevalence in gnomAD 4.0 (which are generally benign), and its effectiveness across gene pathogenicity mechanisms, such as gain of function and loss of function. This analysis will determine REVEL's utility in diverse clinical settings. It was optimized data processing by selecting 20 genes from the OMIM-morbid database, representing a variety of disorders and disease mechanisms. To test the accuracy of REVEL, it was selected genes with varying features, focusing on pathogenicity mechanisms (such as gain of function, loss of function, or dominant negative), inheritance patterns (autosomal dominant, autosomal recessive, or X-linked), and disorder frequencies. This approach allowed us to evaluate REVEL's performance across diverse gene characteristics and clinical scenarios.It was mapped each gene's REVEL score to its gnomAD frequency, ClinVar classification, and canonical transcript position, and accuracy was tested using Python and Biopython. Our preliminary analysis showed that the REVEL score performed well for variants with medium-to-high prevalence in gnomAD. REVEL scores were generally consistent with ClinVar classifications, with high accuracy across most gene type, but some care should be taken upon analysing Clinvar classification, as some may have used REVEL or some of its components during interpretation. The tool was effective regardless of pathogenicity mechanisms, inheritance patterns, or disorder frequencies, suggesting broad utility in genomic analysis.
- Sequence alignment: Comparative analysis of algorithms in KRAS genetic mutationsPublication . Pereira, Ana Rita; Lopes, Carlos; Oliveira, Catarina; Pereira, Gonçalo; Moreira, Rui; Faria, Brígida Mónica; Faria, Brigida MonicaFor this study a program was developed in Python on biological sequence alignment, considering the application of algorithms in the genetic analysis of Kristen rat sarcoma viral oncogene (KRAS) and its main mutations associated with cancer. The KRAS gene, like other genes in the same family, is responsible for encoding proteins that regulate cell proliferation, differentiation, and apoptosis. The algorithms employed include the Needleman-Wunsch algorithm as well as the Smith-Waterman algorithm and the Basic Local Alignment Search Tool (BLAST). The algorithms for multiple sequence alignment help to understand the function, evolution, and variability of biological sequences, significantly contributing to advances in genomics and proteomics. The objectives of this study are to apply algorithms for the effective alignment of biological sequences, compare the non-mutated KRAS sequence with principal mutations associated with cancer development, delineate and justify the selection of the algorithms used, assess their computational complexity, and facilitate 3D visualization of the sequences. Development of a program - BioAlign - in Python, with various functions including upload and visualization of sequences; use of different algorithms for global and local alignment; BLAST search; algorithms complexity analysis; obtaining the nucleotides positions; obtaining subsequence and its position; phylogenetic analysis; histogram visualization of sequence length and 3D structure visualization. The program is capable of analyzing and comparing the provided sequences using both local and global algorithms. The execution time among the three main algorithms differs, with the BLAST algorithm notably slower in returning results. This fact may be due to several factors, such as the complexity of the algorithm itself, the internet speed, and the response time of the NCBI website. The development of the BioAlign program indeed allows to address the proposed objectives. Furthermore, the completion of this project has enhanced proficiency in utilizing the Python programming language, demonstrating significant skill development.
- In-silico prediction of the complete ataxin-3 protein network relevant for Spinocerebellar Ataxia type 3 (SCA3)Publication . Batista, Paulo; Vieira, Jorge; Vieira, Cristina P.; Faria, Brígida Mónica; Faria, Brigida MonicaSpinocerebellar ataxia type 3, also known as Machado Joseph disease (SCA3/MJD), is the most common inherited ataxia worldwide and is caused by a pathogenic expansion of the polyglutamine (polyQ) tract, located at the C-terminal region of the ataxin-3 protein (1). The polyQ region is involved in the stabilization of protein- protein interactions (PPIs). Abnormal polyQ expansion results in structural changes of the ataxin-3 (2,3), implying different accessibility at specific interacting residues, needed for the normal protein activity. PolyQ proteins have large protein networks. Mapping of PPIs has been performed using high-throughput methods, that are known to produce false interactions (4). Therefore, the use of multiple interactomes comparisons (conserved interactions between pairs of proteins which have interacting homologs in another organism, as well as proteomic data from cell lines, patients, mutants expressing a human protein, and cross-species genetic screens (modifier screens), available at EvoPPI3 (5)), together with in-silico analyses, can be used to support PPIs, as well as identify novel interactors. In this work we will: 1- characterize ataxin-3 network (validating the proteins identified in main databases, as well as identify new putative interactors); 2- identifying the interactors that behave differently in the presence of an expanded polyQ using different 3D structure prediction methods and protein docking methods. Using EvoPPI3 and protein expression in tissues that matter to SCA3 for PPI retrieval and validation, as well as identification of new interactors. In-silico approaches for predicting protein binding differences between wildtype and expanded ataxin-3 forms will be performed, using different a) 3D protein structure predictions (namely ITASSER (6), AlphaFold (7), and D-ITASSER (8)) and b) protein docking methodologies (such as HADDOCK (9) and ClustPro (10). Using EvoPPI3, there are 422 ataxin-3 interactors in human main databases. From this, 250 proteins have been previously studied. Of the remaining 172 proteins, 158 have been reported from proteomic analyses of human cell lines and ataxin-3 patients (H. sapiens polyQ_22 database), and these could be true interactors. 28 proteins are in common when considering the polyQ, Mus musculus interlogs and Danio rerio interlogs, and these could be novel interactors to study. From the 158, 73 proteins bind more to the expanded form of ataxin-3 using AlphaFold, to confirm these results we used ITASSER, where we obtained 46 of the 73 that bind more to the expanded form. This study contributes significantly to understanding SCA3 pathology by delineating a network of ataxin-3 interactors and analysing their behaviour in the presence of an expanded polyQ stretch.
- Allergic rhinitis and work productivity: Preliminary analysis of data from the MASK-air applicationPublication . Ferreira, Laura; Pinto, Bernardo Sousa; Alves, Sandra Maria; Amaral, Rita; Alves, Sandra Maria; Amaral, RitaAllergic rhinitis is a health condition more prevalent in developed countries that can impact the activities and quality of life of affected individuals1. Although its impact on work productivity is recognized2, there is still a need for a more detailed understanding and quantification. This cross-sectional observational study investigates the relationship between allergic rhinitis and work productivity, using data from the MASK-air mobile designed for monitoring allergic rhinitis and related respiratory conditions3. To investigate the association between the severity of allergic rhinitis symptoms and the impact on work productivity. Data was collected through the MASK-air mobile application4,5 that contains demographic, environmental and symptom variables on a daily basis, with users providing information on a scale of 0 to 100 each day. A sample of 1000 random observations of users from 30 countries, recorded between May 2015 and December 2023 was analysed. Participants were selected based on specifics criteria, including a minimum age of 15 or 16 (depending on the digital consent age in each country) and self-reported diagnosis of allergic rhinitis. Descriptive statistics and the Spearman correlation coefficient6 between symptoms and impact on productivity were calculated. The sample showed a balanced distribution between sexes, with 435 individuals identified as female (53.5%) and 378 individuals as male (46.5%). The mean age of participants was 41.41 ± 14.50 years. The data included participants from various countries; the most frequent was from Mexico with 141 participants (17.3%), followed by Lithuania with 91 participants (11.9%), and Germany with 79 participants (9.7%). Regarding comorbidities, 535 participants (65.6%) reported having conjunctivitis, and 310 participants (38.1%) reported being asthmatic. Additionally, 200 participants (20%) used immunotherapy. A strong positive correlation was observed between work impact and the severities of global allergic symptoms (ρs= 0.82, p < 0.0001) and nasal symptoms (ρs= 0.77, p < 0.0001); and a moderate correlation was observed between work impact and the severities of ocular symptoms (ρs= 0.69, p < 0.0001) and asthma (ρs= 0.48, p< 0.0001). This study offers an initial understanding of how symptoms of allergic rhinitis affect work productivity. Identifying other associated factors will allow targeting health interventions and policies to improve the well-being and performance of workers affected by this condition.
- Machine learning in tumor classification in breast cancerPublication . Lima, Ana Sofia; Coutinho, Carolina; Machado, Raquel; Oliveira, Alexandra Alves; Faria, Brígida Mónica; Faria, Brigida Monica; Oliveira, AlexandraBreast cancer is the primary cause of mortality among women worldwide (1). Discernible patterns can be found within the disease, presenting an opportunity for the application of machine learning (ML), garnering effective results in screening and diagnosis. Different ML algorithms were tested - Decision Tree, Deep Learning (DL), k-Nearest Neighbors (k-NN) and Naïve Bayes - to construct a predictive model allowing the early classification of a breast tumor as benign or malignant, avoiding the need to proceed to a more invasive technique. The ML models were constructed and applied to a database of 201 individuals with breast cancer and descriptive attributes (e.g. age, tumor size, presence of invasive nodes) (2) by using RapidMiner Studio. The evaluation of the models was done by analyzing their accuracy, true negative (TNR) and true positive rates (TPR), their ROC (Receiver Operating Characteristic) curves and AUC (Area Under Curve). During a first exploratory phase, fours clusters were detected: smaller tumor sizes, younger patients, and a benign diagnosis; older age, bigger tumor sizes and a malignant diagnosis; and two more with the opposite characteristics. These characteristics were later found to be important factors in the construction of the Decision Tree. When comparing the models accuracy, the best model was Naïve Bayes (91.04%), followed by the Decision Tree (90.55%), DL (90.02%) and k-NN (86.32%). There is a statistically significant difference between the performances of every model (p<0.05) except between the DL and the Decision Tree models. Naïve Bayes presented the highest TPR (98.21%) while DL presented the highest TNR (83.15%). The Decision Tree model presented the highest AUC (0.976), followed by Naïve Bayes (0.961). The Decision Tree model best achieved our goal by having the highest AUC which denotes an exceptional sensitivity rate, surpassing Naïve Bayes while maintaining a similar accuracy and TNR.
- Development of a cellprofiler pipeline to evaluate adipocyte differentiationPublication . Andrade, João; Torres, Sílvia; Coelho, Pedro; Coelho, PedroObesity is a complex chronic disease characterized by excessive body fat accumulation, with increasingly prevalence worldwide, burdening individuals and healthcare systems, thus urgent research is needed. (1,2) Adipocytes, the major cellular component of adipose tissue, are cells vastly used by the scientific community for in vitro studies of obesity. (3) Oil red O (ORO) staining and quantification is widely used for intracellular lipid staining and adipogenesis evaluation. Modern microscopy and image analysis software like CellProfiler enable efficient, high-throughput cellular image analysis, improving biological understanding and overcoming manual microscopy processing limitations. (4) The present work aimed to develop an in silico image-based method to evaluated lipid accumulation along the differentiation and adipogenesis of adipocytes. Briefly, 3T3-L1 preadipocytes were differentiated with a cocktail of insulin (10 µg/mL), dexamethasone (1 µM) and 3- isobutyl-1-methylxanthine (0.25 mM) and maintained in culture for 12 days. Brightfield contrast phase images, before and after ORO staining, were captured every two days. Lipid-droplet accumulation was evaluated by both CellProfiler analysis and ORO quantification. Throughout differentiation, 3T3-L1 cells exhibited adipocyte- like morphological changes, with increasing lipid accumulation, detected by ORO staining. CellProfiler automated image analysis was comparable to ORO staining quantification, both detecting, approximately after day 4, the presence and accumulation of lipid droplets. The results showed that along differentiation of 3T3-L1 cells into mature adipocytes, CellProfiler evaluation of lipid accumulation provided similar results as ORO staining. Altogether, automated in silico image-based protocols can be used to investigate adipogenic differentiation in vitro, overcoming the demanding conventional quantitative methods.
- ViruScopeDB: a comprehensive multi-omics database for highly infectious virusesPublication . Lima, Ana; Carneiro, João; Sousa, Sérgio; Sá, Vítor; Pratas, Diogo; Sá, Vítor J.Highly infectious viruses such as HIV, Ebola, and SARS-CoV-2 have presented ongoing challenges to global health. Consequently, the optimization of rapid detection tests, including PCR, and the identification of new therapeutic targets remain of paramount importance. The development of genomic and proteomic databases like the HIV Oligonucleotide Database (HIVoligoDB) [1], EbolaID [2], and CoV2ID [3] has facilitated the accumulation and accessibility of knowledge through comprehensive, user-friendly, open-access platforms. This study aims to update, expand, and integrate these databases into a single resource, ViruScopeDB, while conducting thorough analyses of informative genomic regions with the goal of enhancing viral detection methods and treatment strategies. Complete genomic sequence variants for each virus were compiled using NCBI Virus, followed by multiple sequence alignment via MAFFT within the Galaxy platform. The alignments were consequently uploaded to Geneious Prime for complete genome visualization and calculation of parameters such as percentage of pairwise identity. Primer data was extracted from open-access research articles available on PubMed using a newly-built custom pipeline for PDF to plain text conversion followed by data mining of oligonucleotide sequences. A fully automated script for primer validation, parameter scoring and calculation of best primer pairs for PCR is being constructed for subsequent upload into the database. A total of 658 sequences with a mean length of 18,910 base pairs (bp) were collected for Ebolavirus, with percentage of pairwise identity (PPI) of 91.7%. 7,261 sequences with a mean length of 8,883 bp, with a PPI of 80.8% were identified for HIV-1. For HIV-2, 43 sequences with an average of 10,108 bp and PPI of 80.9% were analyzed. For Ebola, a total of 709 primers were scraped from 257 articles, and for HIV articles this number rises to 10,290 primers collected from 2,579 articles. Using a combination of preexistent and novel custom-built bioinformatics tools, it was possible to data mine key information related to each virus and their variants, as well as collect primer information for possible PCR optimizations. Further analysis will be conducted on the data collected, branching out into the realm of phylogenetics and 3D modelling/viral protein docking, in order to construct a database that is transversal to various omics.
- Comprehensive multi-omics database for highly infectious viruses: a focus on HIV, Ebola and SARS-CoV-2Publication . Lima, A. S.; Carneiro, J.; Sousa, S.; Sá, Vítor Júlio; Pratas, D.Highly infectious viruses, such as HIV, Ebola, and SARS-CoV-2, continue to pose significant threats to global health, underlining the urgent need for new therapeutic approaches. Recent advancements in genomic and proteomic databases, along with 3D homology modelling, have enabled detailed simulations of virus-host interactions, providing insights into infection mechanisms and helping identify potential therapeutic targets. This study aimed to create a unified database of highly infectious viruses and to conduct structural analyses of key viral proteins to explore potential therapeutic strategies. Structural information for proteins involved in the infection process was sourced from the Protein Data Bank and UniProt, while 3D homology models for significant viral variants were generated using AlphaFold. The quality of these models was assessed using AlphaFold-specific metrics, including pLDDT (per-residue confidence scores) and PAE (predicted aligned error), ensuring the structural reliability for further analyses. Identification of the most relevant structural changes was done through alanine scanning in Schrödinger’s Biologic Suite, with posterior studies on how those changes affected the infection process. Simulations of virus-host interactions were conducted using docking algorithms, namely HADDOCK, with visualizations performed using PyMOL. This integrative approach highlights high-confidence therapeutic targets and provides a foundation for developing novel effective treatments for highly infectious diseases.
- A multi-omics and primer database for virus identification: Focus on HIV, Ebola, and SARS-CoV-2Publication . Lima, A. S.; Carneiro, J.; Sousa, S.; Sá, Vítor Júlio; Pratas, D.; Sá, Vítor J.Highly infectious viruses such as HIV, Ebola, and SARS-CoV-2 have presented ongoing challenges to global health. Consequently, the optimization of rapid detection tests, including PCR, and the identification of new therapeutic targets remain of paramount importance. The development of genomic and proteomic databases like the HIV Oligonucleotide Database (HIVoligoDB), EbolaID, and CoV2ID has facilitated the accumulation and accessibility of knowledge through comprehensive, user-friendly, open-access platforms. This study aims to update, expand, and integrate these databases into a single resource, while conducting thorough analyses of informative genomic regions with the goal of enhancing viral detection methods and treatment strategies. Complete genomic sequence variants for each virus were compiled using Geneious Prime and NCBI Virus, followed by multiple sequence alignment via MAFFT within the Galaxy platform. The extraction of primers and probes from research articles was attempted using two approaches: Large Language Models (LLMs), specifically NotebookLM and DonutAI/OpenLLaMa-7b, and a classic method combining the Python package PyMuPDF4LLM for PDF data extraction with regular expressions (RegEx) for oligonucleotide identification. Preliminary testing revealed that DonutAI/OpenLLaMa-7b had the lowest accuracy, failing to correctly identify any primers. NotebookLM achieved an accuracy of 39%, while the PyMuPDF4LLM + RegEx method attained the highest accuracy at 71%, successfully identifying 85 out of 121 primers in the test batch of articles. Due to its superior performance and execution speed, the PyMuPDF4LLM + RegEx approach was selected for further refinement. This methodology improves upon previous RegEx-based techniques by eliminating the need for PDF preprocessing and refining the capture of relevant information while minimizing non-relevant captures. Future steps include cross-validation of the extracted primers against the reference genome to eliminate primers intended for other viruses and to accurately identify the binding regions of the identified oligonucleotides. Additionally, parameters such as percentage of identical sites and pairwise identity will be calculated to determine the optimal primer pairs for PCR optimization. Further structural analysis of the collected sequences will form the foundation for 3D modelling and molecular dynamics simulations.
- P-588 The impact of display screen use on visual function at an early agePublication . Mateus, Catarina; Dias, Libânia; Rodrigues, Matilde; Magalhães, Rúben; Ferreira, Simão; Rocha, Nuno; Mateus, Catarina; Dias, Libânia; Rodrigues, Matilde; Ferreira, Simão; Rocha, NunoAs the use of smartphones and other digital devices becomes an integral part of modern life, it is increasingly common to witness children engaging with these devices at younger ages and for extended periods. The outbreak of the COVID-19 pandemic further exacerbated this trend, significantly impacting the way children interact with technology. This study aims to evaluate visual function and lacrimal volume in preschool-aged children and explore possible correlations with the age of screen usage initiation and daily screen time.
