Browsing by Author "Lima, A. S."
Now showing 1 - 2 of 2
Results Per Page
Sort Options
- Comprehensive multi-omics database for highly infectious viruses: a focus on HIV, Ebola and SARS-CoV-2Publication . Lima, A. S.; Carneiro, J.; Sousa, S.; Sá, Vítor Júlio; Pratas, D.Highly infectious viruses, such as HIV, Ebola, and SARS-CoV-2, continue to pose significant threats to global health, underlining the urgent need for new therapeutic approaches. Recent advancements in genomic and proteomic databases, along with 3D homology modelling, have enabled detailed simulations of virus-host interactions, providing insights into infection mechanisms and helping identify potential therapeutic targets. This study aimed to create a unified database of highly infectious viruses and to conduct structural analyses of key viral proteins to explore potential therapeutic strategies. Structural information for proteins involved in the infection process was sourced from the Protein Data Bank and UniProt, while 3D homology models for significant viral variants were generated using AlphaFold. The quality of these models was assessed using AlphaFold-specific metrics, including pLDDT (per-residue confidence scores) and PAE (predicted aligned error), ensuring the structural reliability for further analyses. Identification of the most relevant structural changes was done through alanine scanning in Schrödinger’s Biologic Suite, with posterior studies on how those changes affected the infection process. Simulations of virus-host interactions were conducted using docking algorithms, namely HADDOCK, with visualizations performed using PyMOL. This integrative approach highlights high-confidence therapeutic targets and provides a foundation for developing novel effective treatments for highly infectious diseases.
- A multi-omics and primer database for virus identification: Focus on HIV, Ebola, and SARS-CoV-2Publication . Lima, A. S.; Carneiro, J.; Sousa, S.; Sá, Vítor Júlio; Pratas, D.; Sá, Vítor J.Highly infectious viruses such as HIV, Ebola, and SARS-CoV-2 have presented ongoing challenges to global health. Consequently, the optimization of rapid detection tests, including PCR, and the identification of new therapeutic targets remain of paramount importance. The development of genomic and proteomic databases like the HIV Oligonucleotide Database (HIVoligoDB), EbolaID, and CoV2ID has facilitated the accumulation and accessibility of knowledge through comprehensive, user-friendly, open-access platforms. This study aims to update, expand, and integrate these databases into a single resource, while conducting thorough analyses of informative genomic regions with the goal of enhancing viral detection methods and treatment strategies. Complete genomic sequence variants for each virus were compiled using Geneious Prime and NCBI Virus, followed by multiple sequence alignment via MAFFT within the Galaxy platform. The extraction of primers and probes from research articles was attempted using two approaches: Large Language Models (LLMs), specifically NotebookLM and DonutAI/OpenLLaMa-7b, and a classic method combining the Python package PyMuPDF4LLM for PDF data extraction with regular expressions (RegEx) for oligonucleotide identification. Preliminary testing revealed that DonutAI/OpenLLaMa-7b had the lowest accuracy, failing to correctly identify any primers. NotebookLM achieved an accuracy of 39%, while the PyMuPDF4LLM + RegEx method attained the highest accuracy at 71%, successfully identifying 85 out of 121 primers in the test batch of articles. Due to its superior performance and execution speed, the PyMuPDF4LLM + RegEx approach was selected for further refinement. This methodology improves upon previous RegEx-based techniques by eliminating the need for PDF preprocessing and refining the capture of relevant information while minimizing non-relevant captures. Future steps include cross-validation of the extracted primers against the reference genome to eliminate primers intended for other viruses and to accurately identify the binding regions of the identified oligonucleotides. Additionally, parameters such as percentage of identical sites and pairwise identity will be calculated to determine the optimal primer pairs for PCR optimization. Further structural analysis of the collected sequences will form the foundation for 3D modelling and molecular dynamics simulations.
