Name: | Description: | Size: | Format: | |
---|---|---|---|---|
3.51 MB | Adobe PDF |
Authors
Advisor(s)
Abstract(s)
A doença de Parkinson é a segunda doença neurodegenerativa mais presente, apenas superada pela doença de Alzheimer, e atualmente estima-se que apresente uma incidência entre 7 a 10 milhões de pessoas, estando presente em pessoas com uma idade mais avançada, uma vez que raramente acontece antes dos 50 anos.
À medida que a população mundial envelhece, a sua prevalência aumenta de forma diretamente proporcional. Sabe-se que não existe nenhuma forma efetiva de realizar o diagnóstico da doença de Parkinson, sendo que o presente estudo representa a possibilidade de ser feito um diagnóstico prévio, através de algoritmos de Machine Learning baseados num conjunto de dados da voz.
Como o conjunto de dados adquirido é desbalanceado e apresenta um problema de elevada dimensão, conjunto de features bastante numeroso, estudou-se o conjunto de dados em 3 vertentes distintas: dataset Completo, dataset dividido por género e dataset dividido por conjunto de features.
Nas 3 divisões do conjunto de dados, estudaram-se diversos algoritmos de forma individual e também se utilizou um Ensemble, com a utilização dos diversos classificadores, de forma a tornar o modelo mais robusto.
Nos resultados, obteve-se as melhores métricas no estudo com o dataset completo, em que se promoveu um sistema híbrido de classificação com a utilização de Synthetic Minority Oversampling Technique para balanceamento do dataset, seleção de features para a redução da dimensionalidade através da importância de features do XGBoost e Ensemble Stacking com Random Forest, Gradient Boosting, Support Vector Machine e K-Nearest Neighbors como classificadores base e XGBoost como classificador meta, sendo que o resultado apresentou 98.7% de accuracy.
Os resultados indicam que a utilização de técnicas de Machine Learning baseadas num conjunto de dados da voz pode ser uma boa possibilidade para a deteção prévia da doença de Parkinson, permitindo desta forma, um tratamento mais especializado e eficaz para o paciente.
Parkinson’s disease is the second most common neurodegenerative disease, only surpassed by Alzheimer’s disease, and it is currently estimated that it has an incidence of between 7 and 10 million people, occurring in older people (it rarely happens before the age of 50). As the world’s population ages, its prevalence increases in direct proportion. It is known that there is no effective way of diagnosing Parkinson’s Disease, and this study represents the possibility of making a prior diagnosis using Machine Learning algorithms based on a set of voice data. As the acquired dataset is unbalanced and presents a high-dimensional problem (a very large set of features), the dataset was studied in 3 different ways: complete dataset, dataset divided by gender and dataset divided by set of features. In the 3 divisions of the dataset, various algorithms were studied individually and an Ensemble was also used, using the various classifiers, in order to make the model more robust. The best results were obtained in the study with the complete dataset, in which a hybrid classification system was promoted using the Synthetic Minority Oversampling Technique to balance the dataset, selection of features for dimensionality reduction through the importance of features from XGBoost and Ensemble Stacking with Random Forest, Gradient Boosting, Support Vector Machine and K-Nearest Neighbors as Base classifiers and XGBoost as Meta classifier, and the result was 98.7% accuracy. The results indicate that the use of Machine Learning techniques based on a voice data set may be a good possibility for the early detection of Parkinson’s Disease, thus allowing for a more specialized and effective treatment for the patient.
Parkinson’s disease is the second most common neurodegenerative disease, only surpassed by Alzheimer’s disease, and it is currently estimated that it has an incidence of between 7 and 10 million people, occurring in older people (it rarely happens before the age of 50). As the world’s population ages, its prevalence increases in direct proportion. It is known that there is no effective way of diagnosing Parkinson’s Disease, and this study represents the possibility of making a prior diagnosis using Machine Learning algorithms based on a set of voice data. As the acquired dataset is unbalanced and presents a high-dimensional problem (a very large set of features), the dataset was studied in 3 different ways: complete dataset, dataset divided by gender and dataset divided by set of features. In the 3 divisions of the dataset, various algorithms were studied individually and an Ensemble was also used, using the various classifiers, in order to make the model more robust. The best results were obtained in the study with the complete dataset, in which a hybrid classification system was promoted using the Synthetic Minority Oversampling Technique to balance the dataset, selection of features for dimensionality reduction through the importance of features from XGBoost and Ensemble Stacking with Random Forest, Gradient Boosting, Support Vector Machine and K-Nearest Neighbors as Base classifiers and XGBoost as Meta classifier, and the result was 98.7% accuracy. The results indicate that the use of Machine Learning techniques based on a voice data set may be a good possibility for the early detection of Parkinson’s Disease, thus allowing for a more specialized and effective treatment for the patient.
Description
Keywords
Parkinson Machine Learning SMOTE Feature Selection ENSEMBLE Classification