| Name: | Description: | Size: | Format: | |
|---|---|---|---|---|
| 4.94 MB | Adobe PDF |
Authors
Advisor(s)
Abstract(s)
Atualmente, são geradas enormes quantidades de dados que, na maior parte das vezes, não
são devidamente analisados. Como tal, existe um fosso cada vez mais significativo entre os
dados existentes e a quantidade de dados que é realmente analisada. Esta situação verifica-se
com grande frequência na área da saúde. De forma a combater este problema foram criadas
técnicas que permitem efetuar uma análise de grandes massas de dados, retirando padrões e
conhecimento intrínseco dos dados.
A área da saúde é um exemplo de uma área que cria enormes quantidades de dados
diariamente, mas que na maior parte das vezes não é retirado conhecimento proveitoso dos
mesmos. Este novo conhecimento poderia ajudar os profissionais de saúde a obter resposta
para vários problemas.
Esta dissertação pretende apresentar todo o processo de descoberta de conhecimento:
análise dos dados, preparação dos dados, escolha dos atributos e dos algoritmos, aplicação de
técnicas de mineração de dados (classificação, segmentação e regras de associação), escolha
dos algoritmos (C5.0, CHAID, Kohonen, TwoSteps, K-means, Apriori) e avaliação dos modelos
criados. O projeto baseia-se na metodologia CRISP-DM e foi desenvolvido com a ferramenta
Clementine 12.0. O principal intuito deste projeto é retirar padrões e perfis de dadores que
possam vir a contrair determinadas doenças (anemia, doenças renais, hepatite, entre outras)
ou quais as doenças ou valores anormais de componentes sanguíneos que podem ser comuns
entre os dadores.
Currently, enormous quantities of data are generated which are often not properly analyzed. As such, there is a significant and increasing ditch between the existing data and the quantity that is actually analyzed. This occurs mostly on the healthy area. In order to combat this problem, techniques were created that allow to perform an analysis of a big quantity of data. These techniques permit to retrieve patterns and knowledge intrinsic on data. The healthy area is an example of an area that creates a lot of data but in most cases it does not retrieves useful knowledge. This new knowledge could help the healthy professionals to obtain answers to various problems. This dissertation intends to present all the process of data mining: analysis of data, preparation of data, choice the attributes and algorithms, application of data mining techniques (classification, clustering and association rules), choice of algorithms (C5.0, CHAID, Kohonen, TwoSteps, K-means, Apriori) and evaluation of created models. The project is based on CRISP-DM methodology and was developed with Clementine 12.0 tool. The main objective of this project is to retrieve patterns and profiles of givers that can come to suffer some diseases (anemia, renal diseases, hepatitis, among others) or which diseases or anomalous values of sanguineous components that can be common between givers.
Currently, enormous quantities of data are generated which are often not properly analyzed. As such, there is a significant and increasing ditch between the existing data and the quantity that is actually analyzed. This occurs mostly on the healthy area. In order to combat this problem, techniques were created that allow to perform an analysis of a big quantity of data. These techniques permit to retrieve patterns and knowledge intrinsic on data. The healthy area is an example of an area that creates a lot of data but in most cases it does not retrieves useful knowledge. This new knowledge could help the healthy professionals to obtain answers to various problems. This dissertation intends to present all the process of data mining: analysis of data, preparation of data, choice the attributes and algorithms, application of data mining techniques (classification, clustering and association rules), choice of algorithms (C5.0, CHAID, Kohonen, TwoSteps, K-means, Apriori) and evaluation of created models. The project is based on CRISP-DM methodology and was developed with Clementine 12.0 tool. The main objective of this project is to retrieve patterns and profiles of givers that can come to suffer some diseases (anemia, renal diseases, hepatitis, among others) or which diseases or anomalous values of sanguineous components that can be common between givers.
Description
Keywords
Mineração de dados CRISP-DM Saúde Clementine 12.0 Classificação Segmentação Regras de Associação Data Mining Healthy Classification Clustering Association Rules
