Name: | Description: | Size: | Format: | |
---|---|---|---|---|
7.16 MB | Adobe PDF |
Authors
Advisor(s)
Abstract(s)
O avanço da Inteligência Artificial tem fomentado o lançamento de automóveis com
especificações cada vez mais inovadoras e, consequentemente, a preços mais elevados.
Tal aumento de preços conduz a uma maior procura na compra/venda de carros usados. Esta
procura leva, muitas vezes, à atribuição de preços irrealistas aos mesmos, aumentando o
número de fraudes neste setor, e a uma elevada discrepância nos preços praticados.
Neste âmbito, a área de Machine Learning pode ter um papel preponderante, nomeadamente
na elaboração de modelos de previsão de preços de carros usados. Assim, o objetivo do
presente trabalho prendeu-se com a análise dos modelos já desenvolvidos neste contexto, do
grau de precisão dos mesmos e com a criação de um modelo que colmatasse as falhas nos já
existentes, de forma a se aumentar o referido grau de precisão.
Neste contexto, foram testados os algoritmos RF, XGBoost, LightGBM, RL, MLP e CNN em quatro
conjuntos de dados A, B, C e D. O conjunto de dados A possui 50 características e 57038 carros,
o conjunto de dados B possui 30 características e 70253 automóveis, o conjunto de dados C
possui 10 características e 192799 veículos e o conjunto de dados D possui as 13 características
mais preponderantes e 144702 carros.
Os algoritmos aplicados aos conjuntos de dados A, B e C foram testados duas vezes, com
hiperparâmetros padrão e hiperparâmetros modificados. Todos os algoritmos dos quatro
conjuntos de dados foram sujeitos a uma metodologia de 80% de treino e de 20% de testes e
avaliados, maioritariamente, através das métricas R2, MSE, RMSE e MAE.
Os algoritmos testados com os conjuntos de dados A, B e C obtiveram melhores resultados
aquando da alteração de hiperparâmetros padrão, com a exceção do algoritmo MLP no
conjunto de dados A e o algoritmo RL nos quatro conjuntos de dados.
Dentro dos algoritmos testados, os algoritmos XGBoost e LightGBM foram os que apresentaram
melhores resultados, tendo os mesmos sido muito idênticos entre si nos 4 conjuntos de dados.
Entre os dois algoritmos, o XGBoost foi o que apresentou melhores resultados.
Por fim, o algoritmo XGBoost do conjunto de dados A (MAE=0.12892, RMSE=0.18947,
MSE=0.03590, R2=0.96432) e D (MAE=0.12389, RMSE=0.18913, MSE=0.03577, R2=0.96404)
foram os que apresentaram melhores resultados entre os algoritmos testados, bem como
quando comparados com os algoritmos estudados aquando da revisão do estado da arte.
The development of Artificial Intelligence has fostered the launch of cars with increasingly innovative specifications and, consequently, at higher prices. Such price increases lead to a bigger demand for the purchase/sale of used cars. This demand often leads to the attribution of unrealistic prices to used cars, increasing the number of frauds in this setor, and a high discrepancy in prices. In this context, the area of Machine Learning can play a preponderant role, namely in the elaboration of used car price-prediction models. Thus, the goal of this study was to analyze the models already developed in this context, their precision level as well as the creation of a model that would fill the gaps in the existing models, to increase the referred precision level. In this context, the algorithms RF, XGBoost, LightGBM, RL, MLP, and CNN were tested on four data sets A, B, C, and D. Dataset A has 50 features and 57038 cars, dataset B has 30 features and 70253 cars, dataset C has 10 features and 192799 vehicles, and dataset D has the 13 most prevalent features and 144702 cars. The algorithms applied to datasets A, B, and C were tested twice, with default hyperparameters and modified hyperparameters. All algorithms of the four datasets were submitted to an 80% training and 20% testing methodology and mostly evaluated using the R2, MSE, RMSE, and MAE metrics. The algorithms tested with datasets A, B, and C obtained better results when changing default hyperparameters, except for the MLP algorithm of dataset A and RL algorithm of datasets, A, B, C, and D. XGBoost and LightGBM algorithms were the most successful ones, being their results very similar to each other in all 4 datasets. Among the two algorithms, XGBoost was the one that presented the best results. The algorithm XGBoost on datasets A (MAE=0.12892, RMSE=0.18947, MSE=0.03590, R2=0.96432) and D (MAE=0.12389, RMSE=0.18913, MSE=0.03577, R2=0.96404) were the ones that presented better results among the tested algorithms, as well as when compared with the algorithms studied when reviewing the state of the art.
The development of Artificial Intelligence has fostered the launch of cars with increasingly innovative specifications and, consequently, at higher prices. Such price increases lead to a bigger demand for the purchase/sale of used cars. This demand often leads to the attribution of unrealistic prices to used cars, increasing the number of frauds in this setor, and a high discrepancy in prices. In this context, the area of Machine Learning can play a preponderant role, namely in the elaboration of used car price-prediction models. Thus, the goal of this study was to analyze the models already developed in this context, their precision level as well as the creation of a model that would fill the gaps in the existing models, to increase the referred precision level. In this context, the algorithms RF, XGBoost, LightGBM, RL, MLP, and CNN were tested on four data sets A, B, C, and D. Dataset A has 50 features and 57038 cars, dataset B has 30 features and 70253 cars, dataset C has 10 features and 192799 vehicles, and dataset D has the 13 most prevalent features and 144702 cars. The algorithms applied to datasets A, B, and C were tested twice, with default hyperparameters and modified hyperparameters. All algorithms of the four datasets were submitted to an 80% training and 20% testing methodology and mostly evaluated using the R2, MSE, RMSE, and MAE metrics. The algorithms tested with datasets A, B, and C obtained better results when changing default hyperparameters, except for the MLP algorithm of dataset A and RL algorithm of datasets, A, B, C, and D. XGBoost and LightGBM algorithms were the most successful ones, being their results very similar to each other in all 4 datasets. Among the two algorithms, XGBoost was the one that presented the best results. The algorithm XGBoost on datasets A (MAE=0.12892, RMSE=0.18947, MSE=0.03590, R2=0.96432) and D (MAE=0.12389, RMSE=0.18913, MSE=0.03577, R2=0.96404) were the ones that presented better results among the tested algorithms, as well as when compared with the algorithms studied when reviewing the state of the art.
Description
Keywords
Inteligência Artificial Machine Learning Deep Learning Sistema de Previsão Carros usados RF RL XGBoost LightGBM MLP CNN