| Name: | Description: | Size: | Format: | |
|---|---|---|---|---|
| 3.05 MB | Adobe PDF |
Authors
Abstract(s)
Esta dissertação tem como objetivo investigar a relação entre diferentes técnicas de Processamento de Linguagem Natural e o desempenho de diversos modelos de análise de sentimento no contexto do mercado financeiro de ações, utilizando comentários provenientes da rede social StockTwits. O estudo procura também estabelecer a relação entre o sentimento extraído a partir desses comentários e de notícias financeiras, disponibilizadas pelo EODHD, com a variação diária do valor de fecho das ações, obtido através do Yahoo Finance. Foram exploradas diferentes técnicas de pré-processamento e representações textuais, com e sem balanceamento de dados, comparando abordagens baseadas em léxicos, modelos de aprendizagem computacional tradicional, modelos baseados em redes neuronais e modelos baseados em transformadores. A análise temporal incluiu a construção de índices de sentimento diário dos comentários e das notícias financeiras. Estes índices, bem como os valores de fecho das ações, foram organizados em séries temporais, que serviram de base para a deteção de desfasamentos através do algoritmo Dynamic Time Warping (Alinhamento Temporal Dinâmico) e para a verificação do alinhamento direcional com recurso a testes de significância estatística. Os resultados indicam que os modelos baseados em transformadores alcançaram o melhor desempenho, embora com maior exigência computacional, destacando-se o RoBERTaStockTwits como o mais eficaz. O impacto da técnica de balanceamento foi diferente entre as diferentes abordagens, sendo mais notório nos modelos baseados em transformadores e na aprendizagem computacional tradicional, e pouco significativo nos modelos baseados em redes neuronais. A aplicação de técnicas de pré-processamento menos complexas produziu melhores resultados, com maior destaque nos modelos baseados em transformadores. As técnicas de representação textual baseadas em frequência de palavras mostraram melhor desempenho do que as representações densas nos modelos de aprendizagem computacional tradicional. A análise temporal revelou diferenças entre os comentários nas redes sociais e as notícias, com padrões mais evidentes e estatisticamente significativos nos comentários, sugerindo a possibilidade de ineficiências, tendo como referência a Teoria do Mercado Eficiente.
This dissertation aims to investigate the relationship between different Natural Language Processing techniques and the performance of various sentiment analysis models in the context of the stock market, using comments collected from the StockTwits social network. The study also seeks to establish the relationship between the sentiment extracted from these comments and from financial news, provided by EODHD, with the daily variation in stock closing prices, obtained through Yahoo Finance. Different preprocessing techniques and textual representations were explored, with and without data balancing, comparing approaches based on lexicons, traditional machine learning models, neural network-based models, and transformer-based models. The temporal analysis included the construction of daily sentiment indices from comments and financial news. These indices, together with stock closing prices, were organized into time series, which served as the basis for lag detecting through Dynamic Time Warping algorithm and for verifying directional alignment using statistical significance tests. The results indicate that transformer-based models achieved the best performance, albeit with higher computational requirements, with RoBERTaStockTwits standing out as the most effective. The impact of data balancing varied across the different approaches, being more pronounced in transformer-based and traditional machine learning models, and less significant in neural network models. The application of less complex preprocessing techniques led to better results, particularly in transformer-based models. Textual representation techniques based on word frequency outperformed dense representations in traditional machine learning models. The temporal analysis between sentiment indices and daily stock price variation revealed differences between social media comments and news, with more evident and statistically significant patterns in the comments, suggesting the possibility of inefficiencies, taking the Efficient Market Hypothesis as a reference.
This dissertation aims to investigate the relationship between different Natural Language Processing techniques and the performance of various sentiment analysis models in the context of the stock market, using comments collected from the StockTwits social network. The study also seeks to establish the relationship between the sentiment extracted from these comments and from financial news, provided by EODHD, with the daily variation in stock closing prices, obtained through Yahoo Finance. Different preprocessing techniques and textual representations were explored, with and without data balancing, comparing approaches based on lexicons, traditional machine learning models, neural network-based models, and transformer-based models. The temporal analysis included the construction of daily sentiment indices from comments and financial news. These indices, together with stock closing prices, were organized into time series, which served as the basis for lag detecting through Dynamic Time Warping algorithm and for verifying directional alignment using statistical significance tests. The results indicate that transformer-based models achieved the best performance, albeit with higher computational requirements, with RoBERTaStockTwits standing out as the most effective. The impact of data balancing varied across the different approaches, being more pronounced in transformer-based and traditional machine learning models, and less significant in neural network models. The application of less complex preprocessing techniques led to better results, particularly in transformer-based models. Textual representation techniques based on word frequency outperformed dense representations in traditional machine learning models. The temporal analysis between sentiment indices and daily stock price variation revealed differences between social media comments and news, with more evident and statistically significant patterns in the comments, suggesting the possibility of inefficiencies, taking the Efficient Market Hypothesis as a reference.
Description
Keywords
Análise de sentimento Processamento de linguagem natural Mercado financeiro Dynamic Time Warping Sentiment Analysis Natural Language Processing Financial Market Dynamic Time Warping
