Name: | Description: | Size: | Format: | |
---|---|---|---|---|
5.57 MB | Adobe PDF | |||
1.74 KB | License |
Authors
Advisor(s)
Abstract(s)
O som está associado à maioria das atividades humanas, sendo fundamental a sua perceção e distinção. Contudo, esta distinção nem sempre é fácil devido a alguns fatores como, por exemplo, o ruído associado aos mesmos. Os algoritmos de classificação de sons trouxeram uma forte contribuição no sentido de mitigar o problema da distinção dos sons. O Audioset é um dataset desenvolvido pela Google que contém mais de 2 milhões de sons, apresentando assim 5800 horas de áudio. Neste trabalho foi utilizado o subconjunto de sons emitidos por cães presentes deste dataset, comparando-os com sons aleatórios também presentes no dataset. O trabalho que se descreve nesta tese procura avaliar o potencial das redes neuronais na classificação de sons do Audioset, utilizando modelos pré-treinados e modelos treinados do zero nessa classificação e comparando posteriormente o seu desempenho na fase de avaliação. No pré-processamento, os sons foram normalizados, tendo sido utilizados os espectrogramas de Mel e o método MFCC para se extrair os atributos desses sons. Foram utilizados os modelos LeNet-5 (treinado do zero) e o modelo EfficientNet (treinado do zero e pré-treinado), sendo que o modelo que apresentou melhor desempenho foi o modelo EfficientNet pré-treinado que utilizou os espectrogramas de Mel como método de extração de atributos com uma taxa de acerto de 83%.
Sound is associated with most human activities, being fundamental to its perception and distinction. However, this distinction is not always easy due to some factors such as, for example, the noise associated with them. Sound classification algorithms have made a strong contribution towards mitigating the problem of distinguishing sounds. Audioset is a dataset developed by Google that contains more than 2 million sounds, thus presenting 5800 hours of audio. In this work we used the subset of sounds emitted by dogs present in this dataset, comparing them with random sounds also present in the dataset. The work described in this thesis seeks to evaluate the potential of neural networks in the classification of sounds from the Audioset, using pre-trained models and models trained from scratch in this classification and later comparing their performance in the evaluation phase. In pre-processing, the sounds were normalized, and Mel spectrograms and the MFCC method were used to extract the attributes of these sounds. The LeNet-5 model (trained from scratch) and the EfficientNet model (trained from scratch and pre-trained) were used, and the best performing model was the pre-trained EfficientNet model that used Mel spectrograms as the attribute extraction method, with an accuracy of 83%.
Sound is associated with most human activities, being fundamental to its perception and distinction. However, this distinction is not always easy due to some factors such as, for example, the noise associated with them. Sound classification algorithms have made a strong contribution towards mitigating the problem of distinguishing sounds. Audioset is a dataset developed by Google that contains more than 2 million sounds, thus presenting 5800 hours of audio. In this work we used the subset of sounds emitted by dogs present in this dataset, comparing them with random sounds also present in the dataset. The work described in this thesis seeks to evaluate the potential of neural networks in the classification of sounds from the Audioset, using pre-trained models and models trained from scratch in this classification and later comparing their performance in the evaluation phase. In pre-processing, the sounds were normalized, and Mel spectrograms and the MFCC method were used to extract the attributes of these sounds. The LeNet-5 model (trained from scratch) and the EfficientNet model (trained from scratch and pre-trained) were used, and the best performing model was the pre-trained EfficientNet model that used Mel spectrograms as the attribute extraction method, with an accuracy of 83%.
Description
Keywords
Redes neuronais Classificação de sons Machine learning Audioset MFCC Espectrograma de Mel