ESS - DM - Bioestatística e Bioinformática Aplicadas à Saúde
Permanent URI for this collection
Browse
Browsing ESS - DM - Bioestatística e Bioinformática Aplicadas à Saúde by Author "Fonseca, João Emanuel Sousa"
Now showing 1 - 1 of 1
Results Per Page
Sort Options
- Automatic FoodEx2 classification system for food descriptionPublication . Fonseca, João Emanuel Sousa; Faria, Brígida Mónica; Reis, Luís Paulo; Pimenta, RuiFood is an impacting factor in human health. Food security protects the consumers by offering a safety net from which they can trust the quality of the product. In Europe, entities such as the European Food and Safety Authority (EFSA) are risk assessors. They provide information used to shape laws around food security. To collect data regarding food safety the EFSA developed a comprehensive food classification and description system, called FoodEx2. The FoodEx2 coding system uses manual process to map food descriptions to FoodEx2 codes. The motivation for this work comes from the reduced time that could be obtained by using an algorithm to automate the code generation. It is already known that the application of Knowledge Discovery in Databases is a fundamental area to automatically produce patterns from large quantities of data. The main objective of this project is to explore automatic approaches to classify food descriptions with FoodEx2 codes. In this work several classic classifiers are compared in the prediction of FoodEx2 base codes, a multiclass classification task. The performances were explored in distinct datasets along with different levels of text preprocessing using the metrics exact match ratio and the f1-score and document representation Bag-Of-Words with TF IDF weighting. All the datasets contain imbalanced data distributions. The documents are composed of short texts describing ingredients, dishes, and animal sample details. The performances varied mainly between datasets and classifiers. The best performing classifiers were Random Forests, Decision Trees, and Linear Support Vector Machines. The results show that the creation of an automatic classifier is dependent on further exploration of the available data.