Loading...
9 results
Search Results
Now showing 1 - 9 of 9
- Online detection and infographic explanation of spam reviews with data drift adaptationPublication . de Arriba Pérez, Francisco; García Méndez, Silvia; Leal, Fátima; Malheiro, Benedita; Burguillo, Juan C.Spam reviews are a pervasive problem on online platforms due to its significant impact on reputation. However, research into spam detection in data streams is scarce. Another concern lies in their need for transparency. Consequently, this paper addresses those problems by proposing an online solution for identifying and explaining spam reviews, incorporating data drift adaptation. It integrates (i) incremental profiling, (ii) data drift detection & adaptation, and (iii) identification of spam reviews employing Machine Learning. The explainable mechanism displays a visual and textual prediction explanation in a dashboard. The best results obtained reached up to 87 % spam F-measure.
- Towards adaptive and transparent tourism recommendations: A surveyPublication . Leal, Fátima; Veloso, Bruno; Malheiro, Benedita; Burguillo, Juan C.Crowdsourced data streams are popular and extremely valuable in several domains, namely in tourism. Tourism crowdsourcing platforms rely on past tourist and business inputs to provide tailored recommendations to current users in real time. The continuous, open, dynamic and non-curated nature of the crowd-originated data demands specific stream mining techniques to support online profiling, recommendation, change detection and adaptation, explanation and evaluation. The sought techniques must, not only, continuously improve and adapt profiles and models; but must also be transparent, overcome biases, prioritize preferences, master huge data volumes and all in real time. This article surveys the state-of-art of adaptive and explainable stream recommendation, extends the taxonomy of explainable recommendations from the offline to the stream-based scenario, and identifies future research opportunities.
- Balancing Plug-In for Stream-Based ClassificationPublication . de Arriba-Pérez, Francisco; García-Méndez, Silvia; Leal, Fátima; Malheiro, Benedita; Burguillo-Rial, Juan CarlosThe latest technological advances drive the emergence of countless real-time data streams fed by users, sensors, and devices. These data sources can be mined with the help of predictive and classification techniques to support decision-making in fields like e-commerce, industry or health. In particular, stream-based classification is widely used to categorise incoming samples on the fly. However, the distribution of samples per class is often imbalanced, affecting the performance and fairness of machine learning models. To overcome this drawback, this paper proposes Bplug, a balancing plug-in for stream-based classification, to minimise the bias introduced by data imbalance. First, the plug-in determines the class imbalance degree and then synthesises data statistically through non-parametric kernel density estimation. The experiments, performed with real data from Wikivoyage and Metro of Porto, show that Bplug maintains inter-feature correlation and improves classification accuracy. Moreover, it works both online and offline.
- Responsible processing of crowdsourced tourism dataPublication . Leal, Fátima; Malheiro, Benedita; Veloso, Bruno; Burguillo, Juan CarlosOnline tourism crowdsourcing platforms, such as AirBnB, Expedia or TripAdvisor, rely on the continuous data sharing by tourists and businesses to provide free or paid value-added services. When adequately processed, these data streams can be used to explain and support businesses in the early identification of trends as well as prospective tourists in obtaining tailored recommendations, increasing the confidence in the platform and empowering further end-users. However, existing platforms still do not embrace the desired accountability, responsibility and transparency (ART) design principles, underlying to the concept of sustainable tourism. The objective of this work is to study this problem, identify the most promising techniques which follow these principles and design a novel ART-compliant processing pipeline. To this end, this work surveys: (i) real-time data stream mining techniques for recommendation and trend identification; (ii) trust and reputation (T&R) modelling of data contributors; (iii) chained-based storage of trust models as smart contracts for traceability and authenticity; and (iv) trust- and reputation-based explanations for a transparent and satisfying user experience. The proposed pipeline redesign has implications both to digital and to sustainable tourism since it advances the current processing of tourism crowdsourcing platforms and impacts on the three pillars of sustainable tourism.
- Exposing and explaining fake news on-the-flyPublication . de Arriba Pérez, Francisco; García Méndez, Silvia; Leal, Fátima; Malheiro, Benedita; Burguillo, Juan C.Social media platforms enable the rapid dissemination and consumption of information. However, users instantly consume such content regardless of the reliability of the shared data. Consequently, the latter crowdsourcing model is exposed to manipulation. This work contributes with an explainable and online classification method to recognize fake news in real-time. The proposed method combines both unsupervised and supervised Machine Learning approaches with online created lexica. The profiling is built using creator-, content- and context-based features using Natural Language Processing techniques. The explainable classification mechanism displays in a dashboard the features selected for classification and the prediction confidence. The performance of the proposed solution has been validated with real data sets from Twitter and the results attain 80 % accuracy and macro F-measure. This proposal is the first to jointly provide data stream processing, profiling, classification and explainability. Ultimately, the proposed early detection, isolation and explanation of fake news contribute to increase the quality and trustworthiness of social media contents.
- Explainable Classification of Wiki StreamsPublication . García-Méndez, Silvia; Leal, Fátima; de Arriba-Pérez, Francisco; Malheiro, Benedita; Burguillo-Rial, Juan CarlosWeb 2.0 platforms, like wikis and social networks, rely on crowdsourced data and, as such, are prone to data manipulation by ill-intended contributors. This research proposes the transparent identification of wiki manipulators through the classification of contributors as benevolent or malevolent humans or bots, together with the explanation of the attributed class labels. The system comprises: (i) stream-based data pre-processing; (ii) incremental profiling; and (iii) online classification, evaluation and explanation. Particularly, the system profiles contributors and contributions by combining features directly collected with content- and side-based engineered features. The experimental results obtained with a real data set collected from Wikivoyage – a popular travel wiki – attained a 98.52 % classification accuracy and 91.34 % macro F-measure. In the end, this work seeks to address data reliability to prevent information detrimental and manipulation.
- Interpretable Classification of Wiki-Review StreamsPublication . García-Méndez, Silvia; Leal, Fátima; Malheiro, Benedita; Burguillo-Rial, Juan CarlosWiki articles are created and maintained by a crowd of editors, producing a continuous stream of reviews. Reviews can take the form of additions, reverts, or both. This crowdsourcing model is exposed to manipulation since neither reviews nor editors are automatically screened and purged. To protect articles against vandalism or damage, the stream of reviews can be mined to classify reviews and profile editors in real-time. The goal of this work is to anticipate and explain which reviews to revert. This way, editors are informed why their edits will be reverted. The proposed method employs stream-based processing, updating the profiling and classification models on each incoming event. The profiling uses side and content-based features employing Natural Language Processing, and editor profiles are incrementally updated based on their reviews. Since the proposed method relies on self-explainable classification algorithms, it is possible to understand why a review has been classified as a revert or a non-revert. In addition, this work contributes an algorithm for generating synthetic data for class balancing, making the final classification fairer. The proposed online method was tested with a real data set from Wikivoyage, which was balanced through the aforementioned synthetic data generation. The results attained near-90% values for all evaluation metrics (accuracy, precision, recall, and F-measure).
- Personalised Combination of Multi-Source Data for User ProfilingPublication . Veloso, Bruno; Leal, Fátima; Malheiro, BeneditaHuman interaction with intelligent systems, services and devices generates large volumes of user-related data. This multi-source information can be used to build richer user profiles and improve personalisation. Our goal is to combine multi-source user-related data to create user profiles by assigning dynamic individual weights to the different sources. This paper describes the proposed multi-source user profiling methodology and illustrates its application with a film recommendation system. The contemplated data sources include: (i) personal history, (ii) explicit preferences (ratings); and (iii) social activities (likes, comments or shares). The MovieLens dataset was selected and adapted to assess our approach by comparing the recommendations generated with the standard and the proposed methodologies. In the standard approach, we calculate the best global weights to apply to the different profile sources and generate all user profiles, accordingly. In the proposed approach, we determine, for each user, individual weights for the different profile sources to combine the available data and build the user profile. As a whole, our approach proved to be an efficient solution to a complex problem by continuously updating the individual data source weights and improving the accuracy of the generated personalised multimedia recommendations.
- Emotional evaluation of open-ended responses with transformer modelsPublication . Pajón-Sanmartín, Alejandro; De Arriba Pérez, Francisco; García Méndez, Silvia; Burguillo, Juan C.; Leal, Fátima; Malheiro, BeneditaThis work applies Natural Language Processing (NLP) techniques, specifically transformer models, for the emotional evaluation of open-ended responses. Today’s powerful advances in transformer architecture, such as ChatGPT, make it possible to capture complex emotional patterns in language. The proposed transformer-based system identifies the emotional features of various texts. The research employs an innovative approach, using prompt engineering and existing context, to enhance the emotional expressiveness of the model. It also investigates spaCy’s capabilities for linguistic analysis and the synergy between transformer models and this technology. The results show a significant improvement in emotional detection compared to traditional methods and tools, highlighting the potential of transformer models in this domain. The method can be implemented in various areas, such as emotional research or mental health monitoring, creating a much richer and complete user profile.