Repository logo
 

ISEP – LSA – Artigos

Permanent URI for this collection

Browse

Recent Submissions

Now showing 1 - 10 of 82
  • Identification and explanation of disinformation in wiki data streams
    Publication . Arriba-Pérez, Francisco de; García-Méndez, Silvia; Leal, Fátima; Malheiro, Benedita; Burguillo, Juan C.
    Social media platforms, increasingly used as news sources for varied data analytics, have transformed how information is generated and disseminated. However, the unverified nature of this content raises concerns about trustworthiness and accuracy, potentially negatively impacting readers’ critical judgment due to disinformation. This work aims to contribute to the automatic data quality validation field, addressing the rapid growth of online content on wiki pages. Our scalable solution includes stream-based data processing with feature engineering, feature analysis and selection, stream-based classification, and real-time explanation of prediction outcomes. The explainability dashboard is designed for the general public, who may need more specialized knowledge to interpret the model’s prediction. Experimental results on two datasets attain approximately 90% values across all evaluation metrics, demonstrating robust and competitive performance compared to works in the literature. In summary, the system assists editors by reducing their effort and time in detecting disinformation.
  • Online detection and infographic explanation of spam reviews with data drift adaptation
    Publication . de Arriba Pérez, Francisco; García Méndez, Silvia; Leal, Fátima; Malheiro, Benedita; Burguillo, Juan C.
    Spam reviews are a pervasive problem on online platforms due to its significant impact on reputation. However, research into spam detection in data streams is scarce. Another concern lies in their need for transparency. Consequently, this paper addresses those problems by proposing an online solution for identifying and explaining spam reviews, incorporating data drift adaptation. It integrates (i) incremental profiling, (ii) data drift detection & adaptation, and (iii) identification of spam reviews employing Machine Learning. The explainable mechanism displays a visual and textual prediction explanation in a dashboard. The best results obtained reached up to 87 % spam F-measure.
  • Exposing and explaining fake news on-the-fly
    Publication . de Arriba Pérez, Francisco; García Méndez, Silvia; Leal, Fátima; Malheiro, Benedita; Burguillo, Juan C.
    Social media platforms enable the rapid dissemination and consumption of information. However, users instantly consume such content regardless of the reliability of the shared data. Consequently, the latter crowdsourcing model is exposed to manipulation. This work contributes with an explainable and online classification method to recognize fake news in real-time. The proposed method combines both unsupervised and supervised Machine Learning approaches with online created lexica. The profiling is built using creator-, content- and context-based features using Natural Language Processing techniques. The explainable classification mechanism displays in a dashboard the features selected for classification and the prediction confidence. The performance of the proposed solution has been validated with real data sets from Twitter and the results attain 80 % accuracy and macro F-measure. This proposal is the first to jointly provide data stream processing, profiling, classification and explainability. Ultimately, the proposed early detection, isolation and explanation of fake news contribute to increase the quality and trustworthiness of social media contents.
  • Interpretable Classification of Wiki-Review Streams
    Publication . García-Méndez, Silvia; Leal, Fátima; Malheiro, Benedita; Burguillo-Rial, Juan Carlos
    Wiki articles are created and maintained by a crowd of editors, producing a continuous stream of reviews. Reviews can take the form of additions, reverts, or both. This crowdsourcing model is exposed to manipulation since neither reviews nor editors are automatically screened and purged. To protect articles against vandalism or damage, the stream of reviews can be mined to classify reviews and profile editors in real-time. The goal of this work is to anticipate and explain which reviews to revert. This way, editors are informed why their edits will be reverted. The proposed method employs stream-based processing, updating the profiling and classification models on each incoming event. The profiling uses side and content-based features employing Natural Language Processing, and editor profiles are incrementally updated based on their reviews. Since the proposed method relies on self-explainable classification algorithms, it is possible to understand why a review has been classified as a revert or a non-revert. In addition, this work contributes an algorithm for generating synthetic data for class balancing, making the final classification fairer. The proposed online method was tested with a real data set from Wikivoyage, which was balanced through the aforementioned synthetic data generation. The results attained near-90% values for all evaluation metrics (accuracy, precision, recall, and F-measure).
  • Telco customer top‐ups: Stream‐based multi‐target regression
    Publication . Alves, Pedro Miguel; Filipe, Ricardo; Malheiro, Benedita
    Telecommunication operators compete not only for new clients, but, above all, to maintain current ones. The modelling and prediction of the top-up behaviour of prepaid mobile subscribers allows operators to anticipate customer intentions and implement measures to strengthen customer relationship. This research explores a data set from a Portuguese operator, comprising 30 months of top-up events, to predict the top-up monthly frequency and average value of prepaid subscribers using offline and online multi-target regression algorithms. The offline techniques adopt a monthly sliding window, whereas the online techniques use an event sliding window. Experiments were performed to determine the most promising set of features, analyse the accuracy of the offline and online regressors and the impact of sliding window dimension. The results show that online regression outperforms the offline counterparts. The best accuracy was achieved with adaptive model rules and a sliding window of 500 000 events (approximately 5 months). Finally, the predicted top-up monthly frequency and average value of each subscriber were converted to individual date and value intervals, which can be used by the operator to identify early signs of subscriber disengagement and immediately take pre-emptive measures.
  • Towards adaptive and transparent tourism recommendations: A survey
    Publication . Leal, Fátima; Veloso, Bruno; Malheiro, Benedita; Burguillo, Juan C.
    Crowdsourced data streams are popular and extremely valuable in several domains, namely in tourism. Tourism crowdsourcing platforms rely on past tourist and business inputs to provide tailored recommendations to current users in real time. The continuous, open, dynamic and non-curated nature of the crowd-originated data demands specific stream mining techniques to support online profiling, recommendation, change detection and adaptation, explanation and evaluation. The sought techniques must, not only, continuously improve and adapt profiles and models; but must also be transparent, overcome biases, prioritize preferences, master huge data volumes and all in real time. This article surveys the state-of-art of adaptive and explainable stream recommendation, extends the taxonomy of explainable recommendations from the offline to the stream-based scenario, and identifies future research opportunities.
  • Telco customer top‐ups: Stream‐based multi‐target regression
    Publication . Alves, Pedro Miguel; Filipe, Ricardo Ângelo; Malheiro, Benedita
    Telecommunication operators compete not only for new clients, but, above all, to maintain current ones. The modelling and prediction of the top-up behaviour of prepaid mobile subscribers allows operators to anticipate customer intentions and implement measures to strengthen customer relationship. This research explores a data set from a Portuguese operator, comprising 30 months of top-up events, to predict the top-up monthly frequency and average value of prepaid subscribers using offline and online multi-target regression algorithms. The offline techniques adopt a monthly sliding window, whereas the online techniques use an event sliding window. Experiments were performed to determine the most promising set of features, analyse the accuracy of the offline and online regressors and the impact of sliding window dimension. The results show that online regression outperforms the offline counterparts. The best accuracy was achieved with adaptive model rules and a sliding window of 500,000 events (approximately 5 months). Finally, the predicted top-up monthly frequency and average value of each subscriber were converted to individual date and value intervals, which can be used by the operator to identify early signs of subscriber disengagement and immediately take pre-emptive measures.
  • Simulation, modelling and classification of wiki contributors: Spotting the good, the bad, and the ugly
    Publication . García-Méndez, Silvia; Leal, Fátima; Malheiro, Benedita; Burguillo-Rial, Juan Carlos; Veloso, Bruno; Chis, Adriana E.; González–Vélez, Horacio
    Data crowdsourcing is a data acquisition process where groups of voluntary contributors feed platforms with highly relevant data ranging from news, comments, and media to knowledge and classifications. It typically processes user-generated data streams to provide and refine popular services such as wikis, collaborative maps, e-commerce sites, and social networks. Nevertheless, this modus operandi raises severe concerns regarding ill-intentioned data manipulation in adversarial environments. This paper presents a simulation, modelling, and classification approach to automatically identify human and non-human (bots) as well as benign and malign contributors by using data fabrication to balance classes within experimental data sets, data stream modelling to build and update contributor profiles and, finally, autonomic data stream classification. By employing WikiVoyage – a free worldwide wiki travel guide open to contribution from the general public – as a testbed, our approach proves to significantly boost the confidence and quality of the classifier by using a class-balanced data stream, comprising both real and synthetic data. Our empirical results show that the proposed method distinguishes between benign and malign bots as well as human contributors with a classification accuracy of up to 92 %.
  • Real-time GNSS precise positioning: RTKLIB for ROS
    Publication . Ferreira, António; Matias, Bruno; Almeida, Jose Miguel; Silva, Eduardo
    The global navigation satellite system (GNSS) constitutes an effective and affordable solution to the outdoor positioning problem. When combined with precise positioning techniques, such as the real time kinematic (RTK), centimeter-level positioning accuracy becomes a reality. Such performance is suitable for a whole new range of demanding applications, including high-accuracy field robotics operations. The RTKRCV, part of the RTKLIB package, is one of the most popular open-source solutions for real-time GNSS precise positioning. Yet the lack of integration with the robot operating system (ROS), constitutes a limitation on its adoption by the robotics community. This article addresses this limitation, reporting a new implementation which brings the RTKRCV capabilities into ROS. New features, including ROS publishing and control over a ROS service, were introduced seamlessly, to ensure full compatibility with all original options. Additionally, a new observation synchronization scheme improves solution consistency, particularly relevant for the moving-baseline positioning mode. Real application examples are presented to demonstrate the advantages of our rtkrcv_ros package. For community benefit, the software was released as an open-source package.
  • Stream-based explainable recommendations via blockchain profiling
    Publication . Leal, Fátima; Veloso, Bruno; Malheiro, Benedita; Burguillo, Juan Carlos; Chis, Adriana E.; González–Vélez, Horacio
    Explainable recommendations enable users to understand why certain items are suggested and, ultimately, nurture system transparency, trustworthiness, and confidence. Large crowdsourcing recommendation systems ought to crucially promote authenticity and transparency of recommendations. To address such challenge, this paper proposes the use of stream-based explainable recommendations via blockchain pro filing. Our contribution relies on chained historical data to improve the quality and transparency of online collaborative recommendation filters - Memory-based and Model-based - using, as use cases, data streamed from two large tourism crowdsourcing platforms, namely Expedia and TripAdvisor. Building historical trust-based models of raters, our method is implemented as an external module and integrated with the collaborative filter through a post-recommendation component. The inter-user trust profiling history, traceability and authenticity are ensured by blockchain, since these profiles are stored as a smart contract in a private Ethereum network. Our empirical evaluation with HotelExpedia and Tripadvisor has consistently shown the positive impact of blockchain-based profiling on the quality (measured as recall) and transparency (determined via explanations) of recommendations.