Repository logo
 
Publication

Simulation, modelling and classification of wiki contributors: Spotting the good, the bad, and the ugly

dc.contributor.authorGarcía-Méndez, Silvia
dc.contributor.authorLeal, Fátima
dc.contributor.authorMalheiro, Benedita
dc.contributor.authorBurguillo-Rial, Juan Carlos
dc.contributor.authorVeloso, Bruno
dc.contributor.authorChis, Adriana E.
dc.contributor.authorGonzález–Vélez, Horacio
dc.date.accessioned2022-07-15T08:50:32Z
dc.date.available2022-07-15T08:50:32Z
dc.date.issued2022
dc.description.abstractData crowdsourcing is a data acquisition process where groups of voluntary contributors feed platforms with highly relevant data ranging from news, comments, and media to knowledge and classifications. It typically processes user-generated data streams to provide and refine popular services such as wikis, collaborative maps, e-commerce sites, and social networks. Nevertheless, this modus operandi raises severe concerns regarding ill-intentioned data manipulation in adversarial environments. This paper presents a simulation, modelling, and classification approach to automatically identify human and non-human (bots) as well as benign and malign contributors by using data fabrication to balance classes within experimental data sets, data stream modelling to build and update contributor profiles and, finally, autonomic data stream classification. By employing WikiVoyage – a free worldwide wiki travel guide open to contribution from the general public – as a testbed, our approach proves to significantly boost the confidence and quality of the classifier by using a class-balanced data stream, comprising both real and synthetic data. Our empirical results show that the proposed method distinguishes between benign and malign bots as well as human contributors with a classification accuracy of up to 92 %.pt_PT
dc.description.sponsorshipThis work has been supported by: (i) Xunta de Galicia, Spain grant ED481B-2021-118, Spain; (ii) National Funds through the FCT – Fundação para a Ciência e a Tecnologia, Portugal (Portuguese Foundation for Science and Technology) as part of project UIDB/50014/2020; (iii) CHIST-ERA, Ireland and the Irish Research Council, Ireland as part of the ‘‘Smart Pharmaceutical Manufacturing (SPuMoNI)’’ research project [Apr/2019–Dec/2022]; and (iv) University of Vigo, Spain/CISUG for open access charge.pt_PT
dc.description.versioninfo:eu-repo/semantics/publishedVersionpt_PT
dc.identifier.doi10.1016/j.simpat.2022.102616pt_PT
dc.identifier.issn1569-190X
dc.identifier.urihttp://hdl.handle.net/10400.22/20675
dc.language.isoengpt_PT
dc.peerreviewedyespt_PT
dc.publisherElsevierpt_PT
dc.relationED481B-2021-118pt_PT
dc.relationINESC TEC- Institute for Systems and Computer Engineering, Technology and Science
dc.relation.publisherversionhttps://www.sciencedirect.com/science/article/pii/S1569190X22000971pt_PT
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/pt_PT
dc.subjectClassificationpt_PT
dc.subjectData reliabilitypt_PT
dc.subjectStream processingpt_PT
dc.subjectSynthetic datapt_PT
dc.subjectData fabricationpt_PT
dc.subjectWiki contributorspt_PT
dc.titleSimulation, modelling and classification of wiki contributors: Spotting the good, the bad, and the uglypt_PT
dc.typejournal article
dspace.entity.typePublication
oaire.awardTitleINESC TEC- Institute for Systems and Computer Engineering, Technology and Science
oaire.awardURIinfo:eu-repo/grantAgreement/FCT/6817 - DCRRNI ID/UIDB%2F50014%2F2020/PT
oaire.citation.startPage102616pt_PT
oaire.citation.titleSimulation Modelling Practice and Theorypt_PT
oaire.citation.volume120pt_PT
oaire.fundingStream6817 - DCRRNI ID
person.familyNameBENEDITA CAMPOS NEVES MALHEIRO
person.givenNameMARIA
person.identifier.ciencia-id7A15-08FC-4430
person.identifier.orcid0000-0001-9083-4292
project.funder.identifierhttp://doi.org/10.13039/501100001871
project.funder.nameFundação para a Ciência e a Tecnologia
rcaap.rightsopenAccesspt_PT
rcaap.typearticlept_PT
relation.isAuthorOfPublicationbabd4fda-654a-4b59-952d-6113eebbb308
relation.isAuthorOfPublication.latestForDiscoverybabd4fda-654a-4b59-952d-6113eebbb308
relation.isProjectOfPublication7a2d9a82-ee07-4c57-bbbf-2d88b942688d
relation.isProjectOfPublication.latestForDiscovery7a2d9a82-ee07-4c57-bbbf-2d88b942688d

Files

Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
ART_LSA_MBM_2022.pdf
Size:
1.09 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: