Repository logo
 
Publication

Scalable data analytics using crowdsourced repositories and streams

dc.contributor.authorVeloso, Bruno
dc.contributor.authorLeal, Fátima
dc.contributor.authorGonzález-Veléz, Horacio
dc.contributor.authorMalheiro, Benedita
dc.contributor.authorBurguillo, Juan Carlos
dc.date.accessioned2018-09-07T13:49:42Z
dc.date.embargo2119
dc.date.issued2018
dc.date.updated2018-09-05T11:17:51Z
dc.description.abstractThe scalable analysis of crowdsourced data repositories and streams has quickly become a critical experimental asset in multiple fields. It enables the systematic aggregation of otherwise disperse data sources and their efficient processing using significant amounts of computational resources. However, the considerable amount of crowdsourced social data and the numerous criteria to observe can limit analytical off-line and on-line processing due to the intrinsic computational complexity. This paper demonstrates the efficient parallelisation of profiling and recommendation algorithms using tourism crowdsourced data repositories and streams. Using the Yelp data set for restaurants, we have explored two different profiling approaches: entity-based and feature-based using ratings, comments, and location. Concerning recommendation, we use a collaborative recommendation filter employing singular value decomposition with stochastic gradient descent (SVD-SGD). To accurately compute the final recommendations, we have applied post-recommendation filters based on venue suitability, value for money, and sentiment. Additionally, we have built a social graph for enrichment. Our master–worker implementation shows super-linear scalability for 10, 20, 30, 40, 50, and 60 concurrent instances.pt_PT
dc.description.versioninfo:eu-repo/semantics/publishedVersionpt_PT
dc.identifier07437315en_US
dc.identifier.doi10.1016/j.jpdc.2018.06.013pt_PT
dc.identifier.issn07437315
dc.identifier.urihttp://hdl.handle.net/10400.22/11910
dc.language.isoengpt_PT
dc.publisherElsevierpt_PT
dc.relation.ispartofseriesJournal of Parallel and Distributed Computing;Vol. 122
dc.relation.publisherversionhttps://www.sciencedirect.com/science/article/pii/S0743731518304544?via%3Dihubpt_PT
dc.subjectHigh performance computingpt_PT
dc.subjectCrowdsourcingpt_PT
dc.subjectRecommender systemspt_PT
dc.subjectBig datapt_PT
dc.subjectData analyticspt_PT
dc.subjectParallel processingpt_PT
dc.subjectDistributed computingpt_PT
dc.subjectSmart tourismpt_PT
dc.titleScalable data analytics using crowdsourced repositories and streamspt_PT
dc.typejournal article
dspace.entity.typePublication
oaire.citation.endPage10pt_PT
oaire.citation.startPage1pt_PT
oaire.citation.titleJournal of Parallel and Distributed Computingpt_PT
oaire.citation.volume122pt_PT
person.familyNameBENEDITA CAMPOS NEVES MALHEIRO
person.givenNameMARIA
person.identifier.ciencia-id7A15-08FC-4430
person.identifier.orcid0000-0001-9083-4292
rcaap.rightsclosedAccesspt_PT
rcaap.typearticlept_PT
relation.isAuthorOfPublicationbabd4fda-654a-4b59-952d-6113eebbb308
relation.isAuthorOfPublication.latestForDiscoverybabd4fda-654a-4b59-952d-6113eebbb308

Files

Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
ART_LSA_BeneditaMaleiro_ 2018.pdf
Size:
927.99 KB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: