Repository logo
 
No Thumbnail Available
Publication

Scalable data analytics using crowdsourced repositories and streams

Use this identifier to reference this record.
Name:Description:Size:Format: 
ART_LSA_BeneditaMaleiro_ 2018.pdf927.99 KBAdobe PDF Download

Advisor(s)

Abstract(s)

The scalable analysis of crowdsourced data repositories and streams has quickly become a critical experimental asset in multiple fields. It enables the systematic aggregation of otherwise disperse data sources and their efficient processing using significant amounts of computational resources. However, the considerable amount of crowdsourced social data and the numerous criteria to observe can limit analytical off-line and on-line processing due to the intrinsic computational complexity. This paper demonstrates the efficient parallelisation of profiling and recommendation algorithms using tourism crowdsourced data repositories and streams. Using the Yelp data set for restaurants, we have explored two different profiling approaches: entity-based and feature-based using ratings, comments, and location. Concerning recommendation, we use a collaborative recommendation filter employing singular value decomposition with stochastic gradient descent (SVD-SGD). To accurately compute the final recommendations, we have applied post-recommendation filters based on venue suitability, value for money, and sentiment. Additionally, we have built a social graph for enrichment. Our master–worker implementation shows super-linear scalability for 10, 20, 30, 40, 50, and 60 concurrent instances.

Description

Keywords

High performance computing Crowdsourcing Recommender systems Big data Data analytics Parallel processing Distributed computing Smart tourism

Citation

Research Projects

Organizational Units

Journal Issue

Publisher

Elsevier

CC License

Altmetrics