Logo do repositório
 
Miniatura indisponível
Publicação

Scalable data analytics using crowdsourced repositories and streams

Utilize este identificador para referenciar este registo.
Nome:Descrição:Tamanho:Formato: 
ART_LSA_BeneditaMaleiro_ 2018.pdf927.99 KBAdobe PDF Ver/Abrir

Orientador(es)

Resumo(s)

The scalable analysis of crowdsourced data repositories and streams has quickly become a critical experimental asset in multiple fields. It enables the systematic aggregation of otherwise disperse data sources and their efficient processing using significant amounts of computational resources. However, the considerable amount of crowdsourced social data and the numerous criteria to observe can limit analytical off-line and on-line processing due to the intrinsic computational complexity. This paper demonstrates the efficient parallelisation of profiling and recommendation algorithms using tourism crowdsourced data repositories and streams. Using the Yelp data set for restaurants, we have explored two different profiling approaches: entity-based and feature-based using ratings, comments, and location. Concerning recommendation, we use a collaborative recommendation filter employing singular value decomposition with stochastic gradient descent (SVD-SGD). To accurately compute the final recommendations, we have applied post-recommendation filters based on venue suitability, value for money, and sentiment. Additionally, we have built a social graph for enrichment. Our master–worker implementation shows super-linear scalability for 10, 20, 30, 40, 50, and 60 concurrent instances.

Descrição

Palavras-chave

High performance computing Crowdsourcing Recommender systems Big data Data analytics Parallel processing Distributed computing Smart tourism

Contexto Educativo

Citação

Projetos de investigação

Unidades organizacionais

Fascículo

Editora

Elsevier

Licença CC

Métricas Alternativas