Repository logo
 

Search Results

Now showing 1 - 10 of 11
  • Improving word embeddings in Portuguese: increasing accuracy while reducing the size of the corpus
    Publication . Pinto, José Pedro; Viana, Paula; Teixeira, Inês; Andrade, Maria
    The subjectiveness of multimedia content description has a strong negative impact on tag-based information retrieval. In our work, we propose enhancing available descriptions by adding semantically related tags. To cope with this objective, we use a word embedding technique based on the Word2Vec neural network parameterized and trained using a new dataset built from online newspapers. A large number of news stories was scraped and pre-processed to build a new dataset. Our target language is Portuguese, one of the most spoken languages worldwide. The results achieved significantly outperform similar existing solutions developed in the scope of different languages, including Portuguese. Contributions include also an online application and API available for external use. Although the presented work has been designed to enhance multimedia content annotation, it can be used in several other application areas.
  • A Review of Recent Advances and Challenges in Grocery Label Detection and Recognition
    Publication . Guimarães, Vânia; Nascimento, Jéssica; Viana, Paula; Carvalho, Pedro
    When compared with traditional local shops where the customer has a personalised service, in large retail departments, the client has to make his purchase decisions independently, mostly supported by the information available in the package. Additionally, people are becoming more aware of the importance of the food ingredients and demanding about the type of products they buy and the information provided in the package, despite it often being hard to interpret. Big shops such as supermarkets have also introduced important challenges for the retailer due to the large number of different products in the store, heterogeneous affluence and the daily needs of item repositioning. In this scenario, the automatic detection and recognition of products on the shelves or off the shelves has gained increased interest as the application of these technologies may improve the shopping experience through self-assisted shopping apps and autonomous shopping, or even benefit stock management with real-time inventory, automatic shelf monitoring and product tracking. These solutions can also have an important impact on customers with visual impairments. Despite recent developments in computer vision, automatic grocery product recognition is still very challenging, with most works focusing on the detection or recognition of a small number of products, often under controlled conditions. This paper discusses the challenges related to this problem and presents a review of proposed methods for retail product label processing, with a special focus on assisted analysis for customer support, including for the visually impaired. Moreover, it details the public datasets used in this topic and identifies their limitations, and discusses future research directions of related fields.
  • Deep Learning Approach for Seamless Navigation in Multi-View Streaming Applications
    Publication . Costa, Tiago S.; Viana, Paula; Andrade, Maria Teresa
    Quality of Experience (QoE) in multi-view streaming systems is known to be severely affected by the latency associated with view-switching procedures. Anticipating the navigation intentions of the viewer on the multi-view scene could provide the means to greatly reduce such latency. The research work presented in this article builds on this premise by proposing a new predictive view-selection mechanism. A VGG16-inspired Convolutional Neural Network (CNN) is used to identify the viewer’s focus of attention and determine which views would be most suited to be presented in the brief term, i.e., the near-term viewing intentions. This way, those views can be locally buffered before they are actually needed. To this aim, two datasets were used to evaluate the prediction performance and impact on latency, in particular when compared to the solution implemented in the previous version of our multi-view streaming system. Results obtained with this work translate into a generalized improvement in perceived QoE. A significant reduction in latency during view-switching procedures was effectively achieved. Moreover, results also demonstrated that the prediction of the user’s visual interest was achieved with a high level of accuracy. An experimental platform was also established on which future predictive models can be integrated and compared with previously implemented models.
  • Symbolic music generation conditioned on continuous-valued emotions
    Publication . Sulun, Serkan; Davies, Matthew E. P.; Viana, Paula
    In this paper we present a new approach for the generation of multi-instrument symbolic music driven by musical emotion. The principal novelty of our approach centres on conditioning a state-of-the-art transformer based on continuous-valued valence and arousal labels. In addition, we provide a new large-scale dataset of symbolic music paired with emotion labels in terms of valence and arousal. We evaluate our approach in a quantitative manner in two ways, first by measuring its note prediction accuracy, and second via a regression task in the valence-arousal plane. Our results demonstrate that our proposed approaches outperform conditioning using control tokens which is representative of the current state of the art.
  • From a Visual Scene to a Virtual Representation: A Cross-Domain Review
    Publication . Pereira, Américo; Carvalho, Pedro; Pereira, Nuno; Viana, Paula; Côrte-Real, Luís
    The widespread use of smartphones and other low-cost equipment as recording devices, the massive growth in bandwidth, and the ever-growing demand for new applications with enhanced capabilities, made visual data a must in several scenarios, including surveillance, sports, retail, entertainment, and intelligent vehicles. Despite significant advances in analyzing and extracting data from images and video, there is a lack of solutions able to analyze and semantically describe the information in the visual scene so that it can be efficiently used and repurposed. Scientific contributions have focused on individual aspects or addressing specific problems and application areas, and no cross-domain solution is available to implement a complete system that enables information passing between cross-cutting algorithms. This paper analyses the problem from an end-to-end perspective, i.e., from the visual scene analysis to the representation of information in a virtual environment, including how the extracted data can be described and stored. A simple processing pipeline is introduced to set up a structure for discussing challenges and opportunities in different steps of the entire process, allowing to identify current gaps in the literature. The work reviews various technologies specifically from the perspective of their applicability to an endto- end pipeline for scene analysis and synthesis, along with an extensive analysis of datasets for relevant tasks.
  • Photo2Video: Semantic-Aware Deep Learning-Based Video Generation from Still Content
    Publication . Viana, Paula; Andrade, Maria Teresa; Carvalho, Pedro; Vilaça, Luis; Teixeira, Inês N.; Costa, Tiago; Jonker, Pieter
    Applying machine learning (ML), and especially deep learning, to understand visual content is becoming common practice in many application areas. However, little attention has been given to its use within the multimedia creative domain. It is true that ML is already popular for content creation, but the progress achieved so far addresses essentially textual content or the identification and selection of specific types of content. A wealth of possibilities are yet to be explored by bringing the use of ML into the multimedia creative process, allowing the knowledge inferred by the former to influence automatically how new multimedia content is created. The work presented in this article provides contributions in three distinct ways towards this goal: firstly, it proposes a methodology to re-train popular neural network models in identifying new thematic concepts in static visual content and attaching meaningful annotations to the detected regions of interest; secondly, it presents varied visual digital effects and corresponding tools that can be automatically called upon to apply such effects in a previously analyzed photo; thirdly, it defines a complete automated creative workflow, from the acquisition of a photograph and corresponding contextual data, through the ML region-based annotation, to the automatic application of digital effects and generation of a semantically aware multimedia story driven by the previously derived situational and visual contextual data. Additionally, it presents a variant of this automated workflow by offering to the user the possibility of manipulating the automatic annotations in an assisted manner. The final aim is to transform a static digital photo into a short video clip, taking into account the information acquired. The final result strongly contrasts with current standard approaches of creating random movements, by implementing an intelligent content- and context-aware video.
  • Data2MV - A user behaviour dataset for multi-view scenarios
    Publication . Soares da Costa, Tiago; Andrade, Maria Teresa; Viana, Paula; Silva, Nuno Castro
    The Data2MV dataset contains gaze fixation data obtained through experimental procedures from a total of 45 partic- ipants using an Intel RealSense F200 camera module and seven different video playlists. Each of the playlists had an approximate duration of 20 minutes and was viewed at least 17 times, with raw tracking data being recorded with a 0.05 second interval. The Data2MV dataset encompasses a total of 1.0 0 0.845 gaze fixations, gathered across a total of 128 exper- iments. It is also composed of 68.393 image frames, extracted from each of the 6 videos selected for these experiments, and an equal quantity of saliency maps, generated from aggregate fixation data. Software tools to obtain saliency maps and generate complementary plots are also provided as an open- source software package. The Data2MV dataset was publicly released to the research community on Mendeley Data and constitutes an important contribution to reduce the current scarcity of such data, particularly in immersive, multi-view streaming scenarios.
  • Video Description Schemes in Broadcasting
    Publication . Viana, Paula
    The large amount of information in audiovisual archives makes it quite difficult to efficiently locate a resource for re-use or re-purposing. In response to the needs of industries and users to solve this problem, different organisations have recently initiated active work in the definition of interoperable frameworks and representation for metadata. This paper presents recommendations given by user and standardisation organisations and addresses some of the main metadata initiatives that are relevant to broadcasting. It also presents some proposals to enable the interoperability between the different solutions.
  • Consumer Attitudes toward News Delivering: An Experimental Evaluation of the Use and Efficacy of Personalized Recommendations
    Publication . Viana, Paula; Soares, Márcio; Gaio, Rita; Correia, Amilcar
    This paper presents an experiment on newsreaders’ behavior and preferences on the interaction with online personalized news. Different recommendation approaches, based on consumption profiles and user location, and the impact of personalized news on several aspects of consumer decision-making are examined on a group of volunteers. Results show a significant preference for reading recommended news over other news presented on the screen, regardless of the chosen editorial layout. In addition, the study also provides support for the creation of profiles taking into consideration the evolution of user’s interests. The proposed solution is valid for users with different reading habits and can be successfully applied even to users with small consumption history. Our findings can be used by news providers to improve online services, thus increasing readers’ perceived satisfaction.
  • A Machine Learning App for Monitoring Physical Therapy at Home
    Publication . Pereira, Bruno; Cunha, Bruno; Viana, Paula; Lopes, Maria; Melo, Ana S. C.; Sousa, Andreia S. P.
    Shoulder rehabilitation is a process that requires physical therapy sessions to recover the mobility of the affected limbs. However, these sessions are often limited by the availability and cost of specialized technicians, as well as the patient’s travel to the session locations. This paper presents a novel smartphone-based approach using a pose estimation algorithm to evaluate the quality of the movements and provide feedback, allowing patients to perform autonomous recovery sessions. This paper reviews the state of the art in wearable devices and camera-based systems for human body detection and rehabilitation support and describes the system developed, which uses MediaPipe to extract the coordinates of 33 key points on the patient’s body and compares them with reference videos made by professional physiotherapists using cosine similarity and dynamic time warping. This paper also presents a clinical study that uses QTM, an optoelectronic system for motion capture, to validate the methods used by the smartphone application. The results show that there are statistically significant differences between the three methods for different exercises, highlighting the importance of selecting an appropriate method for specific exercises. This paper discusses the implications and limitations of the findings and suggests directions for future research.