Repository logo
 

Search Results

Now showing 1 - 7 of 7
  • Adaptive modeling and high quality spectral estimation for speech enhancement
    Publication . Coelho, Luis; Braga, Daniela
    In this work an adaptive modeling and spectral estimation scheme based on a dual Discrete Kalman Filtering (DKF) is proposed for speech enhancement. Both speech and noise signals are modeled by an autoregressive structure which provides an underlying time frame dependency and improves time-frequency resolution. The model parameters are arranged to obtain a combined state-space model and are also used to calculate instantaneous power spectral density estimates. The speech enhancement is performed by a dual discrete Kalman filter that simultaneously gives estimates for the models and the signals. This approach is particularly useful as a pre-processing module for parametric based speech recognition systems that rely on spectral time dependent models. The system performance has been evaluated by a set of human listeners and by spectral distances. In both cases the use of this pre-processing module has led to improved results.
  • An automatic voice pleasantness classification system based on prosodic and acoustic patterns of voice preference
    Publication . Coelho, Luis; Braga, Daniela; Sales-Dias, Miguel; Garcia-Mateo, Carmen
    In the last few years the number of systems and devices that use voice based interaction has grown significantly. For a continued use of these systems the interface must be reliable and pleasant in order to provide an optimal user experience. However there are currently very few studies that try to evaluate how good is a voice when the application is a speech based interface. In this paper we present a new automatic voice pleasantness classification system based on prosodic and acoustic patterns of voice preference. Our study is based on a multi-language database composed by female voices. In the objective performance evaluation the system achieved a 7.3% error rate.
  • ezGo: A voice operated wheelchair with biosignal monitoring for home environments
    Publication . Coelho, Luis; Braga, Daniela
    In this paper we present ezGo, an electric powered wheelchair with a speech based interface and biosignal monitoring instrumentation. The user can use the voice, a natural communication method, for controlling the chair movement and obtain information about his health. Additionally a set of semi-autonomous modes with macro recording enable the execution of navigation tasks with little effort and improved precision. The main purpose of the system is to provide severely disabled persons with an assistive device that can improve their confidence and daily independence. The obtained results on usability tests showed that users consider ezGo a valuable help on their daily tasks and a very desirable addition to standard wheelchairs.
  • Homograph ambiguity resolution in front-end design for portuguese TTS systems
    Publication . Braga, Daniela; Coelho, Luis; Resende Jr., Fernando Gil V.
    In this paper, a module for homograph disambiguation in Portuguese Text-to-Speech (TTS) is proposed. This module works with a part-of-speech (POS) parser, used to disambiguate homographs that belong to different parts-of-speech, and a semantic analyzer, used to disambiguate homographs which belong to the same part-of-speech. The proposed algorithms are meant to solve a significant part of homograph ambiguity in European Portuguese (EP) (106 homograph pairs so far). This system is ready to be integrated in a Letter-to-Sound (LTS) converter. The algorithms were trained and tested with different corpora. The obtained experimental results gave rise to 97.8% of accuracy rate. This methodology is also valid for Brazilian Portuguese (BP), since 95 homographs pairs are exactly the same as in EP. A comparison with a probabilistic approach was also done and results were discussed.
  • Speech as the basic interface for assistive technology
    Publication . Teixeira, António; Braga, Daniela; Coelho, Luis; Fonseca, José Alberto; Alvarelhão, Joaquim; Martim, Inácio; Queirós, Alexandra; Rocha, Nelson; Calado, António; Sales-Dias, Miguel
    Speech interfaces for Assistive Technologies are not common and are usually replaced by others. The market they are targeting is not considered attractive and speech technologies are still not well spread. Industry still thinks they present some performance risks, especially Speech Recognition systems. As speech is the most elemental and natural way for communication, it has strong potential for enhancing inclusion and quality of life for broader groups of users with special needs, such as people with cerebral palsy and elderly staying at their homes. This work is a position paper in which the authors argue for the need to make speech become the basic interface in assistive technologies. Among the main arguments, we can state: speech is the easiest way to interact with machines; there is a growing market for embedded speech in assistive technologies, since the number of disabled and elderly people is expanding; speech technology is already mature to be used but needs adaptation to people with special needs; there is still a lot of R&D to be done in this area, especially when thinking about the Portuguese market. The main challenges are presented and future directions are proposed.
  • CardioML: integrating personal cardiac information for ubiquous diagnosis and analysis
    Publication . Coelho, Luis; Queirós, Ricardo
    The latest medical diagnosis devices enable the performance of e-diagnosis making the access to these services easier, faster and available in remote areas. However this imposes new communications and data interchange challenges. In this paper a new XML based format for storing cardiac signals and related information is presented. The proposed structure encompasses data acquisition devices, patient information, data description, pathological diagnosis and waveform annotation. When compared with similar purpose formats several advantages arise. Besides the full integrated data model it may also be noted the available geographical references for e-diagnosis, the multi stream data description, the ability to handle several simultaneous devices, the possibility of independent waveform annotation and a HL7 compliant structure for common contents. These features represent an enhanced integration with existent systems and an improved flexibility for cardiac data representation.
  • A rule-based grapheme-to-phone converter for TTS systems in european portuguese
    Publication . Braga, Daniela; Coelho, Luis; Vianna Resende, Fernando Gil
    In this paper, a linguistically rule-based grapheme-to-phone (G2P) transcription algorithm is described for European Portuguese. A complete set of phonological and phonetic transcription rules regarding the European Portuguese standard variety is presented. This algorithm was implemented and tested by using online newspaper articles. The obtained experimental results gave rise to 98.80% of accuracy rate. Future developments in order to increase this value are foreseen. Our purpose with this work is to develop a module/ tool that can improve synthetic speech naturalness in European Portuguese. Other applications of this system can be expected like language teaching/learning. These results, together with our perspectives of future improvements, have proved the dramatic importance of linguistic knowledge on the development of Text-to-Speech systems (TTS).