Repository logo
 

ESEIG - FE - Comunicações em eventos científicos

Permanent URI for this collection

Browse

Recent Submissions

Now showing 1 - 10 of 10
  • Adaptive filtering for high quality HMM based speech synthesis
    Publication . Coelho, Luís; Braga, Daniela
    In this work an adaptive filtering scheme based on a dual Discrete Kalman Filtering (DKF) is proposed for Hidden Markov Model (HMM) based speech synthesis quality enhancement. The objective is to improve signal smoothness across HMMs and their related states and to reduce artifacts due to acoustic model's limitations. Both speech and artifacts are modelled by an autoregressive structure which provides an underlying time frame dependency and improves time-frequency resolution. Themodel parameters are arranged to obtain a combined state-space model and are also used to calculate instantaneous power spectral density estimates. The quality enhancement is performed by a dual discrete Kalman filter that simultaneously gives estimates for the models and the signals. The system's performance has been evaluated using mean opinion score tests and the proposed technique has led to improved results.
  • An automatic voice pleasantness classification system based on prosodic and acoustic patterns of voice preference
    Publication . Coelho, Luis; Braga, Daniela; Sales-Dias, Miguel; Garcia-Mateo, Carmen
    In the last few years the number of systems and devices that use voice based interaction has grown significantly. For a continued use of these systems the interface must be reliable and pleasant in order to provide an optimal user experience. However there are currently very few studies that try to evaluate how good is a voice when the application is a speech based interface. In this paper we present a new automatic voice pleasantness classification system based on prosodic and acoustic patterns of voice preference. Our study is based on a multi-language database composed by female voices. In the objective performance evaluation the system achieved a 7.3% error rate.
  • Kalman tracking linear predictor for vowel intelligibility enhancement on european portuguese HMM based speech synthesis
    Publication . Coelho, Luís; Braga, Daniela; Garcia-Mateo, Carmen
    The recent developments on Hidden Markov Models (HMM) based speech synthesis showed that this is a promising technology fully capable of competing with other established techniques. However some issues still lack a solution. Several authors report an over-smoothing phenomenon on both time and frequencies which decreases naturalness and sometimes intelligibility. In this work we present a new vowel intelligibility enhancement algorithm that uses a discrete Kalman filter (DKF) for tracking frame based parameters. The inter-frame correlations are modelled by an autoregressive structure which provides an underlying time frame dependency and can improve time-frequency resolution. The system’s performance has been evaluated using objective and subjective tests and the proposed methodology has led to improved results.
  • Automatic syllabification for danish text-to-speech systems
    Publication . Beck, Jeppe; Braga, Daniela; Nogueira, João; Sales-Dias, Miguel; Coelho, Luís
    In this paper, a rule-based automatic syllabifier for Danish is described using the Maximal Onset Principle. Prior success rates of rule-based methods applied to Portuguese and Catalan syllabification modules were on the basis of this work. The system was implemented and tested using a very small set of rules. The results gave rise to 96.9% and 98.7% of word accuracy rate, contrary to our initial expectations, being Danish a language with a complex syllabic structure and thus difficult to be rule-driven. Comparison with data-driven syllabification system using artificial neural networks showed a higher accuracy rate of the former system.
  • Speech as the basic interface for assistive technology
    Publication . Teixeira, António; Braga, Daniela; Coelho, Luis; Fonseca, José Alberto; Alvarelhão, Joaquim; Martim, Inácio; Queirós, Alexandra; Rocha, Nelson; Calado, António; Sales-Dias, Miguel
    Speech interfaces for Assistive Technologies are not common and are usually replaced by others. The market they are targeting is not considered attractive and speech technologies are still not well spread. Industry still thinks they present some performance risks, especially Speech Recognition systems. As speech is the most elemental and natural way for communication, it has strong potential for enhancing inclusion and quality of life for broader groups of users with special needs, such as people with cerebral palsy and elderly staying at their homes. This work is a position paper in which the authors argue for the need to make speech become the basic interface in assistive technologies. Among the main arguments, we can state: speech is the easiest way to interact with machines; there is a growing market for embedded speech in assistive technologies, since the number of disabled and elderly people is expanding; speech technology is already mature to be used but needs adaptation to people with special needs; there is still a lot of R&D to be done in this area, especially when thinking about the Portuguese market. The main challenges are presented and future directions are proposed.
  • ezGo: A voice operated wheelchair with biosignal monitoring for home environments
    Publication . Coelho, Luis; Braga, Daniela
    In this paper we present ezGo, an electric powered wheelchair with a speech based interface and biosignal monitoring instrumentation. The user can use the voice, a natural communication method, for controlling the chair movement and obtain information about his health. Additionally a set of semi-autonomous modes with macro recording enable the execution of navigation tasks with little effort and improved precision. The main purpose of the system is to provide severely disabled persons with an assistive device that can improve their confidence and daily independence. The obtained results on usability tests showed that users consider ezGo a valuable help on their daily tasks and a very desirable addition to standard wheelchairs.
  • Adaptive modeling and high quality spectral estimation for speech enhancement
    Publication . Coelho, Luis; Braga, Daniela
    In this work an adaptive modeling and spectral estimation scheme based on a dual Discrete Kalman Filtering (DKF) is proposed for speech enhancement. Both speech and noise signals are modeled by an autoregressive structure which provides an underlying time frame dependency and improves time-frequency resolution. The model parameters are arranged to obtain a combined state-space model and are also used to calculate instantaneous power spectral density estimates. The speech enhancement is performed by a dual discrete Kalman filter that simultaneously gives estimates for the models and the signals. This approach is particularly useful as a pre-processing module for parametric based speech recognition systems that rely on spectral time dependent models. The system performance has been evaluated by a set of human listeners and by spectral distances. In both cases the use of this pre-processing module has led to improved results.
  • Homograph ambiguity resolution in front-end design for portuguese TTS systems
    Publication . Braga, Daniela; Coelho, Luis; Resende Jr., Fernando Gil V.
    In this paper, a module for homograph disambiguation in Portuguese Text-to-Speech (TTS) is proposed. This module works with a part-of-speech (POS) parser, used to disambiguate homographs that belong to different parts-of-speech, and a semantic analyzer, used to disambiguate homographs which belong to the same part-of-speech. The proposed algorithms are meant to solve a significant part of homograph ambiguity in European Portuguese (EP) (106 homograph pairs so far). This system is ready to be integrated in a Letter-to-Sound (LTS) converter. The algorithms were trained and tested with different corpora. The obtained experimental results gave rise to 97.8% of accuracy rate. This methodology is also valid for Brazilian Portuguese (BP), since 95 homographs pairs are exactly the same as in EP. A comparison with a probabilistic approach was also done and results were discussed.
  • A rule-based grapheme-to-phone converter for TTS systems in european portuguese
    Publication . Braga, Daniela; Coelho, Luis; Vianna Resende, Fernando Gil
    In this paper, a linguistically rule-based grapheme-to-phone (G2P) transcription algorithm is described for European Portuguese. A complete set of phonological and phonetic transcription rules regarding the European Portuguese standard variety is presented. This algorithm was implemented and tested by using online newspaper articles. The obtained experimental results gave rise to 98.80% of accuracy rate. Future developments in order to increase this value are foreseen. Our purpose with this work is to develop a module/ tool that can improve synthetic speech naturalness in European Portuguese. Other applications of this system can be expected like language teaching/learning. These results, together with our perspectives of future improvements, have proved the dramatic importance of linguistic knowledge on the development of Text-to-Speech systems (TTS).
  • CardioML: integrating personal cardiac information for ubiquous diagnosis and analysis
    Publication . Coelho, Luis; Queirós, Ricardo
    The latest medical diagnosis devices enable the performance of e-diagnosis making the access to these services easier, faster and available in remote areas. However this imposes new communications and data interchange challenges. In this paper a new XML based format for storing cardiac signals and related information is presented. The proposed structure encompasses data acquisition devices, patient information, data description, pathological diagnosis and waveform annotation. When compared with similar purpose formats several advantages arise. Besides the full integrated data model it may also be noted the available geographical references for e-diagnosis, the multi stream data description, the ability to handle several simultaneous devices, the possibility of independent waveform annotation and a HL7 compliant structure for common contents. These features represent an enhanced integration with existent systems and an improved flexibility for cardiac data representation.