Monday , July 16 2018

Multilingual Text-to-Speech Software Component for
Dynamic Language Identification and Voice Switching


1 BAUM Engineering,
8, Str. Traian Moşoiu, Arad 310175, Romania

* Corresponding author

2 I C I Bucharest
(National Institute for R & D in Informatics)

8-10 Averescu Blvd.
011455 Bucharest 1, Romania

Abstract: Text-to-speech synthesis is a critical feature of the applications developed for people with visual or reading disabilities. In the last years there has been an increasing interest in multilingual text-to-speech synthesis, which requires multilingual text analysis and language specific speech synthesis. In this case, the dynamic switching of the synthetic voice is needed in order to enhance the usability and user experience. This paper aims at presenting a software component for multilingual text-to-speech synthesis. The software has been developed and tested in four steps: alpha version (proof-of-concept), functional version (beta), commercial version, and implementation. The beta testing results showed a high accuracy of the language detection algorithms, which perform properly on texts having a variable degree of fragmentation. The commercial version has been then successfully implemented in two applications for visually impaired people: an automatic reading machine and a personal organizer for the blind and visually impaired users. Both implementations have been tested with users for usability and acceptance. The evaluation results showed that a device with this component is easier to use by visually impaired people.

Keywords: multilingual text-to-speech, dynamic language identification, voice switching, accessibility, assistive technologies, visually impaired users, usability.

1. Introduction

Text-to-speech (TTS) means converting a text written in a given language into speech signals. The TTS synthesis is typically done by using a synthetic voice that is available for one specific language.

The text-to-speech synthesis is a critical feature of the assistive technology applications that are developed for people with visual or reading disabilities (dyslectic or illiterate), in order to make the electronic documents accessible for them. There are many examples of such assistive technologies: the automatic reading machines, screen readers, portable computers with voice interface, and Braille displays.

In recent years, there has been a growing interest in the development of applications that are able to process texts written in two or more languages. Two examples of application areas that need a multilingual text-to-speech synthesis are: the education for all and the multi-cultural contexts (Carlson et al., 1990), Udvari-Solner & Thousand, 1996; Hasselbring & Glaser, 2000; Turunen & Hakulinen, 2001; Feraru et al., 2010; Bourlard et al., 2011; Tripathi & Shukla, 2014; Van Laere et al., 2016).

Multilingual (polyglot) text-to-speech synthesis requires dynamic language identification. The algorithms are different than those used for automatic language identification, since in this case the recognition process is performed asynchronous, on a continuous text stream.

If the text is written in more than one language, at each language change the user has to manually change the corresponding synthetic voice. Changing the synthetic voice during a lecture is uncomfortable for a user. Therefore, another requirement for a multilingual TTS is to automatically select and switch the available voice for the document language.

In this paper, a multilingual text-to-speech software component capable of performing both dynamic language identification and synthetic voice switching is presented. The objective of this paper is to integrate the previous contributions (Fogarassy-Neszly & Gherhes, 2014; Fogarassy-Neszly et al., 2015) into a comprehensive framework.

The software component has been developed in the framework of the innovation project iT2V.



