Sunday , November 18 2018

Multilingual Text-to-Speech Software Component for
Dynamic Language Identification and Voice Switching

Paul FOGARASSY-NESZLY1, Costin PRIBEANU2*

1 BAUM Engineering,
8, Str. Traian Moşoiu, Arad 310175, Romania
pf@baum.ro

* Corresponding author

2 I C I Bucharest
(National Institute for R & D in Informatics)

8-10 Averescu Blvd.
011455 Bucharest 1, Romania
pribeanu@ici.ro

Abstract: Text-to-speech synthesis is a critical feature of the applications developed for people with visual or reading disabilities. In the last years there has been an increasing interest in multilingual text-to-speech synthesis, which requires multilingual text analysis and language specific speech synthesis. In this case, the dynamic switching of the synthetic voice is needed in order to enhance the usability and user experience. This paper aims at presenting a software component for multilingual text-to-speech synthesis. The software has been developed and tested in four steps: alpha version (proof-of-concept), functional version (beta), commercial version, and implementation. The beta testing results showed a high accuracy of the language detection algorithms, which perform properly on texts having a variable degree of fragmentation. The commercial version has been then successfully implemented in two applications for visually impaired people: an automatic reading machine and a personal organizer for the blind and visually impaired users. Both implementations have been tested with users for usability and acceptance. The evaluation results showed that a device with this component is easier to use by visually impaired people.

Keywords: multilingual text-to-speech, dynamic language identification, voice switching, accessibility, assistive technologies, visually impaired users, usability.

>>Full text
CITE THIS PAPER AS:
Paul FOGARASSY-NESZLY, Costin PRIBEANU*,
Multilingual Text-to-Speech Software Component for Dynamic Language Identification and Voice Switching, Studies in Informatics and Control, ISSN 1220-1766, vol. 25(3), pp. 335-342, 2016.

1. Introduction

Text-to-speech (TTS) means converting a text written in a given language into speech signals. The TTS synthesis is typically done by using a synthetic voice that is available for one specific language.

The text-to-speech synthesis is a critical feature of the assistive technology applications that are developed for people with visual or reading disabilities (dyslectic or illiterate), in order to make the electronic documents accessible for them. There are many examples of such assistive technologies: the automatic reading machines, screen readers, portable computers with voice interface, and Braille displays.

In recent years, there has been a growing interest in the development of applications that are able to process texts written in two or more languages. Two examples of application areas that need a multilingual text-to-speech synthesis are: the education for all and the multi-cultural contexts (Carlson et al., 1990), Udvari-Solner & Thousand, 1996; Hasselbring & Glaser, 2000; Turunen & Hakulinen, 2001; Feraru et al., 2010; Bourlard et al., 2011; Tripathi & Shukla, 2014; Van Laere et al., 2016).

Multilingual (polyglot) text-to-speech synthesis requires dynamic language identification. The algorithms are different than those used for automatic language identification, since in this case the recognition process is performed asynchronous, on a continuous text stream.

If the text is written in more than one language, at each language change the user has to manually change the corresponding synthetic voice. Changing the synthetic voice during a lecture is uncomfortable for a user. Therefore, another requirement for a multilingual TTS is to automatically select and switch the available voice for the document language.

In this paper, a multilingual text-to-speech software component capable of performing both dynamic language identification and synthetic voice switching is presented. The objective of this paper is to integrate the previous contributions (Fogarassy-Neszly & Gherhes, 2014; Fogarassy-Neszly et al., 2015) into a comprehensive framework.

The software component has been developed in the framework of the innovation project iT2V.

[…]

REFERENCES

  1. BOURLARD, H., DINES, J., MAGIMAI-DOSS, M., GARNER, P. N., IMSENG, D., MOTLICEK, P., VALENTE, F. Current Trends in Multilingual Speech Processing. Sadhana, 36(5), 2011, 885-915.
  2. CARLSON, R., GRANSTROM, B., HELGASON, P., JENSEN, P., TRAINSSON, H. An Icelanding Text-To-Speech System For The Disabled. STL-QPRS 31(4), 1990, 55-56.
  3. CAVNAR, W., TRENKLE, J. N-Gram-Based Text Categorization. Proceedings of SDAIR-94, 1994, 161-176.
  4. CHEN, C.P., HUANG, Y.C., WU, C.H. & LEE K. D. Polyglot Speech Synthesis Based on Cross-lingual Frame Selection using Auditory and Articulatory Features. Proceedings of IEEE/ACM TASLP 22 (10), 2014, 1558-1570.
  5. DUNNING, T. Statistical Identification of Language. Technical Report MCCS 94-273, New Mexico State University, 1994.
  1. FERARU, S. M., TEODORESCU, H. N., ZBANCIOC, M. D. SroL – Web-based Resources for Languages and Language Technology e-Learning. International Journal of Computers, Communications & Control, 5(3), 2010, 301-313.
  2. FOGARASSY-NESZLY, P., GHERHES, V. Applications for Dynamic Language Identification. Proceedings of RoCHI 2014, Popovici D., M. & Iordache D.D. (Eds.), Constanta, 4-5 Sept., 2014, 51-54.
  3. FOGARASSY-NESZLY, P., ZINVELIU, Z., PRIBEANU, C. A Software Component for Polyglot Text-to-Speech Synthesis: User Interface and Beta Testing Results. Proceedings of RoCHI 2015, Dardala, M., Rebedea, T.E. (Eds.), Bucharest, 24-25 Sept., 2015, 145-148.
  4. FOGARASSY-NESZLY, P., PATRU A, IORDACHE D.D., PRIBEANU C. Implementation of a Polyglot Text-to-Speech Synthesis in Two Assistive Technologies. Proceedings of RoCHI 2016, Iftene, A., Vanderdonckt, J. (Eds.), Iasi, 8-9 September, 2016, in press.
  5. HASSELBRING, T.S., & GLASER, C.H.W. Use of Computer Technology to Help Students with Special Needs. The Future of Children, 2000, 102-122.
  6. LAERE, E. VAN, ROSIERS, K., VAN AVERMAET, P., SLEMBROUCK, S., & VAN BRAAK, J. What Can Technology Offer to Linguistically Diverse Classrooms? Using Multilingual Content in a Computer-based Learning Environment for Primary Education. Journal of Multilingual and Multicultural Development.
  7. LJUBEŠIC, N., MIKELIC, N. & BORAS, D. Language Identification: How to Distinguish Similar Languages. Proceedings of the 29th International Conference on Information Technology Interfaces, 2007, 541–546.
  8. MULYONO, H., VEBRIYANTI, D. N. Developing Native-Like Listening Comprehension Materials Perceptions of a Digital Approach. Journal of ELT Research, 1(1), 2016, 1-20
  9. PRIBEANU, C., FOGARASSY-NESZLY, P. Beta Testing of a Dynamic Language Identification Software Component – Preliminary Results. Revista Romana de Interactiune Om-Calculator 7(3), 2014, 259-272.
  10. RAMANI, B., ACTLIN JEEVA, M.P., VIJAYALAKSMI, P., NAGARAJAN, T. Cross-lingual Voice Conversion-based Polyglot Speech Synthesizer for Indian Languages. Proceedings of INTERSPEECH, 2014, 775-779.
  11. ROMSDORFER, H., PFISTER, B. Text Analysis and Language Identification for Polyglot Text-to-Speech Synthesis. Speech Communication 49, 2007, 697-724.
  12. SHIGA, Y. & KAWAI, H. Multilingual Speech Synthesis System. Journal of the National Institute of Information and Communication Technology 59 (3/4), 2012, 21-28.
  13. STEINBERGER, R., POULIQUEN, B., WIDIGER, A., IGNAT, C., ERJAVEC, T., TUFIS, D., VARGA, D. The JRC-Acquis: A Multilingual Aligned Parallel Corpus with 20+ Languages. Proc. LREC’2006, Genoa, Italy, 2006, 2143- 2146.
  14. TRABER, C., HUBER, K., NEDIR, K., PFISTER, B., KELLER, E., & ZELLNER, B. From Multilingual to Polyglot Speech Synthesis. Proceedings of EUROSPEECH, Budapest, Hungary, 1999, 835–838.
  15. TRIPATHI, M., & SHUKLA, A. Use of Assistive Technologies in Academic Libraries: A Survey. Assistive Technology, 26 (2), 2014, 105-118.
  16. TURUNEN, M., & HAKULINEN, J. Mailman-a Multilingual Speech-only e-mail Client based on an Adaptive Speech Application Framework. Proceedings of Workshop on Multi Lingual Speech Communication – MSC 2000, 2000, 7-12.
  17. UDVARI-SOLNER, A., & THOUSAND, J.S. Creating a Responsive Curriculum for Inclusive Schools. Remedial and Special Education, 17(3), 1996, 182-191.

https://doi.org/10.24846/v25i3y201607