Sarkhell Sirwan NAWROLY1*, Decebal POPESCU1, Mariya Celin THEKEKARA ANTONY2, Actlin Jeeva MUTHU PHILOMINAL3
1 Faculty of Automatic Control and Computer Science, University Politehnica of Bucharest, 060042, Romania
firstname.lastname@example.org (*Corresponding author), email@example.com
2 Department of Electronics and Communication Engineering, Vel Tech Rangarajan Dr. Sagunthala R&D Institute of
Science and Technology, Avadi, 600062, Tamil Nadu, India
3 Department of Electronics and Communication Engineering, Sri Sivasubramaniya Nadar College of Engineering,
Kelambakkam, 603110, Tamil Nadu, India
Abstract: With the recent advances in Automatic Speech Recognition systems, the lifestyle of normal people has become more convenient. However, for a population like the speech disordered community, the efficiency or the use of such ASR systems is very limited because these ASR systems are not trained or modelled with speech data pertaining to medically impaired people. The difficulty in training such ASR systems lies in the poor availability of data. To handle this issue, an approach like data augmentation for dysarthric speech recognition was analyzed in this paper. Noise is a source that is freely available in abundance. In speech recognition, noise has been used for developing a robust ASR system. This paper focuses on using noise as a source for data augmentation for increasing the number of dysarthric speech samples and improving the performance of speech recognition systems. The core idea behind this research work is that when a sound is combined or enhanced with another sound, its impact is noticeable only if both sounds have the same frequency range. Therefore, understanding the characteristics of each noise sample and adding them appropriately to the dysarthric speech data to create new samples of dysarthric speech data is the proposed method for increasing the number of dysarthric speech examples. Initially, noise samples were selected that do not affect the dysarthric speech frequency range. At a particular signal-to-noise ratio (SNR) the noise-augmented dysarthric speech examples were then used for training dysarthric speech recognition systems by employing hybrid DNN-HMM-based systems for isolated dysarthric speech examples. After noise selection-based data augmentation, it was observed that the word error rate (WER) was reduced by 7% for all the categories of dysarthric speakers in comparison with the WER for the ASR system trained without data augmentation. Since this approach used low-frequency noises as a source for data augmentation, the number of augmented examples was not restricted to a limit; the higher the number of low-frequency noises within a selective SNR range, the better the augmented examples. Further on, this approach used the selected dysarthric speech examples for augmentation, making the augmented examples not lose the dysarthric speakers’ identities.
Keywords: Data augmentation, Noise selection, SNR-based, Dysarthria, Speech recognition.
>>FULL TEXT: PDF
CITE THIS PAPER AS:
Sarkhell Sirwan NAWROLY, Decebal POPESCU, Mariya Celin THEKEKARA ANTONY, Actlin Jeeva MUTHU PHILOMINAL, SNR-Selection-Based-Data Augmentation for Dysarthric Speech Recognition, Studies in Informatics and Control, ISSN 1220-1766, vol. 32(4), pp. 129-140, 2023. https://doi.org/10.24846/v32i4y202312