Current Issue

Studies in Informatics and Control
Vol. 33, No. 4, 2024

Category-based and Target-based Data Augmentation for Dysarthric Speech Recognition Using Transfer Learning

Sarkhell Sirwan NAWROLY, Decebal POPESCU, Mariya Celin THEKEKARA ANTONY
Abstract

Dysarthric speech recognition poses unique challenges in comparison with normal speech recognition systems due to the scarcity of dysarthric speech data. To address this data sparsity issue, researchers have developed data augmentation techniques. These techniques utilize either the original dysarthric speech examples or speech data pertaining to normal speakers to generate new dysarthric speech data, thereby improving the dysarthric speech recognition performance. This study uses dysarthric speech examples to create augmented examples for training purposes in order to retain the identity of the dysarthric speakers in terms of their speech errors. A two-stage transfer learning strategy is employed, in the first stage of which a category-specific low-frequency noise augmentation method is introduced, while in its second stage a dysarthric speaker-specific data augmentation approach is implemented. The proposed method blends the advantages of various data augmentation approaches in the literature to develop a fine two-stage model that can handle data augmentation without compromising on the quality of the target model. This two-stage approach achieved a notable Word Error Rate (WER) reduction of approximately 11.369%, especially among the severely affected dysarthric speakers, by contrast to the transfer learning method that relies only on normal speech-related data for training.

Keywords

Dysarthric speech recognition, Noise analysis, Transfer learning approach.

View full article