Improving Speech Recognition Rate through Analysis Parameters

Deividas Eringis; Gintautas Tamulevičius

Improving Speech Recognition Rate through Analysis Parameters

2014
Deividas Eringis, Gintautas Tamulevičius

Speech signal is redundant and non-stationary by nature. Because of vocal tract inertness these variations are not very rapid and the signal can be considered as stationary in short segments. It is presumed that in short-time magnitude spectrum the most distinct information of speech is contained. This is the main reason for speech signal analysis in frame-by-frame manner. The analyzed speech signal is segmented into overlapping segments (so-called frames) for this purpose. Segments of 15–25 ms with the overlap of 10–15 ms are used usually. In this paper we present results of our investigation of analysis window length and frame shift influence on speech recognition rate. We have analyzed three different cepstral analysis approaches for this purpose: mel frequency cepstral analysis (MFCC), linear prediction cepstral analysis (LPCC) and perceptual linear prediction cepstral analysis (PLPC). The highest speech recognition rate was obtained using 10 ms length analysis window with the frame shift varying from 7.5 to 10 ms (regardless of analysis type). The highest increase of recognition rate was 2.5 %.

Keywords
Computers and information processing; Speech analysis; Speech recognition; Speech enhancement
DOI
10.2478/ecce-2014-0009

Eringis, D., Tamulevičius, G. Improving Speech Recognition Rate through Analysis Parameters. Electrical, Control and Communication Engineering. Vol.5, 2014, pp.61-66. Available from: doi:10.2478/ecce-2014-0009

Publication language
English (en)