This article introduces a comparison between three different processing techniques for the selection of speech features.
These features can be used for speaker recognition or speech recognition. A comparison between the performance of a system based on the linear prediction code, a system based on the cepstrum and a system based on the short time energy is introduced feature selection is very effective for recognition accuracy. This work illustrates where each of these features are more efficient for speaker recognition or for speech recognition.
The results show that the short time energy in time domain is very effective for epeech recognition where its accuracy is found to be 92%. In speaker identification, the accuracy of identification for the features depending on energy in each frame is found to be 60%. It may be recommended that the features based on the capstrum given accuracy of 94% and 96% for speech recognition and speaker identification respectively. The accuracy of linear prediction code features depend on cepstrum may be recommended for speaker identifier or speech recognition.
A recognition system for spoken digits are given using the above features with neural networks. The neural network has been used as a tool in this comparison.