Speaker-Dependent Speech Recognition For Kambaatissa  Language: Using Hidden Markov Models (Hmms)

Senbetu Abebe; Getachew Mamo; Admas Abitew

Speaker-Dependent Speech Recognition For Kambaatissa Language: Using Hidden Markov Models (Hmms)

Senbetu Abebe; Getachew Mamo; Admas Abitew

URI: https://repository.ju.edu.et//handle/123456789/8650

Date: 2023-07-13

Abstract:

In today's technologically advanced world, speech recognition systems have gained significant importance in various applications. These systems are designed to convert spoken language into written text. Several speech recognition systems exist for major global languages with a large user base. However, a specific speech recognition model is lacking for the Kambaatissa language, which is only spoken by a small community in a particular area of Ethiopia. Moreover, the lack of suitable speech recognition tools hampers communication and access to technology for native Kambaatissa speakers. This limits their participation in digital platforms, information retrieval, and other language-dependent services. By leveraging HMM, this study aimed to bridge the gap by developing a speaker-dependent speech recognition system specifically tailored to the Kambaatissa language community. Such a system can aid in creating language resources, dictionaries, and educational materials for future generations. A voice corpus was generated using 4820 distinct words that were selected after contacting subject matter experts and recorded twice for a total of words in audio from four speakers, with male and female sounds. This corpus was divided into four vocabulary sets, each recorded with four different native speakers of the Kambaatissa language. In the first experiment, from the initial vocabulary set with male sounds, 844 words were used for training the model and 361 words were used to test the model's performance, and the results were WER = 1.1% and WAR = 98.9% using 8 states and 10 observables. In the second experiment, from the second vocabulary set with female sounds, 844 words were used for training the model and 361 words were used to test the model's performance, and the results were WER = 0.8% and WAR = 99.2% using 8 states and 10 observables. The third and fourth experiments used the third and fourth vocabulary sets, each with 844 unique words for training and 361 for testing with male and female sounds, respectively, using 10 states and 12 observables. The performance of the recognition for the third experiment was WER = 0.41 and WAR = 99.59, while the fourth experiment was WER = 0.37 and WAR = 99.63.63. The average WER was 0.67, and the average WAR was 99.33, which shows good performance. And it was concluded that using HMMs with a greater number of states and observables gets high results

Show full item record