Abstract:
In today's technologically advanced world, speech recognition systems have gained significant
importance in various applications. These systems are designed to convert spoken language into
written text. Several speech recognition systems exist for major global languages with a large
user base. However, a specific speech recognition model is lacking for the Kambaatissa language,
which is only spoken by a small community in a particular area of Ethiopia. Moreover, the lack of
suitable speech recognition tools hampers communication and access to technology for native
Kambaatissa speakers. This limits their participation in digital platforms, information retrieval,
and other language-dependent services. By leveraging HMM, this study aimed to bridge the gap
by developing a speaker-dependent speech recognition system specifically tailored to the
Kambaatissa language community. Such a system can aid in creating language resources,
dictionaries, and educational materials for future generations. A voice corpus was generated using
4820 distinct words that were selected after contacting subject matter experts and recorded twice
for a total of words in audio from four speakers, with male and female sounds. This corpus was
divided into four vocabulary sets, each recorded with four different native speakers of the
Kambaatissa language. In the first experiment, from the initial vocabulary set with male sounds,
844 words were used for training the model and 361 words were used to test the model's
performance, and the results were WER = 1.1% and WAR = 98.9% using 8 states and 10
observables. In the second experiment, from the second vocabulary set with female sounds, 844
words were used for training the model and 361 words were used to test the model's performance,
and the results were WER = 0.8% and WAR = 99.2% using 8 states and 10 observables. The third
and fourth experiments used the third and fourth vocabulary sets, each with 844 unique words for
training and 361 for testing with male and female sounds, respectively, using 10 states and 12
observables. The performance of the recognition for the third experiment was WER = 0.41 and
WAR = 99.59, while the fourth experiment was WER = 0.37 and WAR = 99.63.63. The average
WER was 0.67, and the average WAR was 99.33, which shows good performance. And it was
concluded that using HMMs with a greater number of states and observables gets high results