dc.description.abstract |
The speech recognition system sometimes mistakenly taken as voice or speaker recognition system.
However, they are different technologies. Because the speech recognition aims at understanding
and comprehending what was spoken. It is used in hand-free computing, map, or menu navigation.
Whereas the objective of voice or speaker recognition is to recognize who is speaking. It is used
to identify a person by analyzing its tone, voice pitch, and accent. The former system has been
done for different foreign languages. Especially for English language, a number of papers were
produced.
On the other hand, for local languages like Afan Oromo it is still at infant stage. Though Afan
Oromo may benefit from researches conducted on other languages, it also needs its own specific
research since there are many grammatical and syntactical differences between languages. The
thesis explored speech recognition for Afan Oromo and the possibility of its applicability.
In order to ease the way for the thesis, the 29 Afan Oromo and 5 loan phonemes were collected.
Then the phonemes were grouped in 9 sentences which inturn either uttered to computer through
microphone and stored in it or used in creating the sound by praat software and again stored in a
computer. The system has different algorithms like receiving the Afan Oromo speech signal,
preprocessing it, feature extraction, speech classification and recognizing the speech. In
accomplishing these all algorithms, artificial neural network toolboxes and some scripts of
MATLAB software were used.
For developing the system, 21144 * 45 input datasets and 9*45 target datasets were made. 70%
of input datasets were used for training whereas 30% of input datasets shared between validation
and testing algorithms. Then confusion matrix was resulted. It shown the correctly and incorrectly
classified samples.
Out of total samples, 91.1% were perfectly classified to their corresponding classes whereas the
rest 8.9% were misclassified. That is, they were classified to other classes.
Finally, the recognition ability of the system was tested by one sample of MFCC traindataset at a
time. Consequently, the corresponding text form of the recognized sample was displayed. |
en_US |