Abstract:
The focus of this thesis work is to design dictionary based spelling checker for Afaan Oromoo
language. We proposed a dictionary lookup and n-gram approach to detect the misspelled
word(s) and provide the correction suggestions. Accordingly, we built a dictionary from different
domains consisting of correctly spelled words. The prepared dictionary is used as a point of
reference in error detection and correction. The input word from the user would be cross-checked
with the already built dictionary and we have employed a dictionary lookup approach in doing
so. If the user input word is found in the dictionary, then the spelling checker considers it as
correctly spelled. Otherwise, the word would be detected as misspelled and the next task will be
generating the possible corrections for that particular word. To do so, we have used n-gram
approach to generate those words having a shared or common bigrams as suggestions from the
dictionary with respect to the wrongly spelled word(s). The generated suggestions will be ranked
in accordance with their similarity relative to the invalid word(s) so that the most likely
correction word to replace the wrongly spelled word would be at the top of the suggested list of
words. To make this happen, we have employed the Dice‟s coefficient similarity measure.
Accordingly, the candidate word with the highest dice‟s score would be the most likely to
substitute the misspelled word. For evaluating the performance of the spelling checker, we have
used precision and recall metrics as performance measurements. Consequently, a test dictionary
of 3000 words collected from different domains of Afaan Oromoo language is used to test the
capability of the spelling checker in error detection and providing corrections for those detected
words. Accordingly, the results from evaluation reveal that the designed system scored a
precision of 100% and 83.33% of recall. Additionally, we have also conducted an experiment to
demonstrate the possible factors that could affect the performance of the designed spelling
checker under selected three circumstances.