dc.description.abstract |
Spellchecking is the process of detecting and providing spelling suggestions for incorrectly spelled
words in a text. It is directly interposed with several applications like post handwritten text digital
correction and user word correction in the retrieval process. This thesis describes the design
architecture, implementation and testing of a model that have been developed to detect and correct
both non-word and real word. The main focus of this study is to design Context based spell checker
for Afan Oromo writing depends on the spelling error patterns of language based on the sequence
of words in the input sentences contextually. The technique used for this spelling correction is
unsupervised statistical approach. Unsupervised statistical approach helps to prepare manually
tagged data sets to help under resource like Afan Oromo language from collected corpus. The
Process of spelling correction is undertaken through the following major phases: error detection,
candidate suggestion and ranking candidate suggestion. Error detection is based on the dictionary
look up method and bigram analysis. The researcher collected the data from the different sources
and prepare the dictionary and bigram model for error detection and correction. The non-word
error candidate generation is based on calculating the similarity between the misspelled word and
list of token in the dictionary, similarity is measured using the Levenshtein to the dictionary token
and ranking accordingly and for real word error, bigram frequency was used to detect the error
and bigram probability was computed for the correction of misspelled. To conduct experiment
14,896 and 3231 words were used to learn and test the model respectively. Experiment result
shows that, the spell checker score recall of 93.7% and accuracy of 93.9% for both non-word and
real word spelling errors. According to gated result the accuracy of the system is 93.9%, this
shows that the model is optimistic in order to correct misspelling Afan Oromo words. We advise
to improve and complete the quality of the designed model through mixed approach (rule based
approach and N-gram). |
en_US |