Abstract:
Developing language applications or localizations of software is a resource intensive task that
requires the active participation of stakeholders with various backgrounds. Spell checking is the
one and significant application of computational linguistics. Spell checking is the process of
detecting and sometimes providing spelling suggestions for incorrectly spelled words in a text.
The text data in local languages is also increasing fast, requiring text-processing tools for text
documents to be available in local languages. This application is vital to detect and correct
spelling errors in under resource languages like Amharic. This thesis describes the development,
implementation and testing of a model that have been developed to detect and correct non-word
and real word typing errors made by writers for Amharic language. The aim of this study is to
develop context based spell checker and corrector for Amharic depends on the spelling error
patterns of language based on the sequence of words in in the input sentences contextually.
Training and testing data sets were collected from various sources describes different issues to
balance the inclusiveness of the corpus. The texts were prepared and cleaned manually from any
kind of unnecessary errors which are not necessary for detection and correction like numbers and
punctuations. Experimental research design was used to evaluate the performance of developed
prototype system. To conduct experiment 10,000 and 500 sentences were used to learn and test the
model respectively. According the experimental result, the spell checker can correctly classify
Amharic words with prediction accuracy of 95.62%, lexical recall of 95.52% and lexical precision
of 35.18% for non-word spelling errors. The performance of the context sensitive spell checker
was measured and scored a value of prediction accuracy 64.93%, lexical recall 63.42% and error
precision 5.49% to resolve real word errors. Finally, as a comprehensive spell checker system has
to be capable of detection, resolving and ranking correction possibilities using complementary
contextual and linguistic knowledge, we are planning to extend the coverage level of the system
considering more syntactical and semantic knowledge to improve and complete the quality of the
developed system through rule based approaches.