Abstract:
Word Sense Disambiguation is a technique in the field of Natural Language Processing where the
main task is to find the appropriate sense in which ambiguous word occurs in a particular context.
It is a fundamental problem for many natural language technology applications(Machine
Translation, Text Summarization, Question and Answering, Information extraction and text mining
and Information Retrieval). A word may have multiple senses and the problem is to find out which
particular sense is appropriate in a given context. Ambiguity is a cause of poor performance in
searching and retrieval system. The objective of this work is to develop hybrid word sense
disambiguation which finds the sense of words based on surrounding contexts. Hence, this study
presents a Word Sense Disambiguation strategy which combines an unsupervised approach that
exploits sense in a corpus and manually crafted rule. The idea behind the approach is to overcome
the problem a bottleneck for the machine learning approaches, while hybrid method can improve
the accuracy and suitable when there is scarcity of training data. This makes our approach suitable
for disambiguation when there is lack of resource and sense definitions. In this study, the context of
a given word is captured using term co-occurrences within a defined window size of words. The
optimal window sizes for extracting semantic contexts is window +1 and +2 words to the right and
left of the ambiguous word. The similar contexts of a given senses of ambiguous word are clustered
using hierarchical and partitional clustering. Each cluster representing a unique sense. Some
ambiguous words have two senses to the five senses. The result argued that WSD yields an
accuracy of 70% in Unsupervised Machine learning and 81.1% in Hybrid Approach. The machine
learning were a useful information source for disambiguation but that it not as robust as a
linguistic(rule based) [89]. Based on this, the integration of deep linguistic knowledge with
machine learning improves disambiguation accuracy. Therefore, for Afan Oromo semantic has
come to the conclusion that the sense of words are closely connected to the statistics of word usage.
The achieved result was encouraging, despite it is less resource requirement. Yet; further
experiments using different approaches that extend this work are needed for a better performance.