Abstract:
Word Sense Disambiguation is a technique in the field of Natural Language Processing where the main task is to find the appropriate sense in which ambiguous word occurs in a particular context. It is a fundamental problem for many natural language technology applications(Machine Translation, Text Summarization, Question and Answering, Information extraction and text mining and Information Retrieval). A word may have multiple senses and the problem is to find out which particular sense is appropriate in a given context. Ambiguity is a cause of poor performance in searching and retrieval system. The objective of this work is to develop hybrid word sense disambiguation which finds the sense of words based on surrounding contexts. Hence, this study presents a Word Sense Disambiguation strategy which combines an unsupervised approach that exploits sense in a corpus and manually crafted rule. The idea behind the approach is to overcome the problem a bottleneck for the machine learning approaches, while hybrid method can improve the accuracy and suitable when there is scarcity of training data. This makes our approach suitable for disambiguation when there is lack of resource and sense definitions. In this study, the context of a given word is captured using term co-occurrences within a defined window size of words. The optimal window sizes for extracting semantic contexts is window +1 and +2 words to the right and left of the ambiguous word. The similar contexts of a given senses of ambiguous word are clustered using hierarchical and partitional clustering. Each cluster representing a unique sense. Some ambiguous words have two senses to the five senses. The result argued that WSD yields an accuracy of 70% in Unsupervised Machine learning and 81.1% in Hybrid Approach. The machine learning were a useful information source for disambiguation but that it not as robust as a linguistic(rule based) [89]. Based on this, the integration of deep linguistic knowledge with machine learning improves disambiguation accuracy. Therefore, for Afan Oromo semantic has come to the conclusion that the sense of words are closely connected to the statistics of word usage. The achieved result was encouraging, despite it is less resource requirement. Yet; further experiments using different approaches that extend this work are needed for a better performance.