Abstract:
1st century,Improving several natural language applications could be used to make the com munication between human beings and computers easy. Words in one language may have two
or more meanings depending on the contexts that we use. Those words make the communica tion between computers and humans difficult because computers cannot differentiate or identify
proper meanings of ambiguos words. So that, Word Sense Disambiguation(WSD) helps com puters to identify the related meaning of the ambiguous words depending on the surrounding
contexts. In this study, we tried to build WSD prototype model for Geez language because
WSD is an intermediate task for other NLP tasks like information extraction, machine trans lation, speech recognition and information retrieval.There are three types of WSD approches;
hybrid,knowldge based and corpus based approaches.From those approaches, we used a corpus based approach to build the WSD model and it can be further classified as supervised, semi supervised, and unsupervised machine learning methods. We conducted our experiments on six
ambiguous words of Geez language by collecting a total of 2119 sentences or instances of the
language.Those six ambiguous words of the language are:- ሀለፈ (Halafe), ቆመ (ḱome), ባረከ (bareke),
አስተርዓየ(astaraya), ገብረ(gebira), ሰዓለ (Se’ale).We applied four clustering algo- rithms (EM, Simple
K-Means, Farthest First, and Hierarchical Clusterer) and five classification algorithms
(ADTree,AdaBoostM1,SMO, Bagging and Naïve Bayes) for clustering and classifi- cation
purposes of the sentences. We compared the Corpus-based machine learning approachs, and we
found that semi-supervised machine learning approach achieved the best performance. The
proposed method achieved an average performance of 92.1%, 91.3%, 91% and 91.1% for
Precision, Recall, F1-score and Accuracy using ADTree algorithm respectively. Window size
of 4-4 has been the optimal window size to identify the meaning of the selected ambiguous
words of Geez language using ADTree algorithm