Jimma University Open access Institutional Repository

Corpus Based Word Sense Disambiguation For Geez Language

Show simple item record

dc.contributor.author Aschale, Amlakie
dc.contributor.author Anlay, Kinde
dc.contributor.author Abdurahman, Fetulhak
dc.date.accessioned 2022-04-06T08:13:43Z
dc.date.available 2022-04-06T08:13:43Z
dc.date.issued 2020-03-30
dc.identifier.uri https://repository.ju.edu.et//handle/123456789/6924
dc.description.abstract 1st century,Improving several natural language applications could be used to make the com munication between human beings and computers easy. Words in one language may have two or more meanings depending on the contexts that we use. Those words make the communica tion between computers and humans difficult because computers cannot differentiate or identify proper meanings of ambiguos words. So that, Word Sense Disambiguation(WSD) helps com puters to identify the related meaning of the ambiguous words depending on the surrounding contexts. In this study, we tried to build WSD prototype model for Geez language because WSD is an intermediate task for other NLP tasks like information extraction, machine trans lation, speech recognition and information retrieval.There are three types of WSD approches; hybrid,knowldge based and corpus based approaches.From those approaches, we used a corpus based approach to build the WSD model and it can be further classified as supervised, semi supervised, and unsupervised machine learning methods. We conducted our experiments on six ambiguous words of Geez language by collecting a total of 2119 sentences or instances of the language.Those six ambiguous words of the language are:- ሀለፈ (Halafe), ቆመ (ḱome), ባረከ (bareke), አስተርዓየ(astaraya), ገብረ(gebira), ሰዓለ (Se’ale).We applied four clustering algo- rithms (EM, Simple K-Means, Farthest First, and Hierarchical Clusterer) and five classification algorithms (ADTree,AdaBoostM1,SMO, Bagging and Naïve Bayes) for clustering and classifi- cation purposes of the sentences. We compared the Corpus-based machine learning approachs, and we found that semi-supervised machine learning approach achieved the best performance. The proposed method achieved an average performance of 92.1%, 91.3%, 91% and 91.1% for Precision, Recall, F1-score and Accuracy using ADTree algorithm respectively. Window size of 4-4 has been the optimal window size to identify the meaning of the selected ambiguous words of Geez language using ADTree algorithm en_US
dc.language.iso en_US en_US
dc.subject Word Sense Disambiguation en_US
dc.subject Semi- supervised en_US
dc.subject AD tree en_US
dc.subject Geez Language en_US
dc.title Corpus Based Word Sense Disambiguation For Geez Language en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search IR


Browse

My Account