Jimma University Open access Institutional Repository

Developing a Stemmer for Dawurootsuwa Language Using a Rule-Based Approach

Show simple item record

dc.contributor.author Habtamu Dubale
dc.contributor.author Getachew Mamo
dc.contributor.author Zerihun Olana
dc.date.accessioned 2023-10-13T13:48:07Z
dc.date.available 2023-10-13T13:48:07Z
dc.date.issued 2023-06
dc.identifier.uri https://repository.ju.edu.et//handle/123456789/8633
dc.description.abstract Stemmer is typically used as a standalone module in the architecture of NLP systems. It is particularly important for the development of search engines, MT, speech recognition, text categorization, IE, and text summarization. In the study of language morphology, stemming is the reduction of inflected (or occasionally derived) words to their stem, base, or root form. Despite that, Dawurootsuwa has no stemming algorithm developed yet to apply NLP applications to the language. Hence, this language needs an automatic word conflation system. A Rule-based stemming system for the Dawuro language(Dawurootsuwa) is described in this thesis work. It is based on the most popular English language stemmer which is the Porter stemmer. The system in this study uses a word as input and executes an algorithm based on a set of steps made up of several rules. There are several contexts for every stemming rule in the Dawurootsuwa. When designing the stemmer, contexts are considered accordingly. A thorough understanding of language morphology is required for this kind of thesis work. So, Dawurootsuwa morphology was studied and described in detail to model the language and develop an automatic procedure for conflation. The stemmer was designed by categorizing words based on their affixes. The outcome of this study is a rule-based context-sensitive iterative stemmer for Dawurootsuwa. This stemmer's performance was assessed mainly using the error counting technique. To develop a test set that covers a range of topics, totaling 3000 words were collected for testing purposes mainly from several published papers regarding Dawurootsuwa morphology, religious books (like Bible), and Dawurootsuwa-Amharic-English dictionary to make a test set cover variety of issues however training set is distinct from the test set. A system evaluation reveals that the algorithm accuracy yields 93.96 percent accurate outputs. A 6.02 percent error rate is considered to be typical and other evaluation techniques are briefly explored at the end. en_US
dc.language.iso en_US en_US
dc.subject NLP Applications, Stemming, Rule Based Stemming Algorithms, Porter Stemmer, Dawurootsuwa Rule Based Stemmer en_US
dc.title Developing a Stemmer for Dawurootsuwa Language Using a Rule-Based Approach en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search IR


Browse

My Account