Automatic Afan Oromo Sentence Identification and Simplification Using Rule Based Approach

JUNDA, ABDUREHMAN MAHMUD

Automatic Afan Oromo Sentence Identification and Simplification Using Rule Based Approach

JUNDA, ABDUREHMAN MAHMUD

URI: https://repository.ju.edu.et//handle/123456789/6155

Date: 2021-12-20

Abstract:

In NLP, sentence identification and simplification are necessary for machine translation, parsing, question generation, information extraction, summarization, semantic role labeling, opinion mining, etc. The majority of these applications use simple sentences as preprocessing to improve their functionality, and the high coverage of sentence simplification is used for various social classes that have language difficulties, such as aphasics, children, and adults learning the language (non-native speakers). The study provided a new automatic syntactic Afan Oromo sentence identification and simplification using a rule-based method that operates on POS tags. In this study, the main performed task can be separated into two tasks. The first task is the identification and separation of Afan Oromo declarative sentences into simple, compound, complex, and compound-complex sentences. The second task is the simplification of compound sentences into simple and self contained sentences by preserving the meaning of the original meaning as much as possible. Sentence identification and separation were performed to improve the performance of sentence simplification. The resursive type algorithm is developed both for sentence identification and simplification based on the syntactic structure of the sentences. To determine the syntactic structure of the sentence, the POS Tag is used as a preprocssor and then the sentence type indicators and sentence simplification features are managed. To evaluate the algorithms, a dataset containing 480 sentences was collected from the Afan Oromo textbook and annotated with the help of an expert. The performance of the sentence identification and compound sentence simplification algorithms is separately evaluated in terms of precision and recall using the result gained by the expet judgments. The expert classifies the identified and simplified sentences as correct or incorrect by comparing the system's output with the golden standard produced by the language expert. The sentence simplification evaluation criteria includes grammar and fluency of the simplified sentence and also the retainment of the original meaning. The overall performance of both sentence identification and compound sentence simplification is 90% and 84.4% F score respectively. The evaluation result reveals that the proposed algorithm is a promising one, as it is the beginning of a less resource-intensive study

Show full item record