Jimma University Open access Institutional Repository

Development of dependency parser for Amharic sentences

Show simple item record

dc.contributor.author Degu, Mizanu Zelalem
dc.contributor.author Gebeyehu, Worku Birhanie
dc.date.accessioned 2022-04-11T12:37:02Z
dc.date.available 2022-04-11T12:37:02Z
dc.date.issued 2022-01-29
dc.identifier.uri https://repository.ju.edu.et//handle/123456789/6993
dc.description.abstract Dependency parsing provides information regarding word relationships and has many applications in natural language processing. Several methods of dependency parsing have been proposed in the literature for English and European languages. No sufficient dependency parsing system is available for Amharic, which is a Semitic and a national language of Ethiopia. Due to its morphological structure and low-resource availability, customizing available dependency parser systems is not efficient for Amharic language. In this paper, a novel dependency parser system is proposed for the Amharic language based on a long-short-term memory (LSTM) classifier in two steps, unlabeled dependency parsing, and relation label assignment. First, an arc-eager transition-action classifier was designed and trained on transition configurations generated from Amharic treebank to predict. Then, the output of the classifier is used by the arc-eager transition algorithm to produce an unlabeled dependency tree. Second, a relation-label classifier was designed and trained on pairs of parts of speech tags of the head and the dependent words from the treebank to assign an appropriate label for the dependency relation. Experiments were conducted on 1574 annotated sentences collected from universal-dependency Amharic treebank (1074) and a treebank that was prepared during this study (500). Both classifiers were tested on 30% of the dataset, and 92% and 81% accuracies were found for the transition-action classifier and relation-label classifier, respectively. The proposed system was also evaluated using an unlabeled and labeled attachment score on 30% of the dataset, and 91.54% unlabeled and 86% labeled attachment scores were found. Our experimental results demonstrate that the proposed system can be used for parsing Amharic sentences and as a preprocessing tool during the development of natural language processing tools. en_US
dc.language.iso en_US en_US
dc.subject Dependency parsing en_US
dc.subject Amharic en_US
dc.subject Under-resourced en_US
dc.subject LSTM en_US
dc.subject Arc-eager transition en_US
dc.title Development of dependency parser for Amharic sentences en_US
dc.type Article en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search IR


Browse

My Account