Abstract:
Dependency parsing provides information regarding word relationships and has many applications in natural
language processing. Several methods of dependency parsing have been proposed in the literature for English and
European languages. No sufficient dependency parsing system is available for Amharic, which is a Semitic and a
national language of Ethiopia. Due to its morphological structure and low-resource availability, customizing
available dependency parser systems is not efficient for Amharic language. In this paper, a novel dependency parser
system is proposed for the Amharic language based on a long-short-term memory (LSTM) classifier in two steps,
unlabeled dependency parsing, and relation label assignment. First, an arc-eager transition-action classifier was
designed and trained on transition configurations generated from Amharic treebank to predict. Then, the output of
the classifier is used by the arc-eager transition algorithm to produce an unlabeled dependency tree. Second, a
relation-label classifier was designed and trained on pairs of parts of speech tags of the head and the dependent
words from the treebank to assign an appropriate label for the dependency relation. Experiments were conducted on
1574 annotated sentences collected from universal-dependency Amharic treebank (1074) and a treebank that was
prepared during this study (500). Both classifiers were tested on 30% of the dataset, and 92% and 81% accuracies
were found for the transition-action classifier and relation-label classifier, respectively. The proposed system was
also evaluated using an unlabeled and labeled attachment score on 30% of the dataset, and 91.54% unlabeled and
86% labeled attachment scores were found. Our experimental results demonstrate that the proposed system can be
used for parsing Amharic sentences and as a preprocessing tool during the development of natural language
processing tools.