Abstract:
Dependency parsing is an act of extracting the relations among the words or morphemes using
dependency type to resolve ambiguities among the head and its modifiers. Humans use facial
expressions, tone of speech, body language, and others to make a natural language more clear
and understandable. Unlike humans, machines need well-formed and studied language
structures for both natural language understanding and generations. This was achieved through
developing natural language processing applications. Hence Afaan Oromoo dependency parser
was developed to resolve and clarify misunderstandings among Afaan Oromoo morphemes.
Even though, constituent parsers and universal dependency parsers exist they are not effective
to handle morpheme information. Among dependency parser approaches, data driven approach
was selected in Afaan Oromoo dependency parser to obtain morphemes and word order
features in Afaan Oromoo. From a data-driven approach transition system was selected for its
simplicity and fast performance than graph-based dependency parsers. Particularly arc standard is used to generate an unlabeled dependency graph. Afaan Oromoo dependency parser
was developed from two sub-models that work self-reliantly. The first one is used to predict
the transition and then generates an unlabeled dependency graph (tree). The second one is used
to predict the relation types and generate a labeled dependency graph. RNN algorithm was
selected to handle sequences of Afaan Oromoo morphemes and extract the language patterns.
The treebank was constructed from 500 sentences and in the first model 3480 and 1740
instances of configurations were used for training and test data. In the second model 1000 and
415 (head-dependents) were used for training and test purposes. Consequently, LSTM and
BILSTM had experimented and the BILSTM has shown better accuracy for classifications of
both transitions and relations. The first model performs an accuracy of 90% using BILSTM
and 89% using LSTM. Next, the second model scored 71% for BILSTM and 69% for LSTM.
Additionally, using BILSTM the model scores 60% for UAS and 40% for LAS. To sum up,
the performance of the deep learning models is directly proportional to corpus size. And also
increasing dependency labels enhances clarifications between the morphemes.