Abstract:
This study focuses on building Named Entity Recognition for Kafi Noonoo language
which is frequently used in IE with the goal of classifying and predicting Named Entity
categories of a given tokens in a given sentence into predefined classes. The approach we
followed is ensemble methods which includes four techniques (HMM Machine Learning
algorithm, Rule-based, Pattern matching, and Dictionary-based techniques). We used voting
and priority technique to select the final Named Entity from a candidate Named Entities which
are recognized by each model. By employing those methods, we make use of the strength each
technique, in the end, a combination of those different approaches increases the efficiency of
our NER system.
We have collected the data from three main sources namely Kaffa TV (70% of data),
Kaffa Zone Administration bureau (11% of data) and Elementary school books (13% of data).
The corpus includes total of words 18090 words. The experiment was conducted using the
Jupyter notebook computing platform. Unlabeled Kafi Noonoo sentences are given for the
evaluating system. By comparing the output of the proposed model (Actual output) to the
human-annotated one (Expected output), in terms of Precision, Recall and F1 measure, the
following results are reported: 87.54%, 86.85% and 87.19%.
Our model is relatively effective at recognizing miscellaneous named entities
(DateTime, currency, and percentage values). The Machine Learning and Dictionary-based
techniques are highly dependent on training data and gazetteer respectively.