Abstract:
Named Entity Recognition (NER) is an essential and challenging task in Nat ural Language Processing (NLP), particularly for resource-limited languages
like Afaan Oromo(AO). NER is a type of sequence tagging task that assigns
a label to each entity that contains multiple tokens. In this research, we
propose a variety of Long Short-Term Memory (LSTM) based models for se quence tagging. These models include LSTM networks, bidirectional LSTM
networks, LSTM with a Conditional Random Field layer (LSTM-CRF, and
bidirectional LSTM with a CRF layer (BiLSTM-CRF). We show that the
BiLSTM-CRF model can efficiently use both past and future input features
by using BiLSTM. The proposed approach aims at automating manual fea ture design and avoiding dependency on other natural language processing
tasks for classification features. In this paper potential feature information
represented as a word with their index is generated using the neural net work from text files. These generated features are used as features for Afaan
Oromo Named entity classification. Afaan Oromo NE corpus have been de veloped based on CoNLL’s 2002, BIO tagging scheme. Four NE categories
have been identified and used in this research work: person, location, orga nization, and miscellaneous. The miscellaneous category includes date/time,
monetary value, and percentage.
We have used the AONER corpus of around 700 Afaan Oromo sentences,
and from this corpus, we have used 567 sentences for training, 67 sentences
for validation and, 70 sentences for testing of our work. We got an accuracy
of 74.14 % using bidirectional LSTM and CRF layer (BiLSTM-CRF) for
AONER model