Enhancing performance of Kafinoonoo Named Entity  Recognition Using Machine and Deep Learning

Emiru, Mulugeta

Enhancing performance of Kafinoonoo Named Entity Recognition Using Machine and Deep Learning

Emiru, Mulugeta

URI: https://repository.ju.edu.et//handle/123456789/9437

Date: 2024

Abstract:

Named Entity Recognition (NER) is an essential task for many applications in Natural Language Processing (NLP), which aims to make computers capable of comprehending and processing human language. This study addresses the challenge of developing an NER system for Kafinoonoo, a low-resource language spoken in Ethiopia, which has limited linguistic resources and NLP infrastructure. The main objective was to develop and assess a deep learning-based NER system that was specifically tailored to the Kafinoonoo language's characteristics. In this study, we explored combinations of input representation methods, context encoders, and tag decoders to determine the most effective architecture. The input representations included traditional word embeddings such as Word2Vec and FastText, as well as transformer-based language models like BERT and RoBERTa, which provided richer, context-sensitive representations. Recurrent neural networks, specifically BiLSTM and BiGRU, were utilized as context encoders to capture language patterns, while Softmax and Conditional Random Fields (CRF) served as tag decoders for classifying named entities. The results demonstrated that models employing transformer-based representations with recurrent encoders and CRF decoders consistently outperformed other configurations. Notably, the combination of RoBERTa with BiLSTM and CRF achieved the highest F1 score of 0.90, underscoring the effectiveness of advanced architectures and CRF in enhancing tagging accuracy. This research enhances Kafinoonoo language preservation, search engine efficiency, and information extraction, promoting digital literacy and cultural preservation within the Kafecho community. It addresses the unique linguistic difficulties of Kafinoonoo and offers a strong basis for developing NER systems for low-resource languages. The work has constraints that might impact generalizability despite its contributions, such as a limited dataset size and a lack of thorough annotated corpora. In order to enhance performance, future work should concentrate on growing annotated datasets, utilizing semi-supervised and unsupervised learning approaches, and investigating data augmentation strategies.

Show full item record