Abstract:
Natural language refers to human languages like Awngi and Amharic as opposed to artificial or
programming languages such as C++, Java, Pascal, etc
Natural Language processing (NLP) is the major field of study in computer science and related
departments. NLP increase the ability of computersto understand, interpret and communicate
using human languages. It is a branch of computational linguistics which is concerned with
automated, computer processing of natural language such as speech acts or texts.
Parts of speech tagging, one of the major tasks of NLP, automatically tags the word of a text by
labels that can be used to determine the structure of a sentence. The parts of speech tagger
developed in this thesis is based on probability theory.
The purpose of this thesis is to develop parts of speech tagger for Awngi language using Hidden
Markov Model (HMM). Most natural language processing systems use parts of speech (POS)
tagger as one of their components in their system.
Awngi language literatures on grammar and morphology are reviewed to understand nature of
the language and also to identify possible tagsets. Based on this, 23 tagsets are identified and for
the study, we have collected 350 sentences (with total word of 3760 both for training and testing
sets).
The performance of the tagger, Awngi language HMM POS tagger is tested using tenfold cross
validation mechanism. The experimental result indicates both unigram and bigram taggers tag
words with 85.16% and 87.84% accuracy respectively. Based on the achieved result conclusions
and recommendations are forwarded