Parts of Speech Tagging for Awngi Language

Wubetu Barud

Parts of Speech Tagging for Awngi Language

Wubetu Barud

URI: https://repository.ju.edu.et//handle/123456789/4643

Date: 2016-11

Abstract:

Natural language refers to human languages like Awngi and Amharic as opposed to artificial or programming languages such as C++, Java, Pascal, etc Natural Language processing (NLP) is the major field of study in computer science and related departments. NLP increase the ability of computersto understand, interpret and communicate using human languages. It is a branch of computational linguistics which is concerned with automated, computer processing of natural language such as speech acts or texts. Parts of speech tagging, one of the major tasks of NLP, automatically tags the word of a text by labels that can be used to determine the structure of a sentence. The parts of speech tagger developed in this thesis is based on probability theory. The purpose of this thesis is to develop parts of speech tagger for Awngi language using Hidden Markov Model (HMM). Most natural language processing systems use parts of speech (POS) tagger as one of their components in their system. Awngi language literatures on grammar and morphology are reviewed to understand nature of the language and also to identify possible tagsets. Based on this, 23 tagsets are identified and for the study, we have collected 350 sentences (with total word of 3760 both for training and testing sets). The performance of the tagger, Awngi language HMM POS tagger is tested using tenfold cross validation mechanism. The experimental result indicates both unigram and bigram taggers tag words with 85.16% and 87.84% accuracy respectively. Based on the achieved result conclusions and recommendations are forwarded

Show full item record