Abstract:
Automatic text categorization is a supervised learning task, defined as assigning category labels
to new documents based on likelihood suggested by a training set of labeled documents. The
world is widely changing hence, the impact of the technology and communications revolution
has grown greater today. People have realized the importance of archiving and finding
information, only nowadays with the advent of computers and the progress of information
technology became possible to store and share large amounts of information, and finding useful
information from such collections became a necessity.
Currently Oromia Radio and Television Organization are implementing a manual
categorization system to categorize their news items in their day-to-day activities although they
are using computer system to store and dispatch information using database systems of un
organized information system.
The objective of this research is to apply the novel techniques of machine learning approaches to
Afan Oromo news text categorization using Naïve Bayes, Sequential Minimal Optimization and
J48 classifier algorithm to recommend the best for the problem at hand. The classifiers use Afan
Oromo News items of five classes, collected from Oromia Television and Radio Organization
and Voice of America AfaanOromoo program for training and testing of the classifiers. Before
the implementation of classifiers, document preprocessing is applied on the prepared document.
Under preprocessing steps, removing of digits, punctuation marks, extra characters following this
compound words are merged and stop words are removed and finally documents are
transformed into term matrix with its weighted values to perform the summarization.