dc.description.abstract |
With the rapid development of information technology, the world is flooded with information.
Also information has become the most valuable and important resource of this fast
growing information society. Today, with digitally stored information available in abundance,
even for many minor languages this information must by some means be filtered and extracted in
order to avoid drowning in it. Automatic text summarization is one such technique, where a
computer summarizes a longer text to a shorter non-redundant form. This thesis thus sets the
focus on automatic text summarization for Afan Oromo language by sentence extraction for the
original source documents and the evaluation of the summaries using five human resources. The
resources that are used for this study is nine corpus collected from different website. The field of
automatic text summarization began with some classical approach by extracting sentence from
original document attempted to identify the most salient sentences of the documents using some
thematic features. This research was intended to develop extraction based automatic text
summarization for Afan Oromo language by using two different feature namely, term frequency
and title word features for achieving accurate summaries. The proposed method was evaluated
by comparing the machine generated and human summaries. Results show that title word feature
is the best individual feature for extracting most informative sentence from Afan Oromo text.
According to the experimentation made the system registered recall 0.37(37%), 0.33(33%)
precision and 0.35(35%) F-score for the method of Term frequency. Using the title word method
0.52(52%) recall, 0.39(39%) precision and 0.44(44%) F-score that shows the improvement of
the summarizer with this method. In general, according the experiment result gives the best
performance for the title word feature than term frequency in both subjective and objective
evaluations. |
en_US |