Automatic Text Summarization For Afan Oromo

Gemechu Kena

Automatic Text Summarization For Afan Oromo

Gemechu Kena

URI: https://repository.ju.edu.et//handle/123456789/4575

Date: 2014-06

Abstract:

With the rapid development of information technology, the world is flooded with information. Also information has become the most valuable and important resource of this fast growing information society. Today, with digitally stored information available in abundance, even for many minor languages this information must by some means be filtered and extracted in order to avoid drowning in it. Automatic text summarization is one such technique, where a computer summarizes a longer text to a shorter non-redundant form. This thesis thus sets the focus on automatic text summarization for Afan Oromo language by sentence extraction for the original source documents and the evaluation of the summaries using five human resources. The resources that are used for this study is nine corpus collected from different website. The field of automatic text summarization began with some classical approach by extracting sentence from original document attempted to identify the most salient sentences of the documents using some thematic features. This research was intended to develop extraction based automatic text summarization for Afan Oromo language by using two different feature namely, term frequency and title word features for achieving accurate summaries. The proposed method was evaluated by comparing the machine generated and human summaries. Results show that title word feature is the best individual feature for extracting most informative sentence from Afan Oromo text. According to the experimentation made the system registered recall 0.37(37%), 0.33(33%) precision and 0.35(35%) F-score for the method of Term frequency. Using the title word method 0.52(52%) recall, 0.39(39%) precision and 0.44(44%) F-score that shows the improvement of the summarizer with this method. In general, according the experiment result gives the best performance for the title word feature than term frequency in both subjective and objective evaluations.

Show full item record