Jimma University Open access Institutional Repository

Amharic Text Summarization for News Items posted on Social Media

Show simple item record

dc.contributor.author Abaynew Guadie
dc.contributor.author Debela Tesfaye
dc.contributor.author Teferi Kebebew
dc.date.accessioned 2021-02-11T12:53:38Z
dc.date.available 2021-02-11T12:53:38Z
dc.date.issued 2017
dc.identifier.uri https://repository.ju.edu.et//handle/123456789/5555
dc.description.abstract Nowadays, the information overloads with social media are with the development in natural language application for the local Amharic language speaker posted texts, the amount of data one has to deal with increased rapidly the volumes of posts Amharic text documents that could be posted on Facebook and Twitter. So, the news items on text summarization system user needs for posting that can be summarized from the posted documents that belong to the duration of over a time period of the date, monthly, and yearly posted textsto summarize the tweets from Twitter and Facebook posted Amharic texts.Thus, summarization is dealing with information overload presenting and posted with a text document for the current time representation of the posted documents to summarize [1]. The purpose of this paper is to summarize the news items posted Amharic texts over a time posted documents from social media on Twitter and Facebook; first to find the similarity between posts, and then to cluster of similar posts, and also the groups of clustered documents to summarize with the individual cluster posted documents in the rank sentences that is to identify the higher score of important sentences in the documents to produce for the readers without duplicated posted sentences in the document. The main problem of the social media posted texts are that most people would probably read their posted in Amharic textswith duplicate posted documents. The post textswill most likely contain articles or others news items that are not relevant to the post in question. However, to find the information the user is looking for she or he will have to find summary posted texts and read important portions of posts as Amharic documents to extract desired information on social media. So, duplication information once is posted for the original texts for the Twitter and Facebook that need to minimize the amount of posts to be summarized can be achieved by condensing duplicate posted texts. The main objective of the study is to investigate the development of Amharic text summarization system for social media posted texts (Twitter and Facebook) in Amharic texts. The corpus preparation contains posts news items (protests, droughts, sports and floods) on Twitter and Facebook in total 4951 posted documents in the sentences to prepare for the experimentation and the implementation tool is Java platforms. Our proposed approach has three components: First, calculate the similarity between each posted document within the two pair of sentences. Second, clustering based on the similarity results of the documents to group them by using Kmeans algorithm. Third, summarizing the clustered posted document using TF-IDF algorithms that involve finding statistical ways for the frequent terms to rank the documents. We apply the summarization technique is an extractive summarization approach that is assigned an extract the sentences with highest or list of top ranked sentences in the posted documentsto form the summaries and the size of the summary can be identified by the user. The performance of the system is evaluating the results by using both subjective VIII and objective evaluations. In the subjective evaluation is the linguistic qualities of the system summaries are assessed by the human who prepared the manual summaries. Objective evaluation is done for the summaries generated in the experiments by comparing them with an automatic summary of ideal manual summary using f-measure. For the f-measure for Amharic text summarization news items posted on social media that our system indeed performed very well for both single and multiple document summarization tasks. The experiments are prepared for manual and automatic summary using both systemsto generate the 30 posted the textsin each clustered posts text files for testing set to extract summary for at 10%, 20% and 30% extraction rate for the posted texts. In the experiment one the highest F-measure score is 87.07% of extraction rate of 30%, in the clustered or group of protests posts. The second experiment the highest Fmeasure score is 84% for the extraction rate at 30%, in droughts post groups. In the third experiment the highest F-measure score is 91.37% of extraction rate of 30%, in the sports post groups and also the fourth experiments the highest F-measure score is 93.52% of extraction rate of 30% in floods post group to generate the summary post texts. If the system to generate the size of the summary is increased, the extraction rate also increased to posted texts en_US
dc.language.iso en en_US
dc.subject Amharic language en_US
dc.subject Similarity measure en_US
dc.subject Text Summarization en_US
dc.subject Clustering en_US
dc.subject Tf-IDF algorithm en_US
dc.subject Social media en_US
dc.subject Facebook en_US
dc.subject Twitter en_US
dc.subject News posted texts en_US
dc.title Amharic Text Summarization for News Items posted on Social Media en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search IR


Browse

My Account