Abstract:
Nowadays, the information overloads with social media are with the development in natural language
application for the local Amharic language speaker posted texts, the amount of data one has to deal with
increased rapidly the volumes of posts Amharic text documents that could be posted on Facebook and
Twitter. So, the news items on text summarization system user needs for posting that can be summarized
from the posted documents that belong to the duration of over a time period of the date, monthly, and yearly
posted textsto summarize the tweets from Twitter and Facebook posted Amharic texts.Thus, summarization
is dealing with information overload presenting and posted with a text document for the current time
representation of the posted documents to summarize [1]. The purpose of this paper is to summarize the
news items posted Amharic texts over a time posted documents from social media on Twitter and Facebook;
first to find the similarity between posts, and then to cluster of similar posts, and also the groups of clustered
documents to summarize with the individual cluster posted documents in the rank sentences that is to identify
the higher score of important sentences in the documents to produce for the readers without duplicated
posted sentences in the document. The main problem of the social media posted texts are that most people
would probably read their posted in Amharic textswith duplicate posted documents. The post textswill most
likely contain articles or others news items that are not relevant to the post in question. However, to find
the information the user is looking for she or he will have to find summary posted texts and read important
portions of posts as Amharic documents to extract desired information on social media. So, duplication
information once is posted for the original texts for the Twitter and Facebook that need to minimize the
amount of posts to be summarized can be achieved by condensing duplicate posted texts. The main objective
of the study is to investigate the development of Amharic text summarization system for social media posted
texts (Twitter and Facebook) in Amharic texts. The corpus preparation contains posts news items (protests,
droughts, sports and floods) on Twitter and Facebook in total 4951 posted documents in the sentences to
prepare for the experimentation and the implementation tool is Java platforms. Our proposed approach has
three components: First, calculate the similarity between each posted document within the two pair of
sentences. Second, clustering based on the similarity results of the documents to group them by using
Kmeans algorithm. Third, summarizing the clustered posted document using TF-IDF algorithms that
involve finding statistical ways for the frequent terms to rank the documents. We apply the summarization
technique is an extractive summarization approach that is assigned an extract the sentences with highest or
list of top ranked sentences in the posted documentsto form the summaries and the size of the summary can
be identified by the user. The performance of the system is evaluating the results by using both subjective
VIII
and objective evaluations. In the subjective evaluation is the linguistic qualities of the system summaries
are assessed by the human who prepared the manual summaries. Objective evaluation is done for the
summaries generated in the experiments by comparing them with an automatic summary of ideal manual
summary using f-measure. For the f-measure for Amharic text summarization news items posted on social
media that our system indeed performed very well for both single and multiple document summarization
tasks. The experiments are prepared for manual and automatic summary using both systemsto generate the
30 posted the textsin each clustered posts text files for testing set to extract summary for at 10%, 20% and
30% extraction rate for the posted texts. In the experiment one the highest F-measure score is 87.07% of
extraction rate of 30%, in the clustered or group of protests posts. The second experiment the highest Fmeasure score is 84% for the extraction rate at 30%, in droughts post groups. In the third experiment the
highest F-measure score is 91.37% of extraction rate of 30%, in the sports post groups and also the fourth
experiments the highest F-measure score is 93.52% of extraction rate of 30% in floods post group to
generate the summary post texts. If the system to generate the size of the summary is increased, the extraction
rate also increased to posted texts