Context-sensitive sentence auto-completion for Amharic text

Mohammed Nuru; Debela Tesfaye; Seid Yesuf

Context-sensitive sentence auto-completion for Amharic text

Mohammed Nuru; Debela Tesfaye; Seid Yesuf

URI: https://repository.ju.edu.et//handle/123456789/5369

Date: 2016

Abstract:

Sentence completion is an unsolvable problem in the area of Natural Language Processing and Information Retrieval field of study. These-days, alertly increasing the number of electronic device users, who need to perform writing reports, searching files on their large-scale datasets, but have difficulty writing for different cases. Auto-completion is a general and specialized application to solve such type of problems. The main objective of auto-completion is reducing spelling error for poor spellers, keeping the syntactic structure of language, saving user’s keystrokes, and the time and effort involved in typing. This paper presents a context-sensitive sentence auto-completion of Amharic text using combining features learned from the part-ofspeech tagging to extract syntactic information and other features learned from frequencies, which include calculating the distance, similarity and length between input word and the possible recommendations using various techniques like tf-idf. This work completes the missed part of a sentence. The goal is then, when the user inserts the portion of a sentence, the system suggests the top five ranked sentences. In general, the researcher has designed and implemented the prototype for three systems, such as distance similarity, pos tag and tf-idf and the hybrid of them. Finally, the researcher has also evaluated the performance of the systems, in four phases by preparing training and test set. Thus, based on the observed errors the hybrid sentence auto-completion has able to reached 81.82% completion accuracy. Unfortunately, the performance of the prototypes i.e. distance similarity, probabilistic part-of-speech tag information and tf-idf sentence auto-completion are tested using different experiments within the same input. The probabilistic distance similarity, part-of-speech tag information and tf-idf have achieved 21.21%, 31.82% and 80.03%, individually and in the order already mentioned. Last, but not least, these methods rely on length, tf-idf and syntactical information to predict the most likely sentences. To that end, this research paper attempts to provide some recommendations that could bring about a change in the performance of sentence auto-completion in the Amharic sentence construction in order that current techniques of sentence completion could be employed from this time onwards.

Show full item record