Afaan Oromo Sentence Based Plagiarism Detection: A Semantic Similarity Approach

Gizaw Tadele; Million Meshesha; Admas Abtew

Afaan Oromo Sentence Based Plagiarism Detection: A Semantic Similarity Approach

Gizaw Tadele; Million Meshesha; Admas Abtew

URI: https://repository.ju.edu.et//handle/123456789/5395

Date: 2019

Abstract:

In current day the availability of digital technology enables world community to communicate and exchange information easily. As a result of which, we are in the era of information overloading where various types of information is collected from different sources. As the amount of available digital information increases it is difficult to access information efficiently from different sources. To address this problem, machine leaning based NLP has a great contribution. In this work we focused on semantic based similarity measure for plagiarism detection from Afaan Oromo documents. To use the semantic approach, we built a sample dictionary for synonym terms representation. The study used LSI approach to decompose sentences into terms matrix for similarity calculation. We have collected 3 documents with 15 sentences, 14 sentences and 11 sentences. The documents are collected from different sources like two documents from Afaan Oromo published fiction and one document of personal bibliography from Afaan Oromo FBC. Preprocessing of text has been applied to the dataset. Java programming has been used to develop a prototype of the proposed model and SQL has been used to build sample dictionary. The performance of the study work was tested on 10 sentences of suspicious query and 3 source documents of 275 key terms. The accuracy achieved in detecting plagiarism from suspicious query was 53.02 %. The result gained was not high due to less dataset. In addition stemming and POS tagging has not been applied this work. The accuracy can be improved with big dataset, applying stemming and POS tagging will the recommendation for this study for future step.

Show full item record