Afaan Oromo Sentence Based Plagiarism Detection: A Semantic Similarity Approach

Gizaw Tadele; Million Meshesha; Admas Abtew

dc.contributor.author	Gizaw Tadele
dc.contributor.author	Million Meshesha
dc.contributor.author	Admas Abtew
dc.date.accessioned	2021-02-05T12:24:33Z
dc.date.available	2021-02-05T12:24:33Z
dc.date.issued	2019
dc.identifier.uri	https://repository.ju.edu.et//handle/123456789/5395
dc.description.abstract	In current day the availability of digital technology enables world community to communicate and exchange information easily. As a result of which, we are in the era of information overloading where various types of information is collected from different sources. As the amount of available digital information increases it is difficult to access information efficiently from different sources. To address this problem, machine leaning based NLP has a great contribution. In this work we focused on semantic based similarity measure for plagiarism detection from Afaan Oromo documents. To use the semantic approach, we built a sample dictionary for synonym terms representation. The study used LSI approach to decompose sentences into terms matrix for similarity calculation. We have collected 3 documents with 15 sentences, 14 sentences and 11 sentences. The documents are collected from different sources like two documents from Afaan Oromo published fiction and one document of personal bibliography from Afaan Oromo FBC. Preprocessing of text has been applied to the dataset. Java programming has been used to develop a prototype of the proposed model and SQL has been used to build sample dictionary. The performance of the study work was tested on 10 sentences of suspicious query and 3 source documents of 275 key terms. The accuracy achieved in detecting plagiarism from suspicious query was 53.02 %. The result gained was not high due to less dataset. In addition stemming and POS tagging has not been applied this work. The accuracy can be improved with big dataset, applying stemming and POS tagging will the recommendation for this study for future step.	en_US
dc.language.iso	en	en_US
dc.subject	Afaan Oromo	en_US
dc.subject	Semantic similarity	en_US
dc.subject	Machine learning	en_US
dc.subject	LSI	en_US
dc.subject	Plagiarism detection	en_US
dc.title	Afaan Oromo Sentence Based Plagiarism Detection: A Semantic Similarity Approach	en_US
dc.type	Thesis	en_US