Jimma University Open access Institutional Repository

Afaan Oromo Sentence Based Plagiarism Detection: A Semantic Similarity Approach

Show simple item record

dc.contributor.author Gizaw Tadele
dc.contributor.author Million Meshesha
dc.contributor.author Admas Abtew
dc.date.accessioned 2021-02-05T12:24:33Z
dc.date.available 2021-02-05T12:24:33Z
dc.date.issued 2019
dc.identifier.uri https://repository.ju.edu.et//handle/123456789/5395
dc.description.abstract In current day the availability of digital technology enables world community to communicate and exchange information easily. As a result of which, we are in the era of information overloading where various types of information is collected from different sources. As the amount of available digital information increases it is difficult to access information efficiently from different sources. To address this problem, machine leaning based NLP has a great contribution. In this work we focused on semantic based similarity measure for plagiarism detection from Afaan Oromo documents. To use the semantic approach, we built a sample dictionary for synonym terms representation. The study used LSI approach to decompose sentences into terms matrix for similarity calculation. We have collected 3 documents with 15 sentences, 14 sentences and 11 sentences. The documents are collected from different sources like two documents from Afaan Oromo published fiction and one document of personal bibliography from Afaan Oromo FBC. Preprocessing of text has been applied to the dataset. Java programming has been used to develop a prototype of the proposed model and SQL has been used to build sample dictionary. The performance of the study work was tested on 10 sentences of suspicious query and 3 source documents of 275 key terms. The accuracy achieved in detecting plagiarism from suspicious query was 53.02 %. The result gained was not high due to less dataset. In addition stemming and POS tagging has not been applied this work. The accuracy can be improved with big dataset, applying stemming and POS tagging will the recommendation for this study for future step. en_US
dc.language.iso en en_US
dc.subject Afaan Oromo en_US
dc.subject Semantic similarity en_US
dc.subject Machine learning en_US
dc.subject LSI en_US
dc.subject Plagiarism detection en_US
dc.title Afaan Oromo Sentence Based Plagiarism Detection: A Semantic Similarity Approach en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search IR


Browse

My Account