Jimma University Open access Institutional Repository

Feature selection by integrating document frequency with genetic algorithm for Amharic news document classification

Show simple item record

dc.contributor.author Endalie, Demeke
dc.contributor.author Haile, Getamesay
dc.contributor.author Abebe, Wondmagegn Taye
dc.date.accessioned 2023-06-09T12:32:14Z
dc.date.available 2023-06-09T12:32:14Z
dc.date.issued 2022-04-25
dc.identifier.other http://dx.doi.org/10.7717
dc.identifier.uri https://repository.ju.edu.et//handle/123456789/8185
dc.description.abstract Text classification is the process of categorizing documents based on their content into a predefined set of categories. Text classification algorithms typically represent documents as collections of words and it deals with a large number of features. The selection of appropriate features becomes important when the initial feature set is quite large. In this paper, we present a hybrid of document frequency (DF) and genetic algorithm (GA)-based feature selection method for Amharic text classification. We evaluate this feature selection method on Amharic news documents obtained from the Ethiopian News Agency (ENA). The number of categories used in this study is 13. Our experimental results showed that the proposed feature selection method outperformed other feature selection methods utilized for Amharic news document classification. Combining the proposed feature selection method with Extra Tree Classifier (ETC) improves classification accuracy. It improves classification accuracy up to 1% higher than the hybrid of DF, information gain (IG), chi-square (CHI), and principal component analysis (PCA), 2.47% greater than GA and 3.86% greater than a hybrid of DF, IG, and CHI. en_US
dc.language.iso en_US en_US
dc.subject Chi-square en_US
dc.subject Document frequency en_US
dc.subject Extra tree classifier en_US
dc.subject Feature selection en_US
dc.subject Genetic algorithm en_US
dc.subject Information gain en_US
dc.subject Text classification en_US
dc.title Feature selection by integrating document frequency with genetic algorithm for Amharic news document classification en_US
dc.type Article en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search IR


Browse

My Account