This paper compares two methods for features representation in Arabic text classification. These methods are bag of words (BOW) that mean the word-level unigram and mixed words representations. The mixed words use a mixture of a bag of words and two adjacent words with different proportions. The main objective of this paper is to measure the accuracy of each method and to determine which method is more accurate for Arabic text classification based on the representation modes. Each method uses normalization and stemming. The results show that the use of mixed words in features representation achieves the highest accuracy by 98.61% when normalization is used.
Sallam, R., Mousa, H., & Hussien, M. (2016). A Comparative Study for Arabic Text Classification Based on BOW and Mixed Words Representations. IJCI. International Journal of Computers and Information, 5(1), 24-34. doi: 10.21608/ijci.2016.33954
MLA
Rouhia M. Sallam; Hamdy Mousa; Mahmoud Hussien. "A Comparative Study for Arabic Text Classification Based on BOW and Mixed Words Representations", IJCI. International Journal of Computers and Information, 5, 1, 2016, 24-34. doi: 10.21608/ijci.2016.33954
HARVARD
Sallam, R., Mousa, H., Hussien, M. (2016). 'A Comparative Study for Arabic Text Classification Based on BOW and Mixed Words Representations', IJCI. International Journal of Computers and Information, 5(1), pp. 24-34. doi: 10.21608/ijci.2016.33954
VANCOUVER
Sallam, R., Mousa, H., Hussien, M. A Comparative Study for Arabic Text Classification Based on BOW and Mixed Words Representations. IJCI. International Journal of Computers and Information, 2016; 5(1): 24-34. doi: 10.21608/ijci.2016.33954