Classification and Prediction of Opinion Mining in Social Networks Data

Mohamed, Shaimaa Mahmoud; Hussien, Mahmoud; Keshk, Arabi

doi:10.21608/ijci.2020.26841.1015

Classification and Prediction of Opinion Mining in Social Networks Data

Document Type : Original Article

Authors

¹ Computer Science Department, Faculty of Computers and Information,Menoufia University, Shebin Elkom 32511, Egypt

² Faculty of Computers and Information, Menofia University, Egypt

³ Faculty of Computer and Information Menoufia University

10.21608/ijci.2020.26841.1015

Abstract

opinion mining in social networks data considers one of the most significant and challenging tasks in our days due to the huge number of information that distributed each day. We can profit from these opinions by utilizing two significant procedures (classification and prediction). Although there is many researchers’ work at this point, it still needs improvement. Therefore, in this paper, we present a method to improve the accuracy of both processes. The improvement is done through cleaning the data set by converting all words to lower case, removing usernames, mentions, links, repeated characters, numbers, delete more than two spaces between words, empty tweets, punctuations and stop words, and converting all words like “isn't” to “is not”. we using both unigrams and bigrams as features. Our data set contains the user's feelings about distributed products, tweets labeled positive or negative, and each product rate from one to five. We implemented this work using different supervised machine learning algorithms like Naïve Bayes, Support Vector Machine and MaxEntropy for the classification process, and Random Forest Regression, Logistic Regression, and Support Vector Regression for the prediction process. At last, we have accuracy in both processes better than existing works. In classification, we achieved an accuracy of 90% and in the prediction process, Support Vector Regression model is able to predict future product rate with a Mean Squared Error (MSE) of 0.4122, Logistic Regression model is able to predict with MSE of 0.4986 and Random Forest Regression model able to predict with MSE of 0.4770.

Keywords