Performance Investigation of Features Extraction and Classification Approaches for Sentiment Analysis Systems

Document Type : Original Article

Authors

1 Computer science department Faculty Of Computer And Information Menoufia university

2 Faculty of Computers and Information Menoufia University

3 computer sciences department,faculty of computers and information, Menoufia university

Abstract

Data pre-processing and feature extraction of micro-blogging data in sentiment analysis systems becomes an effective field of analysis. Object identification, negation expressions, sarcasm, outlines, misspellings are the major issues faced during sentiment analysis. So, data pre-processing in a sentiment analysis system is a conclusive step to improve data quality, raise the extraction, and classification of meaningful data. This paper presents a sentiment analysis system for performance investigation. Several pre-processing and feature extraction techniques are applied to optimize the sentiment analysis. Our system comprises three different components: data pre-processing, feature extraction, and sentiment analysis. The pre-processing and feature extraction approaches enhance the sentiment analysis system performance. We compare between different sentiment analysis approaches using a dataset of US Airlines from Twitter. Results show achieving high performance when using the Word2Vec approach with XGBoost and random forest classification algorithms. Also, the results show the classification technique, Naive Bayes is the lowest performance.

Keywords