Evaluating the Performance of Five Classifiers for Twitter Sentiment Analysis Using Bag of Words

Authors

  • Qasim Munir Student Superior University

DOI:

https://doi.org/10.56536/jicet.v4i2.153

Abstract

Abstract: Twitter is a social medium for millions of people to express their opinions on different subjects. Natural Language Processing is widely used for sentiment analysis. This Research paper compares five machine learning classifier performances using a bag of words technique. A Kaggle Dataset is used. The Naïve Bayes classifier gave better predictions having an accuracy achieved to 91.59 percent for the four type sentiments from the text, Negative, Positive, Neutral, and Irrelevant. The classifiers used in this research are Logistic Regression, XGBoost, Naïve Bayes, Random Forest, and Support Vector Machine. The bag of words with having n-grams to four words and retaining all the stop words gave better accuracy than having n-grams to one word and removing all stop words. For further improvement in results Transformer based models like Bidirectional Encoder Representations from Transformers (BERT) and Generative Pre-trained Transformers (GPT) must be taken into account. Deep learning models are more likely to give better accuracy than traditional machine learning models, and they use word embeddings rather than a bag of words technique to understand semantics better.

Downloads

Published

2024-10-08