Sentiment Analysis using Unlabeled Email data

EasyChair Preprint 2080

6 pages•Date: December 1, 2019

Abstract

Sentiment Analysis (SA) in the context of text mining is an automated process to detect subjectivity information, such as opinions, attitudes, emotions and feeling. Most prior work in SA view it as a text classification problem which needs labeled data to train the model. However, it is tough to get a labeled dataset. Most of the times we will need to do it by hand. Another issue is that the lack of portability across different domains makes it hard to use the same labeled data in different applications. Thus, we need to create labeled data for each domain manually. In this paper, we will use sentiment analysis to analyze the Enron email dataset. This work aims to find the best techniques to label the dataset automatically and avoid manual labeling. The training data is used to build a classifier using a supervised machine learning algorithm. In the labeling phase, we compare the lexicon labeling with k-mean labeling. Lexicon labeling gave better and reliable results. We used this labeled dataset to train the classifier. We used TF-IDF for feature extraction, to train Naïve Bayes and Support vector machine (SVM) classifiers.

Keyphrases: Chi-square, Email data, K-means, Semantic Orientation, Sentiment Analysis, Support Vector Machine, TFIDF, Target Label, email dataset, enron email dataset, feature extraction, frequency inverse document frequency, k-mean labeling, labeled dataset, lexicon labeling, negative email, sentiment classification, stop word

Links:

https://easychair.org/publications/preprint/HkRB

BibTeX entry

BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:

@booklet{EasyChair:2080,
  author    = {Rayan Salah and Neamat El Gayar},
  title     = {Sentiment Analysis using Unlabeled Email data},
  howpublished = {EasyChair Preprint 2080},
  year      = {EasyChair, 2019}}

Download PDF Open PDF in browser