About this R notebook

This R Notebook contains Part 2 of my analysis of one of the first widely-available sentiment analysis datasets. It is the movie review dataset (movie-pang02.csv) that was obtained from http://boston.lti.cs.cmu.edu/classes/95-865-K/HW/HW3/. More information about the dataset can be found at http://www.cs.cornell.edu/people/pabo/movie-review-data/. Paper associated with the dataset can be found at https://www.cs.cornell.edu/home/llee/papers/cutsent.pdf.

Looking at the dataset classes

Classification Models

I made two classification models using the movie review dataset. One model was made using the textmodel_nb() function from library(quanteda) and the other model was made using the knn() function from library(class). I followed Text Message Classification a tutorial by Anish Singh Walial located at https://www.r-bloggers.com/text-message-classification/ on R-bloggers. Please see this tutorial if you are interested in the code that I used. After I followed the tutorial using the movie review dataset, I used the same techiniques to make a model using knn() from library(class) with k =3 and k=5.

Note: I am still learning about text analysis and sentiment analysis using R and Python.

Results of Naive Bayes classifier

print(paste0("Confusion Matrix the Naive Bayes classifier" ))
## [1] "Confusion Matrix the Naive Bayes classifier"
ConTable
##          actual
## predicted Neg Pos
##       Neg 227  68
##       Pos  64 242
print(paste0("Accuracy of Naive Bayes classifier: ",AccuracyPercentNB ))
## [1] "Accuracy of Naive Bayes classifier: 78.0366056572379"

Results of KNN model with k=3

print(paste0("Confusion Matrix for KNN with k = 3" ))
## [1] "Confusion Matrix for KNN with k = 3"
conKNN3
##          actual
## predicted Neg Pos
##       Neg 197 148
##       Pos  94 162
print(paste0("Accuracy of KNN (k=3): ",AccuracyPercentKNN3 ))
## [1] "Accuracy of KNN (k=3): 59.7337770382696"

Results of KNN model with k=5

print(paste0("Confusion Matrix for KNN with k = 5" ))
## [1] "Confusion Matrix for KNN with k = 5"
conKNN5
##          actual
## predicted Neg Pos
##       Neg 231 185
##       Pos  60 125
print(paste0("Accuracy of KNN (k=5): ",AccuracyPercentKNN5 ))
## [1] "Accuracy of KNN (k=5): 59.234608985025"

References

References:

Links for the dataset that was used.

The following libraries were used and the following blog was used in the making of the graph and models above.

And I also I read the following websites, blogs and book.