This R Notebook contains Part 2 of my analysis of one of the first widely-available sentiment analysis datasets. It is the movie review dataset (movie-pang02.csv) that was obtained from http://boston.lti.cs.cmu.edu/classes/95-865-K/HW/HW3/. More information about the dataset can be found at http://www.cs.cornell.edu/people/pabo/movie-review-data/. Paper associated with the dataset can be found at https://www.cs.cornell.edu/home/llee/papers/cutsent.pdf.
I made two classification models using the movie review dataset. One model was made using the textmodel_nb() function from library(quanteda) and the other model was made using the knn() function from library(class). I followed Text Message Classification a tutorial by Anish Singh Walial located at https://www.r-bloggers.com/text-message-classification/ on R-bloggers. Please see this tutorial if you are interested in the code that I used. After I followed the tutorial using the movie review dataset, I used the same techiniques to make a model using knn() from library(class) with k =3 and k=5.
Note: I am still learning about text analysis and sentiment analysis using R and Python.
print(paste0("Confusion Matrix the Naive Bayes classifier" ))
## [1] "Confusion Matrix the Naive Bayes classifier"
ConTable
## actual
## predicted Neg Pos
## Neg 227 68
## Pos 64 242
print(paste0("Accuracy of Naive Bayes classifier: ",AccuracyPercentNB ))
## [1] "Accuracy of Naive Bayes classifier: 78.0366056572379"
print(paste0("Confusion Matrix for KNN with k = 3" ))
## [1] "Confusion Matrix for KNN with k = 3"
conKNN3
## actual
## predicted Neg Pos
## Neg 197 148
## Pos 94 162
print(paste0("Accuracy of KNN (k=3): ",AccuracyPercentKNN3 ))
## [1] "Accuracy of KNN (k=3): 59.7337770382696"
print(paste0("Confusion Matrix for KNN with k = 5" ))
## [1] "Confusion Matrix for KNN with k = 5"
conKNN5
## actual
## predicted Neg Pos
## Neg 231 185
## Pos 60 125
print(paste0("Accuracy of KNN (k=5): ",AccuracyPercentKNN5 ))
## [1] "Accuracy of KNN (k=5): 59.234608985025"
References:
Links for the dataset that was used.
The following libraries were used and the following blog was used in the making of the graph and models above.
And I also I read the following websites, blogs and book.