Matthew Landowski
November 2015
The Question
Can a model be developed to predict sentiment from a review text?
Example text of a review
“Mad Max Pizza was amazing!!!”
| stars | count | percentage of data |
|---|---|---|
| 1 | 159811 | 10.18% |
| 2 | 140608 | 8.96% |
| 3 | 222719 | 14.19% |
| 4 | 466599 | 29.73% |
| 5 | 579527 | 36.93% |
Summary of star ratings in the data set
e.g. “Mad Max Pizza was amazing!!!” => [mad, max, pizza, amazing]
| Pred/Ref | 5 | 4 | 2 | 1 | Total | F-score | Accuracy % |
|---|---|---|---|---|---|---|---|
| 5 | 14203 | 2612 | 118 | 313 | 17246 | 0.7087325 | 0.8235533 |
| 4 | 6666 | 6632 | 402 | 306 | 14006 | 0.5347525 | 0.4735114 |
| 2 | 894 | 1156 | 1234 | 958 | 4242 | 0.3790508 | 0.2258369 |
| 1 | 1071 | 398 | 515 | 2871 | 4855 | 0.6172203 | 0.1060762 |
| Pred/Ref | Good.Rating | Bad.Rating | Total | F-score | Accuracy % |
|---|---|---|---|---|---|
| Good.Rating | 30016 | 1236 | 31252 | 0.9294894 | 0.9604505 |
| Bad.Rating | 3318 | 5779 | 9097 | 0.7173535 | 0.6352644 |
Good and bad ratings can be predicted with good accuracy.