I am exceedingly bored over Thanksgiving, so I decided to do some data analysis on a data set of the movies I have watched in the past two years. I have the habit of rating each movie I watch after I watch them, on a scale from 1 to 10. I wanted to compare my ratings to those of Rotten Tomatoes to see how our ratings preferences compared.

While I do not refer to Rotten Tomatoes while providing rankings, I do often use Rotten Tomatoes in selecting movies to watch (selection bias).

Some General Statistics

First take away – while my average score is lower, I have selection into already really really highly rated movies on Rotten Tomatoes(RT). My grades may seem “deflated” only because I have upward bias in my selection–unlike RT I don’t have to watch all movies :).

My Average Score

cat("My Average Score:",mean(movies$My.Rating), "\n")
## My Average Score: 7.75

Average Rotten Tomatoes Score of Movies I Watched

cat("Average RT Score of Movies I Watched:",mean(movies$RT.Score), "\n")
## Average RT Score of Movies I Watched: 8.343548

Average Difference (My Score = Rotten Tomatoes)

cat("Average Difference:",mean(movies$diff))
## Average Difference: -0.5935484

Top 5 Lists

My Top Five Movies

d=datatable(movies[order(-movies$My.Rating),][1:5,], options = list(dom = 't'))
d=formatRound(d,columns='diff',digits=1)
d

Rotten Tomatoes’ Top Five Movies (of this data set)

d=datatable(movies[order(-movies$RT.Score),][1:5,], options = list(dom = 't'))
d=formatRound(d,columns='diff',digits=1)
d

My Biggest “Undervaluations”–i.e. where I liked a movie a lot less than Rotten Tomatoes

d=datatable(movies[order(movies$diff),][1:5,], options = list(dom = 't'))
d=formatRound(d,columns='diff',digits=1)
d

My Biggest “Overvaluations”–i.e. where I liked a movie a lot more than Rotten Tomatoes

d=datatable(movies[order(-movies$diff),][1:5,], options = list(dom = 't'))
d=formatRound(d,columns='diff',digits=1)
d

Graph

library(plotly)
attach(movies)

p = plot_ly(movies, x= ~My.Rating, y= ~RT.Score, text=Movie,color=My.Rating, type ="scatter", mode = "markers", size=My.Rating, name="data") 
p = add_lines(p, x=My.Rating,y=My.Rating,name="45-degree ",line = list(color = "black",dash = 'dash',width=1))
p