I am exceedingly bored over Thanksgiving, so I decided to do some data analysis on a data set of the movies I have watched in the past two years. I have the habit of rating each movie I watch after I watch them, on a scale from 1 to 10. I wanted to compare my ratings to those of Rotten Tomatoes to see how our ratings preferences compared.
While I do not refer to Rotten Tomatoes while providing rankings, I do often use Rotten Tomatoes in selecting movies to watch (selection bias).
First take away – while my average score is lower, I have selection into already really really highly rated movies on Rotten Tomatoes(RT). My grades may seem “deflated” only because I have upward bias in my selection–unlike RT I don’t have to watch all movies :).
My Average Score
cat("My Average Score:",mean(movies$My.Rating), "\n")
## My Average Score: 7.75
Average Rotten Tomatoes Score of Movies I Watched
cat("Average RT Score of Movies I Watched:",mean(movies$RT.Score), "\n")
## Average RT Score of Movies I Watched: 8.343548
Average Difference (My Score = Rotten Tomatoes)
cat("Average Difference:",mean(movies$diff))
## Average Difference: -0.5935484
My Top Five Movies
d=datatable(movies[order(-movies$My.Rating),][1:5,], options = list(dom = 't'))
d=formatRound(d,columns='diff',digits=1)
d
Rotten Tomatoes’ Top Five Movies (of this data set)
d=datatable(movies[order(-movies$RT.Score),][1:5,], options = list(dom = 't'))
d=formatRound(d,columns='diff',digits=1)
d
My Biggest “Undervaluations”–i.e. where I liked a movie a lot less than Rotten Tomatoes
d=datatable(movies[order(movies$diff),][1:5,], options = list(dom = 't'))
d=formatRound(d,columns='diff',digits=1)
d
My Biggest “Overvaluations”–i.e. where I liked a movie a lot more than Rotten Tomatoes
d=datatable(movies[order(-movies$diff),][1:5,], options = list(dom = 't'))
d=formatRound(d,columns='diff',digits=1)
d
library(plotly)
attach(movies)
p = plot_ly(movies, x= ~My.Rating, y= ~RT.Score, text=Movie,color=My.Rating, type ="scatter", mode = "markers", size=My.Rating, name="data")
p = add_lines(p, x=My.Rating,y=My.Rating,name="45-degree ",line = list(color = "black",dash = 'dash',width=1))
p