As always, we start by loading the relevant packages as well as our data:
movies <- read.csv("/home/rstudio/data/movie_ratings2015.csv")
View(movies)
library(ggplot2)
ggplot(movies, aes(RottenTomatoes, RottenTomatoes_User))+geom_point()
ggplot(movies, aes(RottenTomatoes, RottenTomatoes_User))+geom_point()+geom_smooth()
ggplot(movies, aes(RottenTomatoes, RottenTomatoes_User))+geom_point()+geom_smooth(method="lm")
ggplot(movies, aes(RottenTomatoes, RottenTomatoes_User))+geom_point()+geom_smooth(method="lm", se=FALSE)
Where values are missing you may need to add use=“complete” as shown below:
library(dplyr)
movies %>% summarize(cor(RottenTomatoes, RottenTomatoes_User))
movies %>% summarize(cor(RottenTomatoes, RottenTomatoes_User, use="complete"))
movies %>% summarize(mean(RottenTomatoes), mean(RottenTomatoes_User))
movies %>% summarize(mean(RottenTomatoes,na.rm=TRUE), mean(RottenTomatoes_User,na.rm=TRUE))
movies %>% summarize(RT = mean(RottenTomatoes,na.rm=TRUE), RT_user = mean(RottenTomatoes_User,na.rm=TRUE))
Standardize <- function(x){
(x-mean(x, na.rm=TRUE))/sd(x, na.rm=TRUE)}
movies %>% mutate(zRT = Standardize(RottenTomatoes), zRT_User = Standardize(RottenTomatoes_User)) %>%
ggplot(aes(zRT, zRT_User))+geom_point()
## or ##
movies <- movies %>% mutate(zRT = Standardize(RottenTomatoes), zRT_User = Standardize(RottenTomatoes_User))
ggplot(movies, aes(zRT, zRT_User))+geom_point()
We can get the equation for the best-fit line for predicting the number of votes from the budget as follows:
lm( RottenTomatoes_User~RottenTomatoes, data=movies)
lm( zRT_User~zRT, data=movies)
What does this mean? We can plug in possible Rotten Tomatoes ratings and predict Rotten Tomatoes user ratings.
If a movie had a critic rating of \(0\), our linear prediction is that it will have a user rating of 32. If it has a critic rating of \(80\) we would expect 32 + 80 * 0.525 = 74 votes.
Which mammals have the most REM sleep (dream sleep)? Does it depend on the size of the animal? Does it depend on the size of the animal’s brain?
Try to answer this question by making scatterplots, finding correlations, finding best fit lines and interpreting the lines.
You can find out more about this data here.
data(msleep)
View(msleep)