I. Example Analysis: Exploring the Movie Dataset

As always, we start by loading the relevant packages as well as our data:

movies <- read.csv("/home/rstudio/data/movie_ratings2015.csv")

View(movies)

Making Scatterplots

library(ggplot2)

ggplot(movies, aes(RottenTomatoes, RottenTomatoes_User))+geom_point()

ggplot(movies, aes(RottenTomatoes, RottenTomatoes_User))+geom_point()+geom_smooth()

ggplot(movies, aes(RottenTomatoes, RottenTomatoes_User))+geom_point()+geom_smooth(method="lm")

ggplot(movies, aes(RottenTomatoes, RottenTomatoes_User))+geom_point()+geom_smooth(method="lm", se=FALSE)

Finding Correlations and Means

Where values are missing you may need to add use=“complete” as shown below:

library(dplyr)

movies %>% summarize(cor(RottenTomatoes, RottenTomatoes_User))

movies %>% summarize(cor(RottenTomatoes, RottenTomatoes_User, use="complete"))

movies %>% summarize(mean(RottenTomatoes), mean(RottenTomatoes_User))

movies %>% summarize(mean(RottenTomatoes,na.rm=TRUE), mean(RottenTomatoes_User,na.rm=TRUE))

movies %>% summarize(RT = mean(RottenTomatoes,na.rm=TRUE), RT_user = mean(RottenTomatoes_User,na.rm=TRUE))

Computing and Graphing z-scores

Standardize <- function(x){
  (x-mean(x, na.rm=TRUE))/sd(x, na.rm=TRUE)}

movies %>% mutate(zRT = Standardize(RottenTomatoes), zRT_User = Standardize(RottenTomatoes_User)) %>%
  ggplot(aes(zRT, zRT_User))+geom_point()

## or ##

movies <- movies %>% mutate(zRT = Standardize(RottenTomatoes), zRT_User = Standardize(RottenTomatoes_User))

ggplot(movies, aes(zRT, zRT_User))+geom_point()

Finding and Interpreting Equations for Best Fit Lines

We can get the equation for the best-fit line for predicting the number of votes from the budget as follows:

lm( RottenTomatoes_User~RottenTomatoes, data=movies)

lm( zRT_User~zRT, data=movies)

What does this mean? We can plug in possible Rotten Tomatoes ratings and predict Rotten Tomatoes user ratings.

If a movie had a critic rating of \(0\), our linear prediction is that it will have a user rating of 32. If it has a critic rating of \(80\) we would expect 32 + 80 * 0.525 = 74 votes.

Correlation and Best Fit Line Lab

Jared Cross

I. Example Analysis: Exploring the Movie Dataset

Making Scatterplots

Finding Correlations and Means

Computing and Graphing z-scores

Finding and Interpreting Equations for Best Fit Lines

II. Why do some mammals sleep more than others?