I. Example Analysis: Exploring the Movie Dataset

As always, we start by loading the relevant packages as well as our data:

movies <- read.csv("/home/rstudio/data/movie_ratings2015.csv")
View(movies)

Making Scatterplots

library(ggplot2)

ggplot(movies, aes(RottenTomatoes, RottenTomatoes_User))+geom_point()
ggplot(movies, aes(RottenTomatoes, RottenTomatoes_User))+geom_point()+geom_smooth()

ggplot(movies, aes(RottenTomatoes, RottenTomatoes_User))+geom_point()+geom_smooth(method="lm")

ggplot(movies, aes(RottenTomatoes, RottenTomatoes_User))+geom_point()+geom_smooth(method="lm", se=FALSE)

Finding Correlations and Means

Where values are missing you may need to add use=“complete” as shown below:

library(dplyr)

movies %>% summarize(cor(RottenTomatoes, RottenTomatoes_User))

movies %>% summarize(cor(RottenTomatoes, RottenTomatoes_User, use="complete"))

movies %>% summarize(mean(RottenTomatoes), mean(RottenTomatoes_User))

movies %>% summarize(mean(RottenTomatoes,na.rm=TRUE), mean(RottenTomatoes_User,na.rm=TRUE))

movies %>% summarize(RT = mean(RottenTomatoes,na.rm=TRUE), RT_user = mean(RottenTomatoes_User,na.rm=TRUE))

Computing and Graphing z-scores

Standardize <- function(x){
  (x-mean(x, na.rm=TRUE))/sd(x, na.rm=TRUE)}

movies %>% mutate(zRT = Standardize(RottenTomatoes), zRT_User = Standardize(RottenTomatoes_User)) %>%
  ggplot(aes(zRT, zRT_User))+geom_point()

## or ##

movies <- movies %>% mutate(zRT = Standardize(RottenTomatoes), zRT_User = Standardize(RottenTomatoes_User))

ggplot(movies, aes(zRT, zRT_User))+geom_point()

Finding and Interpreting Equations for Best Fit Lines

We can get the equation for the best-fit line for predicting the number of votes from the budget as follows:

lm( RottenTomatoes_User~RottenTomatoes, data=movies)

lm( zRT_User~zRT, data=movies)

What does this mean? We can plug in possible Rotten Tomatoes ratings and predict Rotten Tomatoes user ratings.

If a movie had a critic rating of \(0\), our linear prediction is that it will have a user rating of 32. If it has a critic rating of \(80\) we would expect 32 + 80 * 0.525 = 74 votes.

II. Why do some mammals sleep more than others?

Which mammals have the most REM sleep (dream sleep)? Does it depend on the size of the animal? Does it depend on the size of the animal’s brain?

Try to answer this question by making scatterplots, finding correlations, finding best fit lines and interpreting the lines.

You can find out more about this data here.

data(msleep)
View(msleep)