View() commandglimpse() function from the dplyr packagemovies data set below by typing ?moviesThe movies data set in the ggplot2movies package has information and ratings on 28,819 movies. This many data points is a bit unwieldy, so let’s take a random sample of 1000 of these movies. Furthermore, let’s take the variable Comedy and convert it to a yes vs no (binary) categorical variable. Note: you don’t need to understand this code for now, we’ll see this when we study data manipulation.
# Do not edit this section
data(movies)
movies <- movies %>%
sample_n(1000) %>%
mutate(Comedy=ifelse(Comedy==1, "yes", "no"))
You want to know for these 1000 randomly chosen movies: What is the relationship between the year the movie was made and the IMDB rating? Furthermore, I want to distinguish between comedies and non-comedies. In the code block below, write the code that generates a graphic that will answer this for you:
# Write your code here:
ggplot(data=movies, aes(x = year, y = rating, color = Comedy)) + geom_point(alpha = 0.6)
As best you can, answer this question: Within these 1000 movies, do comedies get rated higher than non-comedies?
The spread of both datasets appears indistinguishable, and do not show dissimilar trends.
So, no - within these 100 movies, comedies are not consistently rated higher than non-comedies.
Considering the babynames data set in the babynames package again, we will limit consideration to only the name “Casey”.
# Do not edit this section
data(babynames)
babynames <- babynames %>%
filter(name=="Casey")
I want to know about popularity trends of the name “Casey” as a male name and as a female name over the years. In the code block below, write the code that generates a graphic that will answer this for you:
# Write your code here:
ggplot(data=babynames, aes(x=year, y=n, color=sex)) + geom_line()
Given this graphic, what can you say about the name “Casey”? Don’t merely describe what is already apparent on the graphic, but make a broader statement.
The name Casey increased dramatically beginning in the ’60s to the mid-2000s as both a Male and Female name. It is a slightly more common name for Men, but popularity is not historically differentiated by gender.