This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
library(dplyr) library(readr)
# 1. Load the movies dataset # movies <- read_csv(“https://gist.githubusercontent.com/tiangechen/b68782efa49a16edaf07dc2cdaa855ea/raw/0c794a9717f18b094eabab2cd6a6b9a226903577/movies.csv”) # # Rename the “Film” column to “movie_title” and “Year” to “release_year”.
q1 <- movies %>%
rename(movie_title = Film, release_year = Year )
print(head(q1))
# 2. select(): (4 points) # Create a new dataframe with only the columns: movie_title, release_year, Genre, Profitability,
q2 <- q1 %>% select(movie_title, release_year, Genre, Profitability) print(head(q2))
# 3. filter(): (4 points) # Filter the dataset to include only movies released after 2000 with a Rotten Tomatoes % higher than 80.
q3 <- q1 %>% filter(release_year > 2000,
Rotten Tomatoes %
> 80)
head(q3)
# 4. mutate(): (4 points) # Add a new column called “Profitability_millions” that converts the Profitability to millions of dollars.
q4 <- q1 %>% mutate(Profitability_millions = Profitability / 1000000)
head(q4)
# 5. arrange(): (3 points) # Sort the filtered dataset by Rotten Tomatoes % in descending order, and then by Profitability in descending order.
q5 <- q4 %>% arrange(desc(‘Rotten Tomatoes %’) , desc(Profitability_millions)) head(q5)
# 6. Combining functions: (3 points) #Use the pipe operator (%>%) to chain these operations together, starting with the original dataset and ending with a final dataframe that incorporates all the above transformations.
q6 <- q5 %>% rename(movie_title = Film, release_year = Year )
select(movie_title, release_year, Genre, Profitability)
filter(release_year > 2000, Rotten Tomatoes %
> 80)
mutate(Profitability_millions = Profitability / 1000000)
arrange(desc(‘Rotten Tomatoes %’) , desc(Profitability_millions)) head
(q5)
# 7. Interpret question 6 (1 point) # From the resulting data, are the best movies the most popular?
Yes the resulting data shows that the movies that are ranked the best on rotten tomatoes are also ranked the most popular by the audience.
EXTRA CREDIT (4 points) # Create a summary dataframe that shows the average rating and Profitability_millions for movies by Genre. Hint: You’ll need to use group_by() and summarize().