1. rename(): (4 points)

Rename the “Film” column to “movie_title” and “Year” to “release_year”.

q1 <- movies %>%
  rename(movie_title = Film, release_year = Year) 

2. select(): (4 points)

Create a new dataframe with only the columns: movie_title, release_year, Genre, Profitability,

q2 = q1 %>%
  select(movie_title, release_year, Genre, Profitability)

3. filter(): (4 points)

Filter the dataset to include only movies released after 2000 with a Rotten Tomatoes % higher than 80.

q3 <- q1 %>%
  filter(release_year > 2000, `Rotten Tomatoes %` > 80)

4. mutate(): (4 points)

Add a new column called “Profitability_millions” that converts the Profitability to millions of dollars.

q4 <- movies %>%
  mutate(Profitability_millions = Profitability / 1e6)

5. arrange(): (3 points)

Sort the filtered dataset by Rotten Tomatoes % in descending order, and then by Profitability in descending order. five <- four %>% arrange(desc(Rotten Tomatoes %) , desc(Profitability_millions))

q5 <- q4 %>%
  arrange(desc(`Rotten Tomatoes %`))
q5_2 = q4 %>% 
  arrange(desc(Profitability_millions))

6. Combining functions: (3 points)

Use the pipe operator (%>%) to chain these operations together, starting with the original dataset and ending with a final dataframe that incorporates all the above transformations.

q6 <- movies %>%
  rename(
    movie_title = Film,
    release_year = Year
  ) %>%
  filter(release_year > 2000, `Rotten Tomatoes %` > 80) %>%  ###  Filter first
  select(movie_title, release_year, Genre, Profitability, `Rotten Tomatoes %`) %>%  ###  Select after filtering
  mutate(Profitability_millions = Profitability / 1e6) %>%
  arrange(desc(`Rotten Tomatoes %`), desc(Profitability_millions))  ###  Sort after selecting

7. Interpret question 6 (1 point)

EXTRA CREDIT (4 points)

Create a summary dataframe that shows the average rating and Profitability_millions for movies by Genre. Hint: You’ll need to use group_by() and summarize().

summary_df <- movies %>%
  group_by(Genre) %>%
  summarize(
    average_audience_score = mean(`Audience score %`, na.rm = TRUE),  # Assuming column 4 is 'Audience_score'
    average_profitability = mean(Profitability, na.rm = TRUE)  # Assuming column 5 is 'Profitability_millions'
  )
# Display the summary dataframe
print(summary_df)
## # A tibble: 10 × 3
##    Genre     average_audience_score average_profitability
##    <chr>                      <dbl>                 <dbl>
##  1 Action                      45                   1.25 
##  2 Animation                   70.2                 3.76 
##  3 Comdy                       61                   2.65 
##  4 Comedy                      61.0                 3.78 
##  5 Drama                       67.2                 8.41 
##  6 Fantasy                     81                   1.78 
##  7 Romance                     62.8                 3.98 
##  8 Romence                     84                   8.74 
##  9 comedy                      81                   8.10 
## 10 romance                     84                   0.653