1. rename(): (4 points)

Rename the “Film” column to “movie_title” and “Year” to “release_year”.

one <- movies %>% 
  rename(movie_title = Film , release_year = Year)

head(one)
## # A tibble: 6 × 8
##   movie_title               Genre `Lead Studio` `Audience score %` Profitability
##   <chr>                     <chr> <chr>                      <dbl>         <dbl>
## 1 Zack and Miri Make a Por… Roma… The Weinstei…                 70          1.75
## 2 Youth in Revolt           Come… The Weinstei…                 52          1.09
## 3 You Will Meet a Tall Dar… Come… Independent                   35          1.21
## 4 When in Rome              Come… Disney                        44          0   
## 5 What Happens in Vegas     Come… Fox                           72          6.27
## 6 Water For Elephants       Drama 20th Century…                 72          3.08
## # ℹ 3 more variables: `Rotten Tomatoes %` <dbl>, `Worldwide Gross` <chr>,
## #   release_year <dbl>

2. select(): (4 points)

Create a new dataframe with only the columns: movie_title, release_year, Genre, Profitability

two <- one %>% 
  select(movie_title, release_year, Genre, Profitability)

head(two)
## # A tibble: 6 × 4
##   movie_title                        release_year Genre   Profitability
##   <chr>                                     <dbl> <chr>           <dbl>
## 1 Zack and Miri Make a Porno                 2008 Romance          1.75
## 2 Youth in Revolt                            2010 Comedy           1.09
## 3 You Will Meet a Tall Dark Stranger         2010 Comedy           1.21
## 4 When in Rome                               2010 Comedy           0   
## 5 What Happens in Vegas                      2008 Comedy           6.27
## 6 Water For Elephants                        2011 Drama            3.08

3. filter(): (4 points)

Filter the dataset to include only movies released after 2000 with a Rotten Tomatoes % higher than 80

three <- one %>% 
  select(movie_title, release_year, Genre, Profitability , `Rotten Tomatoes %`) %>% 
  filter(release_year > 2000, `Rotten Tomatoes %` > 80)

head(three)
## # A tibble: 6 × 5
##   movie_title            release_year Genre    Profitability `Rotten Tomatoes %`
##   <chr>                         <dbl> <chr>            <dbl>               <dbl>
## 1 WALL-E                         2008 Animati…         2.90                   96
## 2 Waitress                       2007 Romance         11.1                    89
## 3 Tangled                        2010 Animati…         1.37                   89
## 4 Rachel Getting Married         2008 Drama            1.38                   85
## 5 My Week with Marilyn           2011 Drama            0.826                  83
## 6 Midnight in Paris              2011 Romence          8.74                   93

4. mutate(): (4 points)

Add a new column called “Profitability_millions” that converts the Profitability to millions of dollars.

four <- three %>% 
  mutate(Profitability_millions = Profitability * 1000000)

head(four)
## # A tibble: 6 × 6
##   movie_title            release_year Genre    Profitability `Rotten Tomatoes %`
##   <chr>                         <dbl> <chr>            <dbl>               <dbl>
## 1 WALL-E                         2008 Animati…         2.90                   96
## 2 Waitress                       2007 Romance         11.1                    89
## 3 Tangled                        2010 Animati…         1.37                   89
## 4 Rachel Getting Married         2008 Drama            1.38                   85
## 5 My Week with Marilyn           2011 Drama            0.826                  83
## 6 Midnight in Paris              2011 Romence          8.74                   93
## # ℹ 1 more variable: Profitability_millions <dbl>

5. arrange(): (3 points)

Sort the filtered dataset by Rotten Tomatoes % in descending order, and then by Profitability in descending order. five <- four %>% arrange(desc(Rotten Tomatoes %) , desc(Profitability_millions))

five <- four %>% 
  arrange(desc(`Rotten Tomatoes %`), desc(Profitability_millions))

head(five)
## # A tibble: 6 × 6
##   movie_title       release_year Genre     Profitability `Rotten Tomatoes %`
##   <chr>                    <dbl> <chr>             <dbl>               <dbl>
## 1 WALL-E                    2008 Animation          2.90                  96
## 2 Midnight in Paris         2011 Romence            8.74                  93
## 3 Enchanted                 2007 Comedy             4.01                  93
## 4 Knocked Up                2007 Comedy             6.64                  91
## 5 Waitress                  2007 Romance           11.1                   89
## 6 A Serious Man             2009 Drama              4.38                  89
## # ℹ 1 more variable: Profitability_millions <dbl>

6. Combining functions: (3 points)

Use the pipe operator (%>%) to chain these operations together, starting with the original dataset and ending with a final dataframe that incorporates all the above transformations.

six <- movies %>% 
  rename(movie_title = Film , release_year = Year) %>% 
  select(movie_title, release_year, Genre, Profitability, `Rotten Tomatoes %`) %>% 
  filter(release_year > 2000, `Rotten Tomatoes %` > 80) %>% 
  mutate(Profitability_millions = Profitability * 1000000) %>% 
  arrange(desc(`Rotten Tomatoes %`), desc(Profitability_millions))

head(six)
## # A tibble: 6 × 6
##   movie_title       release_year Genre     Profitability `Rotten Tomatoes %`
##   <chr>                    <dbl> <chr>             <dbl>               <dbl>
## 1 WALL-E                    2008 Animation          2.90                  96
## 2 Midnight in Paris         2011 Romence            8.74                  93
## 3 Enchanted                 2007 Comedy             4.01                  93
## 4 Knocked Up                2007 Comedy             6.64                  91
## 5 Waitress                  2007 Romance           11.1                   89
## 6 A Serious Man             2009 Drama              4.38                  89
## # ℹ 1 more variable: Profitability_millions <dbl>

7. Interpret question 6 (1 point)

From the resulting data, are the best movies the most popular?

We can measure the best movies through financial success (profitability) and popularity through the rotten tomatoes ratings. Although most of the most financially successful movies have a high rotten tomatoes rating, financial success does not guarantee popularity. For instance, Waitress with a rotten tomatoes rating of 89% had 11 million dollars of profit, but Wall-E with a rotten tomatoes of 96% had lower finanical success of 2 millions dollars.

EXTRA CREDIT (4 points)

Create a summary dataframe that shows the average rating and Profitability_millions for movies by Genre. Hint: You’ll need to use group_by() and summarize().

five1 <- five %>%
  mutate(Genre = case_when(
    Genre == "Romence" ~ "Romance",
    Genre == "comedy" ~ "Comedy",
    TRUE ~ Genre))

genre_summary <- five1 %>%
  group_by(Genre) %>%
  summarize(
    avg_rating = mean(`Rotten Tomatoes %`, na.rm = TRUE),
    avg_profitability_millions = mean(Profitability_millions, na.rm = TRUE))

head(genre_summary)
## # A tibble: 4 × 3
##   Genre     avg_rating avg_profitability_millions
##   <chr>          <dbl>                      <dbl>
## 1 Animation       92.5                   2130856.
## 2 Comedy          88.8                   5802503.
## 3 Drama           85.7                   2197608.
## 4 Romance         89                     6611482.