q1 <- movies %>%
rename(movie_title = Film, release_year = Year)
q2 = q1 %>%
select(movie_title, release_year, Genre, Profitability)
q3 <- q1 %>%
filter(release_year > 2000, `Rotten Tomatoes %` > 80)
q4 <- movies %>%
mutate(Profitability_millions = Profitability / 1e6)
q5 <- q4 %>%
arrange(desc(`Rotten Tomatoes %`))
q5_2 = q4 %>%
arrange(desc(Profitability_millions))
q6 <- movies %>%
rename(
movie_title = Film,
release_year = Year
) %>%
filter(release_year > 2000, `Rotten Tomatoes %` > 80) %>% ### Filter first
select(movie_title, release_year, Genre, Profitability, `Rotten Tomatoes %`) %>% ### Select after filtering
mutate(Profitability_millions = Profitability / 1e6) %>%
arrange(desc(`Rotten Tomatoes %`), desc(Profitability_millions)) ### Sort after selecting
####From the resulting data, I can tell that the best movies are not necessarily the most popular. The highest rated movies (via rotten tomatoes) dont have the highest profitability. The highest profits are around 11 or 8 mil but have sub-90 rotten tomato scores. The highest rotten tomatoes scores have between 2-8 million in profits.
print(q6)
## # A tibble: 12 × 6
## movie_title release_year Genre Profitability `Rotten Tomatoes %`
## <chr> <dbl> <chr> <dbl> <dbl>
## 1 WALL-E 2008 Animat… 2.90 96
## 2 Midnight in Paris 2011 Romence 8.74 93
## 3 Enchanted 2007 Comedy 4.01 93
## 4 Knocked Up 2007 Comedy 6.64 91
## 5 Waitress 2007 Romance 11.1 89
## 6 A Serious Man 2009 Drama 4.38 89
## 7 Tangled 2010 Animat… 1.37 89
## 8 (500) Days of Summer 2009 comedy 8.10 87
## 9 Rachel Getting Married 2008 Drama 1.38 85
## 10 Jane Eyre 2011 Romance 0 85
## 11 Beginners 2011 Comedy 4.47 84
## 12 My Week with Marilyn 2011 Drama 0.826 83
## # ℹ 1 more variable: Profitability_millions <dbl>
summary_df <- movies %>%
group_by(Genre) %>%
summarize(
average_audience_score = mean(`Audience score %`, na.rm = TRUE), # Assuming column 4 is 'Audience_score'
average_profitability = mean(Profitability, na.rm = TRUE) # Assuming column 5 is 'Profitability_millions'
)
# Display the summary dataframe
print(summary_df)
## # A tibble: 10 × 3
## Genre average_audience_score average_profitability
## <chr> <dbl> <dbl>
## 1 Action 45 1.25
## 2 Animation 70.2 3.76
## 3 Comdy 61 2.65
## 4 Comedy 61.0 3.78
## 5 Drama 67.2 8.41
## 6 Fantasy 81 1.78
## 7 Romance 62.8 3.98
## 8 Romence 84 8.74
## 9 comedy 81 8.10
## 10 romance 84 0.653