We’ll first load the data used for all the exercises
library(tidyverse)
library(magrittr)
t1 <- "https://github.com/thomasjwood/code_lab/raw/main/data/box_office_jan_00_sep_23.rds" %>%
url %>%
readRDS
Generate a new variable,
wday, which indicates the day of the week for each row. Report the mean inflation adjusted box office (currently in therev_adjvariable) for each year and day of the week, and then print a summary table where the days are indicated in the columns, and the rows are separate by years.
t1 %>%
mutate(
wday = date %>%
wday(label = T),
year = date %>%
year
) %>%
group_by(wday, year) %>%
summarize(
mu = rev_adj %>%
mean %>%
divide_by(1e5) %>%
round(1)
) %>%
pivot_wider(
names_from = "wday",
values_from = "mu"
) %>%
print(
n = Inf
)
## # A tibble: 24 × 8
## year Sun Mon Tue Wed Thu Fri Sat
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 2000 24 9.8 8.9 8.4 8.4 29.2 40.5
## 2 2001 27.1 11.3 9.9 10.5 9.5 30.5 39.1
## 3 2002 17.8 7.5 6.7 6.7 7 20.5 26.7
## 4 2003 11.9 5.1 4.7 5.2 5.1 13.7 18
## 5 2004 13 6 4.9 5.6 5.7 15 19.6
## 6 2005 12.4 5.8 4.6 4.9 5.2 14.6 18.4
## 7 2006 11.6 5.3 4.4 4.1 4.1 13.6 17.3
## 8 2007 10.3 4.6 3.9 4.2 3.8 12.2 15.1
## 9 2008 9.1 3.9 3.6 3.3 3.7 11 13.5
## 10 2009 10.3 4.5 4.1 4.7 4.4 13 15.4
## 11 2010 10.9 5 4.5 4.6 4.5 13.2 16.1
## 12 2011 9.6 4.3 3.9 3.6 3.7 11.8 14
## 13 2012 10 4.5 4.7 3.8 3.7 12.2 14.8
## 14 2013 9.7 4.3 4.2 4 4.1 11.8 14.4
## 15 2014 9.2 3.9 4 3.5 3.3 11.5 14
## 16 2015 10.9 4.5 4.7 4 3.7 13.5 15.8
## 17 2016 9.9 4.1 4.3 3.4 3.2 11.8 13.9
## 18 2017 9.1 3.8 4.3 3.2 3 10.8 12.9
## 19 2018 9 3.8 4.5 3.5 3 10.3 12.5
## 20 2019 8.2 3.4 4.4 3.1 2.9 9.8 11.5
## 21 2020 3.9 1.7 1.8 1.7 1.4 4.8 6.1
## 22 2021 6.6 3.8 3.7 2.9 2.6 11.9 9.9
## 23 2022 9.2 5.3 5.6 3.3 2.9 15.4 12.9
## 24 2023 11.1 6.6 7.4 4 3.6 16.4 15.1
Estimate films’ adjusted box office for their first weekend in release. By year, report the film with the highest grossing opening weekend.
Apologies, this one was harder than I anticipated! You needed to know
a little bit of lubridate to get this one…
t1 %>%
mutate(
week = date %>%
week,
year = date %>%
year,
day_type = date %>%
wday(label = T) %>%
is_in(
c("Sat",
"Sun")
) %>%
ifelse(
"weekend",
"weekday"
)
) %>%
group_by(
year, title, week, day_type
) %>%
summarize(
tote = rev_adj %>%
sum %>%
divide_by(1e6) %>%
round(1)
) %>%
filter(
day_type == "weekend"
) %>%
arrange(
year, desc(tote)
) %>%
group_by(year) %>%
slice(1) %>%
print(n = Inf)
## # A tibble: 24 × 5
## # Groups: year [24]
## year title week day_type tote
## <dbl> <chr> <dbl> <chr> <dbl>
## 1 2000 Mission: Impossible II 22 weekend 70.3
## 2 2001 Harry Potter and the Sorcerer's Stone 46 weekend 95.8
## 3 2002 Spider-Man 18 weekend 123.
## 4 2003 The Matrix Reloaded 20 weekend 96.2
## 5 2004 Shrek 2 21 weekend 124.
## 6 2005 Star Wars: Episode III - Revenge of the Sith 21 weekend 112.
## 7 2006 Pirates of the Caribbean: Dead Man's Chest 28 weekend 87.4
## 8 2007 Spider-Man 3 18 weekend 129.
## 9 2008 The Dark Knight 29 weekend 124
## 10 2009 Transformers: Revenge of the Fallen 26 weekend 98.5
## 11 2010 Iron Man 2 19 weekend 103.
## 12 2011 Harry Potter and the Deathly Hallows: Part 2 29 weekend 102.
## 13 2012 The Avengers 19 weekend 127.
## 14 2013 Iron Man 3 18 weekend 132.
## 15 2014 The Hunger Games: Mockingjay - Part 1 47 weekend 82.5
## 16 2015 Star Wars: Episode VII - The Force Awakens 51 weekend 159.
## 17 2016 Captain America: Civil War 19 weekend 126.
## 18 2017 Beauty and the Beast 12 weekend 103.
## 19 2018 Avengers: Infinity War 17 weekend 176.
## 20 2019 Avengers: Endgame 17 weekend 228.
## 21 2020 Bad Boys for Life 3 weekend 44
## 22 2021 Spider-Man: No Way Home 51 weekend 149.
## 23 2022 Black Panther: Wakanda Forever 46 weekend 97.1
## 24 2023 Barbie 30 weekend 72.5
By year, report the film which most increased their second weekend gross.
t1 %>%
mutate(
week = date %>%
week,
year = date %>%
year,
day_type = date %>%
wday(label = T) %>%
is_in(
c("Sat",
"Sun")
) %>%
ifelse(
"weekend",
"weekday"
)
) %>%
group_by(
year, title, week, day_type
) %>%
summarize(
tote = rev_adj %>% sum %>% divide_by(1e6) %>% round(1)
) %>%
filter(
day_type == "weekend"
) %>%
group_by(
year, title
) %>%
mutate(
t2 = c(NA, tote %>% diff)
) %>%
na.omit %>%
slice(1) %>%
arrange(
year, desc(t2)
) %>%
group_by(year) %>%
slice(1) %>%
print(n = Inf)
## # A tibble: 24 × 6
## # Groups: year [24]
## year title week day_type tote t2
## <dbl> <chr> <dbl> <chr> <dbl> <dbl>
## 1 2000 Miss Congeniality 53 weekend 15.8 5.4
## 2 2001 A Beautiful Mind 52 weekend 9.1 8.7
## 3 2002 The Ring 43 weekend 20.2 3.7
## 4 2003 Brother Bear 44 weekend 30.9 30.6
## 5 2004 The Aviator 52 weekend 13.1 12.2
## 6 2005 Corpse Bride 39 weekend 18.6 18.2
## 7 2006 Pirates of the Caribbean: Dead Man's Chest 28 weekend 87.4 22.9
## 8 2007 Michael Clayton 41 weekend 10 9.3
## 9 2008 The Bucket List 2 weekend 17.7 17.4
## 10 2009 Gran Torino 2 weekend 27.1 24.5
## 11 2010 The Fighter 51 weekend 11 10.7
## 12 2011 Mission: Impossible - Ghost Protocol 52 weekend 25.9 14.6
## 13 2012 The Avengers 19 weekend 127. 38.7
## 14 2013 Frozen 48 weekend 50.9 50.7
## 15 2014 Lone Survivor 2 weekend 29 28.9
## 16 2015 Selma 2 weekend 9.3 8.8
## 17 2016 The Revenant 2 weekend 31.1 30.7
## 18 2017 Beauty and the Beast 12 weekend 103. 28.4
## 19 2018 Aquaman 51 weekend 46.8 43.4
## 20 2019 Fighting with My Family 8 weekend 6 5.9
## 21 2020 1917 2 weekend 26.1 25.6
## 22 2021 The French Dispatch 44 weekend 1.6 0.7
## 23 2022 Bodies Bodies Bodies 33 weekend 1.9 1.8
## 24 2023 Barbie 30 weekend 72.5 28.2