Assignment 2

Author

Emma Whipkey

BAN 350 Assignment 2

Distribution and Temporal Patterns in Icelandic Movie Attendance

The Movie Attendance in Iceland dataset, created by Þorsteinn Aðalsteinsson, contains weekly Icelandic movie attendance and box office data from 2017 to 2021.

Icelandic Box Office Monthly Averages by Year

Code
library(tidyverse)
ice_movies <- read_delim(
  "https://query.data.world/s/pmbfldxflx7ttdyfs23cx3abehcl5c",
  delim = ";",
  escape_double = FALSE,
  trim_ws = TRUE,
  locale = locale(encoding = "ISO-8859-1")
)

After viewing the dataset, I wanted to see if one or more years in the dataset’s time range outperformed the others in average gross box office weekend earnings. I first used mutate() to split months and years from the blog.date column, then calculated the average box office gross per month using summarize(). I also concatenated the month and year columns into a Month/Year format using paste(), as this label could be useful in my visualization.

Code
ice_movies_mean <- ice_movies |> mutate(month = month(blog.date, label = TRUE), year = year(blog.date)) |> group_by(month, year) |> drop_na() |> summarize(gross.box.o.weekend = mean(gross.box.o.weekend))
ice_movies_mean$monthyear = paste(ice_movies_mean$month, ice_movies_mean$year, sep=" ")

I used a box-and-whisker plot with jittered points to visualize the average box office gross weekends per month. A box-and-whisker plot is useful for visualizing multiple data points in a specific time interval - in this case, twelve box office weekend averages for each year. As the ice_movies_mean data frame contains only 44 observations, I added the jittered points to see the underlying distribution of the data and identify potential outliers. I added labels for the 5 months with the highest average box office gross weekends. I also made sure to specify in the y axis label that the currency used in the dataset is the Icelandic króna.

Code
ggplot(ice_movies_mean, aes(year, gross.box.o.weekend, group = year)) +
geom_boxplot(width = 0.5, fill = "aquamarine", alpha = 0.2, outlier.shape = NA) +
geom_jitter(position = position_jitter(seed = 1), color="black", size=0.8, alpha=0.9) +
geom_text(aes(label=ifelse(gross.box.o.weekend>800000, as.character(monthyear),'')), position = position_jitter(seed = 1), hjust=0.5, vjust=2, size = 2) +
scale_y_continuous(labels = scales::dollar_format(prefix="kr")) +
labs(title = "Average Monthly Gross Box Office Weekends") +
xlab("Year") +
ylab("Average Monthly Gross Box Office Weekend Earnings")

With four of the five best-performing box office monthly averages, 2018 stands out as Iceland’s top box office year in the dataset’s time range. January 2020 is a notable outlier; social distancing during the COVID-19 pandemic, which reached Iceland in February 2020, could explain the depressed box office earnings for the rest of the year. The dataset also contains relatively few observations for 2021, as the dataset creator stopped updating in January 2021. A tidier visualization could omit 2021 data entirely.

2018’s Highest Grossing Box Office Weekends by Month

Was there a clear box office winner, or winners, that put four months in 2018 at the top of the average monthly gross box office weekends?

I used mutate() again to split months and years from the blog.date column, then used subset() to select only observations from 2018, and slice_max() to return the highest gross.box.o.weekend value for each month.

Code
ice_movies_monthyear <- ice_movies |> mutate(month = month(blog.date, label = TRUE), year = year(blog.date)) |> group_by(month, year) |> drop_na() 
ice_movies_2018 <- subset(ice_movies_monthyear, year == '2018')
ice_movies_2018_top12 <- ice_movies_2018 |> slice_max(gross.box.o.weekend, n = 1)

I chose a simpler histogram to visualize each month’s highest-grossing box office weekend; the reader can easily determine if there are any extreme or unusual values in the time series. Each bar is labeled with the highest-earning film’s name.

Code
ice_movies_2018_top12 |> 
arrange(month) |> 
mutate(month = factor(month, levels=c("Dec", "Nov", "Oct", "Sep", "Aug", "Jul", "Jun", "May", "Apr", "Mar", "Feb", "Jan"))) |> 
ggplot(aes(x = month, y = gross.box.o.weekend, fill = film.title)) +
geom_col() +
coord_flip() +
geom_text(aes(label = film.title, size = 0.3, hjust = 1.02)) +
scale_y_continuous(labels = scales::dollar_format(prefix="kr")) +
labs(
x = 'Month',
y = 'Gross Box Office Weekend Earnings',
title = '2018 Highest Grossing Box Office Weekends by Month'
) +
theme(legend.position = "none")

Iceland’s highest-grossing box office weekend in 2018 belongs to Lof mér að falla (Let Me Fall), one of two domestically produced films to appear in the top twelve. At six appearances, Disney-produced films dominate the rankings; Black Panther evenappears twice in February and March. However, neither of the top four average grossing months (January, July, October, and November) have the highest grossing box office weekend. It is likely that the cumulative weekend gross across all films in theatres put these months ahead, which is not shown in this visualization.

Conclusion

My first visualization helped me identify 2018 as the best performing box office year by monthly average gross box office weekends, and my second visualization drilled down on which 2018 films topped the box office weekend earnings. As noted above, merely looking at the films with the monthly top weekend gross did not reveal why 2018 had the top four average grossing months in the first visualization. A further analysis of the monthly gross of all films in 2018 could be explored in a visualization such as a stacked bar chart or streamgraph.