The global box office industry is a multi-billion dollar sector that shapes entertainment trends and economic landscapes. This report examines box office revenue data for movies released between 2010 and 2024. The dataset includes worldwide, domestic, and foreign revenues, along with their respective percentage shares. By analyzing this dataset, we can identify patterns in box office success, the dominance of franchise films, and the importance of international markets.
This analysis will provide: - Descriptive statistics of box office revenue trends. - Five unique visualizations to explore revenue distributions and correlations. - Key insights into factors influencing box office performance.
# Load necessary libraries
library(readr)
library(ggplot2)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(knitr)
library(kableExtra)
##
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
##
## group_rows
library(plotly)
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
library(tidyr)
# Load the dataset
df <- read_csv("/Users/jasoncherubini/Desktop/2010-2024 Movies Box Ofice Collection.csv")
## Rows: 2800 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): Release Group, Domestic_percent, Foreign_percent
## dbl (2): Rank, year
## num (3): Worldwide, Domestic, Foreign
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Convert revenue columns to numeric
df <- df %>% mutate(
Worldwide = as.numeric(gsub(",", "", Worldwide)),
Domestic = as.numeric(gsub(",", "", Domestic)),
Foreign = as.numeric(gsub(",", "", Foreign)),
Domestic_percent = as.numeric(gsub("%", "", Domestic_percent)) / 100,
Foreign_percent = as.numeric(gsub("%", "", Foreign_percent)) / 100
)
# Remove any NA values that may cause visualization issues
df <- na.omit(df)
# Display basic summary of the data
summary(df)
## Rank Release Group Worldwide Domestic
## Min. : 0.00 Length:2800 Min. :4.811e+06 Min. : 0
## 1st Qu.: 49.75 Class :character 1st Qu.:2.739e+07 1st Qu.: 0
## Median : 99.50 Mode :character Median :5.213e+07 Median : 14232540
## Mean : 99.50 Mean :1.353e+08 Mean : 45280768
## 3rd Qu.:149.25 3rd Qu.:1.363e+08 3rd Qu.: 53351402
## Max. :199.00 Max. :2.799e+09 Max. :936662225
## Domestic_percent Foreign Foreign_percent year
## Min. :0.000 Min. :0.000e+00 Min. :0.000 Min. :2010
## 1st Qu.:0.000 1st Qu.:1.797e+07 1st Qu.:0.502 1st Qu.:2013
## Median :0.270 Median :3.566e+07 Median :0.730 Median :2016
## Mean :0.287 Mean :8.997e+07 Mean :0.713 Mean :2016
## 3rd Qu.:0.498 3rd Qu.:8.793e+07 3rd Qu.:1.000 3rd Qu.:2020
## Max. :1.000 Max. :1.941e+09 Max. :1.000 Max. :2023
# Summary statistics
descriptive_stats <- df %>% summarise(
Mean_Worldwide = mean(Worldwide, na.rm = TRUE),
Median_Worldwide = median(Worldwide, na.rm = TRUE),
SD_Worldwide = sd(Worldwide, na.rm = TRUE),
Min_Worldwide = min(Worldwide, na.rm = TRUE),
Max_Worldwide = max(Worldwide, na.rm = TRUE),
Mean_Domestic = mean(Domestic, na.rm = TRUE),
SD_Domestic = sd(Domestic, na.rm = TRUE),
Mean_Foreign = mean(Foreign, na.rm = TRUE),
SD_Foreign = sd(Foreign, na.rm = TRUE)
)
kable(descriptive_stats, caption = "Descriptive Statistics for Worldwide, Domestic, and Foreign Revenue") %>% kable_styling()
| Mean_Worldwide | Median_Worldwide | SD_Worldwide | Min_Worldwide | Max_Worldwide | Mean_Domestic | SD_Domestic | Mean_Foreign | SD_Foreign |
|---|---|---|---|---|---|---|---|---|
| 135252367 | 52132482 | 226425143 | 4810790 | 2799439100 | 45280768 | 85158784 | 89970983 | 152146157 |
Insight: The scatter plot reveals a strong correlation between domestic and worldwide revenue. However, some movies achieve higher foreign earnings, indicating the influence of global distribution and marketing strategies. The rise of the global box office has changed a significant amount of the marketing and distribution plans for films, both large and small.
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
trend_data <- df %>% group_by(year) %>% summarise(Total_Revenue = sum(Worldwide, na.rm = TRUE))
ggplot(trend_data, aes(x = year, y = Total_Revenue)) + geom_line(color = “red”, size = 1) + geom_point(color = “black”) + labs(title = “Total Box Office Revenue Trends (2010-2024)”, x = “Year”, y = “Total Worldwide Revenue ($)”) + theme_minimal()Insight: Revenue trends fluctuate, with notable peaks likely corresponding to blockbuster releases and franchise films. The impact of external factors, such as the COVID-19 pandemic, can also be seen in revenue dips. The post pandemic box office revenues are expected to be driven by large tentpole movies with mid-budget movies underperforming. This leads to an opening for small budget independent films.
Insight: The density plot provides a clear view of how revenue is distributed among movies. Peaks in the distribution highlight which revenue ranges are most common, giving insights into industry performance.
Insight: The top-grossing movies tend to be franchise films, sequels, or superhero movies, reinforcing the financial dominance of established intellectual properties. While individual movie-goers often say they want new and original films, the evidence shows that this is not where they spend their money.
This report analyzed box office trends using five unique visualizations. The findings emphasize: - The high correlation between domestic and global revenue. - The dominance of franchise films in revenue generation. - The importance of foreign markets in shaping success.
End of Report