Introduction

The global box office industry is a multi-billion dollar sector that shapes entertainment trends and economic landscapes. This report examines box office revenue data for movies released between 2010 and 2024. The dataset includes worldwide, domestic, and foreign revenues, along with their respective percentage shares. By analyzing this dataset, we can identify patterns in box office success, the dominance of franchise films, and the importance of international markets.

This analysis will provide: - Descriptive statistics of box office revenue trends. - Five unique visualizations to explore revenue distributions and correlations. - Key insights into factors influencing box office performance.

Loading and Exploring the Data

Click to show/hide code
# Load necessary libraries
library(readr)
library(ggplot2)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(knitr)
library(kableExtra)
## 
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
## 
##     group_rows
library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
library(tidyr)

# Load the dataset
df <- read_csv("/Users/jasoncherubini/Desktop/2010-2024 Movies Box Ofice Collection.csv")
## Rows: 2800 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): Release Group, Domestic_percent, Foreign_percent
## dbl (2): Rank, year
## num (3): Worldwide, Domestic, Foreign
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Convert revenue columns to numeric
df <- df %>% mutate(
  Worldwide = as.numeric(gsub(",", "", Worldwide)),
  Domestic = as.numeric(gsub(",", "", Domestic)),
  Foreign = as.numeric(gsub(",", "", Foreign)),
  Domestic_percent = as.numeric(gsub("%", "", Domestic_percent)) / 100,
  Foreign_percent = as.numeric(gsub("%", "", Foreign_percent)) / 100
)

# Remove any NA values that may cause visualization issues
df <- na.omit(df)

# Display basic summary of the data
summary(df)
##       Rank        Release Group        Worldwide            Domestic        
##  Min.   :  0.00   Length:2800        Min.   :4.811e+06   Min.   :        0  
##  1st Qu.: 49.75   Class :character   1st Qu.:2.739e+07   1st Qu.:        0  
##  Median : 99.50   Mode  :character   Median :5.213e+07   Median : 14232540  
##  Mean   : 99.50                      Mean   :1.353e+08   Mean   : 45280768  
##  3rd Qu.:149.25                      3rd Qu.:1.363e+08   3rd Qu.: 53351402  
##  Max.   :199.00                      Max.   :2.799e+09   Max.   :936662225  
##  Domestic_percent    Foreign          Foreign_percent      year     
##  Min.   :0.000    Min.   :0.000e+00   Min.   :0.000   Min.   :2010  
##  1st Qu.:0.000    1st Qu.:1.797e+07   1st Qu.:0.502   1st Qu.:2013  
##  Median :0.270    Median :3.566e+07   Median :0.730   Median :2016  
##  Mean   :0.287    Mean   :8.997e+07   Mean   :0.713   Mean   :2016  
##  3rd Qu.:0.498    3rd Qu.:8.793e+07   3rd Qu.:1.000   3rd Qu.:2020  
##  Max.   :1.000    Max.   :1.941e+09   Max.   :1.000   Max.   :2023

Descriptive Statistics

To provide an overview of the dataset, we examine key descriptive statistics, including summary measures and the top-grossing movies.

Click to show/hide code
# Summary statistics
descriptive_stats <- df %>% summarise(
  Mean_Worldwide = mean(Worldwide, na.rm = TRUE),
  Median_Worldwide = median(Worldwide, na.rm = TRUE),
  SD_Worldwide = sd(Worldwide, na.rm = TRUE),
  Min_Worldwide = min(Worldwide, na.rm = TRUE),
  Max_Worldwide = max(Worldwide, na.rm = TRUE),
  Mean_Domestic = mean(Domestic, na.rm = TRUE),
  SD_Domestic = sd(Domestic, na.rm = TRUE),
  Mean_Foreign = mean(Foreign, na.rm = TRUE),
  SD_Foreign = sd(Foreign, na.rm = TRUE)
)

kable(descriptive_stats, caption = "Descriptive Statistics for Worldwide, Domestic, and Foreign Revenue") %>% kable_styling()
Descriptive Statistics for Worldwide, Domestic, and Foreign Revenue
Mean_Worldwide Median_Worldwide SD_Worldwide Min_Worldwide Max_Worldwide Mean_Domestic SD_Domestic Mean_Foreign SD_Foreign
135252367 52132482 226425143 4810790 2799439100 45280768 85158784 89970983 152146157
# Top 10 Movies by Worldwide Revenue
top_movies <- df %>% arrange(desc(Worldwide)) %>% head(10) %>% select(Rank, `Release Group`, Worldwide)
kable(top_movies, caption = "Top 10 Movies by Worldwide Revenue") %>% kable_styling()
Top 10 Movies by Worldwide Revenue
Rank Release Group Worldwide
0 Avengers: Endgame 2799439100
0 Avatar: The Way of Water 2320250281
0 Star Wars: Episode VII 0 The Force Awakens 2068223624
0 Avengers: Infinity War 2048359754
0 Spider0Man: No Way Home 1912233593
1 Jurassic World 1670400637
1 The Lion King 1656943394
0 The Avengers 1518812988
2 Furious 7 1515047671
1 Top Gun: Maverick 1495696292

The table above highlights the top-performing films in terms of worldwide revenue, often dominated by major franchises and sequels.

Visualization 1: Scatter Plot - Worldwide vs. Domestic Revenue

Click to show/hide code
ggplot(df, aes(x = Domestic, y = Worldwide)) +
  geom_point(color = "blue", alpha = 0.5) +
  labs(title = "Worldwide vs. Domestic Revenue", x = "Domestic Revenue ($)", y = "Worldwide Revenue ($)") +
  theme_minimal()

Insight: The scatter plot reveals a strong correlation between domestic and worldwide revenue. However, some movies achieve higher foreign earnings, indicating the influence of global distribution and marketing strategies.

Visualization 3: Density Plot - Distribution of Worldwide Revenue

Click to show/hide code
ggplot(df, aes(x = Worldwide)) +
  geom_density(fill = "blue", alpha = 0.4) +
  labs(title = "Density Distribution of Worldwide Revenue",
       x = "Worldwide Revenue ($)",
       y = "Density") +
  theme_minimal()

Insight: The density plot provides a clear view of how revenue is distributed among movies. Peaks in the distribution highlight which revenue ranges are most common, giving insights into industry performance. Unlike a heatmap, this avoids potential clutter and presents a smooth revenue distribution across all movies.

Visualization 4: Bar Chart - Top 10 Movies by Worldwide Revenue

Click to show/hide code
top_10 <- df %>% arrange(desc(Worldwide)) %>% head(10)

ggplot(top_10, aes(x = reorder(`Release Group`, Worldwide), y = Worldwide)) +
  geom_bar(stat = "identity", fill = "steelblue") +
  coord_flip() +
  labs(title = "Top 10 Movies by Worldwide Revenue", x = "Movie Title", y = "Worldwide Revenue ($)") +
  theme_minimal()

Insight: The top-grossing movies tend to be franchise films, sequels, or superhero movies, reinforcing the financial dominance of established intellectual properties.

Visualization 5: Pie Chart - Domestic vs. Foreign Revenue Share

Click to show/hide code
revenue_distribution <- df %>% summarise(Domestic = sum(Domestic, na.rm = TRUE), Foreign = sum(Foreign, na.rm = TRUE))

df_pie <- data.frame(
  Category = c("Domestic Revenue", "Foreign Revenue"),
  Revenue = c(revenue_distribution$Domestic, revenue_distribution$Foreign)
)

ggplot(df_pie, aes(x = "", y = Revenue, fill = Category)) +
  geom_bar(stat = "identity", width = 1) +
  coord_polar("y") +
  labs(title = "Proportion of Domestic vs. Foreign Revenue") +
  theme_minimal()

Insight: A large portion of revenue is generated outside the domestic market, highlighting the importance of international markets in shaping modern box office success.

Conclusion

This report analyzed box office trends using five unique visualizations. The findings emphasize: - The high correlation between domestic and global revenue. - The dominance of franchise films in revenue generation. - The importance of foreign markets in shaping success.

These insights are crucial for investors, movie studios, and distributors looking to maximize box office performance.


End of Report