Introduction

The global box office industry is a multi-billion dollar sector that shapes entertainment trends and economic landscapes. This report examines box office revenue data for movies released between 2010 and 2024. The dataset includes worldwide, domestic, and foreign revenues, along with their respective percentage shares. By analyzing this dataset, we can identify patterns in box office success, the dominance of franchise films, and the importance of international markets.

This analysis will provide: - Descriptive statistics of box office revenue trends. - Five unique visualizations to explore revenue distributions and correlations. - Key insights into factors influencing box office performance.


Loading and Exploring the Data

Click to show/hide code
# Load necessary libraries
library(readr)
library(ggplot2)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(knitr)
library(kableExtra)
## 
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
## 
##     group_rows
library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
library(tidyr)

# Load the dataset
df <- read_csv("/Users/jasoncherubini/Desktop/2010-2024 Movies Box Ofice Collection.csv")
## Rows: 2800 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): Release Group, Domestic_percent, Foreign_percent
## dbl (2): Rank, year
## num (3): Worldwide, Domestic, Foreign
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Convert revenue columns to numeric
df <- df %>% mutate(
  Worldwide = as.numeric(gsub(",", "", Worldwide)),
  Domestic = as.numeric(gsub(",", "", Domestic)),
  Foreign = as.numeric(gsub(",", "", Foreign)),
  Domestic_percent = as.numeric(gsub("%", "", Domestic_percent)) / 100,
  Foreign_percent = as.numeric(gsub("%", "", Foreign_percent)) / 100
)

# Remove any NA values that may cause visualization issues
df <- na.omit(df)

# Display basic summary of the data
summary(df)
##       Rank        Release Group        Worldwide            Domestic        
##  Min.   :  0.00   Length:2800        Min.   :4.811e+06   Min.   :        0  
##  1st Qu.: 49.75   Class :character   1st Qu.:2.739e+07   1st Qu.:        0  
##  Median : 99.50   Mode  :character   Median :5.213e+07   Median : 14232540  
##  Mean   : 99.50                      Mean   :1.353e+08   Mean   : 45280768  
##  3rd Qu.:149.25                      3rd Qu.:1.363e+08   3rd Qu.: 53351402  
##  Max.   :199.00                      Max.   :2.799e+09   Max.   :936662225  
##  Domestic_percent    Foreign          Foreign_percent      year     
##  Min.   :0.000    Min.   :0.000e+00   Min.   :0.000   Min.   :2010  
##  1st Qu.:0.000    1st Qu.:1.797e+07   1st Qu.:0.502   1st Qu.:2013  
##  Median :0.270    Median :3.566e+07   Median :0.730   Median :2016  
##  Mean   :0.287    Mean   :8.997e+07   Mean   :0.713   Mean   :2016  
##  3rd Qu.:0.498    3rd Qu.:8.793e+07   3rd Qu.:1.000   3rd Qu.:2020  
##  Max.   :1.000    Max.   :1.941e+09   Max.   :1.000   Max.   :2023

Descriptive Statistics

Click to show/hide code
# Summary statistics
descriptive_stats <- df %>% summarise(
  Mean_Worldwide = mean(Worldwide, na.rm = TRUE),
  Median_Worldwide = median(Worldwide, na.rm = TRUE),
  SD_Worldwide = sd(Worldwide, na.rm = TRUE),
  Min_Worldwide = min(Worldwide, na.rm = TRUE),
  Max_Worldwide = max(Worldwide, na.rm = TRUE),
  Mean_Domestic = mean(Domestic, na.rm = TRUE),
  SD_Domestic = sd(Domestic, na.rm = TRUE),
  Mean_Foreign = mean(Foreign, na.rm = TRUE),
  SD_Foreign = sd(Foreign, na.rm = TRUE)
)

kable(descriptive_stats, caption = "Descriptive Statistics for Worldwide, Domestic, and Foreign Revenue") %>% kable_styling()
Descriptive Statistics for Worldwide, Domestic, and Foreign Revenue
Mean_Worldwide Median_Worldwide SD_Worldwide Min_Worldwide Max_Worldwide Mean_Domestic SD_Domestic Mean_Foreign SD_Foreign
135252367 52132482 226425143 4810790 2799439100 45280768 85158784 89970983 152146157

Visualization 1: Scatter Plot - Worldwide vs. Domestic Revenue

click to show/hide code ggplot(df, aes(x = Domestic, y = Worldwide)) + geom_point(color = “blue”, alpha = 0.5) + labs(title = “Worldwide vs. Domestic Revenue”, x = “Domestic Revenue (\()", y = "Worldwide Revenue (\))”) + theme_minimal()

Insight: The scatter plot reveals a strong correlation between domestic and worldwide revenue. However, some movies achieve higher foreign earnings, indicating the influence of global distribution and marketing strategies. The rise of the global box office has changed a significant amount of the marketing and distribution plans for films, both large and small.


Visualization 3: Density Plot - Distribution of Worldwide Revenue

click to show/hide code ggplot(df, aes(x = Worldwide)) + geom_density(fill = “blue”, alpha = 0.4) + labs(title = “Density Distribution of Worldwide Revenue”, x = “Worldwide Revenue ($)”, y = “Density”) + theme_minimal()

Insight: The density plot provides a clear view of how revenue is distributed among movies. Peaks in the distribution highlight which revenue ranges are most common, giving insights into industry performance.


Visualization 4: Bar Chart - Top 10 Movies by Worldwide Revenue

click to show/hide code

Insight: The top-grossing movies tend to be franchise films, sequels, or superhero movies, reinforcing the financial dominance of established intellectual properties. While individual movie-goers often say they want new and original films, the evidence shows that this is not where they spend their money.


Visualization 5: Pie Chart - Domestic vs. Foreign Revenue Share

click to show/hide code

revenue_distribution <- df %>% summarise(Domestic = sum(Domestic, na.rm = TRUE), Foreign = sum(Foreign, na.rm = TRUE))

df_pie <- data.frame( Category = c(“Domestic Revenue”, “Foreign Revenue”), Revenue = c(revenue_distribution\(Domestic, revenue_distribution\)Foreign) )

ggplot(df_pie, aes(x = ““, y = Revenue, fill = Category)) + geom_bar(stat =”identity”, width = 1) + coord_polar(“y”) + labs(title = “Proportion of Domestic vs. Foreign Revenue”) + theme_minimal()

Insight: A large portion of revenue is generated outside the domestic market, highlighting the importance of international markets in shaping modern box office success.At all budget levels, the importance of international films cannot be overstated which has driven to a change in what movies are greenlit.


Conclusion

This report analyzed box office trends using five unique visualizations. The findings emphasize: - The high correlation between domestic and global revenue. - The dominance of franchise films in revenue generation. - The importance of foreign markets in shaping success.


End of Report