1. Boxoffice:

Box office provide reports on financial performance of movies, typically measured by the total amount of revenue generated from ticket sales during its theatrical run.

1.1. Aim of the Boxoffice:

The aim of box office analysis is to understand the factors that contribute to a movie’s financial success at the box office. This involves exploring the relationship between a movie’s budget, release date, genre, runtime, ratings, and other variables, and its box office performance.

2. Methodology:

A public Box office dataset is used in this study which is available in Kaggle and some rows of the dataset are shown below:

Show code

# ensure the results are repeatable
set.seed(7)
# load the data
df <- read.csv('blockbusters.csv')
# head of the data
df %>% arrange(desc(year)) %>% head() %>%
  kbl(caption = "Head of Data sorted by Worldwide Gross in descending order") %>%
  kable_classic(full_width = F, html_font = "Cambria")

(#tab:read the data)Head of Data sorted by Worldwide Gross in descending order
Main_Genre	Genre_2	Genre_3	imdb_rating	length	rank_in_year	rating	studio	title	worldwide_gross	year
Action	Adventure	Drama	7.4	135	1	PG-13	Walt Disney Pictures	Black Panther	$700,059,566	2018
Action	Adventure	Sci-Fi	8.5	156	2	PG-13	Walt Disney Pictures	Avengers: Infinity War	$678,815,482	2018
Animation	Action	Adventure	7.8	118	3	PG	Pixar	Incredibles 2	$608,581,744	2018
Action	Adventure	Drama	6.2	129	4	PG-13	Universal Pictures	Jurassic World: Fallen Kingdom	$416,769,345	2018
Action	Comedy		7.8	119	5	R	20th Century Fox	Deadpool 2	$318,491,426	2018
Action	Adventure	Drama	7.9	147	6	PG-13	Paramount Pictures	Mission: Impossible - Fallout	$220,159,104	2018

3. Results:

3.1. Relationship between worldwide sales and othe features

A correlation matrix is plotted for the numeric values of the dataset to find a pattern between these values.

The following figure shows that the worldwide sales is correlated with year by 68% which means that recent movies had a better sales in general which could be due to the inflation and loss in the value of the money as well. But, overall, this trend is observed.
Moreover, worldwide sales is correlated with length of the movie and IMDB rating with the correlation rates of 34% and 21%, respectively.

Show code

df_num <- select(df, c("year", "imdb_rating", "length", "worldwide_sales"))
df_cormatrix <- data.frame(cor(df_num))

ggcorrplot(df_cormatrix, type="lower", hc.order = TRUE, lab = TRUE, insig ="blank")

3.2. Relationship between the worlwide sales and other features?

3.2.1. Worldwide sales and IMDB rating:

The results show that movies with higher IMD ratings tend to have a higher worldwide sales. The regression line depicts this pattern as well.

Show code

# load the plotly and RColorBrewer packages
library(plotly)
library(RColorBrewer)

# create a color palette with Set1 colors
n_colors <- length(unique(df$Main_Genre))
colors <- brewer.pal(n_colors, "Set1")

# create a scatter plot with plotly
p <- plot_ly(df, x = ~imdb_rating, y = ~worldwide_sales, color = ~Main_Genre,
             colors = colors, type = 'scatter', mode = 'markers') %>%
  layout(title = 'Scatter plot of worldwide sales and IMDB rating',
         xaxis = list(title = 'IMDB rating'),
         yaxis = list(title = 'Worldwide sales [$]')) %>%
  add_lines(x = ~imdb_rating, y = ~fitted(lm(worldwide_sales ~ imdb_rating, data = df)),
            line = list(dash = 'dash'), color= "blue", name = "regression line") 

# display the plot
p

3.2.2. Worldwide sales and year of release:

The following plot shows the worldwide sales versus year of the movie release. We see a general pattern that the sales increases as the the year of the movie is more recent. However, a specific pattern is observed that after 2015 there is a sudden drop in the worldwide sales.
The highest sales occured approximately in the range of 2005 to 2015.

Show code

# load the plotly and RColorBrewer packages
library(plotly)
library(RColorBrewer)

# create a color palette with Set1 colors
n_colors <- length(unique(df$Main_Genre))
colors <- brewer.pal(n_colors, "Set1")

# create a scatter plot with plotly
p <- plot_ly(df, x = ~year, y = ~worldwide_sales, color = ~Main_Genre,
             colors = colors, type = 'scatter', mode = 'markers') %>%
  layout(title = 'Scatter plot of worldwide sales and year of the movie release',
         xaxis = list(title = 'year'),
         yaxis = list(title = 'Worldwide sales [$]')) %>%
  add_lines(x = ~year, y = ~fitted(lm(worldwide_sales ~ year, data = df)),
            line = list(dash = 'dash'), color= "blue", name = "regression line") 

# display the plot
p

3.2.3. Worldwide sales and length of the movie:

Length of the movie has a direct relation with the worldwide sales of the movie. This is illustrated in the following plot where the sales enhances as the length of the movie is more.

A regression analysis shows the direct relation between these two variabels. Lenghty movies tend to have higher sales.

Show code

# load the plotly and RColorBrewer packages
library(plotly)
library(RColorBrewer)

# create a color palette with Set1 colors
n_colors <- length(unique(df$Main_Genre))
colors <- brewer.pal(n_colors, "Set1")

# create a scatter plot with plotly
p <- plot_ly(df, x = ~length, y = ~worldwide_sales, color = ~Main_Genre,
             colors = colors, type = 'scatter', mode = 'markers') %>%
  layout(title = 'Scatter plot of worldwide sales and length of the movie',
         xaxis = list(title = 'length [min]'),
         yaxis = list(title = 'Worldwide sales [$]')) %>%
  add_lines(x = ~length, y = ~fitted(lm(worldwide_sales ~ length, data = df)),
            line = list(dash = 'dash'), color= "blue", name = "regression line") 

# display the plot
p

Boxoffice Data Analysis

1. Boxoffice:

1.1. Aim of the Boxoffice:

2. Methodology:

3. Results:

3.1. Relationship between worldwide sales and othe features

3.2. Relationship between the worlwide sales and other features?

3.2.1. Worldwide sales and IMDB rating:

3.2.2. Worldwide sales and year of release:

3.2.3. Worldwide sales and length of the movie:

4. Conclusions:

Main_Genre	Genre_2	Genre_3	imdb_rating	length	rank_in_year	rating	studio	title	worldwide_sales	year
Fantasy	Adventure	Action	7.9	162	1	PG-13	20th Century Fox	Avatar	2749064328	2009
Romance	Drama		7.7	194	1	PG-13	Paramount Pictures	Titanic	1843201268	1997
Sci-Fi	Adventure	Action	8.2	143	1	PG-13	Walt Disney Pictures	The Avengers	1518594910	2012
Thriller	Fantasy	Adventure	8.1	130	1	PG-13	Warner Bros	Harry Potter and the Deathly Hallows - Part 2	1341511219	2011
Comedy	Animation	Adventure	7.7	102	1	PG	Walt Disney Pictures	Frozen	1274219009	2013