INTRODUCTION

Going to the movies has been one of my favorite pastimes from a young age, so in this project, I intend to explore the factors defining a box office hit. The US domestic box office is a multi-billion-dollar industry, with the highest total gross profits in 2018 at almost $11.9 billion (Box Office Mojo). However, many factors have contributed to the decline of the Box Office industry in recent years: the rise of streaming service platforms and ticket price inflation have kept Americans from attending more films. The COVID pandemic shutdown in 2020 accelerated this downward trend, and though revenue has increased in the following years, it has still not recovered to its previous state. Many movies have found success at the Box Office, and with this project, I hope to look into the defining factors of movie success. I intend to look into factors related to a movie’s success and see if these factors have changed in recent years. The data set I used includes information on the Top 500 movies with the highest production budget, and information on the movie’s Domestic and Worldwide Gross, genre, release year, runtime, MPAA, and other factors. For consistency, I used Gross Domestic Earnings for most of the graphs, and used color palettes from the “viridis” package. All of these visualizations are visually accessible, and highlight the most important comparison.

REVENUE TRENDS

Using the Top 500 movies data set, the double line graph highlights the general trends for Movie Revenues and Production costs in the past 35 years. This does not represent the total revenue and production costs by year, but within the top 500 most expensive movies; however, trends within this data set likely reflect overall box office trends. The animation reveals that both the total domestic gross and the total production cost for these top movies increased steadily until 2020, when both values sharply dropped off during the COVID-19 pandemic. During this time, most theaters had limited or no in-person showings of current movies, leading to a stark decrease in the amount of revenue generated. At the same time, the amount of money that could be spent on producing movies declined, with increased regulations on in-person contact. The red dotted line separates pre- and post-2020 results, which illustrates this steep decline and how these metrics have failed to recover to their pre-COVID states. It is interesting that in the years before COVID, the growth in these values had already slowed down and begun to plateau.

This pattern shows that the shutdowns from the COVID pandemic were responsible for a very steep decline in both the amount of money spent on making movies and the amount of money they made. The close relationship between movie domestic gross and production cost is demonstrated in this visualization, and is further illustrated in the next set of visualizations.

INTERACTIVE VISUALIZATIONS

The next set of visualizations allows the viewer to meaningfully interact with the data and directly view the relationship between numerical variables in the data. The first interactive graph is a scatterplot, and the viewer can pick between three of the variables, Opening Weekend revenue, Domestic Gross, and production cost, choosing which ones to set as both x and y. The shinyapp includes the option to choose between one of three types of regression models: linear regression, LOESS regression, and polynomial regression models. Another area of the input section contains the option to create faceted graphs by genre, which can help us see if these trends are the same across film genres. From the scatterplot, the viewer can easily glean that there is a positive relationship between all of these factors. The default output displays a clearer picture of the correlative relationship between production cost and domestic gross revenue, which is moderately strong and positive. This visualization leads us to the conclusion that more expensive movies generally create more revenue at the box office.

The second visualization looks into the total number of top films by MPAA rating and genre. The visual guides us to the most important comparison with a default stacked bar graph, which splits up the number of movies by MPAA Ratings into genres. From this graph, the viewer can see that PG and PG-13 movies make up the substantial majority of these top films, and within this, most of the films are within the action and adventure genres. From this, we can conclude that these are common attributes of films that cost the most to produce. There is the option to switch between the types of graphs, and we can change the bar plot to either a boxplot or a ridge plot. For the box plot and the ridge plot, I removed the “Romantic Comedy” genre to reduce visual clutter–it had only one data point. Through these plots, we look into the average gross domestic earnings from movies in each genre. Musicals are the genre with by far the highest median earnings, followed by Action and Adventure movies. Though there are fewer musical movies in this data set, the ones included were profitable, and Action and Adventure movies may be the most common of these top movies due to their high earning potential.

Link to shiny app: https://lvy4o6-carolina-turner.shinyapps.io/Final_Project2/

RUNTIME AND OTHER TRENDS

After creating a scatterplot, I extrapolated on the predictive values of numerical variables for movies. My next graph is a correlation heatmap of the numeric variables in the dataset, excluding year. In the heatmap, warmer colors indicate a strong positive correlation, green-blue indicates low to no correlation, and blue to deep purple indicates a negative correlation. The variables with the strongest correlations were the revenue generated from a movie’s opening weekend, worldwide gross, and domestic gross. These results indicate that high levels of earnings in one area indicate high earnings in another area. Production cost was correlated with revenue, though less strongly so. One surprising correlation was the positive relationship between runtime and all revenue variables; though the correlation was small, I expected runtime to be negatively related to earnings or potentially quadratic in the relationship.

In looking at yearly trends, I wanted to see if there were trends in the average length of movies, and if any of these trends were related to COVID. I created a set of multiple histograms, grouped by five years. In looking at the graphs, there was a clear, gradual trend in movie runtimes over time: the top movies of the 1990s and early 2000s generally had shorter and more varied runtimes, which shifted towards longer and standardized runtimes. This may correlate with greater franchise dominance in the box office, or greater budgets, which justified longer runtimes (though the correlation between these two is relatively weak). From 2015 to 2021, movie runtimes became clustered around 120 minutes, signalling greater standardization. Generally, two-hour run times have been gradually normalized by the box office. Trends from 2021 onward may show a reversal of this trend and a move towards greater diversity in movie runtimes, though this may also be due to the small sample size.

MPAA RATING

One of the most important factors in determining a film’s audience is its MPAA Rating, which indicates its appropriateness for a general audience. For the next section, I wanted to visualize MPAA Ratings as a factor in movie profitability. I created a shiny app with three different visualizations, all providing the viewer with the opportunity to interact meaningfully with the data. Of these movies with top production costs, the vast majority are rated PG or PG-13, and the area graph visualizes how the movie industry has moved toward PG-13 dominance in top movies. The interactive feature for this graph is a toggle between the proportion of a film with these ratings and the count (total number in that year). With this, the graph provides the viewer with a closer look into the decline of top R movies and the industry’s move towards PG-13 movies; R movies shift from making up 100% of these movies in the early 90s to about 20% by the late 90s.

There are many potential explanations as to why top films have shifted towards PG-13 rankings, and it may relate to their ability to appeal to a wider audience. R and G-rated movies, by contrast, are much less common among these top films, and the next few graphs take a closer look at potential explanatory factors. The next visual is a bar plot of the average domestic revenue of a film by its MPAA Rating. As demonstrated by the graph, PG-13 and G-rated films earned the highest average domestic gross, while R-rated movies earned the least. In the shiny app, I included a toggle to switch between mean and median revenue in order to see if the disparity was due to a few outliers. My other interaction was a toggle between per-year and aggregated revenue, which examines if these trends are influenced by overall trends in movie revenue. This feature shows that the trends reflect consistency in revenue by MPAA rating, as each year equally contributes to the overall revenue. With these interactive elements, viewers can see that trends remain consistent despite these changes, as there are no significant shifts when switching between mean and median or year-normalized and aggregate revenue. From this, we can assume that the decline of G-rated films is not related to their profitability, but instead from a general decline of films with the rating.

The layered box and violin plots examine the average number of theaters airing a movie by its rating, and R-rated movies again stand out with a visibly lower median number of theaters. The interactive feature included allows the viewer to switch between a log 10 and linear scale, and determine if this discrepancy is because less theaters may be willing to airing R-rated movies or due there being more films with a very wide release within the other ratings. With this add-on, the discrepancy in the theater count across ratings disappears, and we can assume that this is because there are fewer R-rated movies with a very wide release. This may be due to many factors, but as one of the major customer bases for movies is children, R-rated movies may generally be less appealing to a general audience, leading to fewer wide releases compared to other ratings. The increase in PG-13 movies, however, may indicate a shift towards teen rather than early childhood general audiences.

Second Shiny app link: https://lvy4o6-carolina-turner.shinyapps.io/Final_Project/

GENRE

The next two visualizations look into genre; specifically, which genres make the most money and which make the most money in total? Through tableau, I created a Treemap diagram, which shows which genres of these top movies have contributed the most to the total domestic revenue. I used a colorblindness color filter to maximize visual accessibility. The treemap illustrates that the vast majority of domestic box office revenue comes from Action and Adventure movies, which the previous graphs indicate as the most frequent genres within these top 500 movies. It may be that action and adventure movies require more monetary input (for expenses such as special effects or stunt doubles), which is why they are the most frequent in this data set, but they also generate a significant output. The next graph is a bar plot of the average domestic gross by genre. The log-10 scale distorts the effect of size, so I included a gradient to indicate average domestic gross. These results re-emphasize the information from the previous box and ridge plots: Musical movies generate the highest average domestic gross, with action and adventure movies at the second and third highest. This information suggests that action and adventure movies are both very common among expensive films, but also high earners. While musicals have the highest average domestic gross, this may be a distortion due to sample effects; they make up a very small share of the total.

link to graphic: https://public.tableau.com/views/FinalProjectTableauGraphic/Sheet13?:language=en-US&publish=yes&:sid=&:redirect=auth&:display_count=n&:origin=viz_share_link

CONCLUSIONS

In conclusion, while the movie industry as a whole may be declining, there are still blockbuster hits coming out every year. Moviegoing as an American pastime is not dead, and while it’s still alive, we can look into what makes up the most successful movies. The reasons why the most expensive movies turn into flops or major hits may be complex, but we can examine a few patterns from this data set.

Generally, the most important predictors of how much a movie will earn domestically are other earnings variables. The highest-grossing films domestically will also be highly grossing worldwide, and earn a lot during the opening weekend.

Most of the movies that are the most expensive are rated PG-13 or PG, though PG-13 movies have become much more dominant in recent years. R-rated and G-rated movies are less common for complex reasons, but we can guess from the data that they are less appealing to a general audience. Action and Adventure movies make up the lion’s share of this dataset, and what they require may simply make the production costs higher. However, they are also quite profitable, and the trade-off in producing them appears to be worth it for movie executives.

In a time when the box office industry is in decline for numerous reasons, there are still many blockbuster hits, and the films that succeed in this era will define aspects of the movies produced in the coming years. When looking to understand the aspects of expensive and profitable movies, viewers can turn to these graphs to clearly outline trends and patterns.

SOURCES https://www.kaggle.com/datasets/mitchellharrison/top-500-movies-budget https://chatgpt.com/share/692e6de9-be60-8004-8769-86e53771a898 https://chatgpt.com/share/692e6e02-50fc-8004-91bf-b73ab9fd8ecd https://chatgpt.com/share/692e6e20-64dc-8004-9b40-3e7799eb7ab7 https://chatgpt.com/c/693f456d-e670-832e-ae8d-2bb6e068cb19 https://www.boxofficemojo.com/year/