Introduction

Through the time of movies and analytics, there have been many analyses done about the most successful movies in the box office. I have not seen many talk about movies that do not do well in the box office and go not generate much revenue.Throughout this document, I will be looking at data related to movies and looking at primarily the genre.

Question

What movies have failed the most in the box office and which genre has the most failures.

Hypothesis

I believe that a movie that has a high budget is more likely to have more gross success. I also think that comedy movies will have the most failure movies. I think this because Comedy movies have made so many sequels to their original hits but they just are never as good. Which would lead into having limited revenue causing more failure in the comedy movie genre.

My Interest On This Topic

I have always had an interest in movies success since Avatar grossed the most income and the recent success of the Marvel Avengers movies. This made me curious about what movies have generated no money and failed in the box office.

What I am using

I am using data that is found on Kaggle by Daniel Grijava and is updated annually. The data was scraped from IMDb website and is publicly accessible on Kaggle. I have also scraped proficient user reviews from IMDb from two movies and I look at the positive and negative words used in each review. IMDb is a credible source of user reviews on movies. This data is primarily observing gross in the box office and does not take into account how other products made for the movie affected the profit.

DATA Dictionary

If you are curious and want to access the data. Here is a dictionary of each variable and definition to it.

  • budget: the budget of a movie. Some movies don’t have this, so it appears as 0
  • company: the production company
  • country: country of origin
  • director: the director
  • genre: main genre of the movie
  • gross: revenue of the movie
  • name: name of the movie
  • rating: rating of the movie (R, PG, etc.)
  • released: release date (YYYY-MM-DD)
  • runtime: duration of the movie
  • score: IMDb user rating
  • votes: number of user votes
  • star: main actor/actress
  • writer: writer of the movie
  • year: year of release

Cleaning the data

Before starting any manipulation, I had to clean the not usable data to make the data run smoothly. And furthermore add in the revenue variable which I calculated by doing a Revenue = Gross - Profit.

Summary

First I will run some summary statistics to get a sense of what the data is and how it can be used. I looked at how many movies are in the data, the average budget for the movies in the data, the average gross profit for movies of the data. Then I decided to start looking at the failed movies and I found the number of failed movies and the average growth of the failed movies.

Total Movies Average Budget For Movies Average Gross for Movies Number of Failed Movies Average Gross of Failed Movies
5436 35938638 103004458 1752 13178534

Visualizations

I ran a series of graphs and tables to look further into the question about our question of failed movies.

Budget vs Revenue Scatterplot

To start, I wanted to see the relationship between gross and budgets of movies.

The scatter plot works the best for this situation because we can see which movies under preformed. The movies above the line prove to be above the average of all the data. The blue line indicates the average for the relationship between Revenue and Budget. The movies on the far bottom right are movies that had a large budget but did not gross nearly enough to make a profit. That is the area we will be targeting throughout the rest of the analyses.

Top 10 failed movies

Before running this graph, I had to add the variable for revenue as previously mentioned. This table looks at which 10 movies failed and includes the movie name, revenue, and the year released.

name revenue year
The Irishman -158031147 2019
The 13th Warrior -98301101 1999
The Adventures of Pluto Nash -92896027 2002
Jin ling shi san chai -91144356 2011
Cutthroat Island -87982678 1995
Live by Night -85321445 2016
The Alamo -81180039 2004
Supernova -75171919 2000
Missing Link -73434290 2019
How Do You Know -71331093 2010

In this data, we can conclude that The Irishman was the biggest failure and did not bring in nearly enough revenue. After doing research on this information, I found that much of the production budget was in the camera and effects because the directors wanted the actors to look older.

Graph of Top 10 Failures

On the right of the graph, it indicates the revenue lost and the bottom indicates the movie title. The top of the graph shows the most lost from that movie which is a loss of 150 million dollars and the bottom starts at 0. The right key indicates the names of the movies along with the color associated with it. This gives us a better look at the distribution of the data. Now we can easily see that The Irishman is the biggest failure. After the huge drop off, The 13th Warrior and The Adventures of Pluto Nash are the next two failures.

Most Failed Genres

I wanted to take a deeper dive and look at each genre for how much they have had unsuccessful movies. As I stated in my hypothesis, I thought that comedy movies would have the most unsuccessful movies. Here we can see that Action movies have had the most failures in terms of revenue.

I scaled the variables down because there are many genres in movies as some movies can be in multiple genres. The height of the bar indicates how much is lost having the most negative revenue at the top. I am also looking at the sum of the total revenue from movies that are predominantly in that genre.

One thing that I found interesting in this graph is that Biography has made the list. As it has been a recent trend, most biographical movies tend to draw a big audience but we can assume that the audience feedback was not good which lead to not many viewers.

Deeper Look At Comedy and Action Revenues

Lets take a deeper look at Comedy and Action because they were the top two with the most losses in the total revenue. In this we are going to look at the Average Revenue that is lost in each genres.

On the right of the graph it shows the average revenue lost and on the bottom is the Genre of the movies. The top of the graph indicates the amount loss by having 15 million at the top of the chart and 0 at the bottom.

After analyzing this data further, I began to think of what is causing the Action movies to have so much losses. And I believe that more movies can claim the genre of Action than Comedy and there can be more failures in that case. This is something that I will look into on further research.

Do These Failures lead to Having a Lower Movie Score

I began to think about the comparison of movie failures in the box office and the quality of the movie. There are plenty of movies that failed in the box office but have turned out to be an actual good film. There is a whole category of movies that are called “Cult Classics” which are movie that have a strong fan base who still watch the movie and quote everyday. This is what coined the phrase Cult Classic.

In these graphs and tables, I am going to be looking at the comparison of failed movies with IMDb scores for comedy and action movies.

On the bottom is the IMDb score which shows what critics believe is the right rating for the movie. It is based on a scale of Zero being the worst and Ten being the best. The right of the graph is the revenue lost.

The goal of this scatter plot is to look at the outliers and see why it is. There are many more Action movies that lost more than 75 million dollars and are ranked low. This indicates that they are a bad movie overall.

Failed Movies with High IMDb Score

name genre revenue score
Ran Action -7335717 8.2
Underground Comedy -13828918 8.1
Warrior Action -1691385 8.1
Crimes and Misdemeanors Comedy -745298 7.9
Marriage Story Comedy -18266314 7.9
The King of Comedy Comedy -17463758 7.8
The Boondock Saints Action -5969529 7.8
The Purple Rose of Cairo Comedy -4368667 7.7
Empire of the Sun Action -12761304 7.7
Barton Fink Comedy -2846061 7.7

This table looks at the movies that have failed in the box office but are still scoring high on the IMDb scale. They are separated into the same Comedy and Action categories. Comedy movies has more in the top ten but it pretty even with scores are there are repeats in 7.7 range.

Webscraping IMDb Reviews

To get a further look at the reviews, I wanted to look at what negative words are in place for reviews. I choose to use proficient reviews that have credibility on the website for being an equal viewer.

Determining The Movie Range

I wanted to look at the movies that had the most revenue lost because I think they would have the most negative reviews on the movie. This leads to having a better understanding of the hatred for the movie.

name genre revenue
The 13th Warrior Action -98301101
How Do You Know Comedy -71331093

The two outcomes were The 13th Warrior for action and How Do You Know for comedy. These movies had the worst revenue and I figured they must have bad reviews.

After A Scraping the website, I looked at the most common words that were negative. This was able to determine key words that lead to a bad rating. The Graph displays the count that the words are used in all of the 50 reviews. This data is very interesting as there are some expected words and some not. The words; Bad, Wrong, Dark, Fraud, Mess, and Weak are all expected as they are about the movie. The ones that stood out to me is the word “Funny” this peaked my curiosity on the outcome. I can only assume that it is referring to the How Do You Know because it is comedy. I can also assume that words like “not” were used before it because the movie must’ve not been funny to the audience causing the bad reviews. Also Hard caught my attention because what could that possibly mean. It must mean that the move was hard to understand or hard to watch. We can only assume these because they are the most common negative words in the data.

Conclusion

Overall this data was very interesting and valuable to directors, writers, and producers in the future. I have come to the conclusion that my hypothesis was wrong but not entirely. It has the second worse failure in revenue, but not the first. I believe that if I were to do a further analysis on a larger set of data with more sequels it would have a different outcome.