Analysis of Movies Box Office Markets

Author

Aaryan Bhatta

Introduction

From a young age, movies were such a defining source of excitement and joy for me. At the time, I always felt really lucky to be able watch the Disney classic and Superhero movies. Whether the stories, were happy, sad, scary or even thrilling, movies were fun and even inspiring. This lifelong passion for movies has sparked my curiosity about the movie industry and what happens behind the scenes. In particular, as movies have become more accessible over the past couple of decades around the world, I have grown increasingly interested how these movies perform globally. I wish to explore what factors have influences how films produced domestically (USA) perform against films produced international.

Research Questions

In order to conduct this analysis, I will propose some research questions to help me with my analysis:

  1. Do movies with higher star ratings perform better in the box office in terms of world wide revenue
  2. Does a higher star rating lead to greater worldwide revenue for domestic and foreign films?
  3. How has the worldwide revenue of domestic and foreign films changed over time?
  4. What genres are common in the domestic vs. foreign film industries?
  5. Which genres earn higher average revenue in domestic and foreign markets?

Data set

Now, in order to answer these questions that I have proposed, I need some data to work with, so I have chosen a data set from Kaggle as follows: “https://myxavier-my.sharepoint.com/:x:/g/personal/bhattaa_xavier_edu/EZmHDgEmPxxDmR9Rr51NC7kBoipKbGRZF8sJQMQOirQ03A?download=1

Here is some information about the data set:
This data set contains 5000 observations of various movies that were released released globally from 2000 to 2024 and it,captures a broad array of information related to the global film industry. It contains important data needed for our analysis such as revenue, genres, ratings, vote count, revenue percentages, etc. By analyzing this data set, it allows us to not only explore the crucial aspects needed to answer my research questions but, it allows me to make valuable insights and trends into various aspects of the the global film industry.

Here is a link to the dictionary so that you can get yourself familiar with the values in this data set:
https://myxavier-my.sharepoint.com/:x:/g/personal/bhattaa_xavier_edu/EX0Ll7vi5VJBsfX98B8uAv0BuF4uMbpb_rsuy0gwIcpY0w?download=1

Data Wrangling

  1. The first thing to wrangle from this data set, was the ratings column as it is stored as 6.2/10 for example. I need to change this variable so that it does not have the “/10” and change its datatype from character to numeric. Additionally, I want to change the rating to not be a decimal, hence I will round the value to the nearest whole integer.
  2. I want to create a variable that determines whether a movie is a foreign movie (a movie that is not from USA). I shall do this by checking if the percentage column for foreign revenue is 100%, thus meaning the movie is a foreign film
  3. Look at the Summary Statistics of the World Revenue Variable
     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
1.666e+06 2.466e+07 4.845e+07 1.192e+08 1.198e+08 2.799e+09 

The worldwide revenue variable makes sense as there would be more movies that were average compared to being a really good movie. Hence we see the right skewed distribution.

summary(Worldwide)
Min.        1st Qu.     Median      Mean      3rd Qu.       Max.  
1.666e+06   2.466e+07   4.845e+07   1.192e+   08 1.198e+08  2.799e+09 

Descriptive Analysis

Foreign Market and Domestic Market vs Worldwide

The first question we want to look at is how movies perform around the world. So this visualization will show a comparison of foreign movies and domestic movies and how they performed based off the worldwide revenue earned.

From this visualization, we can see that overall, there is a difference between the foreign films and domestic films. Through the boxes of movies produced domestically and internationally, we can see that the box of the domestic side, is higher and greater and than the foreign one. This tells us that 50% of the data from the domestic side is greater than the foreign films. So overall, it seems that domestic films seems to typically make more money than the foreign films. Although there are some outliers in the foreign films that is greater than the foreign films’ range, domestic’s range s still greater.

Worldwide Revenue vs Star Ratings

So a basic assumption when it comes to movies is that movies is that if movies have high star ratings, then naturally, the movie will perform really well. However, that is not always the case these days because while critics may rate a movie low, the audience may thoroughly enjoy the movie thus making it a hit in the box office. So let’s explore if this is the case.

Based off the two visualizations, there are stark differences between domestic films and international films. For domestic films, we can see that there is a positive correlation as when the ratings increases, so does the worldwide revenue earned. However that is not the case for foreign produced movies because when we reach to 3-4 ratings, we can see that it is not fluctuating a lot and how it is very consistent from then on. This thus, suggests that star ratings may not be influential in the foreign market.

World Wide Revenue Over Time

Another insight we want to look at is how films produced domestically and internally have performed over the years, so this visualization will look from 2000 to 2024.

Based on this visualization, we can see that domestic films show a clear increasing trend in revenue from the 2000 onward. This could reflect increasing global box office expansion, more successful franchises, etc. While there is a drop, in 2020 but that is because of Covid and quite possibly that writers strike. Foreign films show a less pronounced growth, compared to Domestic films. But, this only due to Foreign films not having hthe same amount of success yet as domestic films. Otherwise, the foreign films visualization also has a clear upward trend.

Genres in Foreign and Domestic Markets

When it comes to movies, an important factor we need to look is how genres play a role in this. First we will look at the distributions of the movie to see if there any differences in what is popular in the markets then we will compare the markets and the average revenue earned from the genres.

When we look at distributions we can see some similarities between the two markets . Drama, action and comedy are the three front runners when we look at these distributions. Near the end, western, tv_movie and documentary have the lowest amount in both movies.

Revenue vs Genres

From this visualization, it is clear to see that domestic still outperforms the foreign market by a lot.Genres such as science fiction, adventure, animation lead the charge and make the domestic market as successful as it is. On the other hand, foreign films are not as dominant as the domestic market thus the average revenues we see on this visualization is not as strong as the domestic market. This is a strong indicator that the foreign market has not reached up to the domestic markets growth.

Secondary Data Analysis

For further analysis, I used data that I web scrapped from the Top 250 Movies IMDB website. Here is the link to the website: “https://www.imdb.com/chart/top/”.

The dataset has the following values:
Title, Release Year, Runtime, Star Ratings, Vote counts, Directors, Genres,Budget, Domestic gross, and Worldwide gross

I decided to use data from here just so that I could narrow down movies to the best of the best movies and compare the international and domestic movies. Here is the link to my version:

World Wide Revenue Over Time Comparison

Now let’s compare our previous graph where we ran World Revenue over a period of time and compare it to this version’s and see if there are any insights we can make.

Based off what we see here, we start from 1925 to 2025. While the domestic revenue line continues to rise, it has become noticeably more volatile over the past few decades when comparing to the trends in the previous data set. This reflects the evolving nature of the film industry and how movies have grown more complex, high-budget, and globally marketed over time. In contrast, foreign-produced films show minimal presence until around the year 2000. This pattern mirrors what we observed in the Kaggle data set, likely due to increased global accessibility and distribution in the 21st century, as international markets opened and gave more opportunities for people to be exposed to foreign films.

Conclusion

Overall, this project gave me a deeper understanding of how the global industry works. I was able to explore how different types of movies actually perform on a worldwide scale.

One of the main takeaways from this project was how dominant domestic movies are and how they perform in terms of revenue. Domestic films tend to bring in higher earnings, have more consistency over time, and seem to benefit more from higher star ratings.

On the other hand, foreign films have started gaining traction mainly after the 2000s. This is probably because access to international films only became available around that time and from what i have analysed, it has grown in recent years with the help of platforms like streaming services and global distribution. It was also interesting to see that while foreign films might not always earn as much, patterns were shown that it is growing at a steady rate, and who knows if there might a huge increase of foreign films as we continue on.

To conclude, I believe that there is more research needed to be done to find out more about movies produced domestically and internationally. I believe that the next focus to look into is the country of origin, look at profit and budget, etc. Through this, we can gather more data and learn more about movies!