For this project I used the “All Time Worldwide Box Office” dataset from kaggle: https://www.kaggle.com/kkhandekar/all-time-worldwide-box-office . The dataset contains the 595 movies with the highest worldwide box office of all time. It contains six columns, including rank, year, Movie title, Worldwide box office, domestic box office, and international box office. This was gathered by Kaggle user koustubhk who is a data brewer from New Zealand. It was scraped from www.thenumbers.com, which is a site that tracks all movie box office data. For this project I used domestic box office, international box office, Worldwide box office, and rank.
I chose this dataset because I have always been very interested in movies, and I was curious if I could find any trends in the most profitable movies. There has been a general push in the movie industry to appeal more to the international audience, as there is so much money to be made there. I was hoping to see whether the domestic or foreign box office played a larger role in movies making the most possible money. I did this by first creating histograms to observe the distribution of the domestic and international box offices, and then by graphing these against the worldwide box office. Finally I created bar plots of the domestic and international box offices arranged by total world wide rank.
What does this look like?
I was expecting to find a relatively normal distribution in both the domestic and international box offices, but both sets seem to skew more towards the larger side. The international box office skews to the left even harder than the dometic. The first scatter plot compares worldwide box office vs domestic box office. I expected to see a linear relationship, but the graph appears to show an exponential relationship. The scatterplot of international box office vs worldwide has a much more linear relationship than the domestic did. I found this interesting as it showed more outliers on the domestic market. I was surprised to see that Star Wars: The Force Awakens was by far the highest grossing domestic box office, but is only fourth worldwide. On the flip side, titanic grossed much less domestically than other films in the top 5, but stayed on top thanks to large international revenue. I trimmed the data set from the top 595 films to the top 100, as it made the scatter plot and bar plot easier to interpret, but I found that the trends remained the same even with a different number of observations. I was expecting to find that either domestic or international played a larger role in a movie's success at the box office, but there is a great deal of variation.