The movie business is a highly lucrative industry that attracts movie lovers globally to the cinema. For this video presentation I'll endeavour to answer the question Do big budget movies make the most money at the box office? Data was sourced from The Numbers: Where Data and the Movie Business Meet website, which includes 6,033 movies. A subset of this list was chosen by selecting 100 movies including the top 50 movies and the bottom 50 movies ranked by Production Budget. Values ranged from $1,100 to $400,000,000 with a median of $100,035,000. The new James Bond film No time to Die, which was originally due for release in April 2020 and now scheduled to be released in November and Christopher Nolan's film Tenet, which is due for release in July 2020, ranked 21 and 32 respectively were omitted from the study as their Worldwide Gross is equal to $0.
The use of green in the scatterplot was chosen, as it is symbolically synonymous with money. A Pearson's product-moment correlation revealed a correlation of .80 between Production Budget and Worldwide Gross. This means that 80.42% of the variance in Worldwide Gross can be explained by Production Budget. However, these results should be interpreted with caution, as correlation does not equal causation. From these results, we can infer that an increase in Production Budget is related to an Increase in Worldwide Gross but other factors need to be taken into consideration when determining what Worldwide Gross at the Box Office can be attributed to. For instance, a good review in the newspaper or online such as IMDB and Rotten Tomatoes can persuade or discourage someone from seeing a movie.
What is noteworthy is that Avengers: Endgame is ranked 1 for Production Budget costing $400,000,000 to make and Worldwide Gross generating $2,797,800,564 worth of Box Office revenue. This is an indication that big budget movies do make the most money at the Box Office.
library(readxl)
## Warning: package 'readxl' was built under R version 3.5.3
library(readr)
## Warning: package 'readr' was built under R version 3.5.3
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.5.3
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyr)
## Warning: package 'tidyr' was built under R version 3.5.3
library(ggplot2)
You can also embed plots, for example:
movies.xlsx <- read.csv("~/Master of Data Science/Year 1/Data Visualisation Communication/movies.xlsx.csv")
movies1 <- movies.xlsx
movies1
## Rank Year Movie
## 1 1 2019 Avengers:_Endgame
## 2 2 2011 Pirates_of_the_Carribean:On_Stranger_Tides
## 3 3 2015 Avengers:_Age_of_Ultron
## 4 4 2015 Star_Wars_Ep._VII:_The_Force_Awakens
## 5 5 2018 Avengers:_Infinity_War
## 6 6 2007 Pirates_of_the_Carribean:_At_World's_End
## 7 7 2017 Justice_League
## 8 8 2015 Spectre
## 9 9 2019 Star_Wars:_The_Rise_of_Skywalker
## 10 10 2018 Solo:_A_Star_Wars_Story
## 11 11 2012 John_Carter
## 12 12 2016 Batman_v_Superman:_Dawn_of_Justice
## 13 13 2019 The_Lion_King
## 14 14 2010 Tangled
## 15 15 2007 Spider-Man_3
## 16 16 2016 Captain_America:_Civil_War
## 17 17 2009 Harry_Potter_and_the_Half-Blood_Prince
## 18 18 2013 The_Hobbit:_The_Desolation_of_Smaug
## 19 19 2014 The_Hobbit:_The_Battle_of_the_Five_Armies
## 20 20 2017 The_Fate_of_the_Furious
## 21 22 2009 Avatar
## 22 23 2006 Superman_Returns
## 23 24 2012 The_Dark_Knight_Rises
## 24 25 2017 Pirates_of_the Caribbean:_Dead_Men_Tell_No_Tales
## 25 26 2008 Quantum_of_Solace
## 26 27 2012 The_Avengers
## 27 28 2006 Pirates_of_the_Caribbean:_Dead_Man's_Chest
## 28 29 2013 Man_of_Steel
## 29 30 2008 The_Chronicles_of_Narnia:_Prince_Caspian
## 30 31 2013 The_Lone_Ranger
## 31 33 2012 The_Amazing_Spider-Man
## 32 34 2012 Battleship
## 33 35 2017 Transformers:_The_Last_Knight
## 34 36 2015 Jurassic_World
## 35 37 2012 Men_in_Black_3
## 36 38 2009 Transformers:_Revenge_of_the_Fallen
## 37 39 2014 Transformers:_Age_of_Extinction
## 38 40 2006 X-Men:_The_Last_Stand
## 39 41 2010 Robin_Hood
## 40 42 2005 King_Kong
## 41 43 2007 The_Golden_Compass
## 42 44 2018 Black_Panther
## 43 45 1997 Titanic
## 44 46 2017 Star_Wars_Ep._VIII:_The_Last_Jedi
## 45 47 2018 Incredibles_2
## 46 48 2016 Rogue_One:_A_Star_Wars_Story
## 47 49 2016 Finding_Dory
## 48 50 2019 Toy_Story_4
## 49 51 2010 Toy_Story_3
## 50 52 2013 Iron_Man_3
## 51 5,951 2014 Happy_Christmas
## 52 5,952 2005 Peace,_Propoganda_and_the_Promised_Land
## 53 5,953 2013 Absentia
## 54 5,954 1998 Pi
## 55 5,955 1998 I_Love_You_..._Don't_Touch_Me!
## 56 5,956 1999 20_Dates
## 57 5,957 2004 Super_Size_Me
## 58 5,958 2013 Supporting_Characters
## 59 5,964 1995 The_Brothers_McMullen
## 60 5,965 2001 Gabriela
## 61 5,966 2010 Tiny_Furniture
## 62 5,967 2008 The_Signal
## 63 5,968 2015 Counting
## 64 5,976 2000 George_Washington
## 65 5,978 2000 Smiling_Fish_and_Goat_on_Fire
## 66 5,979 2010 The_Exploding_Girl
## 67 5,980 2011 Raymond_Did_It
## 68 5,982 1991 The_Last_Waltz
## 69 5,986 2008 The_Legend_of_God's_Gun
## 70 5,987 2016 Krisha
## 71 5,988 2006 Mutual_Appreciation
## 72 5,989 2005 Funny_Ha_Ha
## 73 5,990 2010 Down_Terrace
## 74 5,993 1994 Clerks
## 75 5,994 1999 Pink_Narcissus
## 76 5,995 2017 Emily
## 77 5,996 1972 Deep_Throat
## 78 5,997 1997 In_the_Company_of_Men
## 79 5,998 2000 The_Terrorist
## 80 5,999 2015 Exeter
## 81 6,003 1991 Slacker
## 82 6,005 2002 Steel_Spirit
## 83 6,011 2006 The_Puffy_Chair
## 84 6,012 2010 Breaking_Upwards
## 85 6,014 1997 Pink_Flamingos
## 86 6,015 2006 Grip:_A_Criminal's_Story
## 87 6,017 2001 Dayereh
## 88 6,018 2006 Clean
## 89 6,019 2001 Cure
## 90 6,020 2004 On_the_Downlow
## 91 6,021 1996 Bang
## 92 6,022 2008 The_Rise_and_Fall_of_Miss_Thang
## 93 6,024 2012 Newlyweds
## 94 6,025 1993 El_Mariachi
## 95 6,026 2004 Primer
## 96 6,027 2006 Cavite
## 97 6,028 Unknown The_Mongol_King
## 98 6,030 1999 Following
## 99 6,031 2005 Return_to_the_Land_of_Wonders
## 100 6,033 2005 My_Date_with_Drew
## Worldwide_Gross Production_Budget
## 1 2797800564 400000000
## 2 1045663875 379000000
## 3 1396099202 365000000
## 4 2068223624 306000000
## 5 2048359754 300000000
## 6 963420425 300000000
## 7 655945209 300000000
## 8 879620923 300000000
## 9 1074141030 275000000
## 10 393151347 275000000
## 11 282778100 263700000
## 12 873634919 263000000
## 13 1656943394 260000000
## 14 585727091 260000000
## 15 894860230 258000000
## 16 1153284349 250000000
## 17 935213767 250000000
## 18 960241522 250000000
## 19 945577621 250000000
## 20 1238764765 250000000
## 21 2788701337 237000000
## 22 391081192 232000000
## 23 1084439099 230000000
## 24 788241137 230000000
## 25 591692078 230000000
## 26 1515100211 225000000
## 27 1066215812 225000000
## 28 667999518 225000000
## 29 417341288 225000000
## 30 260002115 225000000
## 31 757890267 220000000
## 32 313477717 220000000
## 33 602893340 217000000
## 34 1670400637 215000000
## 35 654213485 215000000
## 36 836519699 210000000
## 37 1104054072 210000000
## 38 459260946 210000000
## 39 322459006 210000000
## 40 550517357 207000000
## 41 367262558 205000000
## 42 1346913161 200000000
## 43 2208208395 200000000
## 44 1332539889 200000000
## 45 1242805359 200000000
## 46 1056057273 200000000
## 47 1028570889 200000000
## 48 1073394813 200000000
## 49 1448203157 200000000
## 50 1215392272 200000000
## 51 30312 70000
## 52 4930 70000
## 53 8555 70000
## 54 4678513 68000
## 55 33598 68000
## 56 602920 66000
## 57 22233808 65000
## 58 4917 60000
## 59 10426506 50000
## 60 2335352 50000
## 61 424149 50000
## 62 406299 50000
## 63 8374 50000
## 64 342722 42000
## 65 277233 40000
## 66 25572 40000
## 67 3632 40000
## 68 322563 35000
## 69 243768 30000
## 70 144822 30000
## 71 103509 30000
## 72 82698 30000
## 73 9812 30000
## 74 3894240 27000
## 75 8231 27000
## 76 3547 27000
## 77 45000000 25000
## 78 2883661 25000
## 79 195043 25000
## 80 489792 25000
## 81 1227508 23000
## 82 1860 20000
## 83 195254 15000
## 84 115592 15000
## 85 413802 12000
## 86 1336 12000
## 87 683509 10000
## 88 138711 10000
## 89 94596 10000
## 90 1987 10000
## 91 527 10000
## 92 401 10000
## 93 4584 9000
## 94 2041928 7000
## 95 841926 7000
## 96 71644 7000
## 97 900 7000
## 98 240495 6000
## 99 1338 5000
## 100 181041 1100
ggplot() +
coord_cartesian() +
scale_x_continuous(name = "Production_Budget") +
scale_y_continuous(name = "Worldwide_Gross") +
layer(
data = movies1,
mapping = aes(x = Production_Budget, y = Worldwide_Gross),
stat = "identity",
geom = "point",
position = position_identity()
)
ggplot(movies1, aes(x=Production_Budget, y=Worldwide_Gross)) +
geom_point(color = "green",
alpha = .6) +
scale_x_continuous(breaks = seq(1100, 400000000, 100000000),
limits=c(1100, 400000000)) +
scale_y_continuous(breaks = seq(401, 2797800000, 100000000),
limits=c(101, 2797800000))
## Warning: Removed 1 rows containing missing values (geom_point).
summary(movies.xlsx)
cor.test(movies.xlsx\(Production_Budget, movies.xlsx\)Worldwide_Gross)
Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.