R Markdown

Vodcast Transcript

The movie business is a highly lucrative industry that attracts movie lovers globally to the cinema. For this video presentation I'll endeavour to answer the question Do big budget movies make the most money at the box office? Data was sourced from The Numbers: Where Data and the Movie Business Meet website, which includes 6,033 movies. A subset of this list was chosen by selecting 100 movies including the top 50 movies and the bottom 50 movies ranked by Production Budget. Values ranged from $1,100 to $400,000,000 with a median of $100,035,000. The new James Bond film No time to Die, which was originally due for release in April 2020 and now scheduled to be released in November and Christopher Nolan's film Tenet, which is due for release in July 2020, ranked 21 and 32 respectively were omitted from the study as their Worldwide Gross is equal to $0.
The use of green in the scatterplot was chosen, as it is symbolically synonymous with money. A Pearson's product-moment correlation revealed a correlation of .80 between Production Budget and Worldwide Gross. This means that 80.42% of the variance in Worldwide Gross can be explained by Production Budget. However, these results should be interpreted with caution, as correlation does not equal causation.  From these results, we can infer that an increase in Production Budget is related to an Increase in Worldwide Gross but other factors need to be taken into consideration when determining what Worldwide Gross at the Box Office can be attributed to. For instance, a good review in the newspaper or online such as IMDB and Rotten Tomatoes can persuade or discourage someone from seeing a movie.
What is noteworthy is that Avengers: Endgame is ranked 1 for Production Budget costing $400,000,000 to make and Worldwide Gross generating $2,797,800,564 worth of Box Office revenue. This is an indication that big budget movies do make the most money at the Box Office.
library(readxl)
## Warning: package 'readxl' was built under R version 3.5.3
library(readr)
## Warning: package 'readr' was built under R version 3.5.3
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.5.3
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tidyr)
## Warning: package 'tidyr' was built under R version 3.5.3
library(ggplot2)

Including Plots

You can also embed plots, for example:

movies.xlsx <- read.csv("~/Master of Data Science/Year 1/Data Visualisation Communication/movies.xlsx.csv")
movies1 <- movies.xlsx
movies1
##      Rank    Year                                            Movie
## 1       1    2019                                Avengers:_Endgame
## 2       2    2011       Pirates_of_the_Carribean:On_Stranger_Tides
## 3       3    2015                          Avengers:_Age_of_Ultron
## 4       4    2015             Star_Wars_Ep._VII:_The_Force_Awakens
## 5       5    2018                           Avengers:_Infinity_War
## 6       6    2007         Pirates_of_the_Carribean:_At_World's_End
## 7       7    2017                                   Justice_League
## 8       8    2015                                          Spectre
## 9       9    2019                 Star_Wars:_The_Rise_of_Skywalker
## 10     10    2018                          Solo:_A_Star_Wars_Story
## 11     11    2012                                      John_Carter
## 12     12    2016               Batman_v_Superman:_Dawn_of_Justice
## 13     13    2019                                    The_Lion_King
## 14     14    2010                                          Tangled
## 15     15    2007                                     Spider-Man_3
## 16     16    2016                       Captain_America:_Civil_War
## 17     17    2009           Harry_Potter_and_the_Half-Blood_Prince
## 18     18    2013              The_Hobbit:_The_Desolation_of_Smaug
## 19     19    2014        The_Hobbit:_The_Battle_of_the_Five_Armies
## 20     20    2017                          The_Fate_of_the_Furious
## 21     22    2009                                           Avatar
## 22     23    2006                                 Superman_Returns
## 23     24    2012                            The_Dark_Knight_Rises
## 24     25    2017 Pirates_of_the Caribbean:_Dead_Men_Tell_No_Tales
## 25     26    2008                                Quantum_of_Solace
## 26     27    2012                                     The_Avengers
## 27     28    2006       Pirates_of_the_Caribbean:_Dead_Man's_Chest
## 28     29    2013                                     Man_of_Steel
## 29     30    2008         The_Chronicles_of_Narnia:_Prince_Caspian
## 30     31    2013                                  The_Lone_Ranger
## 31     33    2012                           The_Amazing_Spider-Man
## 32     34    2012                                       Battleship
## 33     35    2017                    Transformers:_The_Last_Knight
## 34     36    2015                                   Jurassic_World
## 35     37    2012                                   Men_in_Black_3
## 36     38    2009              Transformers:_Revenge_of_the_Fallen
## 37     39    2014                  Transformers:_Age_of_Extinction
## 38     40    2006                            X-Men:_The_Last_Stand
## 39     41    2010                                       Robin_Hood
## 40     42    2005                                        King_Kong
## 41     43    2007                               The_Golden_Compass
## 42     44    2018                                    Black_Panther
## 43     45    1997                                          Titanic
## 44     46    2017                Star_Wars_Ep._VIII:_The_Last_Jedi
## 45     47    2018                                    Incredibles_2
## 46     48    2016                     Rogue_One:_A_Star_Wars_Story
## 47     49    2016                                     Finding_Dory
## 48     50    2019                                      Toy_Story_4
## 49     51    2010                                      Toy_Story_3
## 50     52    2013                                       Iron_Man_3
## 51  5,951    2014                                  Happy_Christmas
## 52  5,952    2005          Peace,_Propoganda_and_the_Promised_Land
## 53  5,953    2013                                         Absentia
## 54  5,954    1998                                               Pi
## 55  5,955    1998                   I_Love_You_..._Don't_Touch_Me!
## 56  5,956    1999                                         20_Dates
## 57  5,957    2004                                    Super_Size_Me
## 58  5,958    2013                            Supporting_Characters
## 59  5,964    1995                            The_Brothers_McMullen
## 60  5,965    2001                                         Gabriela
## 61  5,966    2010                                   Tiny_Furniture
## 62  5,967    2008                                       The_Signal
## 63  5,968    2015                                         Counting
## 64  5,976    2000                                George_Washington
## 65  5,978    2000                    Smiling_Fish_and_Goat_on_Fire
## 66  5,979    2010                               The_Exploding_Girl
## 67  5,980    2011                                   Raymond_Did_It
## 68  5,982    1991                                   The_Last_Waltz
## 69  5,986    2008                          The_Legend_of_God's_Gun
## 70  5,987    2016                                           Krisha
## 71  5,988    2006                              Mutual_Appreciation
## 72  5,989    2005                                      Funny_Ha_Ha
## 73  5,990    2010                                     Down_Terrace
## 74  5,993    1994                                           Clerks
## 75  5,994    1999                                   Pink_Narcissus
## 76  5,995    2017                                            Emily
## 77  5,996    1972                                      Deep_Throat
## 78  5,997    1997                            In_the_Company_of_Men
## 79  5,998    2000                                    The_Terrorist
## 80  5,999    2015                                           Exeter
## 81  6,003    1991                                          Slacker
## 82  6,005    2002                                     Steel_Spirit
## 83  6,011    2006                                  The_Puffy_Chair
## 84  6,012    2010                                 Breaking_Upwards
## 85  6,014    1997                                   Pink_Flamingos
## 86  6,015    2006                         Grip:_A_Criminal's_Story
## 87  6,017    2001                                          Dayereh
## 88  6,018    2006                                            Clean
## 89  6,019    2001                                             Cure
## 90  6,020    2004                                   On_the_Downlow
## 91  6,021    1996                                             Bang
## 92  6,022    2008                  The_Rise_and_Fall_of_Miss_Thang
## 93  6,024    2012                                        Newlyweds
## 94  6,025    1993                                      El_Mariachi
## 95  6,026    2004                                           Primer
## 96  6,027    2006                                           Cavite
## 97  6,028 Unknown                                  The_Mongol_King
## 98  6,030    1999                                        Following
## 99  6,031    2005                    Return_to_the_Land_of_Wonders
## 100 6,033    2005                                My_Date_with_Drew
##     Worldwide_Gross Production_Budget
## 1        2797800564         400000000
## 2        1045663875         379000000
## 3        1396099202         365000000
## 4        2068223624         306000000
## 5        2048359754         300000000
## 6         963420425         300000000
## 7         655945209         300000000
## 8         879620923         300000000
## 9        1074141030         275000000
## 10        393151347         275000000
## 11        282778100         263700000
## 12        873634919         263000000
## 13       1656943394         260000000
## 14        585727091         260000000
## 15        894860230         258000000
## 16       1153284349         250000000
## 17        935213767         250000000
## 18        960241522         250000000
## 19        945577621         250000000
## 20       1238764765         250000000
## 21       2788701337         237000000
## 22        391081192         232000000
## 23       1084439099         230000000
## 24        788241137         230000000
## 25        591692078         230000000
## 26       1515100211         225000000
## 27       1066215812         225000000
## 28        667999518         225000000
## 29        417341288         225000000
## 30        260002115         225000000
## 31        757890267         220000000
## 32        313477717         220000000
## 33        602893340         217000000
## 34       1670400637         215000000
## 35        654213485         215000000
## 36        836519699         210000000
## 37       1104054072         210000000
## 38        459260946         210000000
## 39        322459006         210000000
## 40        550517357         207000000
## 41        367262558         205000000
## 42       1346913161         200000000
## 43       2208208395         200000000
## 44       1332539889         200000000
## 45       1242805359         200000000
## 46       1056057273         200000000
## 47       1028570889         200000000
## 48       1073394813         200000000
## 49       1448203157         200000000
## 50       1215392272         200000000
## 51            30312             70000
## 52             4930             70000
## 53             8555             70000
## 54          4678513             68000
## 55            33598             68000
## 56           602920             66000
## 57         22233808             65000
## 58             4917             60000
## 59         10426506             50000
## 60          2335352             50000
## 61           424149             50000
## 62           406299             50000
## 63             8374             50000
## 64           342722             42000
## 65           277233             40000
## 66            25572             40000
## 67             3632             40000
## 68           322563             35000
## 69           243768             30000
## 70           144822             30000
## 71           103509             30000
## 72            82698             30000
## 73             9812             30000
## 74          3894240             27000
## 75             8231             27000
## 76             3547             27000
## 77         45000000             25000
## 78          2883661             25000
## 79           195043             25000
## 80           489792             25000
## 81          1227508             23000
## 82             1860             20000
## 83           195254             15000
## 84           115592             15000
## 85           413802             12000
## 86             1336             12000
## 87           683509             10000
## 88           138711             10000
## 89            94596             10000
## 90             1987             10000
## 91              527             10000
## 92              401             10000
## 93             4584              9000
## 94          2041928              7000
## 95           841926              7000
## 96            71644              7000
## 97              900              7000
## 98           240495              6000
## 99             1338              5000
## 100          181041              1100
ggplot() +
  coord_cartesian() +
  scale_x_continuous(name = "Production_Budget") +
  scale_y_continuous(name = "Worldwide_Gross") +
  layer(
    data = movies1,
    mapping = aes(x = Production_Budget, y = Worldwide_Gross),
    stat = "identity",
    geom = "point",
    position = position_identity()
  ) 

ggplot(movies1, aes(x=Production_Budget, y=Worldwide_Gross)) +
  geom_point(color = "green",
             alpha = .6) +
  
scale_x_continuous(breaks = seq(1100, 400000000, 100000000), 
                   limits=c(1100, 400000000)) +
  scale_y_continuous(breaks = seq(401, 2797800000, 100000000), 
                     limits=c(101, 2797800000))
## Warning: Removed 1 rows containing missing values (geom_point).

summary(movies.xlsx)

cor.test(movies.xlsx\(Production_Budget, movies.xlsx\)Worldwide_Gross)

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.