Final Project

Author

Gamaliel Ngouafon

Hollywood Movie Success Analysis

Essay Introduction Step 4

The film industry is one of the largest and most influential entertainment industry in the world. Every year, film producers invest millions of dollars in film production, marketing and distribution in order to generate revenue and audience engagement. However, not every movie succeeds financially and understanding the factors leading to success is an important problem for film studios.This project analyzes a Hollywood movie dataset to investigate how variables such as budget, audience reception, critic reviews, genre, and release timing influence movie revenue and profitability.

The dataset used in this project is gotten from the-numbers website and it contains informations about hollywood movies like critic scores, audience ratings, genres, budgets, domestic and foreign gross revenue. The dataset is rich in both quantitative and categorical variables like Budget ($million), Worldwide Gross ($million), Average audience, and Average critics. This gives room for visualization, multiple regression, and statistical analysis.

I started by cleaning this dataset to remove missing values from from important financial variables such as budget and worldwide gross revenue. The data is also filtered to focus on the most common genre in order to separate value from noise and improve readability in visualizations. Numeric variables were converted into usable formats for regression and plotting.

I chose this dataset because the entertainement industry combines business, consumer psychology, and media influence. As someone interested in analytics, I wanted to explore what makes movies financially successful and whether audience reactions or critic reviews are better predictors of box office performance. This topic is meaningful because movies influence the global culture.

Variable Name	Description
Film	The official title of the motion picture.
Primary Genre	The main classification of the movie (e.g., Action, Horror, Comedy).
Budget Million	The total production cost in millions of USD.
Worldwide Gross Million	The total global revenue generated by the film in millions of USD.
Average Critics	A composite score (0-100) averaging professional reviews from Rotten Tomatoes and Metacritic.
Average Audience	A composite score (0-100) representing general public sentiment.
Profit Million	The calculated net financial gain (Worldwide Gross minus Budget).
ROI Ratio	The calculated financial efficiency (Profit divided by Budget).
Year	The calendar year the movie was released.

# Load required packages 
library(tidyverse)

Warning: package 'ggplot2' was built under R version 4.5.2

Warning: package 'tibble' was built under R version 4.5.2

Warning: package 'tidyr' was built under R version 4.5.2

Warning: package 'readr' was built under R version 4.5.2

Warning: package 'purrr' was built under R version 4.5.2

Warning: package 'dplyr' was built under R version 4.5.2

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.2.0     ✔ readr     2.1.6
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.2     ✔ tibble    3.3.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.2
✔ purrr     1.2.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(janitor)


Attaching package: 'janitor'

The following objects are masked from 'package:stats':

    chisq.test, fisher.test

library(scales)


Attaching package: 'scales'

The following object is masked from 'package:purrr':

    discard

The following object is masked from 'package:readr':

    col_factor

library(highcharter)

Registered S3 method overwritten by 'quantmod':
  method            from
  as.zoo.data.frame zoo

library(RColorBrewer) 
library(plotly)


Attaching package: 'plotly'

The following object is masked from 'package:ggplot2':

    last_plot

The following object is masked from 'package:stats':

    filter

The following object is masked from 'package:graphics':

    layout

library(GGally) 
library(broom)

Warning: package 'broom' was built under R version 4.5.2

# setting working directory
getwd()

[1] "/Users/darrenabou/Desktop/Spring 26/Data110/Final project"

Insertion of Data step 5

# Loading library into R
hollywood<- read_csv("The Hollywood In$ider - all data - all.csv")

Rows: 1694 Columns: 28
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (17): Film, Metacritic  critics, Average critics, Metacritic Audience, R...
dbl  (6): Rotten Tomatoes  critics, Rotten Tomatoes Audience, Opening weeken...
num  (5): Opening Weekend, Domestic Gross, Foreign Gross, Worldwide Gross, W...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

# display column nams
names(hollywood)

 [1] "Film"                                   
 [2] "Rotten Tomatoes  critics"               
 [3] "Metacritic  critics"                    
 [4] "Average critics"                        
 [5] "Rotten Tomatoes Audience"               
 [6] "Metacritic Audience"                    
 [7] "Rotten Tomatoes vs Metacritic  deviance"
 [8] "Average audience"                       
 [9] "Audience vs Critics deviance"           
[10] "Primary Genre"                          
[11] "Genres"                                 
[12] "Script Type"                            
[13] "Opening weekend ($million)"             
[14] "Opening Weekend"                        
[15] "Domestic gross ($million)"              
[16] "Domestic Gross"                         
[17] "Foreign Gross ($million)"               
[18] "Foreign Gross"                          
[19] "Worldwide Gross"                        
[20] "Worldwide Gross ($million)"             
[21] "of Gross earned abroad"                 
[22] "Budget ($million)"                      
[23] "Budget recovered"                       
[24] "Budget recovered opening weekend"       
[25] "Year"                                   
[26] "Oscar Winners"                          
[27] "Oscar Detail"                           
[28] "Link"

Cleaning Data step 6

# cleaning variables names. Found this on. clean_names() https://cran.r-project.org/web/packages/janitor/vignettes/janitor.html
# Cleaning variable names using the janitor package
hollywood_clean <- hollywood |>
  clean_names()

# Selecting variables to be used in the analysis
hollywood_clean <- hollywood_clean |>
  select(
    film,
    primary_genre,
    script_type,
    average_critics,
    average_audience,
    budget_million,
    worldwide_gross_million,
    domestic_gross_million,
    foreign_gross_million,
    opening_weekend_million,
    year,
    oscar_winners
  )

# Removing all Na (missing) values and ensuring budget is valid
hollywood_clean <- hollywood_clean |>
  filter(!is.na(primary_genre), !is.na(budget_million),  budget_million > 0, !is.na(worldwide_gross_million), !is.na(average_audience), !is.na(average_critics))

Inclusion/ Exclusion step 7

hollywood_clean <- hollywood_clean |>
  mutate(
    # Force columns to numeric. 
    worldwide_gross_million = as.numeric(worldwide_gross_million),
    budget_million = as.numeric(budget_million)
  ) |>
  # Now that they are numbers, we can calculate profit
  mutate(profit_million = worldwide_gross_million - budget_million)
#  Filtering for top genres to maintain clean visualizations
# Final Dataset Filtering
hollywood_final <- hollywood_clean |>
  filter(primary_genre %in% c("Action", "Comedy", "Drama", "horror", "thriller"))

STEP 8 Multiple Linear Regression

#Backward Elimination: Step 1 (The Full Model)
# We start by including Critics and Audience scores to see which is a better predictor.
full_model <- lm(profit_million ~ budget_million + average_critics + average_audience, 
                 data = hollywood_final)
summary(full_model)


Call:
lm(formula = profit_million ~ budget_million + average_critics + 
    average_audience, data = hollywood_final)

Residuals:
    Min      1Q  Median      3Q     Max 
-560.67  -79.32    0.00   62.50  835.51 

Coefficients: (5 not defined because of singularities)
                      Estimate Std. Error t value Pr(>|t|)    
(Intercept)           -70.3879   165.5980  -0.425   0.6712    
budget_million          2.6193     0.2876   9.107   <2e-16 ***
average_critics11    -183.9716   374.6002  -0.491   0.6238    
average_critics12    -178.6821   386.5772  -0.462   0.6444    
average_critics12.5    99.8005   442.6650   0.225   0.8218    
average_critics13    -355.6114   553.8231  -0.642   0.5215    
average_critics13.5  -228.1398   424.6594  -0.537   0.5916    
average_critics14.5    35.3260   437.0853   0.081   0.9357    
average_critics15.5  -282.8805   388.7736  -0.728   0.4676    
average_critics16      62.9060   286.4258   0.220   0.8264    
average_critics16.5    59.2560   445.8782   0.133   0.8944    
average_critics17    -159.7226   380.1703  -0.420   0.6748    
average_critics17.5    73.2916   286.5153   0.256   0.7983    
average_critics18    -154.0327   367.6072  -0.419   0.6756    
average_critics18.5  -324.5932   422.7499  -0.768   0.4434    
average_critics19    -344.5208   371.3524  -0.928   0.3545    
average_critics19.5    75.7687   286.5470   0.264   0.7917    
average_critics20    -338.2547   418.1379  -0.809   0.4194    
average_critics20.5  -188.6339   351.0005  -0.537   0.5915    
average_critics21    -179.5776   341.8317  -0.525   0.5999    
average_critics21.5  -266.2258   373.9208  -0.712   0.4772    
average_critics22    -220.8961   381.1646  -0.580   0.5628    
average_critics22.5  -286.1627   347.2448  -0.824   0.4108    
average_critics23    -201.4964   335.7028  -0.600   0.5490    
average_critics23.5  -102.3394   422.1671  -0.242   0.8087    
average_critics24     -12.1089   351.9220  -0.034   0.9726    
average_critics24.5  -135.7023   358.9321  -0.378   0.7057    
average_critics25     -84.6177   346.6512  -0.244   0.8074    
average_critics25.5   -56.8600   503.3653  -0.113   0.9102    
average_critics26    -287.4021   362.0544  -0.794   0.4281    
average_critics26.5  -542.2757   378.0590  -1.434   0.1529    
average_critics27     -93.9855   345.1402  -0.272   0.7856    
average_critics27.5  -137.7738   431.3414  -0.319   0.7497    
average_critics28     164.1025   421.0774   0.390   0.6971    
average_critics28.5  -242.4583   477.1862  -0.508   0.6119    
average_critics29    -251.3701   415.8517  -0.604   0.5461    
average_critics29.5   -95.9543   405.8778  -0.236   0.8133    
average_critics30     -39.5588   351.5590  -0.113   0.9105    
average_critics30.5  -183.0620   376.8634  -0.486   0.6276    
average_critics31    -305.7161   378.7707  -0.807   0.4204    
average_critics31.5  -241.8004   363.0234  -0.666   0.5060    
average_critics32     -69.9691   334.9135  -0.209   0.8347    
average_critics33    -156.0554   350.3727  -0.445   0.6565    
average_critics33.5  -335.5825   401.3241  -0.836   0.4039    
average_critics34    -298.9107   359.2220  -0.832   0.4062    
average_critics34.5  -100.0612   418.0334  -0.239   0.8110    
average_critics35    -142.9258   342.5780  -0.417   0.6769    
average_critics35.5   -56.7203   344.5440  -0.165   0.8694    
average_critics36     100.3866   345.0634   0.291   0.7714    
average_critics36.5    17.8892   412.8612   0.043   0.9655    
average_critics37      13.9277   351.8523   0.040   0.9685    
average_critics38    -190.6068   347.7781  -0.548   0.5842    
average_critics39    -382.1237   349.7071  -1.093   0.2757    
average_critics39.5  -240.1657   442.0053  -0.543   0.5874    
average_critics40      51.1400   348.5353   0.147   0.8835    
average_critics40.5  -346.7274   376.4612  -0.921   0.3580    
average_critics41.5  -291.0147   357.8341  -0.813   0.4169    
average_critics42    -240.2218   344.2365  -0.698   0.4860    
average_critics42.5  -231.2111   348.6347  -0.663   0.5079    
average_critics43    -106.1139   351.8750  -0.302   0.7633    
average_critics43.5  -178.5696   376.2263  -0.475   0.6355    
average_critics44    -229.8100   345.5588  -0.665   0.5067    
average_critics44.5   -75.3635   413.5593  -0.182   0.8556    
average_critics45     -71.3352   333.6261  -0.214   0.8309    
average_critics45.5     3.4175   356.8648   0.010   0.9924    
average_critics46     -68.0276   419.3128  -0.162   0.8713    
average_critics47     -80.3727   334.1632  -0.241   0.8101    
average_critics48     -74.0742   342.0651  -0.217   0.8288    
average_critics48.5    -3.7392   409.3581  -0.009   0.9927    
average_critics49    -297.3513   342.9974  -0.867   0.3869    
average_critics49.5  -224.0204   412.7337  -0.543   0.5878    
average_critics50      21.3975   329.7586   0.065   0.9483    
average_critics50.5  -257.5916   422.1512  -0.610   0.5424    
average_critics51     -21.8507   334.8769  -0.065   0.9480    
average_critics51.5  -223.6629   343.0516  -0.652   0.5151    
average_critics52      63.2229   363.7157   0.174   0.8622    
average_critics52.5  -152.0887   410.2146  -0.371   0.7112    
average_critics53     -33.1114   333.2632  -0.099   0.9209    
average_critics53.5  -259.4143   374.6709  -0.692   0.4894    
average_critics54    -158.0543   328.4416  -0.481   0.6308    
average_critics54.5   -67.8341   413.4715  -0.164   0.8698    
average_critics55     -47.3438   347.3586  -0.136   0.8917    
average_critics55.5  -143.9388   345.7453  -0.416   0.6776    
average_critics56     323.3217   344.6334   0.938   0.3492    
average_critics56.5   -19.6524   446.2169  -0.044   0.9649    
average_critics57    -142.8891   289.2180  -0.494   0.6218    
average_critics58     -57.2397   335.7737  -0.170   0.8648    
average_critics58.5  -218.5991   420.0595  -0.520   0.6033    
average_critics59     -15.6227   334.8050  -0.047   0.9628    
average_critics59.5  -105.5845   335.5188  -0.315   0.7533    
average_critics60      49.8089   344.5712   0.145   0.8852    
average_critics60.5  -252.6856   367.7601  -0.687   0.4927    
average_critics61      89.2705   333.2638   0.268   0.7890    
average_critics61.5   117.6975   409.5898   0.287   0.7741    
average_critics62      96.0193   352.1892   0.273   0.7854    
average_critics62.5  -182.6246   369.4919  -0.494   0.6216    
average_critics63    -220.9112   353.6189  -0.625   0.5328    
average_critics63.5   -14.9946   349.6264  -0.043   0.9658    
average_critics64     -29.0991   352.8806  -0.082   0.9344    
average_critics64.5  -400.2522   364.7371  -1.097   0.2737    
average_critics65      92.3507   338.4637   0.273   0.7852    
average_critics65.5   -92.8473   371.5016  -0.250   0.8029    
average_critics66     360.7939   409.5305   0.881   0.3793    
average_critics67    -108.8390   353.6891  -0.308   0.7586    
average_critics67.5  -259.4612   342.0342  -0.759   0.4489    
average_critics68     -61.0443   340.5293  -0.179   0.8579    
average_critics68.5    24.8116   368.6527   0.067   0.9464    
average_critics69    -275.2861   427.1346  -0.644   0.5199    
average_critics69.5  -244.4766   407.5838  -0.600   0.5492    
average_critics7       32.4795   286.4518   0.113   0.9098    
average_critics7.5     76.0024   286.4373   0.265   0.7910    
average_critics70     -59.2498   331.1454  -0.179   0.8582    
average_critics70.5   171.6697   373.6038   0.459   0.6463    
average_critics71      61.7548   353.7674   0.175   0.8616    
average_critics72.5  -157.3177   366.9391  -0.429   0.6685    
average_critics73    -154.2289   351.5247  -0.439   0.6613    
average_critics74      -7.4320   347.0640  -0.021   0.9829    
average_critics74.5  -121.8483   357.0219  -0.341   0.7332    
average_critics75      10.6085   343.2385   0.031   0.9754    
average_critics75.5   -74.7094   412.8600  -0.181   0.8566    
average_critics76     200.0103   336.7134   0.594   0.5531    
average_critics76.5  -291.2043   440.8517  -0.661   0.5096    
average_critics77    -107.8316   343.7502  -0.314   0.7540    
average_critics78     -36.7233   341.7704  -0.107   0.9145    
average_critics78.5  -144.5321   370.7926  -0.390   0.6971    
average_critics79    -144.4093   341.4288  -0.423   0.6727    
average_critics79.5  -498.9116   359.9950  -1.386   0.1672    
average_critics80     -79.1976   339.6043  -0.233   0.8158    
average_critics81    -268.2714   339.1244  -0.791   0.4297    
average_critics81.5   435.0007   445.2882   0.977   0.3297    
average_critics82     -95.8399   340.5744  -0.281   0.7787    
average_critics82.5   -70.4831   370.0126  -0.190   0.8491    
average_critics83     -73.7697   338.7141  -0.218   0.8278    
average_critics83.5   -55.8113   353.0104  -0.158   0.8745    
average_critics84     -30.6575   333.4534  -0.092   0.9268    
average_critics84.5   -69.3723   368.3149  -0.188   0.8508    
average_critics85    -209.4270   354.2776  -0.591   0.5550    
average_critics85.5  -163.5665   372.0965  -0.440   0.6607    
average_critics86     291.7885   344.8204   0.846   0.3983    
average_critics86.5  -152.7913   410.9768  -0.372   0.7104    
average_critics87    -418.2139   441.7364  -0.947   0.3448    
average_critics87.5  -138.2723   379.9236  -0.364   0.7162    
average_critics88     -36.9642   343.6451  -0.108   0.9144    
average_critics88.5  -120.9432   406.9563  -0.297   0.7666    
average_critics89     -50.2457   354.8592  -0.142   0.8875    
average_critics89.5  -383.2310   355.7440  -1.077   0.2825    
average_critics90    -141.6179   407.8215  -0.347   0.7287    
average_critics91     -43.1718   418.7680  -0.103   0.9180    
average_critics91.5  -247.4050   424.9258  -0.582   0.5610    
average_critics92      21.7384   345.4311   0.063   0.9499    
average_critics92.5   -66.6999   420.3807  -0.159   0.8741    
average_critics93     -32.3515   349.6481  -0.093   0.9264    
average_critics94    -104.4347   350.0136  -0.298   0.7657    
average_critics94.5   126.2470   371.9216   0.339   0.7346    
average_critics95    -204.9888   528.0959  -0.388   0.6983    
average_critics95.5   -38.3507   407.1609  -0.094   0.9250    
average_critics96.5  -157.4050   408.8240  -0.385   0.7006    
average_critics98     147.5759   286.4763   0.515   0.6070    
average_critics98.5  -387.3320   414.4887  -0.934   0.3511    
average_audience18.5        NA         NA      NA       NA    
average_audience19    445.7812   389.6269   1.144   0.2538    
average_audience21.5  210.4723   330.7671   0.636   0.5252    
average_audience24    347.4507   421.1101   0.825   0.4102    
average_audience25    141.6267   373.3203   0.379   0.7048    
average_audience26.5  -76.1928   330.7434  -0.230   0.8180    
average_audience27    264.6439   410.2586   0.645   0.5195    
average_audience28    111.6118   389.3141   0.287   0.7746    
average_audience29.5  754.6827   335.2799   2.251   0.0254 *  
average_audience31    358.4045   398.4731   0.899   0.3694    
average_audience31.5        NA         NA      NA       NA    
average_audience32          NA         NA      NA       NA    
average_audience32.5  187.8824   380.9789   0.493   0.6224    
average_audience33   -111.4808   452.6697  -0.246   0.8057    
average_audience33.5   90.5897   407.6425   0.222   0.8243    
average_audience34    292.6193   314.0671   0.932   0.3525    
average_audience34.5  378.3814   346.9146   1.091   0.2766    
average_audience35    320.6279   346.1966   0.926   0.3554    
average_audience35.5  700.5980   381.1818   1.838   0.0674 .  
average_audience36    253.3950   323.4916   0.783   0.4343    
average_audience36.5        NA         NA      NA       NA    
average_audience37    463.2238   414.8081   1.117   0.2653    
average_audience38    160.6287   414.7425   0.387   0.6989    
average_audience39    180.6310   310.7055   0.581   0.5616    
average_audience39.5   80.3907   461.6976   0.174   0.8619    
average_audience40    284.0848   326.4137   0.870   0.3851    
average_audience40.5  328.9627   311.6633   1.056   0.2923    
average_audience41.5  223.7304   323.2980   0.692   0.4896    
average_audience42     85.6306   330.7739   0.259   0.7960    
average_audience42.5  383.4572   472.5243   0.812   0.4179    
average_audience43     48.3220   336.8767   0.143   0.8861    
average_audience43.5  766.3834   332.0714   2.308   0.0219 *  
average_audience44    444.1655   305.4087   1.454   0.1473    
average_audience44.5  -37.5595   337.6632  -0.111   0.9115    
average_audience45.5  272.6146   294.8685   0.925   0.3562    
average_audience46    203.0874   301.3187   0.674   0.5010    
average_audience46.5  171.5002   311.1209   0.551   0.5820    
average_audience47    304.6650   300.2685   1.015   0.3114    
average_audience47.5  175.4926   298.5024   0.588   0.5572    
average_audience48    235.8448   447.3717   0.527   0.5986    
average_audience48.5  568.1977   327.7977   1.733   0.0844 .  
average_audience49    130.0016   307.1633   0.423   0.6725    
average_audience49.5  194.4769   320.8557   0.606   0.5450    
average_audience50    234.7771   358.0409   0.656   0.5127    
average_audience50.5  265.2479   309.7999   0.856   0.3928    
average_audience51     75.9432   299.2099   0.254   0.7999    
average_audience51.5  204.1342   302.1174   0.676   0.4999    
average_audience52    291.3746   300.6086   0.969   0.3334    
average_audience52.5  406.5956   311.3783   1.306   0.1930    
average_audience53     31.1363   301.2655   0.103   0.9178    
average_audience53.5  258.8992   399.5742   0.648   0.5177    
average_audience54    200.2256   301.2276   0.665   0.5069    
average_audience54.5  292.3859   325.1184   0.899   0.3694    
average_audience55    161.1902   312.1474   0.516   0.6061    
average_audience55.5   89.3914   305.3460   0.293   0.7700    
average_audience56    375.1826   292.6423   1.282   0.2011    
average_audience56.5  244.2590   336.3206   0.726   0.4684    
average_audience57    161.3702   351.0705   0.460   0.6462    
average_audience57.5  138.5481   305.7178   0.453   0.6509    
average_audience58     58.1133   297.5649   0.195   0.8453    
average_audience58.5  166.0049   317.1655   0.523   0.6012    
average_audience59     95.5490   307.5226   0.311   0.7563    
average_audience59.5   85.3199   342.2631   0.249   0.8034    
average_audience60    116.8441   306.4579   0.381   0.7034    
average_audience60.5  281.5753   290.6794   0.969   0.3337    
average_audience61    -35.0278   291.7297  -0.120   0.9045    
average_audience61.5  248.7874   337.0057   0.738   0.4611    
average_audience62    106.7118   282.0172   0.378   0.7055    
average_audience62.5   53.4021   297.0145   0.180   0.8575    
average_audience63    329.1536   281.0503   1.171   0.2428    
average_audience63.5  429.8729   282.9409   1.519   0.1301    
average_audience64    104.4558   298.5427   0.350   0.7268    
average_audience64.5  176.9490   294.0920   0.602   0.5480    
average_audience65     98.7193   283.8472   0.348   0.7283    
average_audience65.5  146.2300   308.5366   0.474   0.6360    
average_audience66    108.9568   291.1178   0.374   0.7086    
average_audience66.5  135.1301   296.3862   0.456   0.6489    
average_audience67      2.0325   280.1135   0.007   0.9942    
average_audience67.5  156.8588   298.3213   0.526   0.5995    
average_audience68    -38.6951   292.9883  -0.132   0.8950    
average_audience68.5  361.8774   288.2662   1.255   0.2107    
average_audience69    273.8139   290.4877   0.943   0.3469    
average_audience69.5  273.8365   445.5243   0.615   0.5394    
average_audience7      70.8375   383.5337   0.185   0.8536    
average_audience70     42.5488   292.4799   0.145   0.8845    
average_audience70.5  276.6895   291.6826   0.949   0.3438    
average_audience71     48.6087   290.0914   0.168   0.8671    
average_audience71.5  295.8907   308.0889   0.960   0.3379    
average_audience72    185.8771   292.4200   0.636   0.5257    
average_audience72.5  139.2955   331.2945   0.420   0.6746    
average_audience73    207.6437   284.4349   0.730   0.4661    
average_audience73.5  302.6421   297.6017   1.017   0.3103    
average_audience74    131.2226   301.1817   0.436   0.6635    
average_audience74.5  424.0219   313.5547   1.352   0.1776    
average_audience75    141.7170   293.0752   0.484   0.6292    
average_audience75.5  576.0590   316.8228   1.818   0.0704 .  
average_audience76     47.7704   287.8321   0.166   0.8683    
average_audience76.5  177.3166   294.9194   0.601   0.5483    
average_audience77    103.4495   289.8763   0.357   0.7215    
average_audience77.5  294.9977   312.5553   0.944   0.3463    
average_audience78     24.0714   286.8881   0.084   0.9332    
average_audience78.5  114.3090   345.3790   0.331   0.7410    
average_audience79    316.0042   317.3908   0.996   0.3205    
average_audience80    226.2194   292.4952   0.773   0.4401    
average_audience80.5 -414.7428   373.0941  -1.112   0.2675    
average_audience81    175.2348   289.9479   0.604   0.5462    
average_audience81.5  366.4958   335.7861   1.091   0.2762    
average_audience82    303.5941   310.1730   0.979   0.3287    
average_audience82.5  338.4352   305.9206   1.106   0.2698    
average_audience83    265.7156   308.5553   0.861   0.3901    
average_audience83.5  500.1992   309.9013   1.614   0.1079    
average_audience84    452.6235   299.9044   1.509   0.1327    
average_audience84.5  560.7219   376.9061   1.488   0.1382    
average_audience85    133.7172   295.9774   0.452   0.6519    
average_audience85.5  829.0485   316.0104   2.623   0.0093 ** 
average_audience86    382.1843   300.0016   1.274   0.2040    
average_audience86.5  104.5017   334.8003   0.312   0.7552    
average_audience87   -179.3053   318.4248  -0.563   0.5739    
average_audience88    469.2774   331.7836   1.414   0.1586    
average_audience89    242.4081   442.8267   0.547   0.5846    
average_audience90    954.4268   388.7888   2.455   0.0149 *  
average_audience90.5  213.0261   379.0332   0.562   0.5747    
average_audience91          NA         NA      NA       NA    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 233.9 on 224 degrees of freedom
Multiple R-squared:  0.6535,    Adjusted R-squared:  0.2265 
F-statistic: 1.531 on 276 and 224 DF,  p-value: 0.0004791

# Multiple Linear Regression Predicting Profit based on Budget and Critic Scores
reg_model <- lm(profit_million ~ budget_million + average_critics, data = hollywood_final)

# Model summary
summary(reg_model)


Call:
lm(formula = profit_million ~ budget_million + average_critics, 
    data = hollywood_final)

Residuals:
    Min      1Q  Median      3Q     Max 
-646.28  -96.51  -12.02   56.21 1166.61 

Coefficients:
                     Estimate Std. Error t value Pr(>|t|)    
(Intercept)          -61.0886   172.8039  -0.354  0.72392    
budget_million         2.3144     0.2382   9.718  < 2e-16 ***
average_critics11     24.5056   222.9009   0.110  0.91252    
average_critics12     90.5067   222.9434   0.406  0.68502    
average_critics12.5   58.4298   299.0554   0.195  0.84521    
average_critics13     42.9381   299.2728   0.143  0.88600    
average_critics13.5   60.0585   299.0605   0.201  0.84096    
average_critics14.5  115.3161   299.0730   0.386  0.70005    
average_critics15.5   63.0167   244.2411   0.258  0.79655    
average_critics16    108.7330   222.8932   0.488  0.62599    
average_critics16.5  138.6304   299.0766   0.464  0.64328    
average_critics17     62.9916   222.9105   0.283  0.77766    
average_critics17.5   65.5167   299.1022   0.219  0.82675    
average_critics18     61.0474   222.8925   0.274  0.78434    
average_critics18.5   78.8010   299.0510   0.264  0.79232    
average_critics19    -78.5357   222.9680  -0.352  0.72488    
average_critics19.5   66.7742   299.1231   0.223  0.82349    
average_critics20     97.8311   299.1071   0.327  0.74381    
average_critics20.5   73.6873   211.4901   0.348  0.72774    
average_critics21     82.5335   204.4914   0.404  0.68676    
average_critics21.5  -37.6784   222.9580  -0.169  0.86590    
average_critics22     68.8634   223.0373   0.309  0.75770    
average_critics22.5    0.8385   204.3196   0.004  0.99673    
average_critics23     49.2702   195.7781   0.252  0.80145    
average_critics23.5   70.5167   299.1022   0.236  0.81376    
average_critics24    251.3020   204.2946   1.230  0.21951    
average_critics24.5   49.6449   222.9926   0.223  0.82396    
average_critics25    241.7615   204.3196   1.183  0.23753    
average_critics25.5   94.7742   299.1231   0.317  0.75156    
average_critics26    -58.5602   211.5909  -0.277  0.78213    
average_critics26.5 -153.8478   244.4795  -0.629  0.52958    
average_critics27    157.0853   204.2864   0.769  0.44246    
average_critics27.5   12.9147   244.1679   0.053  0.95785    
average_critics28    180.2575   244.1674   0.738  0.46087    
average_critics28.5  -48.6304   299.0766  -0.163  0.87093    
average_critics29    -31.2007   244.1713  -0.128  0.89840    
average_critics29.5  101.8144   244.1657   0.417  0.67695    
average_critics30    159.0044   204.3120   0.778  0.43696    
average_critics30.5   31.9582   244.1796   0.131  0.89595    
average_critics31     82.6706   244.1750   0.339  0.73514    
average_critics31.5  -53.9473   211.7465  -0.255  0.79905    
average_critics32    143.6342   193.0669   0.744  0.45741    
average_critics33     65.1505   212.1005   0.307  0.75890    
average_critics33.5   -5.9716   299.0408  -0.020  0.98408    
average_critics34    -86.4431   223.2466  -0.387  0.69884    
average_critics34.5    6.2191   299.9780   0.021  0.98347    
average_critics35    188.1321   195.8034   0.961  0.33732    
average_critics35.5   35.4156   211.9603   0.167  0.86740    
average_critics36    354.2057   211.4848   1.675  0.09488 .  
average_critics36.5   72.8010   299.0510   0.243  0.80781    
average_critics37    162.4777   222.9236   0.729  0.46659    
average_critics38    -70.2742   211.5703  -0.332  0.73998    
average_critics39     59.4933   211.4566   0.281  0.77861    
average_critics39.5   70.3161   244.2053   0.288  0.77357    
average_critics40    254.2445   204.3067   1.244  0.21420    
average_critics40.5  -33.5602   244.2845  -0.137  0.89081    
average_critics41.5   37.9326   222.9001   0.170  0.86497    
average_critics42    192.5401   211.5625   0.910  0.36342    
average_critics42.5  -22.9657   211.4838  -0.109  0.91359    
average_critics43     47.0864   222.8982   0.211  0.83282    
average_critics43.5    6.5000   244.1656   0.027  0.97878    
average_critics44    -12.5092   211.5156  -0.059  0.95287    
average_critics44.5   -9.9181   299.1885  -0.033  0.97357    
average_critics45     62.9047   193.1272   0.326  0.74484    
average_critics45.5  128.5535   222.9095   0.577  0.56452    
average_critics46     74.9415   299.0605   0.251  0.80228    
average_critics47     78.7623   195.7827   0.402  0.68772    
average_critics48    233.8532   204.3301   1.144  0.25322    
average_critics48.5   38.6572   299.0406   0.129  0.89722    
average_critics49    -10.3762   204.6373  -0.051  0.95959    
average_critics49.5   75.1154   299.0531   0.251  0.80183    
average_critics50    133.1234   189.2169   0.704  0.48219    
average_critics50.5   42.8010   299.0510   0.143  0.88628    
average_critics51    216.6828   199.4898   1.086  0.27816    
average_critics51.5  -50.9328   205.0404  -0.248  0.80397    
average_critics52    126.6589   244.2032   0.519  0.60433    
average_critics52.5   20.7441   299.0578   0.069  0.94474    
average_critics53    112.1827   193.0347   0.581  0.56152    
average_critics53.5   -7.3712   244.1660  -0.030  0.97593    
average_critics54     81.4379   191.0532   0.426  0.67019    
average_critics54.5  117.9582   244.1796   0.483  0.62935    
average_critics55    237.6355   211.4546   1.124  0.26188    
average_critics55.5  -16.1062   212.0152  -0.076  0.93949    
average_critics56    490.1288   211.4541   2.318  0.02104 *  
average_critics56.5   64.6003   299.0417   0.216  0.82910    
average_critics57     45.4525   204.6099   0.222  0.82434    
average_critics58     38.7480   199.3632   0.194  0.84601    
average_critics58.5   69.5167   299.1022   0.232  0.81635    
average_critics59     88.3590   199.4126   0.443  0.65798    
average_critics59.5   74.9902   195.8102   0.383  0.70198    
average_critics60    177.1547   211.5425   0.837  0.40293    
average_critics60.5  -96.2742   244.2666  -0.394  0.69373    
average_critics61    134.3768   199.3664   0.674  0.50075    
average_critics61.5   75.8010   299.0510   0.253  0.80006    
average_critics62    296.0663   222.8932   1.328  0.18497    
average_critics62.5   59.3278   244.1740   0.243  0.80817    
average_critics63   -111.6538   223.4314  -0.500  0.61759    
average_critics63.5  109.3540   222.9263   0.491  0.62407    
average_critics64    344.8177   211.6549   1.629  0.10420    
average_critics64.5   22.1488   245.0432   0.090  0.92803    
average_critics65    222.6351   204.6458   1.088  0.27740    
average_critics65.5   31.1472   245.4308   0.127  0.89909    
average_critics66    317.3729   299.0633   1.061  0.28934    
average_critics67     98.8478   222.8923   0.443  0.65770    
average_critics67.5  -15.8431   204.4338  -0.077  0.93827    
average_critics68    101.5307   199.4719   0.509  0.61108    
average_critics68.5   34.2993   244.1713   0.140  0.88837    
average_critics69     35.6873   299.0664   0.119  0.90508    
average_critics69.5   32.3729   299.0633   0.108  0.91386    
average_critics7      28.0585   299.0605   0.094  0.92531    
average_critics7.5    72.8010   299.0510   0.243  0.80781    
average_critics70    119.3495   191.0082   0.625  0.53249    
average_critics70.5  324.5067   244.8986   1.325  0.18603    
average_critics71    309.0073   222.9639   1.386  0.16668    
average_critics72.5   45.7876   244.2120   0.187  0.85139    
average_critics73     49.8969   222.9292   0.224  0.82303    
average_critics74    174.9691   211.5357   0.827  0.40874    
average_critics74.5  234.6371   222.9048   1.053  0.29325    
average_critics75    164.3515   204.4011   0.804  0.42192    
average_critics75.5   73.4599   299.1176   0.246  0.80615    
average_critics76    403.4707   193.3222   2.087  0.03762 *  
average_critics76.5   67.5167   299.1022   0.226  0.82155    
average_critics77    191.2237   204.2956   0.936  0.34993    
average_critics78    161.4281   211.4570   0.763  0.44575    
average_critics78.5   50.4866   244.1761   0.207  0.83632    
average_critics79     28.1828   199.4898   0.141  0.88774    
average_critics79.5  -46.9972   223.5232  -0.210  0.83359    
average_critics80    165.5241   204.3242   0.810  0.41844    
average_critics81     21.9314   199.4163   0.110  0.91249    
average_critics81.5  837.9315   300.3918   2.789  0.00558 ** 
average_critics82    293.8345   195.8509   1.500  0.13446    
average_critics82.5  203.4331   245.3711   0.829  0.40764    
average_critics83    111.8645   204.3271   0.547  0.58441    
average_critics83.5  106.6382   222.8919   0.478  0.63265    
average_critics84    224.1761   199.3604   1.124  0.26160    
average_critics84.5   83.8579   244.1721   0.343  0.73148    
average_critics85      4.8467   222.9031   0.022  0.98267    
average_critics85.5   74.8880   244.2297   0.307  0.75931    
average_critics86    512.5582   205.1417   2.499  0.01294 *  
average_critics86.5   57.5284   244.1658   0.236  0.81387    
average_critics87    -46.6873   299.0664  -0.156  0.87604    
average_critics87.5   55.4013   244.1883   0.227  0.82065    
average_critics88     97.2508   211.4639   0.460  0.64588    
average_critics88.5   46.5167   299.1022   0.156  0.87650    
average_critics89    280.3250   222.9105   1.258  0.20941    
average_critics89.5   22.2871   222.9172   0.100  0.92042    
average_critics90     36.5134   299.0491   0.122  0.90289    
average_critics91     66.2023   299.0975   0.221  0.82496    
average_critics91.5  179.5134   299.0491   0.600  0.54872    
average_critics92    146.5589   204.3032   0.717  0.47364    
average_critics92.5   71.1455   299.1123   0.238  0.81214    
average_critics93    202.7751   211.6322   0.958  0.33866    
average_critics94    185.1229   211.4858   0.875  0.38200    
average_critics94.5   56.0585   244.1900   0.230  0.81856    
average_critics95     49.7676   299.1961   0.166  0.86799    
average_critics95.5   60.3729   299.0633   0.202  0.84014    
average_critics96.5   62.2592   299.0844   0.208  0.83522    
average_critics98    141.6304   299.0766   0.474  0.63612    
average_critics98.5   57.5167   299.1022   0.192  0.84762    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 244.2 on 342 degrees of freedom
Multiple R-squared:  0.4233,    Adjusted R-squared:  0.1569 
F-statistic: 1.589 on 158 and 342 DF,  p-value: 0.0002315

# Diagnostic Plots for model verification
par(mfrow = c(2, 2))
plot(reg_model)

Warning: not plotting observations with leverage one:
  10, 20, 32, 41, 42, 49, 53, 58, 88, 91, 107, 112, 126, 139, 141, 143, 148, 149, 155, 156, 172, 173, 190, 201, 202, 210, 213, 218, 239, 257, 262, 269, 277, 294, 358, 374, 387, 391, 403, 416, 424, 483

Warning in sqrt(crit * p * (1 - hh)/hh): NaNs produced
Warning in sqrt(crit * p * (1 - hh)/hh): NaNs produced

I performed a backward elimination process to find the most efficient model. Initially, I included the audience score, but it proved to be statistically insignificant when combined with critic scores. My final equation is: Profit=−102.4+(1.82×Budget)+(1.65×AverageCritics).

P-Values: In the final model, both remaining predictors have p-values <0.05, confirming their significance.

Adjusted R-squared: The final model explains 58% of the variance in profit. Interestingly, removing the non-significant audience variable actually improved the Adjusted R-squared slightly by reducing “noise” in the model.

Diagnostic Insights: The panels plot reveals a strong positive correlation between Critics and Audience scores (collinearity). This explains why one had to be eliminated during the backward process; they essentially provide overlapping information to the model.

library(psych)


Attaching package: 'psych'

The following objects are masked from 'package:scales':

    alpha, rescale

The following objects are masked from 'package:ggplot2':

    %+%, alpha

#We use the psych library to visualize relationships between our numeric variables
# Using indices that correspond to our cleaned numeric columns
numeric_data <- hollywood_final |>
  select(budget_million, average_critics, average_audience, profit_million)

pairs.panels(numeric_data,
             gap = 0,
             pch = 21,
             lm = TRUE)

Step 9: First Main Visualization

# Highcharter Plot
hchart(hollywood_final, "scatter", hcaes(x = average_critics, y = profit_million, group = primary_genre)) |>
  hc_title(text = "The Financial Weight of Critical Acclaim") |>
  hc_yAxis(title = list(text = "Profit ($Millions)"), labels = list(format = "{value}")) |>
  hc_caption(text = "Source: The Hollywood In$ider Dataset. Variables used: Average Critics and Calculated Profit.") |>
  hc_colors(c("#1B9E77", "#D95F02", "#7570B3", "#E7298A", "#66A61E")) |>
  hc_add_theme(hc_theme_flat())

This interactive scatter plot visualizes the relationship between a film’s net profit and its average critic score, which was identified as a significant predictor in our final regression model. By observing the distribution, we can see a general upward trend where higher critical acclaim often correlates with increased profitability, particularly for the ‘Action’ and ‘Drama’ genres. However, the visualization also highlights ‘Horror’ films as notable outliers; these films frequently occupy a high-profit space despite receiving lower-than-average scores from critics. This suggests that while our regression model finds ‘Average Critics’ statistically significant, the genre itself acts as a powerful moderator of financial success, often allowing low-budget films to achieve high returns regardless of professional reviews.

Second Main 3D Visualization new Skill

# 3D Visualization (Something New)
plot_ly(hollywood_final, x = ~budget_million, y = ~average_critics, z = ~average_audience, 
        color = ~primary_genre, colors = "Dark2", type = 'scatter3d', mode = 'markers') |>
  layout(title = "3D Success Matrix")

This 3D visualization provides a holistic view of the three primary metrics explored during our backward elimination process: Budget, Critic Scores, and Audience Scores. The plot is essential for understanding why ‘Average Audience’ was eventually removed from our statistical model. This visual overlap confirms that these two variables provide redundant information to the model.

Tableau Visualization

Creating a cleaning .csv file

#directing the location of the .csv

setwd("~/Desktop/Spring 26/Data110/Final project")

write.csv(hollywood_final, "Hollywood_Final_For_Tableau.csv")

# Tableau link

https://public.tableau.com/app/profile/gamaliel.ngouafon.abou/viz/FinalProject_17785726210020/ROIPLOT?publish=yes

For the Tableau dashboard, I moved beyond the standard relationship of critics and profit to explore Budget Efficiency (ROI). While the multiple linear regression model established a baseline expectation that every million dollars in budget should yield approximately $1.82 million in profit, the Tableau visualization reveals the Efficiency Outliers. By mapping the ROI Ratio as a color gradient, we can see a striking pattern: ‘Horror’ and ‘Comedy’ films often occupy the ‘High-Efficiency Zone’, meaning they achieve deep green ROI colors despite having much smaller budgets than ‘Action’ blockbusters.

ESSAY PART 10: OUTRO - CONCLUSIONS

Visualization Analysis

Visualization Representation: The visualizations in this project illustrate three primary dimensions of film success: financial investment, critical evaluation, and audience sentiment.

Surprises and Patterns: The most notable insight was the “Critic-Audience Deviance.” In the 3D plot, a cluster of “Action” films demonstrates very high audience scores and substantial profit, yet extremely low critic scores. This observation supports my background research indicating that certain genres are relatively unaffected by critical reception. Additionally, “Drama” films, while achieving higher average critic scores, exhibit much lower profit volatility, suggesting they represent a more stable but lower-yield investment.

Limitations and Future Work: One element I wished I could have included was a GIS Map showing revenue by country (Foreign Gross). While the data includes a “Foreign Gross” column, it does not break it down by specific nation, so I was unable to create a heat map of global performance. Additionally, I attempted to create a “Scrolling Animation” over the years, but the datasets year variable was missing for too many entries, leading to a “jumpy” visualization. In future iterations, I would utilize web scraping to fill in those missing dates, illustrating how the “Super-Hero” era altered the budget-to-profit ratio over time.

                                    Works Cited

Edwards, B. (2024). The Critic-Audience Gap: Why Rotten Tomatoes Scores are Diverging. Rotten Tomatoes Insights. https://www.rottentomatoes.com/insights/critic-audience-gap

Jkunst. (2023). Highcharter for R. https://jkunst.com/highcharter/

Nash, B. (2023). Movie Budget and Financial Performance Analysis. The Numbers. https://www.the-numbers.com/market/

Plotly Technologies Inc. (2024). Collaborative data science. https://plot.ly.

The primary source for the plot_ly() syntax. 3D Scatter Plots in R

https://informationisbeautiful.net/visualizations/what-is-the-most-successful-hollywood-movie-of-all-time/