Build a linear (or generalized linear) model as you like

Use whatever response variable and explanatory variables you prefer

# Loading necessary libraries
library(dplyr)

# Building a linear regression model with 'budget_x' as the response variable
model <- lm(budget_x ~ country + orig_lang, data = data)

# Displaying model summary
summary(model)
## 
## Call:
## lm(formula = budget_x ~ country + orig_lang, data = data)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -175288571  -43494336  -12800673   36543751  406543751 
## 
## Coefficients: (2 not defined because of singularities)
##                                                 Estimate Std. Error t value
## (Intercept)                                     91029829   30467017   2.988
## countryAT                                      -28996397   40518239  -0.716
## countryAU                                      -34672933   10435693  -3.323
## countryBE                                      -41062564   24029895  -1.709
## countryBO                                       19547417   55736563   0.351
## countryBR                                        4695662   16993522   0.276
## countryBY                                       89511657   57745155   1.550
## countryCA                                      -24776289   12368861  -2.003
## countryCH                                        8996383   26953295   0.334
## countryCL                                       -1952205   20281814  -0.096
## countryCN                                      -15109433   13541329  -1.116
## countryCO                                      -14976723   17046408  -0.879
## countryCZ                                       12011342   56048680   0.214
## countryDE                                       -8989760   13397361  -0.671
## countryDK                                      -31803777   17658111  -1.801
## countryDO                                       -9992583   55736563  -0.179
## countryES                                      -12514448    9711528  -1.289
## countryFI                                      -63889690   33047862  -1.933
## countryFR                                       -7561669   12001347  -0.630
## countryGB                                      -11785870   11198501  -1.052
## countryGR                                      -85166806   48817114  -1.745
## countryGT                                       -7192583   39878129  -0.180
## countryHK                                        -333757   13314251  -0.025
## countryHU                                        4070819   56048680   0.073
## countryID                                       70542857   24479312   2.882
## countryIE                                        8941728   19609861   0.456
## countryIL                                        4070819   56048680   0.073
## countryIN                                       56737280   22502728   2.521
## countryIR                                      -52421796   46701992  -1.122
## countryIS                                       15582067   68248089   0.228
## countryIT                                        2647019   14472586   0.183
## countryJP                                        2586855   11452975   0.226
## countryKH                                       78490568   79047864   0.993
## countryKR                                       18976481   14480396   1.310
## countryLV                                     -148834846   78577179  -1.894
## countryMU                                       12842419   56048680   0.229
## countryMX                                       -7646608   10143527  -0.754
## countryMY                                      -82137498   49442274  -1.661
## countryNL                                      -57691052   29211217  -1.975
## countryNO                                      -35090384   23057845  -1.522
## countryPE                                      -13483526   22536775  -0.598
## countryPH                                       20079747   31603238   0.635
## countryPL                                       -3661151   20485205  -0.179
## countryPR                                      -13992583   39878129  -0.351
## countryPT                                      -88109182   56048680  -1.572
## countryPY                                       60607417   55736563   1.087
## countryRU                                       -9168907   18210729  -0.503
## countrySE                                      -22476173   26055452  -0.863
## countrySG                                       70121670   33849773   2.072
## countrySK                                       82207417   55736563   1.475
## countrySU                                      -53617307   30140073  -1.779
## countryTH                                        5168109   31110069   0.166
## countryTR                                       34970678   42446853   0.824
## countryTW                                       11093668   19622109   0.565
## countryUA                                      -47690260   33542101  -1.422
## countryUS                                      -21634846   10448723  -2.071
## countryUY                                       73109118   39987344   1.828
## countryVN                                       44921677   72036602   0.624
## countryXC                                       11460632   95951299   0.119
## countryZA                                       -4595848   33462248  -0.137
## orig_lang Basque                                37084620   62475174   0.594
## orig_lang Bengali                               32972892   52309513   0.630
## orig_lang Bokmål, Norwegian, Norwegian Bokmål  -52439445   65387629  -0.802
## orig_lang Cantonese                            -10517185   29614854  -0.355
## orig_lang Catalan, Valencian                     8984620   48855379   0.184
## orig_lang Central Khmer                         25479604   62670901   0.407
## orig_lang Chinese                               -4612052   29512856  -0.156
## orig_lang Czech                               -101910461   82977376  -1.228
## orig_lang Danish                               -10211247   32056795  -0.319
## orig_lang Dutch, Flemish                        32120604   37126460   0.865
## orig_lang Dzongkha                              11443104   62070893   0.184
## orig_lang English                               -2900647   28636475  -0.101
## orig_lang Finnish                              -32346581   37876133  -0.854
## orig_lang French                               -14222411   29156569  -0.488
## orig_lang Galician                              29931841   62381101   0.480
## orig_lang German                               -10166217   29827458  -0.341
## orig_lang Greek                                 44758080   53893966   0.830
## orig_lang Gujarati                             -54356896   62070893  -0.876
## orig_lang Hindi                                -66805198   34367180  -1.944
## orig_lang Hungarian                             62099353   73273090   0.848
## orig_lang Icelandic                            -18761896   62070893  -0.302
## orig_lang Indonesian                           -63718389   36809428  -1.731
## orig_lang Irish                                -50967296   62070893  -0.821
## orig_lang Italian                              -13262580   30183133  -0.439
## orig_lang Japanese                               2342673   28929689   0.081
## orig_lang Kannada                              -84467108   52309513  -1.615
## orig_lang Korean                               -12861839   30231409  -0.425
## orig_lang Latin                                -65802047   62915644  -1.046
## orig_lang Latvian                               96405017   62068885   1.553
## orig_lang Macedonian                           -16679767   62070893  -0.269
## orig_lang Malay                                       NA         NA      NA
## orig_lang Malayalam                             30321463   40659737   0.746
## orig_lang Marathi                               19772892   65211843   0.303
## orig_lang No Language                           22231422   45428406   0.489
## orig_lang Norwegian                              -926312   32260970  -0.029
## orig_lang Oriya                               -147663263   65211843  -2.264
## orig_lang Persian                              -40777288   39006332  -1.045
## orig_lang Polish                                13222603   32998474   0.401
## orig_lang Portuguese                            11906208   32149420   0.370
## orig_lang Romanian                              69490041   53894833   1.289
## orig_lang Russian                              -13001486   31829347  -0.408
## orig_lang Serbian                              -65554983   62068885  -1.056
## orig_lang Serbo-Croatian                        10605017   62068885   0.171
## orig_lang Slovak                                 1362754   83183355   0.016
## orig_lang Spanish, Castilian                     1362754   29227954   0.047
## orig_lang Swedish                               12095311   32451568   0.373
## orig_lang Tagalog                                     NA         NA      NA
## orig_lang Tamil                                 41566615   38768769   1.072
## orig_lang Telugu                               -30620335   40040183  -0.765
## orig_lang Thai                                  -8825939   39721311  -0.222
## orig_lang Turkish                              -23614256   48592071  -0.486
## orig_lang Ukrainian                            -32611357   42805644  -0.762
## orig_lang Vietnamese                           -68551499   70098145  -0.978
##                                               Pr(>|t|)    
## (Intercept)                                   0.002817 ** 
## countryAT                                     0.474231    
## countryAU                                     0.000895 ***
## countryBE                                     0.087517 .  
## countryBO                                     0.725813    
## countryBR                                     0.782307    
## countryBY                                     0.121145    
## countryCA                                     0.045191 *  
## countryCH                                     0.738555    
## countryCL                                     0.923321    
## countryCN                                     0.264534    
## countryCO                                     0.379647    
## countryCZ                                     0.830316    
## countryDE                                     0.502230    
## countryDK                                     0.071719 .  
## countryDO                                     0.857720    
## countryES                                     0.197561    
## countryFI                                     0.053234 .  
## countryFR                                     0.528664    
## countryGB                                     0.292618    
## countryGR                                     0.081083 .  
## countryGT                                     0.856870    
## countryHK                                     0.980001    
## countryHU                                     0.942102    
## countryID                                     0.003963 ** 
## countryIE                                     0.648413    
## countryIL                                     0.942102    
## countryIN                                     0.011706 *  
## countryIR                                     0.261688    
## countryIS                                     0.819406    
## countryIT                                     0.854881    
## countryJP                                     0.821309    
## countryKH                                     0.320758    
## countryKR                                     0.190058    
## countryLV                                     0.058237 .  
## countryMU                                     0.818773    
## countryMX                                     0.450962    
## countryMY                                     0.096688 .  
## countryNL                                     0.048300 *  
## countryNO                                     0.128080    
## countryPE                                     0.549660    
## countryPH                                     0.525202    
## countryPL                                     0.858160    
## countryPR                                     0.725683    
## countryPT                                     0.115979    
## countryPY                                     0.276890    
## countryRU                                     0.614631    
## countrySE                                     0.388362    
## countrySG                                     0.038332 *  
## countrySK                                     0.140263    
## countrySU                                     0.075280 .  
## countryTH                                     0.868063    
## countryTR                                     0.410033    
## countryTW                                     0.571838    
## countryUA                                     0.155114    
## countryUS                                     0.038424 *  
## countryUY                                     0.067533 .  
## countryVN                                     0.532908    
## countryXC                                     0.904927    
## countryZA                                     0.890761    
## orig_lang Basque                              0.552800    
## orig_lang Bengali                             0.528485    
## orig_lang Bokmål, Norwegian, Norwegian Bokmål 0.422585    
## orig_lang Cantonese                           0.722498    
## orig_lang Catalan, Valencian                  0.854094    
## orig_lang Central Khmer                       0.684338    
## orig_lang Chinese                             0.875821    
## orig_lang Czech                               0.219411    
## orig_lang Danish                              0.750085    
## orig_lang Dutch, Flemish                      0.386968    
## orig_lang Dzongkha                            0.853738    
## orig_lang English                             0.919321    
## orig_lang Finnish                             0.393120    
## orig_lang French                              0.625706    
## orig_lang Galician                            0.631364    
## orig_lang German                              0.733236    
## orig_lang Greek                               0.406285    
## orig_lang Gujarati                            0.381202    
## orig_lang Hindi                               0.051939 .  
## orig_lang Hungarian                           0.396734    
## orig_lang Icelandic                           0.762456    
## orig_lang Indonesian                          0.083476 .  
## orig_lang Irish                               0.411601    
## orig_lang Italian                             0.660378    
## orig_lang Japanese                            0.935461    
## orig_lang Kannada                             0.106395    
## orig_lang Korean                              0.670521    
## orig_lang Latin                               0.295643    
## orig_lang Latvian                             0.120408    
## orig_lang Macedonian                          0.788150    
## orig_lang Malay                                     NA    
## orig_lang Malayalam                           0.455844    
## orig_lang Marathi                             0.761736    
## orig_lang No Language                         0.624589    
## orig_lang Norwegian                           0.977094    
## orig_lang Oriya                               0.023573 *  
## orig_lang Persian                             0.295862    
## orig_lang Polish                              0.688647    
## orig_lang Portuguese                          0.711137    
## orig_lang Romanian                            0.197301    
## orig_lang Russian                             0.682934    
## orig_lang Serbian                             0.290918    
## orig_lang Serbo-Croatian                      0.864338    
## orig_lang Slovak                              0.986930    
## orig_lang Spanish, Castilian                  0.962813    
## orig_lang Swedish                             0.709366    
## orig_lang Tagalog                                   NA    
## orig_lang Tamil                               0.283671    
## orig_lang Telugu                              0.444444    
## orig_lang Thai                                0.824165    
## orig_lang Turkish                             0.626999    
## orig_lang Ukrainian                           0.446169    
## orig_lang Vietnamese                          0.328129    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 55070000 on 10067 degrees of freedom
## Multiple R-squared:  0.07914,    Adjusted R-squared:  0.06908 
## F-statistic: 7.865 on 110 and 10067 DF,  p-value: < 2.2e-16

The linear regression model was built to predict the film budget (budget_x) using two categorical variables, country and org_lang, as explanatory factors. The model explains approximately 7.9% of the variation in the budget.

Several countries and original languages were found to have significant effects on the budget, as indicated by low p-values. For instance, being in certain countries, such as ‘AU,’ was associated with a significant decrease in the budget.

Use the tools from previous weeks to diagnose the model

Highlight any issues with the model

# Residuals vs. Fitted Values Plot
plot(model, which = 1)

lm_model <- lm(budget_x ~ country + orig_lang, data = data)

hist(lm_model$residuals)

# Normal Q-Q Plot
plot(model, which = 2)
## Warning: not plotting observations with leverage one:
##   573, 624, 1349, 2378, 2679, 2808, 4397, 4917, 5288, 5422, 5582, 6470, 6949, 7045, 7218, 7220, 7576, 8259, 8371, 8639, 9315

The linear regression model used for budget prediction based on “country” and “orig_lang” predictors exhibits some issues. Notably, there are multicollinearity concerns, leading to singularities for certain predictor levels, which may distort the model’s accuracy. Additionally, some predictor variables are not statistically significant, indicating that they might not contribute significantly to the model’s predictive power.

Interpret at least one of the coefficients

countryUS: -21,634,846

This coefficient represents the estimated effect of the “countryUS” predictor variable on the response variable “budget_x” in my model. Here’s the interpretation:

The “countryUS” coefficient of approximately -21,634,846 indicates that, all else being equal, if a data point corresponds to a movie produced in the United States (US), the expected change in the movie budget (“budget_x”) is a decrease of about $21,634,846 compared to the baseline (intercept).

In practical terms, this means that, on average, movies produced in the United States tend to have a lower budget compared to the baseline. The negative sign of the coefficient suggests a decrease in budget when the movie is produced in the US, assuming all other factors remain constant.