Use whatever response variable and explanatory variables you prefer
# Loading necessary libraries
library(dplyr)
# Building a linear regression model with 'budget_x' as the response variable
model <- lm(budget_x ~ country + orig_lang, data = data)
# Displaying model summary
summary(model)
##
## Call:
## lm(formula = budget_x ~ country + orig_lang, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -175288571 -43494336 -12800673 36543751 406543751
##
## Coefficients: (2 not defined because of singularities)
## Estimate Std. Error t value
## (Intercept) 91029829 30467017 2.988
## countryAT -28996397 40518239 -0.716
## countryAU -34672933 10435693 -3.323
## countryBE -41062564 24029895 -1.709
## countryBO 19547417 55736563 0.351
## countryBR 4695662 16993522 0.276
## countryBY 89511657 57745155 1.550
## countryCA -24776289 12368861 -2.003
## countryCH 8996383 26953295 0.334
## countryCL -1952205 20281814 -0.096
## countryCN -15109433 13541329 -1.116
## countryCO -14976723 17046408 -0.879
## countryCZ 12011342 56048680 0.214
## countryDE -8989760 13397361 -0.671
## countryDK -31803777 17658111 -1.801
## countryDO -9992583 55736563 -0.179
## countryES -12514448 9711528 -1.289
## countryFI -63889690 33047862 -1.933
## countryFR -7561669 12001347 -0.630
## countryGB -11785870 11198501 -1.052
## countryGR -85166806 48817114 -1.745
## countryGT -7192583 39878129 -0.180
## countryHK -333757 13314251 -0.025
## countryHU 4070819 56048680 0.073
## countryID 70542857 24479312 2.882
## countryIE 8941728 19609861 0.456
## countryIL 4070819 56048680 0.073
## countryIN 56737280 22502728 2.521
## countryIR -52421796 46701992 -1.122
## countryIS 15582067 68248089 0.228
## countryIT 2647019 14472586 0.183
## countryJP 2586855 11452975 0.226
## countryKH 78490568 79047864 0.993
## countryKR 18976481 14480396 1.310
## countryLV -148834846 78577179 -1.894
## countryMU 12842419 56048680 0.229
## countryMX -7646608 10143527 -0.754
## countryMY -82137498 49442274 -1.661
## countryNL -57691052 29211217 -1.975
## countryNO -35090384 23057845 -1.522
## countryPE -13483526 22536775 -0.598
## countryPH 20079747 31603238 0.635
## countryPL -3661151 20485205 -0.179
## countryPR -13992583 39878129 -0.351
## countryPT -88109182 56048680 -1.572
## countryPY 60607417 55736563 1.087
## countryRU -9168907 18210729 -0.503
## countrySE -22476173 26055452 -0.863
## countrySG 70121670 33849773 2.072
## countrySK 82207417 55736563 1.475
## countrySU -53617307 30140073 -1.779
## countryTH 5168109 31110069 0.166
## countryTR 34970678 42446853 0.824
## countryTW 11093668 19622109 0.565
## countryUA -47690260 33542101 -1.422
## countryUS -21634846 10448723 -2.071
## countryUY 73109118 39987344 1.828
## countryVN 44921677 72036602 0.624
## countryXC 11460632 95951299 0.119
## countryZA -4595848 33462248 -0.137
## orig_lang Basque 37084620 62475174 0.594
## orig_lang Bengali 32972892 52309513 0.630
## orig_lang Bokmål, Norwegian, Norwegian Bokmål -52439445 65387629 -0.802
## orig_lang Cantonese -10517185 29614854 -0.355
## orig_lang Catalan, Valencian 8984620 48855379 0.184
## orig_lang Central Khmer 25479604 62670901 0.407
## orig_lang Chinese -4612052 29512856 -0.156
## orig_lang Czech -101910461 82977376 -1.228
## orig_lang Danish -10211247 32056795 -0.319
## orig_lang Dutch, Flemish 32120604 37126460 0.865
## orig_lang Dzongkha 11443104 62070893 0.184
## orig_lang English -2900647 28636475 -0.101
## orig_lang Finnish -32346581 37876133 -0.854
## orig_lang French -14222411 29156569 -0.488
## orig_lang Galician 29931841 62381101 0.480
## orig_lang German -10166217 29827458 -0.341
## orig_lang Greek 44758080 53893966 0.830
## orig_lang Gujarati -54356896 62070893 -0.876
## orig_lang Hindi -66805198 34367180 -1.944
## orig_lang Hungarian 62099353 73273090 0.848
## orig_lang Icelandic -18761896 62070893 -0.302
## orig_lang Indonesian -63718389 36809428 -1.731
## orig_lang Irish -50967296 62070893 -0.821
## orig_lang Italian -13262580 30183133 -0.439
## orig_lang Japanese 2342673 28929689 0.081
## orig_lang Kannada -84467108 52309513 -1.615
## orig_lang Korean -12861839 30231409 -0.425
## orig_lang Latin -65802047 62915644 -1.046
## orig_lang Latvian 96405017 62068885 1.553
## orig_lang Macedonian -16679767 62070893 -0.269
## orig_lang Malay NA NA NA
## orig_lang Malayalam 30321463 40659737 0.746
## orig_lang Marathi 19772892 65211843 0.303
## orig_lang No Language 22231422 45428406 0.489
## orig_lang Norwegian -926312 32260970 -0.029
## orig_lang Oriya -147663263 65211843 -2.264
## orig_lang Persian -40777288 39006332 -1.045
## orig_lang Polish 13222603 32998474 0.401
## orig_lang Portuguese 11906208 32149420 0.370
## orig_lang Romanian 69490041 53894833 1.289
## orig_lang Russian -13001486 31829347 -0.408
## orig_lang Serbian -65554983 62068885 -1.056
## orig_lang Serbo-Croatian 10605017 62068885 0.171
## orig_lang Slovak 1362754 83183355 0.016
## orig_lang Spanish, Castilian 1362754 29227954 0.047
## orig_lang Swedish 12095311 32451568 0.373
## orig_lang Tagalog NA NA NA
## orig_lang Tamil 41566615 38768769 1.072
## orig_lang Telugu -30620335 40040183 -0.765
## orig_lang Thai -8825939 39721311 -0.222
## orig_lang Turkish -23614256 48592071 -0.486
## orig_lang Ukrainian -32611357 42805644 -0.762
## orig_lang Vietnamese -68551499 70098145 -0.978
## Pr(>|t|)
## (Intercept) 0.002817 **
## countryAT 0.474231
## countryAU 0.000895 ***
## countryBE 0.087517 .
## countryBO 0.725813
## countryBR 0.782307
## countryBY 0.121145
## countryCA 0.045191 *
## countryCH 0.738555
## countryCL 0.923321
## countryCN 0.264534
## countryCO 0.379647
## countryCZ 0.830316
## countryDE 0.502230
## countryDK 0.071719 .
## countryDO 0.857720
## countryES 0.197561
## countryFI 0.053234 .
## countryFR 0.528664
## countryGB 0.292618
## countryGR 0.081083 .
## countryGT 0.856870
## countryHK 0.980001
## countryHU 0.942102
## countryID 0.003963 **
## countryIE 0.648413
## countryIL 0.942102
## countryIN 0.011706 *
## countryIR 0.261688
## countryIS 0.819406
## countryIT 0.854881
## countryJP 0.821309
## countryKH 0.320758
## countryKR 0.190058
## countryLV 0.058237 .
## countryMU 0.818773
## countryMX 0.450962
## countryMY 0.096688 .
## countryNL 0.048300 *
## countryNO 0.128080
## countryPE 0.549660
## countryPH 0.525202
## countryPL 0.858160
## countryPR 0.725683
## countryPT 0.115979
## countryPY 0.276890
## countryRU 0.614631
## countrySE 0.388362
## countrySG 0.038332 *
## countrySK 0.140263
## countrySU 0.075280 .
## countryTH 0.868063
## countryTR 0.410033
## countryTW 0.571838
## countryUA 0.155114
## countryUS 0.038424 *
## countryUY 0.067533 .
## countryVN 0.532908
## countryXC 0.904927
## countryZA 0.890761
## orig_lang Basque 0.552800
## orig_lang Bengali 0.528485
## orig_lang Bokmål, Norwegian, Norwegian Bokmål 0.422585
## orig_lang Cantonese 0.722498
## orig_lang Catalan, Valencian 0.854094
## orig_lang Central Khmer 0.684338
## orig_lang Chinese 0.875821
## orig_lang Czech 0.219411
## orig_lang Danish 0.750085
## orig_lang Dutch, Flemish 0.386968
## orig_lang Dzongkha 0.853738
## orig_lang English 0.919321
## orig_lang Finnish 0.393120
## orig_lang French 0.625706
## orig_lang Galician 0.631364
## orig_lang German 0.733236
## orig_lang Greek 0.406285
## orig_lang Gujarati 0.381202
## orig_lang Hindi 0.051939 .
## orig_lang Hungarian 0.396734
## orig_lang Icelandic 0.762456
## orig_lang Indonesian 0.083476 .
## orig_lang Irish 0.411601
## orig_lang Italian 0.660378
## orig_lang Japanese 0.935461
## orig_lang Kannada 0.106395
## orig_lang Korean 0.670521
## orig_lang Latin 0.295643
## orig_lang Latvian 0.120408
## orig_lang Macedonian 0.788150
## orig_lang Malay NA
## orig_lang Malayalam 0.455844
## orig_lang Marathi 0.761736
## orig_lang No Language 0.624589
## orig_lang Norwegian 0.977094
## orig_lang Oriya 0.023573 *
## orig_lang Persian 0.295862
## orig_lang Polish 0.688647
## orig_lang Portuguese 0.711137
## orig_lang Romanian 0.197301
## orig_lang Russian 0.682934
## orig_lang Serbian 0.290918
## orig_lang Serbo-Croatian 0.864338
## orig_lang Slovak 0.986930
## orig_lang Spanish, Castilian 0.962813
## orig_lang Swedish 0.709366
## orig_lang Tagalog NA
## orig_lang Tamil 0.283671
## orig_lang Telugu 0.444444
## orig_lang Thai 0.824165
## orig_lang Turkish 0.626999
## orig_lang Ukrainian 0.446169
## orig_lang Vietnamese 0.328129
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 55070000 on 10067 degrees of freedom
## Multiple R-squared: 0.07914, Adjusted R-squared: 0.06908
## F-statistic: 7.865 on 110 and 10067 DF, p-value: < 2.2e-16
The linear regression model was built to predict the film budget (budget_x) using two categorical variables, country and org_lang, as explanatory factors. The model explains approximately 7.9% of the variation in the budget.
Several countries and original languages were found to have significant effects on the budget, as indicated by low p-values. For instance, being in certain countries, such as ‘AU,’ was associated with a significant decrease in the budget.
Highlight any issues with the model
# Residuals vs. Fitted Values Plot
plot(model, which = 1)
lm_model <- lm(budget_x ~ country + orig_lang, data = data)
hist(lm_model$residuals)
# Normal Q-Q Plot
plot(model, which = 2)
## Warning: not plotting observations with leverage one:
## 573, 624, 1349, 2378, 2679, 2808, 4397, 4917, 5288, 5422, 5582, 6470, 6949, 7045, 7218, 7220, 7576, 8259, 8371, 8639, 9315
The linear regression model used for budget prediction based on “country” and “orig_lang” predictors exhibits some issues. Notably, there are multicollinearity concerns, leading to singularities for certain predictor levels, which may distort the model’s accuracy. Additionally, some predictor variables are not statistically significant, indicating that they might not contribute significantly to the model’s predictive power.
countryUS: -21,634,846
This coefficient represents the estimated effect of the “countryUS” predictor variable on the response variable “budget_x” in my model. Here’s the interpretation:
The “countryUS” coefficient of approximately -21,634,846 indicates that, all else being equal, if a data point corresponds to a movie produced in the United States (US), the expected change in the movie budget (“budget_x”) is a decrease of about $21,634,846 compared to the baseline (intercept).
In practical terms, this means that, on average, movies produced in the United States tend to have a lower budget compared to the baseline. The negative sign of the coefficient suggests a decrease in budget when the movie is produced in the US, assuming all other factors remain constant.