Enter the model R^2 (the “Multiple R-squared” value):
C=read.csv("Unit2/climate_change.csv")
training=subset(C, Year<= "2006")
testing = subset(C, Year > "2006")
reg1= lm(Temp~MEI+CO2+CH4+N2O+CFC.11+CFC.12+TSI+Aerosols, training)
summary(reg1)
##
## Call:
## lm(formula = Temp ~ MEI + CO2 + CH4 + N2O + CFC.11 + CFC.12 +
## TSI + Aerosols, data = training)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.25888 -0.05913 -0.00082 0.05649 0.32433
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.246e+02 1.989e+01 -6.265 1.43e-09 ***
## MEI 6.421e-02 6.470e-03 9.923 < 2e-16 ***
## CO2 6.457e-03 2.285e-03 2.826 0.00505 **
## CH4 1.240e-04 5.158e-04 0.240 0.81015
## N2O -1.653e-02 8.565e-03 -1.930 0.05467 .
## CFC.11 -6.631e-03 1.626e-03 -4.078 5.96e-05 ***
## CFC.12 3.808e-03 1.014e-03 3.757 0.00021 ***
## TSI 9.314e-02 1.475e-02 6.313 1.10e-09 ***
## Aerosols -1.538e+00 2.133e-01 -7.210 5.41e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.09171 on 275 degrees of freedom
## Multiple R-squared: 0.7509, Adjusted R-squared: 0.7436
## F-statistic: 103.6 on 8 and 275 DF, p-value: < 2.2e-16
#0.7509
Which variables are significant in the model? We will consider a variable signficant only if the p-value is below 0.05. (Select all that apply.)
#MET,CO2, CFC.11, CFC.12, TSI, Aerosols
Which of the following is the simplest correct explanation for this contradiction?
#3 All of the gas concentration variables reflect human development - N2O and CFC.11 are correlated with other variables in the data set.
Compute the correlations between all the variables in the training set. Which of the following independent variables is N2O highly correlated with (absolute correlation greater than 0.7)? Select all that apply.
cor(training)
## Year Month MEI CO2 CH4
## Year 1.00000000 -0.0279419602 -0.0369876842 0.98274939 0.91565945
## Month -0.02794196 1.0000000000 0.0008846905 -0.10673246 0.01856866
## MEI -0.03698768 0.0008846905 1.0000000000 -0.04114717 -0.03341930
## CO2 0.98274939 -0.1067324607 -0.0411471651 1.00000000 0.87727963
## CH4 0.91565945 0.0185686624 -0.0334193014 0.87727963 1.00000000
## N2O 0.99384523 0.0136315303 -0.0508197755 0.97671982 0.89983864
## CFC.11 0.56910643 -0.0131112236 0.0690004387 0.51405975 0.77990402
## CFC.12 0.89701166 0.0006751102 0.0082855443 0.85268963 0.96361625
## TSI 0.17030201 -0.0346061935 -0.1544919227 0.17742893 0.24552844
## Aerosols -0.34524670 0.0148895406 0.3402377871 -0.35615480 -0.26780919
## Temp 0.78679714 -0.0998567411 0.1724707512 0.78852921 0.70325502
## N2O CFC.11 CFC.12 TSI Aerosols
## Year 0.99384523 0.56910643 0.8970116635 0.17030201 -0.34524670
## Month 0.01363153 -0.01311122 0.0006751102 -0.03460619 0.01488954
## MEI -0.05081978 0.06900044 0.0082855443 -0.15449192 0.34023779
## CO2 0.97671982 0.51405975 0.8526896272 0.17742893 -0.35615480
## CH4 0.89983864 0.77990402 0.9636162478 0.24552844 -0.26780919
## N2O 1.00000000 0.52247732 0.8679307757 0.19975668 -0.33705457
## CFC.11 0.52247732 1.00000000 0.8689851828 0.27204596 -0.04392120
## CFC.12 0.86793078 0.86898518 1.0000000000 0.25530281 -0.22513124
## TSI 0.19975668 0.27204596 0.2553028138 1.00000000 0.05211651
## Aerosols -0.33705457 -0.04392120 -0.2251312440 0.05211651 1.00000000
## Temp 0.77863893 0.40771029 0.6875575483 0.24338269 -0.38491375
## Temp
## Year 0.78679714
## Month -0.09985674
## MEI 0.17247075
## CO2 0.78852921
## CH4 0.70325502
## N2O 0.77863893
## CFC.11 0.40771029
## CFC.12 0.68755755
## TSI 0.24338269
## Aerosols -0.38491375
## Temp 1.00000000
#CO2, CH4, CFC.12
Which of the following independent variables is CFC.11 highly correlated with? Select all that apply.
#CH4, CFC.12,
Given that the correlations are so high, let us focus on the N2O variable and build a model with only MEI, TSI, Aerosols and N2O as independent variables. Remember to use the training set to build the model.
Enter the coefficient of N2O in this reduced model:
reg2= lm(Temp~ N2O+MEI+TSI+Aerosols, training)
summary(reg2)
##
## Call:
## lm(formula = Temp ~ N2O + MEI + TSI + Aerosols, data = training)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.27916 -0.05975 -0.00595 0.05672 0.34195
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.162e+02 2.022e+01 -5.747 2.37e-08 ***
## N2O 2.532e-02 1.311e-03 19.307 < 2e-16 ***
## MEI 6.419e-02 6.652e-03 9.649 < 2e-16 ***
## TSI 7.949e-02 1.487e-02 5.344 1.89e-07 ***
## Aerosols -1.702e+00 2.180e-01 -7.806 1.19e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.09547 on 279 degrees of freedom
## Multiple R-squared: 0.7261, Adjusted R-squared: 0.7222
## F-statistic: 184.9 on 4 and 279 DF, p-value: < 2.2e-16
#0.0253
Enter the model R2:
#0.7261
Enter the R2 value of the model produced by the step function:
reg1_step=step(reg1)
## Start: AIC=-1348.16
## Temp ~ MEI + CO2 + CH4 + N2O + CFC.11 + CFC.12 + TSI + Aerosols
##
## Df Sum of Sq RSS AIC
## - CH4 1 0.00049 2.3135 -1350.1
## <none> 2.3130 -1348.2
## - N2O 1 0.03132 2.3443 -1346.3
## - CO2 1 0.06719 2.3802 -1342.0
## - CFC.12 1 0.11874 2.4318 -1335.9
## - CFC.11 1 0.13986 2.4529 -1333.5
## - TSI 1 0.33516 2.6482 -1311.7
## - Aerosols 1 0.43727 2.7503 -1301.0
## - MEI 1 0.82823 3.1412 -1263.2
##
## Step: AIC=-1350.1
## Temp ~ MEI + CO2 + N2O + CFC.11 + CFC.12 + TSI + Aerosols
##
## Df Sum of Sq RSS AIC
## <none> 2.3135 -1350.1
## - N2O 1 0.03133 2.3448 -1348.3
## - CO2 1 0.06672 2.3802 -1344.0
## - CFC.12 1 0.13023 2.4437 -1336.5
## - CFC.11 1 0.13938 2.4529 -1335.5
## - TSI 1 0.33500 2.6485 -1313.7
## - Aerosols 1 0.43987 2.7534 -1302.7
## - MEI 1 0.83118 3.1447 -1264.9
summary(reg1_step)
##
## Call:
## lm(formula = Temp ~ MEI + CO2 + N2O + CFC.11 + CFC.12 + TSI +
## Aerosols, data = training)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.25770 -0.05994 -0.00104 0.05588 0.32203
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.245e+02 1.985e+01 -6.273 1.37e-09 ***
## MEI 6.407e-02 6.434e-03 9.958 < 2e-16 ***
## CO2 6.402e-03 2.269e-03 2.821 0.005129 **
## N2O -1.602e-02 8.287e-03 -1.933 0.054234 .
## CFC.11 -6.609e-03 1.621e-03 -4.078 5.95e-05 ***
## CFC.12 3.868e-03 9.812e-04 3.942 0.000103 ***
## TSI 9.312e-02 1.473e-02 6.322 1.04e-09 ***
## Aerosols -1.540e+00 2.126e-01 -7.244 4.36e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.09155 on 276 degrees of freedom
## Multiple R-squared: 0.7508, Adjusted R-squared: 0.7445
## F-statistic: 118.8 on 7 and 276 DF, p-value: < 2.2e-16
#0.7508
Which of the following variable(s) were eliminated from the full model by the step function? Select all that apply.
#CH4
Using the model produced from the step function, calculate temperature predictions for the testing data set, using the predict function. Enter the testing set R2
P= predict(reg1_step, newdata= testing)
P
## 285 286 287 288 289 290 291
## 0.4677808 0.4435404 0.4265541 0.4299162 0.4455113 0.4151422 0.4097367
## 292 293 294 295 296 297 298
## 0.3839390 0.3255595 0.3274147 0.3231401 0.3316704 0.3522134 0.3313129
## 299 300 301 302 303 304 305
## 0.3142112 0.3703410 0.4162213 0.4391458 0.4237965 0.3913679 0.3587615
## 306 307 308
## 0.3451991 0.3607087 0.3638076
SSE= sum((P-testing$Temp)^2)
SST= sum((mean(training$Temp)-testing$Temp)^2)
R2= 1-SSE/SST
R2
## [1] 0.6286051