First Step is to change the directory and set it to the local folder where the data resides
getwd()
Read the Data into R
climate<-read.csv("climate_change.csv")
str(climate)
'data.frame': 308 obs. of 11 variables:
$ Year : int 1983 1983 1983 1983 1983 1983 1983 1983 1984 1984 ...
$ Month : int 5 6 7 8 9 10 11 12 1 2 ...
$ MEI : num 2.556 2.167 1.741 1.13 0.428 ...
$ CO2 : num 346 346 344 342 340 ...
$ CH4 : num 1639 1634 1633 1631 1648 ...
$ N2O : num 304 304 304 304 304 ...
$ CFC.11 : num 191 192 193 194 194 ...
$ CFC.12 : num 350 352 354 356 357 ...
$ TSI : num 1366 1366 1366 1366 1366 ...
$ Aerosols: num 0.0863 0.0794 0.0731 0.0673 0.0619 0.0569 0.0524 0.0486 0.0451 0.0416
Split the Data into Train and Test Datasets. Training Set to include data upto 2006
train<-subset(climate,Year<=2006)
str(train)
'data.frame': 284 obs. of 11 variables:
$ Year : int 1983 1983 1983 1983 1983 1983 1983 1983 1984 1984 ...
$ Month : int 5 6 7 8 9 10 11 12 1 2 ...
$ MEI : num 2.556 2.167 1.741 1.13 0.428 ...
$ CO2 : num 346 346 344 342 340 ...
$ CH4 : num 1639 1634 1633 1631 1648 ...
$ N2O : num 304 304 304 304 304 ...
$ CFC.11 : num 191 192 193 194 194 ...
$ CFC.12 : num 350 352 354 356 357 ...
$ TSI : num 1366 1366 1366 1366 1366 ...
$ Aerosols: num 0.0863 0.0794 0.0731 0.0673 0.0619 0.0569 0.0524 0.0486 0.0451 0.0416 ...
$ Temp : num 0.109 0.118 0.137 0.176 0.149 0.093 0.232 0.078 0.089 0.013 ...
For the test subset
test<-subset(climate,Year>2006)
str(test)
linear regression model on the training data
climatelm<-lm(Temp ~ MEI+ CO2 + CH4 + N2O + CFC.11 + CFC.12 + TSI + Aerosols, data=train)
Call:
lm(formula = Temp ~ MEI + CO2 + CH4 + N2O + CFC.11 + CFC.12 +
TSI + Aerosols, data = train)
Residuals:
Min 1Q Median 3Q Max
-0.25888 -0.05913 -0.00082 0.05649 0.32433
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.246e+02 1.989e+01 -6.265 1.43e-09 ***
MEI 6.421e-02 6.470e-03 9.923 < 2e-16 ***
CO2 6.457e-03 2.285e-03 2.826 0.00505 **
CH4 1.240e-04 5.158e-04 0.240 0.81015
N2O -1.653e-02 8.565e-03 -1.930 0.05467 .
CFC.11 -6.631e-03 1.626e-03 -4.078 5.96e-05 ***
CFC.12 3.808e-03 1.014e-03 3.757 0.00021 ***
TSI 9.314e-02 1.475e-02 6.313 1.10e-09 ***
Aerosols -1.538e+00 2.133e-01 -7.210 5.41e-12 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.09171 on 275 degrees of freedom
Multiple R-squared: 0.7509, Adjusted R-squared: 0.7436
F-statistic: 103.6 on 8 and 275 DF, p-value: < 2.2e-16
Enter the Model R2 value
summary(climatelm)$r.squared
the value is 0.7508933
Which variables are significant in the model? We will consider a variable signficant only if the p-value is below 0.05
summary(climatelm)
MEI,CO2,CFC.11,CFC.12,TSI,Aerosols
Current scientific opinion is that nitrous oxide and CFC-11 are greenhouse gases: gases that are able to trap heat from the sun and contribute to the heating of the Earth. However, the regression coefficients of both the N2O and CFC-11 variables are negative, indicating that increasing atmospheric concentrations of either of these two compounds is associated with lower global temperatures.
Which of the following is the simplest correct explanation for this contradiction?
Ans - All of the gas concentration variables reflect human development - N2O and CFC.11 are correlated with other variables in the data set.
The linear correlation of N2O and CFC.11 with other variables in the data set is quite large. The first explanation does not seem correct, as the warming effect of nitrous oxide and CFC-11 are well documented, and our regression analysis is not enough to disprove it. The second explanation is unlikely, as we have estimated eight coefficients and the intercept from 284 observations
Compute the correlations between all the variables in the training set. Mention all the independent variables N2O is highly correlated with (absolute correlation greater than 0.7)?
cor(train$N2O,train)
Year Month MEI CO2 CH4 N2O CFC.11 CFC.12
[1,] 0.9938452 0.01363153 -0.05081978 0.9767198 0.8998386 1 0.5224773 0.8679308
TSI Aerosols Temp
[1,] 0.1997567 -0.3370546 0.7786389
Answer - CO2,CH4,CFC.12
Which of the following independent variables is CFC.11 highly correlated with?
cor(train$CFC.11,train)
Year Month MEI CO2 CH4 N2O CFC.11 CFC.12
[1,] 0.5691064 -0.01311122 0.06900044 0.5140597 0.779904 0.5224773 1 0.8689852
TSI Aerosols Temp
[1,] 0.272046 -0.0439212 0.4077103
Answer - CH4,CFC.12
| Given that the correlations are so high, let us focus on the N2O variable and build a model with only MEI, TSI, Aerosols and N2O as independent variables. Remember to use the training set to build the model. ```{} climatelm2<-lm(Temp ~ MEI+ N2O + TSI + Aerosols, data=train) summary(climatelm2) Call: lm(formula = Temp ~ MEI + N2O + TSI + Aerosols, data = train) |
| Residuals: Min 1Q Median 3Q Max -0.27916 -0.05975 -0.00595 0.05672 0.34195 |
| Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -1.162e+02 2.022e+01 -5.747 2.37e-08 MEI 6.419e-02 6.652e-03 9.649 < 2e-16 N2O 2.532e-02 1.311e-03 19.307 < 2e-16 TSI 7.949e-02 1.487e-02 5.344 1.89e-07 Aerosols -1.702e+00 2.180e-01 -7.806 1.19e-13 *** |
Signif. codes: 0 ‘’ 0.001 ’’ 0.01 ’’ 0.05 ‘.’ 0.1 ‘’ 1
Residual standard error: 0.09547 on 279 degrees of freedom Multiple R-squared: 0.7261, Adjusted R-squared: 0.7222 F-statistic: 184.9 on 4 and 279 DF, p-value: < 2.2e-16 ```
Step Process
StepModel<-step(climatelm)
Start: AIC=-1348.16
Temp ~ MEI + CO2 + CH4 + N2O + CFC.11 + CFC.12 + TSI + Aerosols
Df Sum of Sq RSS AIC
- CH4 1 0.00049 2.3135 -1350.1
<none> 2.3130 -1348.2
- N2O 1 0.03132 2.3443 -1346.3
- CO2 1 0.06719 2.3802 -1342.0
- CFC.12 1 0.11874 2.4318 -1335.9
- CFC.11 1 0.13986 2.4529 -1333.5
- TSI 1 0.33516 2.6482 -1311.7
- Aerosols 1 0.43727 2.7503 -1301.0
- MEI 1 0.82823 3.1412 -1263.2
Step: AIC=-1350.1
Temp ~ MEI + CO2 + N2O + CFC.11 + CFC.12 + TSI + Aerosols
Df Sum of Sq RSS AIC
<none> 2.3135 -1350.1
- N2O 1 0.03133 2.3448 -1348.3
- CO2 1 0.06672 2.3802 -1344.0
- CFC.12 1 0.13023 2.4437 -1336.5
- CFC.11 1 0.13938 2.4529 -1335.5
- TSI 1 0.33500 2.6485 -1313.7
- Aerosols 1 0.43987 2.7534 -1302.7
- MEI 1 0.83118 3.1447 -1264.9
summary(StepModel)
Call:
lm(formula = Temp ~ MEI + CO2 + N2O + CFC.11 + CFC.12 + TSI +
Aerosols, data = train)
Residuals:
Min 1Q Median 3Q Max
-0.25770 -0.05994 -0.00104 0.05588 0.32203
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.245e+02 1.985e+01 -6.273 1.37e-09 ***
MEI 6.407e-02 6.434e-03 9.958 < 2e-16 ***
CO2 6.402e-03 2.269e-03 2.821 0.005129 **
N2O -1.602e-02 8.287e-03 -1.933 0.054234 .
CFC.11 -6.609e-03 1.621e-03 -4.078 5.95e-05 ***
CFC.12 3.868e-03 9.812e-04 3.942 0.000103 ***
TSI 9.312e-02 1.473e-02 6.322 1.04e-09 ***
Aerosols -1.540e+00 2.126e-01 -7.244 4.36e-12 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.09155 on 276 degrees of freedom
Multiple R-squared: 0.7508, Adjusted R-squared: 0.7445
F-statistic: 118.8 on 7 and 276 DF, p-value: < 2.2e-16
coef(StepModel)
(Intercept) MEI CO2 N2O CFC.11 CFC.12
-1.245152e+02 6.406779e-02 6.401495e-03 -1.602113e-02 -6.609351e-03 3.867565e-03
TSI Aerosols
9.311551e-02 -1.540206e+00