First Step is to change the directory and set it to the local folder where the data resides

getwd()

Read the Data into R

climate<-read.csv("climate_change.csv")
str(climate)

'data.frame':   308 obs. of  11 variables:
 $ Year    : int  1983 1983 1983 1983 1983 1983 1983 1983 1984 1984 ...
 $ Month   : int  5 6 7 8 9 10 11 12 1 2 ...
 $ MEI     : num  2.556 2.167 1.741 1.13 0.428 ...
 $ CO2     : num  346 346 344 342 340 ...
 $ CH4     : num  1639 1634 1633 1631 1648 ...
 $ N2O     : num  304 304 304 304 304 ...
 $ CFC.11  : num  191 192 193 194 194 ...
 $ CFC.12  : num  350 352 354 356 357 ...
 $ TSI     : num  1366 1366 1366 1366 1366 ...
 $ Aerosols: num  0.0863 0.0794 0.0731 0.0673 0.0619 0.0569 0.0524 0.0486 0.0451 0.0416

Split the Data into Train and Test Datasets. Training Set to include data upto 2006

train<-subset(climate,Year<=2006)
str(train)
'data.frame':   284 obs. of  11 variables:
 $ Year    : int  1983 1983 1983 1983 1983 1983 1983 1983 1984 1984 ...
 $ Month   : int  5 6 7 8 9 10 11 12 1 2 ...
 $ MEI     : num  2.556 2.167 1.741 1.13 0.428 ...
 $ CO2     : num  346 346 344 342 340 ...
 $ CH4     : num  1639 1634 1633 1631 1648 ...
 $ N2O     : num  304 304 304 304 304 ...
 $ CFC.11  : num  191 192 193 194 194 ...
 $ CFC.12  : num  350 352 354 356 357 ...
 $ TSI     : num  1366 1366 1366 1366 1366 ...
 $ Aerosols: num  0.0863 0.0794 0.0731 0.0673 0.0619 0.0569 0.0524 0.0486 0.0451 0.0416 ...
 $ Temp    : num  0.109 0.118 0.137 0.176 0.149 0.093 0.232 0.078 0.089 0.013 ...

For the test subset

test<-subset(climate,Year>2006)
str(test)

linear regression model on the training data

climatelm<-lm(Temp ~ MEI+ CO2 + CH4 + N2O + CFC.11 + CFC.12 + TSI + Aerosols, data=train)


Call:
lm(formula = Temp ~ MEI + CO2 + CH4 + N2O + CFC.11 + CFC.12 + 
    TSI + Aerosols, data = train)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.25888 -0.05913 -0.00082  0.05649  0.32433 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) -1.246e+02  1.989e+01  -6.265 1.43e-09 ***
MEI          6.421e-02  6.470e-03   9.923  < 2e-16 ***
CO2          6.457e-03  2.285e-03   2.826  0.00505 ** 
CH4          1.240e-04  5.158e-04   0.240  0.81015    
N2O         -1.653e-02  8.565e-03  -1.930  0.05467 .  
CFC.11      -6.631e-03  1.626e-03  -4.078 5.96e-05 ***
CFC.12       3.808e-03  1.014e-03   3.757  0.00021 ***
TSI          9.314e-02  1.475e-02   6.313 1.10e-09 ***
Aerosols    -1.538e+00  2.133e-01  -7.210 5.41e-12 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.09171 on 275 degrees of freedom
Multiple R-squared:  0.7509,    Adjusted R-squared:  0.7436 
F-statistic: 103.6 on 8 and 275 DF,  p-value: < 2.2e-16


Enter the Model R2 value

summary(climatelm)$r.squared 

the value is 0.7508933

Which variables are significant in the model? We will consider a variable signficant only if the p-value is below 0.05

summary(climatelm)

MEI,CO2,CFC.11,CFC.12,TSI,Aerosols

Current scientific opinion is that nitrous oxide and CFC-11 are greenhouse gases: gases that are able to trap heat from the sun and contribute to the heating of the Earth. However, the regression coefficients of both the N2O and CFC-11 variables are negative, indicating that increasing atmospheric concentrations of either of these two compounds is associated with lower global temperatures.

Which of the following is the simplest correct explanation for this contradiction?
Ans - All of the gas concentration variables reflect human development - N2O and CFC.11 are correlated with other variables in the data set.
The linear correlation of N2O and CFC.11 with other variables in the data set is quite large. The first explanation does not seem correct, as the warming effect of nitrous oxide and CFC-11 are well documented, and our regression analysis is not enough to disprove it. The second explanation is unlikely, as we have estimated eight coefficients and the intercept from 284 observations

Compute the correlations between all the variables in the training set. Mention all the independent variables N2O is highly correlated with (absolute correlation greater than 0.7)?

cor(train$N2O,train)

          Year      Month         MEI       CO2       CH4 N2O    CFC.11    CFC.12
[1,] 0.9938452 0.01363153 -0.05081978 0.9767198 0.8998386   1 0.5224773 0.8679308
           TSI   Aerosols      Temp
[1,] 0.1997567 -0.3370546 0.7786389

Answer - CO2,CH4,CFC.12


Which of the following independent variables is CFC.11 highly correlated with?

cor(train$CFC.11,train)
          Year       Month        MEI       CO2      CH4       N2O CFC.11    CFC.12
[1,] 0.5691064 -0.01311122 0.06900044 0.5140597 0.779904 0.5224773      1 0.8689852
          TSI   Aerosols      Temp
[1,] 0.272046 -0.0439212 0.4077103

Answer - CH4,CFC.12

Given that the correlations are so high, let us focus on the N2O variable and build a model with only MEI, TSI, Aerosols and N2O as independent variables. Remember to use the training set to build the model. ```{} climatelm2<-lm(Temp ~ MEI+ N2O + TSI + Aerosols, data=train) summary(climatelm2) Call: lm(formula = Temp ~ MEI + N2O + TSI + Aerosols, data = train)
Residuals: Min 1Q Median 3Q Max -0.27916 -0.05975 -0.00595 0.05672 0.34195
Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -1.162e+02 2.022e+01 -5.747 2.37e-08 MEI 6.419e-02 6.652e-03 9.649 < 2e-16 N2O 2.532e-02 1.311e-03 19.307 < 2e-16 TSI 7.949e-02 1.487e-02 5.344 1.89e-07 Aerosols -1.702e+00 2.180e-01 -7.806 1.19e-13 ***

Signif. codes: 0 ‘’ 0.001 ’’ 0.01 ’’ 0.05 ‘.’ 0.1 ‘’ 1

Residual standard error: 0.09547 on 279 degrees of freedom Multiple R-squared: 0.7261, Adjusted R-squared: 0.7222 F-statistic: 184.9 on 4 and 279 DF, p-value: < 2.2e-16 ```

Step Process

StepModel<-step(climatelm)
Start:  AIC=-1348.16
Temp ~ MEI + CO2 + CH4 + N2O + CFC.11 + CFC.12 + TSI + Aerosols

           Df Sum of Sq    RSS     AIC
- CH4       1   0.00049 2.3135 -1350.1
<none>                  2.3130 -1348.2
- N2O       1   0.03132 2.3443 -1346.3
- CO2       1   0.06719 2.3802 -1342.0
- CFC.12    1   0.11874 2.4318 -1335.9
- CFC.11    1   0.13986 2.4529 -1333.5
- TSI       1   0.33516 2.6482 -1311.7
- Aerosols  1   0.43727 2.7503 -1301.0
- MEI       1   0.82823 3.1412 -1263.2

Step:  AIC=-1350.1
Temp ~ MEI + CO2 + N2O + CFC.11 + CFC.12 + TSI + Aerosols

           Df Sum of Sq    RSS     AIC
<none>                  2.3135 -1350.1
- N2O       1   0.03133 2.3448 -1348.3
- CO2       1   0.06672 2.3802 -1344.0
- CFC.12    1   0.13023 2.4437 -1336.5
- CFC.11    1   0.13938 2.4529 -1335.5
- TSI       1   0.33500 2.6485 -1313.7
- Aerosols  1   0.43987 2.7534 -1302.7
- MEI       1   0.83118 3.1447 -1264.9


summary(StepModel)

Call:
lm(formula = Temp ~ MEI + CO2 + N2O + CFC.11 + CFC.12 + TSI + 
    Aerosols, data = train)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.25770 -0.05994 -0.00104  0.05588  0.32203 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) -1.245e+02  1.985e+01  -6.273 1.37e-09 ***
MEI          6.407e-02  6.434e-03   9.958  < 2e-16 ***
CO2          6.402e-03  2.269e-03   2.821 0.005129 ** 
N2O         -1.602e-02  8.287e-03  -1.933 0.054234 .  
CFC.11      -6.609e-03  1.621e-03  -4.078 5.95e-05 ***
CFC.12       3.868e-03  9.812e-04   3.942 0.000103 ***
TSI          9.312e-02  1.473e-02   6.322 1.04e-09 ***
Aerosols    -1.540e+00  2.126e-01  -7.244 4.36e-12 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.09155 on 276 degrees of freedom
Multiple R-squared:  0.7508,    Adjusted R-squared:  0.7445 
F-statistic: 118.8 on 7 and 276 DF,  p-value: < 2.2e-16


coef(StepModel)
  (Intercept)           MEI           CO2           N2O        CFC.11        CFC.12 
-1.245152e+02  6.406779e-02  6.401495e-03 -1.602113e-02 -6.609351e-03  3.867565e-03 
          TSI      Aerosols 
 9.311551e-02 -1.540206e+00