In this example, we are looking into the relationship between two variables obtained from the World Bank, the Co2 emissions from the Philippines from the years 1971 to 2014, against the annual GDP for the Philippines. Notice that there is a strong relationship between these two variables, with an r2 value of .8057 and a statstically significant p-value, we can say that an increasing GDP will increase the Co2 emissions. However, note that the relationship could also be due to time

library(knitr)
library(lemon)
knit_print.data.frame <- lemon_print
philippines = read.csv("/Users/Michele/Desktop/Philippines.csv")
knitr::kable(philippines)
Year CO2.from.Buildings GDP….
1971 26.46675 201.0337
1972 25.83333 211.4010
1973 29.39450 258.3627
1974 30.38653 343.2417
1975 30.13131 360.6714
1976 30.07571 402.6633
1977 34.24447 450.1250
1978 31.83422 506.0852
1979 34.07679 596.3953
1980 33.15332 684.6544
1981 33.03825 731.7250
1982 33.39587 741.7871
1983 34.77366 645.4603
1984 33.42345 594.0256
1985 36.35088 565.7635
1986 30.18223 535.2358
1987 36.76155 579.2011
1988 34.89177 643.8153
1989 35.54645 704.9821
1990 27.99685 715.3106
1991 32.14573 715.1419
1992 31.56089 814.0753
1993 29.46124 815.7222
1994 32.02366 939.1559
1995 30.81761 1061.3479
1996 32.07274 1159.5893
1997 32.35294 1127.0037
1998 34.10171 966.7084
1999 31.64878 1087.2374
2000 37.45412 1038.9110
2001 37.22370 957.2807
2002 36.91456 1000.0681
2003 38.96537 1010.5532
2004 39.28521 1079.0372
2005 42.99105 1194.6972
2006 40.47001 1391.7723
2007 42.00000 1672.6854
2008 44.47723 1919.4662
2009 43.86284 1825.3415
2010 45.16924 2129.4992
2011 46.12812 2352.5182
2012 47.92885 2581.8186
2013 49.73781 2760.2891
2014 50.16195 2842.9384

A summary of the data can be found below

summary(philippines)
##       Year      CO2.from.Buildings    GDP....      
##  Min.   :1971   Min.   :25.83      Min.   : 201.0  
##  1st Qu.:1982   1st Qu.:31.63      1st Qu.: 590.3  
##  Median :1992   Median :34.09      Median : 814.9  
##  Mean   :1992   Mean   :35.70      Mean   :1020.8  
##  3rd Qu.:2003   3rd Qu.:39.05      3rd Qu.:1135.2  
##  Max.   :2014   Max.   :50.16      Max.   :2842.9

We will use the variable GDP and the variable CO2 to see if there is a relationship between emissions and finances.

co2_gdp = lm(philippines$GDP.... ~ philippines$CO2.from.Buildings)
summary(co2_gdp)
## 
## Call:
## lm(formula = philippines$GDP.... ~ philippines$CO2.from.Buildings)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -545.39 -205.68  -40.42  219.16  519.36 
## 
## Coefficients:
##                                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                    -2478.606    269.054  -9.212 1.23e-11 ***
## philippines$CO2.from.Buildings    98.015      7.428  13.195  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 301.1 on 42 degrees of freedom
## Multiple R-squared:  0.8057, Adjusted R-squared:  0.801 
## F-statistic: 174.1 on 1 and 42 DF,  p-value: < 2.2e-16

Next, we will check the validity of this linear model looking at the residual plots, normal Q-Q plots, scale-loaction and the residuals vs leverage. Notice that all of the plots demonstrate that the residauls are normally distributed, so this is a good linear model.

plot(co2_gdp)

Finally, let’s look at the relationship between the two variables along with their relationship to the variable time. Notice that they have a strong positive linear relationship and that the variables in relation to time follow a very similar pattern.

plot(philippines$CO2.from.Buildings, philippines$GDP....)

plot(philippines$Year, philippines$GDP....)

plot(philippines$Year, philippines$CO2.from.Buildings)