In this example, we are looking into the relationship between two variables obtained from the World Bank, the Co2 emissions from the Philippines from the years 1971 to 2014, against the annual GDP for the Philippines. Notice that there is a strong relationship between these two variables, with an r2 value of .8057 and a statstically significant p-value, we can say that an increasing GDP will increase the Co2 emissions. However, note that the relationship could also be due to time
library(knitr)
library(lemon)
knit_print.data.frame <- lemon_print
philippines = read.csv("/Users/Michele/Desktop/Philippines.csv")
knitr::kable(philippines)
| Year | CO2.from.Buildings | GDP…. |
|---|---|---|
| 1971 | 26.46675 | 201.0337 |
| 1972 | 25.83333 | 211.4010 |
| 1973 | 29.39450 | 258.3627 |
| 1974 | 30.38653 | 343.2417 |
| 1975 | 30.13131 | 360.6714 |
| 1976 | 30.07571 | 402.6633 |
| 1977 | 34.24447 | 450.1250 |
| 1978 | 31.83422 | 506.0852 |
| 1979 | 34.07679 | 596.3953 |
| 1980 | 33.15332 | 684.6544 |
| 1981 | 33.03825 | 731.7250 |
| 1982 | 33.39587 | 741.7871 |
| 1983 | 34.77366 | 645.4603 |
| 1984 | 33.42345 | 594.0256 |
| 1985 | 36.35088 | 565.7635 |
| 1986 | 30.18223 | 535.2358 |
| 1987 | 36.76155 | 579.2011 |
| 1988 | 34.89177 | 643.8153 |
| 1989 | 35.54645 | 704.9821 |
| 1990 | 27.99685 | 715.3106 |
| 1991 | 32.14573 | 715.1419 |
| 1992 | 31.56089 | 814.0753 |
| 1993 | 29.46124 | 815.7222 |
| 1994 | 32.02366 | 939.1559 |
| 1995 | 30.81761 | 1061.3479 |
| 1996 | 32.07274 | 1159.5893 |
| 1997 | 32.35294 | 1127.0037 |
| 1998 | 34.10171 | 966.7084 |
| 1999 | 31.64878 | 1087.2374 |
| 2000 | 37.45412 | 1038.9110 |
| 2001 | 37.22370 | 957.2807 |
| 2002 | 36.91456 | 1000.0681 |
| 2003 | 38.96537 | 1010.5532 |
| 2004 | 39.28521 | 1079.0372 |
| 2005 | 42.99105 | 1194.6972 |
| 2006 | 40.47001 | 1391.7723 |
| 2007 | 42.00000 | 1672.6854 |
| 2008 | 44.47723 | 1919.4662 |
| 2009 | 43.86284 | 1825.3415 |
| 2010 | 45.16924 | 2129.4992 |
| 2011 | 46.12812 | 2352.5182 |
| 2012 | 47.92885 | 2581.8186 |
| 2013 | 49.73781 | 2760.2891 |
| 2014 | 50.16195 | 2842.9384 |
A summary of the data can be found below
summary(philippines)
## Year CO2.from.Buildings GDP....
## Min. :1971 Min. :25.83 Min. : 201.0
## 1st Qu.:1982 1st Qu.:31.63 1st Qu.: 590.3
## Median :1992 Median :34.09 Median : 814.9
## Mean :1992 Mean :35.70 Mean :1020.8
## 3rd Qu.:2003 3rd Qu.:39.05 3rd Qu.:1135.2
## Max. :2014 Max. :50.16 Max. :2842.9
We will use the variable GDP and the variable CO2 to see if there is a relationship between emissions and finances.
co2_gdp = lm(philippines$GDP.... ~ philippines$CO2.from.Buildings)
summary(co2_gdp)
##
## Call:
## lm(formula = philippines$GDP.... ~ philippines$CO2.from.Buildings)
##
## Residuals:
## Min 1Q Median 3Q Max
## -545.39 -205.68 -40.42 219.16 519.36
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2478.606 269.054 -9.212 1.23e-11 ***
## philippines$CO2.from.Buildings 98.015 7.428 13.195 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 301.1 on 42 degrees of freedom
## Multiple R-squared: 0.8057, Adjusted R-squared: 0.801
## F-statistic: 174.1 on 1 and 42 DF, p-value: < 2.2e-16
Next, we will check the validity of this linear model looking at the residual plots, normal Q-Q plots, scale-loaction and the residuals vs leverage. Notice that all of the plots demonstrate that the residauls are normally distributed, so this is a good linear model.
plot(co2_gdp)
Finally, let’s look at the relationship between the two variables along with their relationship to the variable time. Notice that they have a strong positive linear relationship and that the variables in relation to time follow a very similar pattern.
plot(philippines$CO2.from.Buildings, philippines$GDP....)
plot(philippines$Year, philippines$GDP....)
plot(philippines$Year, philippines$CO2.from.Buildings)