In this paper you will estimate the IPAT (Impact = Population x Affluence x Technolgoy) model (McNicoll, 2014) with a sampled dataset from the World Bank’s World Development Indicators (WDI) for 2015 (the latest year for which the renewable energy variable is available). You will decide whether or not to apply logistic transformation to the data and then you have to add urbanisation URBAN as an explanatory factor to the model. Based on your results and diagnostics you have to discuss whether urbanisation could be a relevant explanatory variable or a confounding variable (see for example Wu et al, 2016).
Finally you will use the model you have build to project the CO2 emissions towards 2030.
Equation 1: The IPAT model. \[ log\:EMISSIONS_{i} = a_0 + \beta_1*\; log\:POPULATION_{i} + \beta_2*\; log\:GDPCAP_{i} + \beta_3*\;log\:INDUSTRY_{i} + \beta_4*\;log\:RENEW_i + \epsilon_{i} \]
We downloaded the dataset in R using the WDI package and stored it in the datasets EMIT.Rda / EMIT.sav that are appended to the exam paper. You can also download the data yourself if you work in R and use the code pasted here. Note that you can look up the metadata for the variables in the WDI by copying the variable symbols or tickers (e.g. SP.POP.TOTL etc.) into a Google search window.
library(WDI)
EMIT <- data.frame(WDI(country="all",
indicator=c("EN.ATM.CO2E.KT",
"SP.POP.TOTL",
"NY.GDP.PCAP.PP.KD",
"NV.IND.TOTL.ZS",
"EG.FEC.RNEW.ZS",
"SP.URB.TOTL.IN.ZS"),
start=2015, end=2015, extra=TRUE))
EMIT <- subset(EMIT, region != "Aggregates")
colnames(EMIT)[4] <- "EMISSIONS"
colnames(EMIT)[5] <- "POPULATION"
colnames(EMIT)[6] <- "GDPCAP"
colnames(EMIT)[7] <- "INDUSTRY"
colnames(EMIT)[8] <- "RENEW"
colnames(EMIT)[9] <- "URBAN"
EMIT$logEMISSIONS <- log(EMIT$EMISSIONS)
EMIT$logPOPULATION <- log(EMIT$POPULATION)
EMIT$logGDPCAP <- log(EMIT$GDPCAP)
EMIT$logINDUSTRY <- log(EMIT$INDUSTRY)
EMIT$logRENEW <- log(EMIT$RENEW+1)
EMIT$logURBAN <- log(EMIT$URBAN)
save(EMIT, file = "EMIT.rda")
library(haven)
write_sav(EMIT, "EMIT.sav")
The dataset contains the following variables in normal and log transformed values (apart from basic country descriptors):
## First we get an overview of the dataset by generating the basic descriptive statistics..
summary(EMIT)
iso2c country year EMISSIONS
Length:217 Length:217 Min. :2015 Min. : 11
Class :character Class :character 1st Qu.:2015 1st Qu.: 1889
Mode :character Mode :character Median :2015 Median : 9149
Mean :2015 Mean : 162301
3rd Qu.:2015 3rd Qu.: 57317
Max. :2015 Max. :10145005
NA's :13
POPULATION GDPCAP INDUSTRY RENEW
Min. :1.110e+04 Min. : 825.2 Min. : 2.073 Min. : 0.000
1st Qu.:7.575e+05 1st Qu.: 4670.4 1st Qu.:17.119 1st Qu.: 4.759
Median :6.372e+06 Median : 12605.1 Median :23.660 Median :19.281
Mean :3.385e+07 Mean : 20830.8 Mean :24.386 Mean :29.069
3rd Qu.:2.324e+07 3rd Qu.: 28767.6 3rd Qu.:30.013 3rd Qu.:46.476
Max. :1.371e+09 Max. :120294.9 Max. :61.362 Max. :95.818
NA's :1 NA's :24 NA's :21 NA's :4
URBAN iso3c region capital
Min. : 12.08 Length:217 Length:217 Length:217
1st Qu.: 41.16 Class :character Class :character Class :character
Median : 60.78 Mode :character Mode :character Mode :character
Mean : 60.02
3rd Qu.: 79.52
Max. :100.00
NA's :3
longitude latitude income lending
Length:217 Length:217 Length:217 Length:217
Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character
logEMISSIONS logPOPULATION logGDPCAP logINDUSTRY
Min. : 2.398 Min. : 9.315 Min. : 6.716 Min. :0.7291
1st Qu.: 7.542 1st Qu.:13.538 1st Qu.: 8.449 1st Qu.:2.8402
Median : 9.121 Median :15.667 Median : 9.442 Median :3.1638
Mean : 9.185 Mean :15.172 Mean : 9.376 Mean :3.0786
3rd Qu.:10.956 3rd Qu.:16.962 3rd Qu.:10.267 3rd Qu.:3.4016
Max. :16.132 Max. :21.039 Max. :11.698 Max. :4.1168
NA's :13 NA's :1 NA's :24 NA's :21
logRENEW logURBAN
Min. :0.000 Min. :2.491
1st Qu.:1.751 1st Qu.:3.717
Median :3.010 Median :4.107
Mean :2.724 Mean :3.992
3rd Qu.:3.860 3rd Qu.:4.376
Max. :4.573 Max. :4.605
NA's :4 NA's :3
Estimate the IPAT model with the dataset EMIT - with and without logarithmic transformation. Also draw a scatterplot matrix for all the variables in the dataset. Discuss whether logarithmic transformation is appropriate for this particular model?
## Let us try to make the scatterplot matrices first:
pairs(~EMISSIONS+POPULATION+GDPCAP+INDUSTRY+RENEW+URBAN,data=EMIT,
main="Simple Scatterplot before Transf")
pairs(~logEMISSIONS+logPOPULATION+logGDPCAP+logINDUSTRY+logRENEW+logURBAN, data=EMIT, main="Simple Scatterplot after Tranformation")
## Only the log transformed variables meet the assumption of linearity. Therefore we need the log transformed model.
## Next we estimate the IPAT model with and without logarithmic transformation
Fit_normal <- lm(EMISSIONS ~ POPULATION + GDPCAP + INDUSTRY + RENEW, data=EMIT)
Fit_log <- lm(logEMISSIONS ~ logPOPULATION + logGDPCAP + logINDUSTRY + logRENEW, data=EMIT)
library(stargazer)
stargazer::stargazer(Fit_normal, Fit_log, title = "Table R1: IPAT model with and without log transformation ", dep.var.labels=c("Emissions in kiloton per year"), omit.stat=c("f", "ser"), align=TRUE, no.space=TRUE, type="text")
Table R1: IPAT model with and without log transformation
========================================================
Dependent variable:
------------------------------------------
Emissions in kiloton per year logEMISSIONS
(1) (2)
--------------------------------------------------------
POPULATION 0.005***
(0.0003)
GDPCAP 3.218*
(1.908)
INDUSTRY 1,103.429
(3,370.049)
RENEW -2,370.018
(1,452.651)
logPOPULATION 1.005***
(0.020)
logGDPCAP 0.918***
(0.038)
logINDUSTRY 0.343***
(0.081)
logRENEW -0.315***
(0.034)
Constant -28,077.580 -14.983***
(113,751.700) (0.461)
--------------------------------------------------------
Observations 184 184
R2 0.678 0.962
Adjusted R2 0.671 0.961
========================================================
Note: *p<0.1; **p<0.05; ***p<0.01
## The multiple regression analysis also confirms that the model fit significantly improves after the log tranformation - for example R2 increases with nearly 30%. All the variables are significant in Column 2 in Table R1. We can interpret the coefficient estimates as follows: a one percentage increase in x leads to a B increase in y. For example, a 1% increase in population increases CO~2~ emissions with 1.005% etc.
Now include also the variable URBAN into your model. Discuss the results and investigate whether there could be a problem of multicollinearity or a source of heteroscedasticity by including this variable into the model? For example, conduct the diagnostics for the model with and without the URBAN variable included.
## Next we estimate the IPAT model with and without logarithmic transformation
Fit_log <- lm(logEMISSIONS ~ logPOPULATION + logGDPCAP + logINDUSTRY + logRENEW, data=EMIT)
Fit_log_urban <- lm(logEMISSIONS ~ logPOPULATION + logGDPCAP + logINDUSTRY + logRENEW + logURBAN, data=EMIT)
library(stargazer)
stargazer::stargazer(Fit_log, Fit_log_urban, title = "Table R2: IPAT model with and without URBAN", dep.var.labels=c("Emissions in kiloton per year"), omit.stat=c("f", "ser"), align=TRUE, no.space=TRUE, type="text")
Table R2: IPAT model with and without URBAN
============================================
Dependent variable:
------------------------------
Emissions in kiloton per year
(1) (2)
--------------------------------------------
logPOPULATION 1.005*** 1.004***
(0.020) (0.020)
logGDPCAP 0.918*** 0.871***
(0.038) (0.048)
logINDUSTRY 0.343*** 0.342***
(0.081) (0.080)
logRENEW -0.315*** -0.311***
(0.034) (0.034)
logURBAN 0.170
(0.106)
Constant -14.983*** -15.226***
(0.461) (0.476)
--------------------------------------------
Observations 184 183
R2 0.962 0.963
Adjusted R2 0.961 0.962
============================================
Note: *p<0.1; **p<0.05; ***p<0.01
## The URBAN variable is not significant and therefore appears not to be relevant and hence also not a confounder.
library(lmtest)
plot(Fit_normal, 1)
plot(Fit_normal, 2)
## The first plot is the equivalent to testing for heteroscedasticity - we can see that the assumptions are violated especially because of the outliers. The second plot (QQ plot) visualises the normality assumption. Also here we can see the main problem in the untranformed data are the large outliers. They are the same in the two plots: observations 46 (China), 97 (India) and 222 (United States). All very large countries.
library(lmtest)
plot(Fit_log, 1)
plot(Fit_log, 2)
## The model assumptions are better met in the log tranformed model. The outliers are now observations 39 (Congo), 51 (Curacao) and 139 (Niger). All small countries.
## Now let us check if there is any change to the residual plots when including the URBAN variable:
plot(Fit_log_urban, 1)
plot(Fit_log_urban, 2)
## There is no change at all.
Use the column ‘region’ in the dataset to create a categorical variable that you call ECA and that takes a value of 1 when the country is in the region ‘Europe & Central Asia’ and 0 otherwise. Now include the ECA dummy into the IPAT model (again with and without the URBAN variable). Interpret the result for the ECA dummy.
EMIT$ECA <- ifelse(EMIT$region=="Europe & Central Asia", 1, 0)
Fit_log_ECA <- lm(logEMISSIONS ~ logPOPULATION + logGDPCAP + logINDUSTRY + logRENEW + ECA, data=EMIT)
Fit_log_urban_ECA <- lm(logEMISSIONS ~ logPOPULATION + logGDPCAP + logINDUSTRY + logRENEW + logURBAN + ECA, data=EMIT)
library(stargazer)
stargazer::stargazer(Fit_log_ECA, Fit_log_urban_ECA, title = "Table R2: IPAT model with ECA dummy", dep.var.labels=c("Emissions in kiloton per year"), omit.stat=c("f", "ser"), align=TRUE, no.space=TRUE, type="text")
Table R2: IPAT model with ECA dummy
============================================
Dependent variable:
------------------------------
Emissions in kiloton per year
(1) (2)
--------------------------------------------
logPOPULATION 1.003*** 1.002***
(0.019) (0.019)
logGDPCAP 0.861*** 0.812***
(0.042) (0.052)
logINDUSTRY 0.343*** 0.344***
(0.080) (0.079)
logRENEW -0.341*** -0.335***
(0.034) (0.034)
logURBAN 0.187*
(0.104)
ECA 0.269*** 0.255***
(0.092) (0.092)
Constant -14.420*** -14.708***
(0.491) (0.503)
--------------------------------------------
Observations 184 183
R2 0.964 0.965
Adjusted R2 0.963 0.964
============================================
Note: *p<0.1; **p<0.05; ***p<0.01
## According to the results in Table R2 the ECA region has higher emissions levels that the rest of the world. Controlling for the ECA region makes the URBAN variable significant. The explanatory power of the affluence variable (e.g. GDPCAP) is somewhat reduced by including these two variables into the model. The other explanatory factors are constant. There could be a potentially confounding relationship between GDPCAP and URBAN since they are strongly collinear across time. But this result is not born out in a cross section analysis because the countries are at so different levels of development and hence at different stages of their urbanisation process.
## Again checking residual diagnostics for the last equation - because we will use it for prediction, so need to check that it does not violate BLUE assumptions..
## Out of curiousity let us also check if it changes the residual plots to include the URBAN variable -
plot(Fit_log_urban_ECA, 1)
plot(Fit_log_urban_ECA, 2)
## Again there is hardly any change to the residual plots.
## As a backdrop to the new result - e.g. URBAN is significant when we include ECA, let us check the correlation matrix
COR <- EMIT[c(4:9, 23)]
cor(COR, use='complete.obs')
EMISSIONS POPULATION GDPCAP INDUSTRY RENEW
EMISSIONS 1.00000000 0.811769761 0.06409496 0.12857604 -0.121391719
POPULATION 0.81176976 1.000000000 -0.06479765 0.12104810 -0.007958916
GDPCAP 0.06409496 -0.064797654 1.00000000 0.10818056 -0.448830703
INDUSTRY 0.12857604 0.121048099 0.10818056 1.00000000 -0.092151094
RENEW -0.12139172 -0.007958916 -0.44883070 -0.09215109 1.000000000
URBAN 0.06385599 -0.064099942 0.66789862 0.11741882 -0.534818392
ECA -0.03847362 -0.086567201 0.30535015 0.03076116 -0.177002516
URBAN ECA
EMISSIONS 0.06385599 -0.03847362
POPULATION -0.06409994 -0.08656720
GDPCAP 0.66789862 0.30535015
INDUSTRY 0.11741882 0.03076116
RENEW -0.53481839 -0.17700252
URBAN 1.00000000 0.19200709
ECA 0.19200709 1.00000000
## A confounder would also be revealed by a very high correlation coefficient. But multicollinearity does not appear to be a problem - for example the Pearson correlation coefficient between URBAN and GDPCAP is the highest among the independent variables, but not too high - e.g. it is 0.67 according to the correlation matrix below. Note also that there is a negative correlation on average between GDPCAP and RENEW!!
Which of the avenues for cutting CO2 emissions levels (e.g. percentage reduction in population, percentage reduction in affluence, percentage reduction of industry or percentage switch towards renewables) will be relatively more or less effective according to your results?
## We do not need standardised coefficient estimates in this case, because the variables are log transformed and therefore on comparable scales (e.g. percentages) that are also much easier to interpret than standardised coefficients. The largest contribution to reduction would come from 1% reduction in population, then GDPCAP, then INDUSTRY, then RENEW and finally URBAN - e.g. in the order that the independent variables are presented in the model.
Use the model to make predictions for CO2 emissions for European and non-European countries (e.g. by filling in Table 1) under the following scenarios (assuming the current population growth rate continues at 1.05% per year and that INDUSTRY and URBAN remains constant until 2030):
Scenario 1 (baseline): GDP per capita increases with 1.5% per year until 2030, the switch to renewables takes pace with around 2% per year until 2030.
Scenario 2 (low growth, green transition): GDP per capita increases with 0.5% per year until 2030, the switch to renewables goes up to 5% per year until 2030.
Scenario 3 (high growth, modest green transition): GDP per capita increases with 3% per year until 2030, the switch to renewables continues slowly at around 0.5% per year until 2030.
Hint - Use the formulae for compound annual growth rate (e.g. (1+g)^15) to project the averages (from your basic descriptive statistics tables). Insert the projections into your preferred prediction equation under Question 3.
Also, discuss what the implicit assumption of these predictions is?
## These calculations could be made manually and/or in a spreadsheet as well.
## First we can create a new dataset based on the EMIT dataset - but storing the mean values (I did it separately for ECA and non-ECA as it makes more sense for the projections) - and because we need to insert the mean values to make the predictions:
EMIT_ECA_1 <- EMIT[which(EMIT$ECA==1), ]
EMIT_num_ECA_1 <- EMIT_ECA_1[c(4:9)]
EMIT_ECA_0 <- EMIT[which(EMIT$ECA==0), ]
EMIT_num_ECA_0 <- EMIT_ECA_0[c(4:9)]
EMIT_p_ECA_1 <- data.frame(t(colMeans(EMIT_num_ECA_1, na.rm=TRUE)))
EMIT_p_ECA_0 <- data.frame(t(colMeans(EMIT_num_ECA_0, na.rm=TRUE)))
## Scenario 1 calculations for ECA:
EMIT_p_s1 <- EMIT_p_ECA_1
EMIT_p_s1$logPOPULATION <- log(EMIT_p_s1$POPULATION*(1.0105)^15)
EMIT_p_s1$logGDPCAP <- log(EMIT_p_s1$GDPCAP*(1.015)^15)
EMIT_p_s1$logINDUSTRY <- log(EMIT_p_s1$INDUSTRY)
EMIT_p_s1$logRENEW <- log(EMIT_p_s1$RENEW*(1.02)^15)
EMIT_p_s1$logURBAN <- log(EMIT_p_s1$URBAN)
EMIT_p_s1$ECA <- 1
S1_ECA_1 <- predict(Fit_log_urban_ECA, newdata=EMIT_p_s1)
## Scenario 1 calculations for non-ECA:
EMIT_p_s1 <- EMIT_p_ECA_0
EMIT_p_s1$logPOPULATION <- log(EMIT_p_s1$POPULATION*(1.0105)^15)
EMIT_p_s1$logGDPCAP <- log(EMIT_p_s1$GDPCAP*(1.015)^15)
EMIT_p_s1$logINDUSTRY <- log(EMIT_p_s1$INDUSTRY)
EMIT_p_s1$logRENEW <- log(EMIT_p_s1$RENEW*(1.02)^15)
EMIT_p_s1$logURBAN <- log(EMIT_p_s1$URBAN)
EMIT_p_s1$ECA <- 0
S1_ECA_0 <- predict(Fit_log_urban_ECA, newdata=EMIT_p_s1)
exp(S1_ECA_1)
1
116702.8
exp(S1_ECA_0)
1
118612.8
## Scenario 2 calculations for ECA:
EMIT_p_s2 <- EMIT_p_ECA_1
EMIT_p_s2$logPOPULATION <- log(EMIT_p_s2$POPULATION*(1.0105)^15)
EMIT_p_s2$logGDPCAP <- log(EMIT_p_s2$GDPCAP*(1.005)^15)
EMIT_p_s2$logINDUSTRY <- log(EMIT_p_s2$INDUSTRY)
EMIT_p_s2$logRENEW <- log(EMIT_p_s2$RENEW*(1.05)^15)
EMIT_p_s2$logURBAN <- log(EMIT_p_s2$URBAN)
EMIT_p_s2$ECA <- 1
S2_ECA_1 <- predict(Fit_log_urban_ECA, newdata=EMIT_p_s2)
## Scenario 2 calculations for non-ECA:
EMIT_p_s2 <- EMIT_p_ECA_0
EMIT_p_s2$logPOPULATION <- log(EMIT_p_s2$POPULATION*(1.0105)^15)
EMIT_p_s2$logGDPCAP <- log(EMIT_p_s2$GDPCAP*(1.005)^15)
EMIT_p_s2$logINDUSTRY <- log(EMIT_p_s2$INDUSTRY)
EMIT_p_s2$logRENEW <- log(EMIT_p_s2$RENEW*(1.05)^15)
EMIT_p_s2$logURBAN <- log(EMIT_p_s2$URBAN)
EMIT_p_s2$ECA <- 0
S2_ECA_0 <- predict(Fit_log_urban_ECA, newdata=EMIT_p_s2)
exp(S2_ECA_1)
1
89414.57
exp(S2_ECA_0)
1
90877.94
## Scenario 3 calculations for ECA:
EMIT_p_s3 <- EMIT_p_ECA_1
EMIT_p_s3$logPOPULATION <- log(EMIT_p_s3$POPULATION*(1.0105)^15)
EMIT_p_s3$logGDPCAP <- log(EMIT_p_s3$GDPCAP*(1.03)^15)
EMIT_p_s3$logINDUSTRY <- log(EMIT_p_s3$INDUSTRY)
EMIT_p_s3$logRENEW <- log(EMIT_p_s3$RENEW*(1.005)^15)
EMIT_p_s3$logURBAN <- log(EMIT_p_s3$URBAN)
EMIT_p_s3$ECA <- 1
S3_ECA_1 <- predict(Fit_log_urban_ECA, newdata=EMIT_p_s3)
## Scenario 3 calculations for non-ECA:
EMIT_p_s3 <- EMIT_p_ECA_0
EMIT_p_s3$logPOPULATION <- log(EMIT_p_s3$POPULATION*(1.0105)^15)
EMIT_p_s3$logGDPCAP <- log(EMIT_p_s3$GDPCAP*(1.03)^15)
EMIT_p_s3$logINDUSTRY <- log(EMIT_p_s3$INDUSTRY)
EMIT_p_s3$logRENEW <- log(EMIT_p_s3$RENEW*(1.005)^15)
EMIT_p_s3$logURBAN <- log(EMIT_p_s3$URBAN)
EMIT_p_s3$ECA <- 0
S3_ECA_0 <- predict(Fit_log_urban_ECA, newdata=EMIT_p_s3)
exp(S3_ECA_1)
1
150315.9
exp(S3_ECA_0)
1
152776
## Finally does it also make sense to calculate the average for the ECA and non-ECA countries in 2015 - to benchmark the projections and also to help check that the projections made are realistic:
library(dplyr)
Emissions <- EMIT %>%
group_by(ECA) %>%
summarise( n = n(), meanEmissions =mean(EMISSIONS, na.rm=TRUE))
Emissions
# A tibble: 2 x 3
ECA n meanEmissions
<dbl> <int> <dbl>
1 0 159 180981.
2 1 58 107696.
## These last numbers we can use to make relative the projections in the table in terms of seeing what each scenario implies for reaching a certain target - here relative to 2015 (even though often the targets are set relative to for example 1990)
Make a plot, graph or figure that you think best sums up or illustrates your findings in this paper. Write a one paragraph accompanying conclusion to your paper.
McNicoll, G. (2014). Population and sustainability. In Handbook of sustainable development. Edward Elgar Publishing.
Wu, Y., Shen, J., Zhang, X., Skitmore, M., & Lu, W. (2016). The impact of urbanization on carbon emissions in developing countries: a Chinese study based on the U-Kaya method. Journal of Cleaner Production, 135, 589-603.