Part 1: Introduction

Background information

The Earth naturally absorbs and releases CO2. There are four major carbon storage reservoirs : the atmosphere, the biosphere, the ocean and the subsoil. It is impossible to ignore that people on our planet are inextricably connected through the climate they depend on and the air they breathe. For this reason, in the 21st century, there has been a growing public, political and media awareness of the consequences of global warming around the world. The most emitting sectors are industry, energy production and transport. Household consumption also significantly contributes to CO2 emissions. CO2 is also emitted by the natural activity of our planet: volcanic eruptions, respiration of plants and animals.

In developing countries, most farmers spend more on agricultural energy consumption than on fertilizers, seeds or agrochemicals. On the other hand, in the industrialized countries, the number of machines is decreasing because of the concentration of farms, the increased technical performance of machines, their higher power, the development of the common use of tractors and also because of the increased specialization of farms.

CO2 emissions from human activities are increasingly changing the composition of the atmosphere. The growth in the transport of goods and people, in particular air traffic, as well as the heating of buildings, are leading to a high consumption of fossil fuels. CO2 emissions from the combustion of fuels such as gasoline, diesel or kerosene and fuels such as fuel oil or natural gas enhance the natural greenhouse effect and lead to global warming. Transportation activities are an important source of emissions from the combustion of fossil fuels. Land change and agriculture also contribute to increasing concentrations of CO2 in the atmosphere. Energy consumption in agriculture varies greatly from one country to another, as it is related to the size of the country’s sector and the types of production. The most important countries in terms of agricultural area are also those that consume the most energy in the world. The increasing mechanization of agriculture means a sharp increase in energy consumption.

Industry and, to a lesser extent, the waste management sector are also emitters of greenhouse gases. The consumption of imported goods produces significant foreign emissions that also contribute to global warming. Financing and investment decisions in financial markets also have an impact on the environment and the climate. For example, the investments made today, for example in energy supply, determine future CO2 emissions.

It is disclosed that the richest 10 percent of people produce half of the planet’s individual-consumption-based fossil fuel emissions, while the poorest 50 percent contribute only 10 percent. Due to unprecedented economic growth, much higher than that of developed countries, emerging economies such as China and India have experienced a significant increase in consumption over the past decade. China is a particular country. Although they pollute the most, they are also the largest investors in renewable energy. With their energy system largely geared to coal, developed countries are seeing their emissions grow at a considerable rate. Emerging countries, which are mostly densely populated, are growing in population and urbanization is accelerating as their energy needs increase.

Data Collection

The data was collected from the World Bank DataBank: https://databank.worldbank.org/home.aspx

Most of the data are collected from different international organizations and most are observational and some are estimates. Some unavoidable biases are that most of the data collected by international organizations are given directly by each country, which tend to have some variations in how each government will measure and calculate their own national accounts and other different variables, producing some unavoidable measurement discrepancies between countries.

In our specific case, the variables Region and Income Group were recoded, but were still strongly based on the World Bank classifications. Additionally, Ln transformations, and lag variables were created.

Statistical Details

o Statistical Methods:

• Summary statistics, boxplots, histograms, scatter plots, normal probability plots, correlations, multiple linear panel models, statistical tests, and model diagnostics.

The models are multiple linear dynamic panel regressions with the following different specifications:

Pooled OLS,
Between,
Fixed Effects (LSDV),
Fixed Effects (Within),
Fixed and Time Effects (Twoway),
Random Effects,
and their respective Dynamic specifications with lagged variables, t-1.

o Dependent Variables:

CO2EmissionsKT: CO2 emissions in total kilotons(kt), 1960-2014
CO2EmissionsPerCapita: CO2 emissions in metric tons per capita, 1960-2014

o Independent variables:

Region: World Geographical Region (categorical):

Africa
Asia
Europe
Latin America & Caribbean
Middle East
North America
Oceania

IncomeGroup: Income Level Groups (categorical) measured by GDP per capita by Puchasing Power Parity (PPP), constant $ prices 2011 international, 1990-2018:

High Income: x > $28,000
Upper-Middle Income: $12,500 < x < $28,000
Lower-Middle Income: $2,300 < x < $12,500
Low Income: x < $2,300

Log GDP per capita, PPP: Natural log of GDP per capita by Puchasing Power Parity (PPP), constant $ prices 2011 international (continuous), 1990-2018
GDP per capita growth: Percent annual change of rate of GDP per capita, 1961-2018
Total population: Total population measured in thousands, 1960-2018
Population growth: Percent annual change of rate of total population, 1961-2018
Population density: Total number of people per sq. km of land area, 1960-2018
Labor Force: Total number of labor force size, 1990-2018
Employment in Services: Portion of labor force employed in the services sector, 1991-2018
Employment in Agriculture: Portion of labor force employed in the agricultural sector, 1991-2018
Employment in Industry: Portion of labor force employed in the industrial sector, 1991-2018
Total Natural Resources Rents: Portion of GDP from the total earnings from natural resources rents from the sum of oil rents, natural gas rents, coal rents (hard and soft), mineral rents, and forest rents, 1970-2017
Mineral Rents: Portion of GDP from the total earnings from minerals including tin, gold, lead, zinc, iron, copper, nickel, silver, bauxite, and phosphate, 1970-2017
Oil Rents: Portion of GDP from the total earnings from crude oil, 1970-2017
Natural Gas Rents: Portion of GDP from the total earnings from natural gas, 1970-2017
Coal Rents: Portion of GDP from the total earnings from hard and soft coal production, 1970-2017
Forests Rents: Portion of GDP from the total earnings from roundwood harvests, 1970-2017
Renewable Energy consumption: Share of renewables energy in total final energy consumption, 1990-2015

Library & Data

Reshaping Data into Panel Data

A Panel has a cross-sectional dimension (N) which are the individual-observations and a longitudinal dimension (T) which is the time measurement. In our case, our indiviudals are countries and our time dimension is annual years. First, we will transform the data structure to Panel Data, indexing the observations by Country groups and Years. Therefore, our observations for the panels will be country-years units.

Data Cleaning

The Data originally has a total of 9950 observations, but many observations have missing and NA values. We will proceed to drop all the observations with missing and NA values.

## Unbalanced Panel: n = 153, T = 8-23, N = 3137

We are left with an unbalanced panel of 3137 total observations, from 153 countries, ranging from 8 years to 23 years per country. The fact that some countries will have more observations than other due to their time range is what defines an unbalanced panel from a balnaced panel. In a balanced panel every country group in our case will have the same number of years.

One important consideration about working with unbalaced panels is that they are more prone to produce estimation errors than balanced panels even though most statistical packages today are minimize the inneficinecy and inconsistency of estimates. But the estimation errors will become more severe as the panel becomes more unbalanced.

In our case, we have a big range in the time dimension between the country-groups which could produce estimation errors. This could become serious if we took a simple random sample, without sampling by clusters of countries, and potentially produce a more unbalanced panel. For now, we will work with our present number of observations, and in the future it is recommended to try to make a more balanced random sample.

Part 2: Data Analysis

Exploratory Data Analysis

In this section we will focus on the descriptive statistics of our variables, plot histograms, boxplots and scatterplots, and correlations between our varibales.

Descriptive Statistics

The main descriptive analysis from this summary are:

Our Year variable range is from 1992-2014.

Our Region variable composition is as follows:

Africa : 898 (0.286)
Europe : 636 (0.203)
Asia : 587 (0.187)
Latin America & Caribbean : 523 (0.167)
Middle East : 300 (0.096)
North America : 46 (0.015)
Oceania : 147 (0.047)

Our Income Group variable composition is as follows:

High income : 517 (0.165)
Upper-middle income : 540 (0.172)
Lower-middle income : 1350 (0.430)
Low income : 730 (0.233)

The main descriptive analysis from this summary are:

                         Mean       S.D.       Median    S.E.

CO2EmissionsKT: 137905.65 513106.26 10392.28 9161.15
LnCO2EmissionsKT: 9.40 2.42 9.25 0.04
CO2EmissionsPerCapita: 4.46 6.25 1.94 0.11
LnCO2EmissionsPerCapita: 0.49 1.65 0.66 0.03
LnCO2EmissionsPCNew: 0.5677
GDPperCapitaPPP: 13843.03 17584.20 6644.63 313.95
LnGDPperCapitaPPP: 8.82 1.25 8.80 0.02

We can observe that our Log transformations reduce effectively the standard deviation and standard error for our main continuous variables of interest: CO2EmissionsKT, CO2EmissionsPerCapita, and GDPperCapitaPPP.

Data Visualizations

Now we will use histograms, boxplots, and scatterplots to analyze the distribution of our main variables and compare them to their respective Ln transformations and how they relate to one another.

Histograms and Boxplots for Main Variables and their Log transformations:

It can be observe that from the histograms and boxplots that the Ln transformations standarizes the varibales distribution, thus reducing the impact of outliers and normalizing the distribution.

Evaluating The Normal Distribution of CO2 Emissions (KT) and its Log:

We will evaluate if the Ln variable tranformation of CO2 Emissions (KT) appears to be nearly normally distributed in comparison to the orginial variable using a normal probability plot, also called a normal Q-Q plot for “quantile-quantile”.

Evaluating The Normal Distribution of CO2 Emissions per Capita and its Ln:

We will evaluate if the Ln variable tranformation of CO2 Emissions per Capita appears to be nearly normally distributed in comparison to the orginial variable using a normal probability plot.

Evaluating The Normal Distribution of GDP per Capita PPP and its Ln:

We will evaluate if the Ln variable tranformation of GDP per Capita PPP appears to be nearly normally distributed in comparison to the orginial variable using a normal probability plot.

Shapiro-Wilk Normality Tests for the Ln Transformed Variables:

## 
##  Shapiro-Wilk normality test
## 
## data:  pco2new2$LnCO2EmissionsKT
## W = 0.98987, p-value = 3.754e-14

## 
##  Shapiro-Wilk normality test
## 
## data:  pco2new2$LnCO2EmissionsPerCapita
## W = 0.97212, p-value < 2.2e-16

## 
##  Shapiro-Wilk normality test
## 
## data:  pco2new2$LnGDPperCapitaPPP
## W = 0.98256, p-value < 2.2e-16

The Shapiro Tests suggests that the variables are not Normal Distributed. This can be confirmed from the Normal Probability plots that show that the distributions have very strong outliers in both tails even after the Ln transformations.

Boxplots comparing the categrical variables of Region and Income Group:

Scatterplots comparing (Ln)CO2 Emissions per Capita by (Ln)GDP per Capita PPP:

Variables Correlations:

From the following correlations and charts we can detect which independent variables will be more strongly related to our main dependent varibale, Ln CO2 Emissions per Capita. Additionally, very high correlations between independent variables can easily cause multicollinearity problems. We can therefore detect which independent variables are strongly correlated and avoid putting them together in the same model, thus avoiding possible multicollinearity problems beforehand.

In advance, we can expect high correlations between the following:

Each CO2 and GDP variable with its own Ln transformation, The three Employment varibales with each other and , The Total Natural Resources Rents variable with the rest of the Rents variables.

Correlations charts between Independnet continuous variables with significance levels:

The group of variables that we expected to be highly correlated are effectively. Additionally, we find that Total Population is highly correlated with Total Labor Force.

Now we can focus on looking for which specific independent continuous variables are more correlated with Ln CO2 Emissions per Capita to be able to build the most parsimoniuos model possible using forward selection. We will try to look for the least number of independent variables that are able to explain the most amount of the variance of our Ln CO2 Emissions per Capita variable.

Correlations chart between Ln CO2 Emissions per Capita and other Independnet continuous variables with significance levels:

Correlations between Ln CO2 Emissions per Capita and other Independnet continuous variables:

##      GDPperCapitaPPP LnGDPperCapitaPPP GDPperCapitaGrowth PopulationTotal
## [1,]       0.6699336         0.9002975        -0.03300276      0.02031009
##      PopulationGrowth PopulationDensity LaborForce EmploymentAgriculture
## [1,]       -0.2334791        0.07789669 0.04537566            -0.8732624
##      EmploymentIndustry EmploymentServices TotalNaturalResourcesRents
## [1,]          0.7319804          0.8114768                -0.06119391
##       OilRents NaturalGasRents MineralRents ForestRents  CoalRents
## [1,] 0.1859002       0.1748236   -0.1087441  -0.6225594 0.07777814
##      RenewableEnergyConsumption
## [1,]                 -0.8488297

Correlations between Ln CO2 Emissions per Capita and other Independent continuous lagged variables:

##      GDPgrowthlag GDPperCapitaPPPlag LnGDPperCapitaPPPlag
## [1,]  -0.01457617          0.6687695            0.9001759
##      PopulationTotallag PopulationGrowthlag PopulationDensitylag
## [1,]         0.02192764          -0.2345378           0.07781012
##      LaborForcelag EmploymentAgriculturelag EmploymentIndustrylag
## [1,]    0.04714553               -0.8734301             0.7365937
##      EmploymentServiceslag TotalNaturalResourcesRentslag OilRentslag
## [1,]             0.8116599                   -0.05324654   0.1913722
##      NaturalGasRentslag MineralRentslag ForestRentslag CoalRentslag
## [1,]          0.1716228     -0.09529101     -0.6202426   0.08217412
##      RenewableEnergyConsumptionlag
## [1,]                    -0.8509698

We find the following order of the strongest correlations with our variable of interest:

LnGDPperCapitaPPP: 0.9002975
EmploymentAgriculture: -0.8732624
RenewableEnergyConsumption: -0.8488297
EmploymentServices: 0.8114768
EmploymentIndustry: 0.7319804
ForestRents: -0.6225594
PopulationGrowth: -0.2334791

From this correlations we can assume the following general tendency:

The larger the Ln GDP per Capita PPP, the larger the % of Employment in the Services Economic Sector and the larger the % of Employment in the Industry Economic Sector, the values for Ln CO2 Emissions per Capita will be larger.

On the other hand the larger the % of Employment in the Agriculture Economic Sector, the larger the % of Consumption from Renewable Energy, the larger the % of GDP Rents from Forest Extraction, and the larger the Population Growth Rate, the values for Ln CO2 Emissions per Capita will be smaller.

It is interesting to note that Employment in Agriculture has bigger impact than Employment in Services and Industry. Also the direction of the effect for the Population Growth and Forest Rents was the opposite from what was originally expected.

Finnaly, we observe that the lagged values do not change drastically from their original values. This is most likely from the fact that the data is in annual years, and we only applied a lag of only 1 year, therefore the values are not expected to change drastically.

Heterogeneity across the Years:

We will now try to observe if there is a specific Time tendency effect.

There appears to be a clear positive tendency starting from 2004, but we will need to run some tests later on to know if the time tendency effectively has an impact on the Ln CO2 Emissions per Capita or if it just statistical noise.

Central Questions and Hypothesis to test:

Based on the past research and exploratory analysis we will explore which are the main variables that have larger effects in the emissions of carbon dioxide on the country level.

Our main dependent variable will be the emission of carbon dioxide per capita and our initial main independent variables will be Ln of GDP per capita in PPP, the income group, geographical region, the percentage of renewable energy consumption, percentages of employment in economic sectors and rents from forests extranctions.

We will use linear panel models with different specifications, and use different tests to observe how the estimates change along side different speciications.

We will start with basic Pooled OLS Model, proceeding with forward selection to determine which are the variables that have the most individual effects. We will proceed with comparing models with the variables against models with lagged variables to explore if we can detect dynamic time effects. We will then proceed with Between and Within model specifications to determine if we can detect if there are heterogenous effects from each individual country that affects the carbon dioxide emissions or not. We will analize if there are proper time trends that affect the variation in our variable of interest. We will proceed finally with a random effects model and a twoway within model.

Our main interest is to explore different hypothesis and measure variation from different model specifications that would explain variation in carbon dioxide emission by country.

The main model of interest that we want want to initially test is the following:

Ln CarbonEmissions perCapita = Intercept + B1(Ln GDP perCapita PPP) + B2(Income Group) + B3(Region) + B4(Renewable Energy) + B5(Employment Agriculture) + B6(Employment Services) + B7(Forest Rents) + Country Effects + Time Effects + (Error)

Models

We will initially explore univariate linear models with the Pooled OLS Model and once we have determine which are the variables that effectively appear to impact the most, then we will proceed to building multivariate models with more specificaitons.

1. Pooled OLS Model:

In this first model we will explore the impact of geographical region on the levels of carbon emissions per capita:

## Pooling Model
## 
## Call:
## plm(formula = LnCO2EmissionsPerCapita ~ Region, data = pco2new2, 
##     model = "pooling")
## 
## Unbalanced Panel: n = 153, T = 8-23, N = 3137
## 
## Residuals:
##      Min.   1st Qu.    Median   3rd Qu.      Max. 
## -3.885816 -0.663883 -0.038697  0.695229  2.930361 
## 
## Coefficients:
##                                  Estimate Std. Error t-value  Pr(>|t|)    
## (Intercept)                     -1.085388   0.039746 -27.308 < 2.2e-16 ***
## RegionAsia                       1.658373   0.063218  26.233 < 2.2e-16 ***
## RegionEurope                     2.959081   0.061728  47.938 < 2.2e-16 ***
## RegionLatin America & Caribbean  1.741089   0.065515  26.575 < 2.2e-16 ***
## RegionMiddle East                2.706807   0.079426  34.080 < 2.2e-16 ***
## RegionNorth America              3.944940   0.180054  21.910 < 2.2e-16 ***
## RegionOceania                    1.255878   0.105973  11.851 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    8562.4
## Residual Sum of Squares: 4440.3
## R-Squared:      0.48142
## Adj. R-Squared: 0.48042
## F-statistic: 484.282 on 6 and 3130 DF, p-value: < 2.22e-16

From the results we observe that all regions appear to be highly significant and the adjusted R-Squared for the model is 0.48042, and the overall model is also significant. It is also interesting that all regions have a positive effect on the carbon emissions except for Africa which is negative. Apparently on the regional level, the biggest polluter is North America, followed by Europe and the Middle East.

We will now look at the effects of Income Group:

## Pooling Model
## 
## Call:
## plm(formula = LnCO2EmissionsPerCapita ~ IncomeGroup, data = pco2new2, 
##     model = "pooling")
## 
## Unbalanced Panel: n = 153, T = 8-23, N = 3137
## 
## Residuals:
##      Min.   1st Qu.    Median   3rd Qu.      Max. 
## -2.648249 -0.558975 -0.020134  0.536949  3.341349 
## 
## Coefficients:
##                                 Estimate Std. Error  t-value  Pr(>|t|)    
## (Intercept)                     2.365160   0.037889  62.4239 < 2.2e-16 ***
## IncomeGroupLow income          -4.039315   0.049520 -81.5692 < 2.2e-16 ***
## IncomeGroupLower-middle income -1.962995   0.044557 -44.0559 < 2.2e-16 ***
## IncomeGroupUpper-middle income -0.521244   0.053009  -9.8331 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    8562.4
## Residual Sum of Squares: 2325.3
## R-Squared:      0.72844
## Adj. R-Squared: 0.72818
## F-statistic: 2801.28 on 3 and 3133 DF, p-value: < 2.22e-16

All income groups again are significant, as the overall model and the model has now an Adjusted R-Squared of 0.72818, substantially higher than our Region variable. All income groups have a negative effect on the carbon emissions with the exception of the High Income group which has a positive impact on the level of emissions.

We will now analize the impact of the Ln of the GDP per Capita measured in PPP:

## Pooling Model
## 
## Call:
## plm(formula = LnCO2EmissionsPerCapita ~ LnGDPperCapitaPPP, data = pco2new2, 
##     model = "pooling")
## 
## Unbalanced Panel: n = 153, T = 8-23, N = 3137
## 
## Residuals:
##      Min.   1st Qu.    Median   3rd Qu.      Max. 
## -3.375039 -0.419654 -0.017391  0.372477  2.789897 
## 
## Coefficients:
##                     Estimate Std. Error t-value  Pr(>|t|)    
## (Intercept)       -10.009143   0.091571 -109.31 < 2.2e-16 ***
## LnGDPperCapitaPPP   1.189929   0.010275  115.81 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    8562.4
## Residual Sum of Squares: 1622.3
## R-Squared:      0.81054
## Adj. R-Squared: 0.81048
## F-statistic: 13411.6 on 1 and 3135 DF, p-value: < 2.22e-16

The overall Adjusted R-Sqaured for the model is now 0.81048, the variable and the model are significant and the GDP per capita variable has a positive effect on the carbon emissions as expected.

We will now proceed with using the Ln GDP per Capita PPP and the Renewable Energy Consumption varible:

## Pooling Model
## 
## Call:
## plm(formula = LnCO2EmissionsPerCapita ~ LnGDPperCapitaPPP + RenewableEnergyConsumption, 
##     data = pco2new2, model = "pooling")
## 
## Unbalanced Panel: n = 153, T = 8-23, N = 3137
## 
## Residuals:
##       Min.    1st Qu.     Median    3rd Qu.       Max. 
## -2.6380493 -0.3310962 -0.0090764  0.3234117  1.6557494 
## 
## Coefficients:
##                               Estimate  Std. Error t-value  Pr(>|t|)    
## (Intercept)                -5.78023321  0.11811789 -48.936 < 2.2e-16 ***
## LnGDPperCapitaPPP           0.79500173  0.01189165  66.854 < 2.2e-16 ***
## RenewableEnergyConsumption -0.02123411  0.00047241 -44.948 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    8562.4
## Residual Sum of Squares: 986.39
## R-Squared:      0.8848
## Adj. R-Squared: 0.88473
## F-statistic: 12035.4 on 2 and 3134 DF, p-value: < 2.22e-16

The Adjunted R-Squared went up to 0.88473, the model and variables are significant and the Renewable Energy variable is negative which is expected.

Now we will include to this model the Region variable:

## Pooling Model
## 
## Call:
## plm(formula = LnCO2EmissionsPerCapita ~ LnGDPperCapitaPPP + RenewableEnergyConsumption + 
##     Region, data = pco2new2, model = "pooling")
## 
## Unbalanced Panel: n = 153, T = 8-23, N = 3137
## 
## Residuals:
##        Min.     1st Qu.      Median     3rd Qu.        Max. 
## -2.43915400 -0.30725341 -0.00028667  0.30553897  1.73989505 
## 
## Coefficients:
##                                    Estimate  Std. Error  t-value  Pr(>|t|)
## (Intercept)                     -5.40567712  0.12049976 -44.8605 < 2.2e-16
## LnGDPperCapitaPPP                0.72834782  0.01255026  58.0345 < 2.2e-16
## RenewableEnergyConsumption      -0.02054152  0.00050133 -40.9743 < 2.2e-16
## RegionAsia                       0.20837764  0.03202288   6.5071 8.887e-11
## RegionEurope                     0.42843245  0.03647923  11.7446 < 2.2e-16
## RegionLatin America & Caribbean  0.15616696  0.03338782   4.6774 3.029e-06
## RegionMiddle East                0.16117434  0.04397463   3.6652 0.0002513
## RegionNorth America              0.92605449  0.08684054  10.6638 < 2.2e-16
## RegionOceania                    0.18011512  0.04967902   3.6256 0.0002929
##                                    
## (Intercept)                     ***
## LnGDPperCapitaPPP               ***
## RenewableEnergyConsumption      ***
## RegionAsia                      ***
## RegionEurope                    ***
## RegionLatin America & Caribbean ***
## RegionMiddle East               ***
## RegionNorth America             ***
## RegionOceania                   ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    8562.4
## Residual Sum of Squares: 923.52
## R-Squared:      0.89214
## Adj. R-Squared: 0.89187
## F-statistic: 3234.16 on 8 and 3128 DF, p-value: < 2.22e-16

Now the R-Squared had a very slight increase to 0.89187. In general term, we could say that in the presence of GDP per capita and the Renewable Energy variable, the Region variable helps explain very little additional variation of the carbon emission on a country level. In the presence of more variables, the parameter estimates decreased as is generally expected and they had kept the same direction of their partial effects.

We will now run a quick test to see if at this model we are having collinearity:

## [1] "No linear dependent column(s) detected."

On the next model, we included the variable of Employment in Agriculture and we see that the variable does not help explain any more variation. We had also tried including the Income Group variable, and also it does not help significantly explain any additional variation. We can suppose that in presence of the GDP per capita variable, the income group will explain very little. Also the fact that the income groups will average large numbers of countries, this has more potential in diminishing the capacity to explain variation in contrast with the continuous variable of GDP per capita.

## Pooling Model
## 
## Call:
## plm(formula = LnCO2EmissionsPerCapita ~ LnGDPperCapitaPPP + RenewableEnergyConsumption + 
##     Region + EmploymentAgriculture, data = pco2new2, model = "pooling")
## 
## Unbalanced Panel: n = 153, T = 8-23, N = 3137
## 
## Residuals:
##      Min.   1st Qu.    Median   3rd Qu.      Max. 
## -2.359408 -0.305261 -0.012447  0.293017  1.782414 
## 
## Coefficients:
##                                    Estimate  Std. Error  t-value  Pr(>|t|)
## (Intercept)                     -4.26159677  0.17288160 -24.6504 < 2.2e-16
## LnGDPperCapitaPPP                0.62183736  0.01702623  36.5223 < 2.2e-16
## RenewableEnergyConsumption      -0.01815573  0.00055977 -32.4343 < 2.2e-16
## RegionAsia                       0.24039315  0.03180478   7.5584 5.328e-14
## RegionEurope                     0.39162891  0.03623483  10.8081 < 2.2e-16
## RegionLatin America & Caribbean  0.07814022  0.03405032   2.2948   0.02181
## RegionMiddle East                0.18211872  0.04346895   4.1896 2.871e-05
## RegionNorth America              0.88285134  0.08585285  10.2833 < 2.2e-16
## RegionOceania                    0.22453027  0.04928042   4.5562 5.409e-06
## EmploymentAgriculture           -0.00894692  0.00098110  -9.1193 < 2.2e-16
##                                    
## (Intercept)                     ***
## LnGDPperCapitaPPP               ***
## RenewableEnergyConsumption      ***
## RegionAsia                      ***
## RegionEurope                    ***
## RegionLatin America & Caribbean *  
## RegionMiddle East               ***
## RegionNorth America             ***
## RegionOceania                   ***
## EmploymentAgriculture           ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    8562.4
## Residual Sum of Squares: 899.6
## R-Squared:      0.89494
## Adj. R-Squared: 0.89463
## F-statistic: 2959.56 on 9 and 3127 DF, p-value: < 2.22e-16

Now that we have estbalished that any other additional variables are no longer helping explain any other substantial variation we will keep the following model for now:

Carbon Emissions = B1(GDP per capita) + B2(Renewable Energy) + B3(Region)

2. Lagged Pooled OLS Model

We will explore now if we can find any subtantial differences with a dynamic version of the model with lagged variables.

## Pooling Model
## 
## Call:
## plm(formula = LnCO2EmissionsPerCapita ~ LnGDPperCapitaPPPlag + 
##     RenewableEnergyConsumptionlag + Region, data = pco2new2, 
##     model = "pooling")
## 
## Unbalanced Panel: n = 153, T = 8-23, N = 3137
## 
## Residuals:
##       Min.    1st Qu.     Median    3rd Qu.       Max. 
## -2.4817901 -0.3047338 -0.0012595  0.3202074  1.6877058 
## 
## Coefficients:
##                                   Estimate Std. Error  t-value  Pr(>|t|)
## (Intercept)                     -5.3564160  0.1200654 -44.6125 < 2.2e-16
## LnGDPperCapitaPPPlag             0.7260120  0.0125351  57.9184 < 2.2e-16
## RenewableEnergyConsumptionlag   -0.0204678  0.0004996 -40.9688 < 2.2e-16
## RegionAsia                       0.2257303  0.0319362   7.0682 1.928e-12
## RegionEurope                     0.4249246  0.0364576  11.6553 < 2.2e-16
## RegionLatin America & Caribbean  0.1612189  0.0333321   4.8368 1.384e-06
## RegionMiddle East                0.1593380  0.0439444   3.6259 0.0002925
## RegionNorth America              0.9241806  0.0867471  10.6537 < 2.2e-16
## RegionOceania                    0.1737174  0.0496511   3.4988 0.0004739
##                                    
## (Intercept)                     ***
## LnGDPperCapitaPPPlag            ***
## RenewableEnergyConsumptionlag   ***
## RegionAsia                      ***
## RegionEurope                    ***
## RegionLatin America & Caribbean ***
## RegionMiddle East               ***
## RegionNorth America             ***
## RegionOceania                   ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    8562.4
## Residual Sum of Squares: 921.65
## R-Squared:      0.89236
## Adj. R-Squared: 0.89209
## F-statistic: 3241.53 on 8 and 3128 DF, p-value: < 2.22e-16

The model does not change in any substantial way.

Cluster-Robust Standard Errors:

We will now do perfrom a test for the Estimated Coefficients, with Robust Covariance Matrix Estimators clustering the observations by Country groups.

## 
## t test of coefficients:
## 
##                                   Estimate Std. Error t value  Pr(>|t|)
## (Intercept)                     -5.3564160  0.6503777 -8.2359 2.589e-16
## LnGDPperCapitaPPPlag             0.7260120  0.0689114 10.5354 < 2.2e-16
## RenewableEnergyConsumptionlag   -0.0204678  0.0024224 -8.4496 < 2.2e-16
## RegionAsia                       0.2257303  0.1350392  1.6716   0.09471
## RegionEurope                     0.4249246  0.1399230  3.0368   0.00241
## RegionLatin America & Caribbean  0.1612189  0.1081574  1.4906   0.13617
## RegionMiddle East                0.1593379  0.1566088  1.0174   0.30903
## RegionNorth America              0.9241806  0.1875072  4.9288 8.705e-07
## RegionOceania                    0.1737174  0.1938300  0.8962   0.37020
##                                    
## (Intercept)                     ***
## LnGDPperCapitaPPPlag            ***
## RenewableEnergyConsumptionlag   ***
## RegionAsia                      .  
## RegionEurope                    ** 
## RegionLatin America & Caribbean    
## RegionMiddle East                  
## RegionNorth America             ***
## RegionOceania                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We now observe that the Region variable drastically looses it significance taking into account the robust estimators clustered by country. Therefore we should move on to the between and within models now.

3. Between Model

## Oneway (individual) effect Between Model
## 
## Call:
## plm(formula = LnCO2EmissionsPerCapita ~ LnGDPperCapitaPPP + RenewableEnergyConsumption + 
##     Region, data = pco2new2, model = "between")
## 
## Unbalanced Panel: n = 153, T = 8-23, N = 3137
## Observations used in estimation: 153
## 
## Residuals:
##      Min.   1st Qu.    Median   3rd Qu.      Max. 
## -2.047599 -0.275916  0.007135  0.276662  1.347396 
## 
## Coefficients:
##                                   Estimate Std. Error  t-value  Pr(>|t|)
## (Intercept)                     -6.4555964  0.5660989 -11.4037 < 2.2e-16
## LnGDPperCapitaPPP                0.8427768  0.0589487  14.2968 < 2.2e-16
## RenewableEnergyConsumption      -0.0182423  0.0023385  -7.8010  1.14e-12
## RegionAsia                       0.0838992  0.1416145   0.5924   0.55448
## RegionEurope                     0.3108636  0.1634860   1.9015   0.05924
## RegionLatin America & Caribbean  0.1386481  0.1519585   0.9124   0.36308
## RegionMiddle East                0.1025859  0.1966080   0.5218   0.60263
## RegionNorth America              0.7459526  0.4045680   1.8438   0.06726
## RegionOceania                    0.1916401  0.2225150   0.8612   0.39053
##                                    
## (Intercept)                     ***
## LnGDPperCapitaPPP               ***
## RenewableEnergyConsumption      ***
## RegionAsia                         
## RegionEurope                    .  
## RegionLatin America & Caribbean    
## RegionMiddle East                  
## RegionNorth America             .  
## RegionOceania                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    430.85
## Residual Sum of Squares: 40.365
## R-Squared:      0.90631
## Adj. R-Squared: 0.90111
## F-statistic: 174.132 on 8 and 144 DF, p-value: < 2.22e-16

Our model with the Between estimator drops all the significance for the Region variables and even increases the Adjunted R-Squared to 0.90111.

## Oneway (individual) effect Between Model
## 
## Call:
## plm(formula = LnCO2EmissionsPerCapita ~ LnGDPperCapitaPPP + RenewableEnergyConsumption, 
##     data = pco2new2, model = "between")
## 
## Unbalanced Panel: n = 153, T = 8-23, N = 3137
## Observations used in estimation: 153
## 
## Residuals:
##       Min.    1st Qu.     Median    3rd Qu.       Max. 
## -2.1058444 -0.2724307 -0.0025018  0.3126906  1.2939211 
## 
## Coefficients:
##                              Estimate Std. Error  t-value  Pr(>|t|)    
## (Intercept)                -6.8014608  0.5389848 -12.6190 < 2.2e-16 ***
## LnGDPperCapitaPPP           0.8978934  0.0540171  16.6224 < 2.2e-16 ***
## RenewableEnergyConsumption -0.0185213  0.0021463  -8.6294 8.271e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    430.85
## Residual Sum of Squares: 42.156
## R-Squared:      0.90216
## Adj. R-Squared: 0.90085
## F-statistic: 691.536 on 2 and 150 DF, p-value: < 2.22e-16

We now are left with the folowing model:

Carbon Emissions = 0.8978934(GDP per capita) -0.0185213(Renewable Energy)

4. Lagged Between Model

## Oneway (individual) effect Between Model
## 
## Call:
## plm(formula = LnCO2EmissionsPerCapita ~ LnGDPperCapitaPPPlag + 
##     RenewableEnergyConsumptionlag, data = pco2new2, model = "between")
## 
## Unbalanced Panel: n = 153, T = 8-23, N = 3137
## Observations used in estimation: 153
## 
## Residuals:
##       Min.    1st Qu.     Median    3rd Qu.       Max. 
## -2.0549496 -0.2775993 -0.0069748  0.3094586  1.2496258 
## 
## Coefficients:
##                                 Estimate Std. Error  t-value  Pr(>|t|)    
## (Intercept)                   -6.6581912  0.5361531 -12.4185 < 2.2e-16 ***
## LnGDPperCapitaPPPlag           0.8875676  0.0539492  16.4519 < 2.2e-16 ***
## RenewableEnergyConsumptionlag -0.0188259  0.0021292  -8.8419 2.381e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    430.85
## Residual Sum of Squares: 41.703
## R-Squared:      0.90321
## Adj. R-Squared: 0.90192
## F-statistic: 699.863 on 2 and 150 DF, p-value: < 2.22e-16

The lagged model again mantains the stability of the parameters and the significance levels.

We will proceed with the Fixed Effects Model via LSDV Estimation.

5.Fixed Effects Model (LSDV)

## Oneway (individual) effect Within Model
## 
## Call:
## plm(formula = LnCO2EmissionsPerCapita ~ LnGDPperCapitaPPP + RenewableEnergyConsumption + 
##     factor(Country) - 1, data = pco2new2)
## 
## Unbalanced Panel: n = 153, T = 8-23, N = 3137
## 
## Residuals:
##       Min.    1st Qu.     Median    3rd Qu.       Max. 
## -1.2632499 -0.0835300  0.0021621  0.0808663  1.0139040 
## 
## Coefficients:
##                               Estimate  Std. Error t-value  Pr(>|t|)    
## LnGDPperCapitaPPP           0.24561144  0.00996608  24.645 < 2.2e-16 ***
## RenewableEnergyConsumption -0.02586316  0.00059612 -43.386 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    196.79
## Residual Sum of Squares: 85.452
## R-Squared:      0.56576
## Adj. R-Squared: 0.54334
## F-statistic: 1942.62 on 2 and 2982 DF, p-value: < 2.22e-16

The Adjusted R-Squared dropped drastically to 0.54334. This means that the within country variation helps explains a large portion of the carbon emission, therefore there exists heterogeneous country effects that impact strongly on the carbon emissions per country.

Our parameters are now the following:

Carbon Emissions = 0.24561144(GDP per capita) -0.02586316(Renewable Energy)

Our GDP per capita coefficient decreased and our Renewable Energy increased slightly.

6. Lagged Fixed Effects Model (LSDV)

## Oneway (individual) effect Within Model
## 
## Call:
## plm(formula = LnCO2EmissionsPerCapita ~ LnGDPperCapitaPPPlag + 
##     RenewableEnergyConsumptionlag + factor(Country) - 1, data = pco2new2)
## 
## Unbalanced Panel: n = 153, T = 8-23, N = 3137
## 
## Residuals:
##       Min.    1st Qu.     Median    3rd Qu.       Max. 
## -1.3676392 -0.0885588  0.0065851  0.0896644  0.8935486 
## 
## Coefficients:
##                                  Estimate  Std. Error t-value  Pr(>|t|)
## LnGDPperCapitaPPPlag           0.25806852  0.01042921  24.745 < 2.2e-16
## RenewableEnergyConsumptionlag -0.02348301  0.00061315 -38.299 < 2.2e-16
##                                  
## LnGDPperCapitaPPPlag          ***
## RenewableEnergyConsumptionlag ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    196.79
## Residual Sum of Squares: 92.237
## R-Squared:      0.53128
## Adj. R-Squared: 0.50708
## F-statistic: 1690.03 on 2 and 2982 DF, p-value: < 2.22e-16

In our lagged model the Adjusted R-Squared decreased to 0.50708 from the 0.54334 of the past model. The coefficients are still stable.

7. Fixed Effects Model (Within)

Due to the structure of our data and the model specification we do not expected to see any changes from the Fixed Effects Model via LSDV vs Within specification.

## Oneway (individual) effect Within Model
## 
## Call:
## plm(formula = LnCO2EmissionsPerCapita ~ LnGDPperCapitaPPP + RenewableEnergyConsumption, 
##     data = pco2new2, model = "within")
## 
## Unbalanced Panel: n = 153, T = 8-23, N = 3137
## 
## Residuals:
##       Min.    1st Qu.     Median    3rd Qu.       Max. 
## -1.2632499 -0.0835300  0.0021621  0.0808663  1.0139040 
## 
## Coefficients:
##                               Estimate  Std. Error t-value  Pr(>|t|)    
## LnGDPperCapitaPPP           0.24561144  0.00996608  24.645 < 2.2e-16 ***
## RenewableEnergyConsumption -0.02586316  0.00059612 -43.386 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    196.79
## Residual Sum of Squares: 85.452
## R-Squared:      0.56576
## Adj. R-Squared: 0.54334
## F-statistic: 1942.62 on 2 and 2982 DF, p-value: < 2.22e-16

8. Lagged Fixed Effects Model (Within) Oneway via Individual

## Oneway (individual) effect Within Model
## 
## Call:
## plm(formula = LnCO2EmissionsPerCapita ~ LnGDPperCapitaPPPlag + 
##     RenewableEnergyConsumptionlag, data = pco2new2, model = "within")
## 
## Unbalanced Panel: n = 153, T = 8-23, N = 3137
## 
## Residuals:
##       Min.    1st Qu.     Median    3rd Qu.       Max. 
## -1.3676392 -0.0885588  0.0065851  0.0896644  0.8935486 
## 
## Coefficients:
##                                  Estimate  Std. Error t-value  Pr(>|t|)
## LnGDPperCapitaPPPlag           0.25806852  0.01042921  24.745 < 2.2e-16
## RenewableEnergyConsumptionlag -0.02348301  0.00061315 -38.299 < 2.2e-16
##                                  
## LnGDPperCapitaPPPlag          ***
## RenewableEnergyConsumptionlag ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    196.79
## Residual Sum of Squares: 92.237
## R-Squared:      0.53128
## Adj. R-Squared: 0.50708
## F-statistic: 1690.03 on 2 and 2982 DF, p-value: < 2.22e-16

Both Within Models are very stable in relation to the LSDV specification. Now we can explore the intercept value of each individual country cluster.

##              Afghanistan                  Albania                  Algeria 
##              -3.05970809              -1.11720783              -1.20539383 
##                   Angola                Argentina                  Armenia 
##              -0.81802956              -0.80781798              -1.63602314 
##                Australia                  Austria               Azerbaijan 
##               0.36215181               0.02704896              -0.78145514 
##             Bahamas, The                  Bahrain               Bangladesh 
##              -0.89542384               0.47803905              -1.92130117 
##                 Barbados                  Belarus                  Belgium 
##              -0.67846654              -0.37678968              -0.33377912 
##                   Belize                    Benin                   Bhutan 
##              -0.96529952              -1.35733163              -0.36700435 
##                  Bolivia                 Botswana        Brunei Darussalam 
##              -1.19509018              -0.71696138              -0.04228724 
##             Burkina Faso                  Burundi               Cabo Verde 
##              -2.10635903              -2.87802412              -1.81968584 
##                 Cambodia                 Cameroon                   Canada 
##              -1.57347238              -1.37636081               0.63760840 
## Central African Republic                     Chad                    Chile 
##              -2.57444389              -3.16238002              -0.34062891 
##                 Colombia                  Comoros         Congo, Dem. Rep. 
##              -1.15152277              -2.46758539              -2.90574288 
##              Congo, Rep.               Costa Rica            Cote d'Ivoire 
##              -1.41021965              -1.04928918              -1.21652672 
##                  Croatia                   Cyprus           Czech Republic 
##              -0.27780774              -0.58026988               0.03794640 
##                  Denmark       Dominican Republic                  Ecuador 
##              -0.10213361              -1.07910719              -1.16079730 
##         Egypt, Arab Rep.              El Salvador        Equatorial Guinea 
##              -1.40096759              -1.12656792              -1.09624447 
##                  Eritrea                  Estonia                 Eswatini 
##              -1.49250057               0.53739528              -0.94207522 
##                 Ethiopia                  Finland                   France 
##              -2.13561164               0.45431529              -0.64315999 
##                    Gabon                  Germany                    Ghana 
##               0.61901547              -0.22916078              -1.44122336 
##                   Greece                Guatemala                   Guinea 
##              -0.29983853              -0.93766438              -1.45725552 
##            Guinea-Bissau                   Guyana                    Haiti 
##              -1.64577641              -0.51576114              -1.75799565 
##     Hong Kong SAR, China                  Hungary                  Iceland 
##              -0.88349973              -0.61324683               0.76164048 
##                    India                Indonesia       Iran, Islamic Rep. 
##              -0.78813742              -0.77557638              -0.57666945 
##                     Iraq                   Israel                    Italy 
##              -1.03594023              -0.25748915              -0.47312125 
##                  Jamaica                    Japan                   Jordan 
##              -0.78549136              -0.30469548              -1.08046853 
##               Kazakhstan                    Kenya              Korea, Rep. 
##               0.08989056              -1.38436340              -0.25971389 
##                   Kuwait          Kyrgyz Republic                  Lao PDR 
##               0.47297188              -1.10762188              -1.98450386 
##                   Latvia                  Lebanon                  Lesotho 
##              -0.37998191              -0.87873734              -0.63827694 
##                  Liberia               Luxembourg         Macao SAR, China 
##              -1.32072790               0.24861330              -1.45298599 
##               Madagascar                   Malawi                 Malaysia 
##              -2.22362213              -2.37017977              -0.49601016 
##                 Maldives                     Mali                    Malta 
##              -1.60254836              -2.77364541              -0.76954182 
##               Mauritania                Mauritius                   Mexico 
##              -1.72473111              -0.95293454              -0.74335949 
##                  Moldova                 Mongolia               Montenegro 
##              -1.67015760              -0.60321368              -0.03813661 
##                  Morocco                  Myanmar                  Namibia 
##              -1.44938702              -1.52739947              -1.35363519 
##                    Nepal              Netherlands              New Zealand 
##              -1.74867767              -0.23412234               0.07500689 
##                Nicaragua                    Niger                  Nigeria 
##              -1.02503899              -2.43546556              -0.56320763 
##          North Macedonia                     Oman                 Pakistan 
##              -0.27635043              -0.31274193              -1.13024930 
##                   Panama         Papua New Guinea                 Paraguay 
##              -0.98124723              -1.07820341              -0.97546023 
##                     Peru              Philippines                   Poland 
##              -1.25066971              -1.43662872              -0.14308108 
##                    Qatar                  Romania       Russian Federation 
##               0.89546926              -0.45642404               0.13999301 
##                   Rwanda                    Samoa    Sao Tome and Principe 
##              -2.37181572              -1.23894649              -1.53539487 
##             Saudi Arabia                  Senegal                   Serbia 
##               0.02401119              -1.63502883              -0.14023957 
##             Sierra Leone                Singapore          Slovak Republic 
##              -1.88400129              -0.76079590              -0.51341369 
##          Solomon Islands                    Spain                Sri Lanka 
##              -1.40593669              -0.48095373              -1.32120051 
##                St. Lucia                   Sweden              Switzerland 
##              -1.53919981              -0.02760581              -0.58699690 
##               Tajikistan                 Tanzania              Timor-Leste 
##              -1.46904120              -1.82759937              -2.89463634 
##                     Togo                    Tonga      Trinidad and Tobago 
##              -1.27329412              -2.05656805               0.85997305 
##                  Tunisia                   Turkey             Turkmenistan 
##              -1.18289221              -0.72886585               0.01752315 
##     United Arab Emirates           United Kingdom            United States 
##               0.29546531              -0.44501640               0.34627387 
##                  Uruguay               Uzbekistan                  Vanuatu 
##              -0.83795487              -0.48073732              -1.90807267 
##            Venezuela, RB                  Vietnam       West Bank and Gaza 
##              -0.27893980              -0.92373420              -2.40304998 
##              Yemen, Rep.                   Zambia                 Zimbabwe 
##              -2.21780486              -1.43687205              -0.27694761

F test for individual (country) effects:

We will run an F test to compare the individual and/or time effects between the Within and the Pooled Model. Ho:No fixed effects.

## 
##  F test for individual effects
## 
## data:  LnCO2EmissionsPerCapita ~ LnGDPperCapitaPPPlag + RenewableEnergyConsumptionlag
## F = 183.66, df1 = 146, df2 = 2982, p-value < 2.2e-16
## alternative hypothesis: significant effects

The p-value of our test suggests that we should keep the Within model instead of the Pooled OlS.

Cluster-Robust Standard Errors:

## 
## t test of coefficients:
## 
##                                 Estimate Std. Error  t value  Pr(>|t|)    
## LnGDPperCapitaPPPlag           0.2580685  0.0312370   8.2616 < 2.2e-16 ***
## RenewableEnergyConsumptionlag -0.0234830  0.0022802 -10.2987 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Our two independent variables have Cluster-Robust Standard Errors.

9. Lagged Random Effects Model

## Oneway (individual) effect Random Effect Model 
##    (Swamy-Arora's transformation)
## 
## Call:
## plm(formula = LnCO2EmissionsPerCapita ~ LnGDPperCapitaPPPlag + 
##     RenewableEnergyConsumptionlag, data = pco2new2, model = "random")
## 
## Unbalanced Panel: n = 153, T = 8-23, N = 3137
## 
## Effects:
##                   var std.dev share
## idiosyncratic 0.03093 0.17587 0.108
## individual    0.25547 0.50544 0.892
## theta:
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.8779  0.9276  0.9276  0.9239  0.9276  0.9276 
## 
## Residuals:
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -1.36983 -0.09438  0.01320  0.00036  0.10898  0.89857 
## 
## Coefficients:
##                                  Estimate  Std. Error z-value  Pr(>|z|)
## (Intercept)                   -1.16335394  0.11267508 -10.325 < 2.2e-16
## LnGDPperCapitaPPPlag           0.28748575  0.01061445  27.084 < 2.2e-16
## RenewableEnergyConsumptionlag -0.02500396  0.00059679 -41.898 < 2.2e-16
##                                  
## (Intercept)                   ***
## LnGDPperCapitaPPPlag          ***
## RenewableEnergyConsumptionlag ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    248.59
## Residual Sum of Squares: 104.72
## R-Squared:      0.57875
## Adj. R-Squared: 0.57849
## Chisq: 4305.82 on 2 DF, p-value: < 2.22e-16

The Hausman test:

The Hausman test is calculated by estimating the Random Effects and Fixed Effects models and then comparing the estimates:

## 
##  Hausman Test
## 
## data:  LnCO2EmissionsPerCapita ~ LnGDPperCapitaPPPlag + RenewableEnergyConsumptionlag
## chisq = 88.283, df = 2, p-value < 2.2e-16
## alternative hypothesis: one model is inconsistent

The p-value of the Hausman Tests tells us that we should prefer the Fixed Effects Model over the Random Effects.

10. Lagged Fixed Effects Model (Within) Oneway via Time

We will now test the within variation of time, instead of the clusters of countries.

## Oneway (individual) effect Within Model
## 
## Call:
## plm(formula = LnCO2EmissionsPerCapita ~ LnGDPperCapitaPPPlag + 
##     RenewableEnergyConsumptionlag + factor(Year), data = pco2new2, 
##     model = "within")
## 
## Unbalanced Panel: n = 153, T = 8-23, N = 3137
## 
## Residuals:
##       Min.    1st Qu.     Median    3rd Qu.       Max. 
## -1.3711920 -0.0841838  0.0053052  0.0891647  0.8717308 
## 
## Coefficients:
##                                  Estimate  Std. Error  t-value  Pr(>|t|)
## LnGDPperCapitaPPPlag           0.39291698  0.01974674  19.8978 < 2.2e-16
## RenewableEnergyConsumptionlag -0.02248569  0.00062251 -36.1211 < 2.2e-16
## factor(Year)1993              -0.00955720  0.02356030  -0.4056 0.6850300
## factor(Year)1994              -0.02097023  0.02337531  -0.8971 0.3697331
## factor(Year)1995               0.01105553  0.02339296   0.4726 0.6365330
## factor(Year)1996               0.00964081  0.02317007   0.4161 0.6773752
## factor(Year)1997               0.00486526  0.02296836   0.2118 0.8322586
## factor(Year)1998              -0.03834542  0.02304832  -1.6637 0.0962788
## factor(Year)1999              -0.04681161  0.02312269  -2.0245 0.0430096
## factor(Year)2000              -0.05312725  0.02322053  -2.2879 0.0222110
## factor(Year)2001              -0.05948520  0.02312269  -2.5726 0.0101424
## factor(Year)2002              -0.07709066  0.02320681  -3.3219 0.0009049
## factor(Year)2003              -0.04963509  0.02336611  -2.1242 0.0337332
## factor(Year)2004              -0.05179591  0.02346420  -2.2074 0.0273595
## factor(Year)2005              -0.06423246  0.02393234  -2.6839 0.0073170
## factor(Year)2006              -0.07422110  0.02418737  -3.0686 0.0021702
## factor(Year)2007              -0.09608363  0.02484313  -3.8676 0.0001123
## factor(Year)2008              -0.12874976  0.02540735  -5.0674 4.280e-07
## factor(Year)2009              -0.16224838  0.02590058  -6.2643 4.289e-10
## factor(Year)2010              -0.12471453  0.02589277  -4.8166 1.534e-06
## factor(Year)2011              -0.11831452  0.02632914  -4.4937 7.269e-06
## factor(Year)2012              -0.14002980  0.02698181  -5.1898 2.248e-07
## factor(Year)2013              -0.14674906  0.02749339  -5.3376 1.013e-07
## factor(Year)2014              -0.14408504  0.02797952  -5.1497 2.780e-07
##                                  
## LnGDPperCapitaPPPlag          ***
## RenewableEnergyConsumptionlag ***
## factor(Year)1993                 
## factor(Year)1994                 
## factor(Year)1995                 
## factor(Year)1996                 
## factor(Year)1997                 
## factor(Year)1998              .  
## factor(Year)1999              *  
## factor(Year)2000              *  
## factor(Year)2001              *  
## factor(Year)2002              ***
## factor(Year)2003              *  
## factor(Year)2004              *  
## factor(Year)2005              ** 
## factor(Year)2006              ** 
## factor(Year)2007              ***
## factor(Year)2008              ***
## factor(Year)2009              ***
## factor(Year)2010              ***
## factor(Year)2011              ***
## factor(Year)2012              ***
## factor(Year)2013              ***
## factor(Year)2014              ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    196.79
## Residual Sum of Squares: 89.475
## R-Squared:      0.54532
## Adj. R-Squared: 0.51828
## F-statistic: 147.919 on 24 and 2960 DF, p-value: < 2.22e-16

F Test for Time effects:

We will now do the F test to compare the time effects Within Model and the Pooled Model. Ho:No fixed time effects.

## 
##  F test for individual effects
## 
## data:  LnCO2EmissionsPerCapita ~ LnGDPperCapitaPPPlag + RenewableEnergyConsumptionlag +  ...
## F = 163.87, df1 = 168, df2 = 2960, p-value < 2.2e-16
## alternative hypothesis: significant effects

The p-value of our test suggests that we should keep the Within model instead of the Pooled OlS.

Cluster-Robust Standard Errors:

## 
## t test of coefficients:
## 
##                                 Estimate Std. Error t value  Pr(>|t|)    
## LnGDPperCapitaPPPlag           0.3929170  0.0283751  13.847 < 2.2e-16 ***
## RenewableEnergyConsumptionlag -0.0224857  0.0015038 -14.952 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Following the suggestion of the Global F Tests, we will now model a Twoway Fixed Effects Model.

11. Lagged Twoway Fixed Effects Model

## Twoways effects Within Model
## 
## Call:
## plm(formula = LnCO2EmissionsPerCapita ~ LnGDPperCapitaPPPlag + 
##     RenewableEnergyConsumptionlag, data = pco2new2, effect = "twoways", 
##     model = "within")
## 
## Unbalanced Panel: n = 153, T = 8-23, N = 3137
## 
## Residuals:
##       Min.    1st Qu.     Median    3rd Qu.       Max. 
## -1.3711920 -0.0841838  0.0053052  0.0891647  0.8717308 
## 
## Coefficients:
##                                  Estimate  Std. Error t-value  Pr(>|t|)
## LnGDPperCapitaPPPlag           0.39291698  0.01974674  19.898 < 2.2e-16
## RenewableEnergyConsumptionlag -0.02248569  0.00062251 -36.121 < 2.2e-16
##                                  
## LnGDPperCapitaPPPlag          ***
## RenewableEnergyConsumptionlag ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    169.76
## Residual Sum of Squares: 89.475
## R-Squared:      0.47293
## Adj. R-Squared: 0.44159
## F-statistic: 1327.97 on 2 and 2960 DF, p-value: < 2.22e-16

Adjusted R-Squared is 0.44159 and the p-value of the model is < 2.22e-16.

Heterocedasticity consistent coefficients (Arellano):

## 
## t test of coefficients:
## 
##                                 Estimate Std. Error  t value  Pr(>|t|)    
## LnGDPperCapitaPPPlag           0.3929170  0.0592267   6.6341 3.867e-11 ***
## RenewableEnergyConsumptionlag -0.0224857  0.0020558 -10.9378 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Carbon Emissions = 0.3929170(LnGDPperCapitaPPPlag) -0.0224857(RenewableEnergyConsumptionlag)

F Test for Individual Effects:

We will now confirm that the Twoway Model is prefered over the Pooled OLS Model. Ho:No fixed effects.

## 
##  F test for twoways effects
## 
## data:  LnCO2EmissionsPerCapita ~ LnGDPperCapitaPPPlag + RenewableEnergyConsumptionlag
## F = 163.87, df1 = 168, df2 = 2960, p-value < 2.2e-16
## alternative hypothesis: significant effects

We can confirm that the Twoway model is prefered over the Pooled OLS.

Confidence Intervals for our independent variables:

##                                     2.5 %     97.5 %
## LnGDPperCapitaPPPlag           0.35421409  0.4316199
## RenewableEnergyConsumptionlag -0.02370579 -0.0212656

Confidence Intervals:

LnGDPperCapitaPPPlag (0.35421409, 0.4316199) RenewableEnergyConsumptionlag (-0.02370579, -0.0212656)

Test cross-sectional dependence and contemporaneous correlation:

Breusch-Pagan LM Test of Independence:

## 
##  Breusch-Pagan LM test for cross-sectional dependence in panels
## 
## data:  LnCO2EmissionsPerCapita ~ LnGDPperCapitaPPPlag + RenewableEnergyConsumptionlag
## chisq = 43023, df = 11628, p-value < 2.2e-16
## alternative hypothesis: cross-sectional dependence

Pasaran CD test:

## 
##  Pesaran CD test for cross-sectional dependence in panels
## 
## data:  LnCO2EmissionsPerCapita ~ LnGDPperCapitaPPPlag + RenewableEnergyConsumptionlag
## z = 1.5486, p-value = 0.1215
## alternative hypothesis: cross-sectional dependence

The Breush-Pagan LM Test tells us there is cross-sectional dependence, but the Pasaran CD Test tells us the opposite.

Monte Carlo experiments show that the standard Breusch–Pagan LM test performs badly for N > T panels, whereas Pesaran’s CD test performs well even for small T and large N. Therefore in our case of N > T, we will follow the results from the Pesaran CD test and assume we don´t have cross-sectional dependence.

Breusch-Godfrey/Wooldridge test for serial correlation:

## 
##  Breusch-Godfrey/Wooldridge test for serial correlation in panel
##  models
## 
## data:  LnCO2EmissionsPerCapita ~ LnGDPperCapitaPPPlag + RenewableEnergyConsumptionlag
## chisq = 1136.1, df = 8, p-value < 2.2e-16
## alternative hypothesis: serial correlation in idiosyncratic errors

The test suggests that we have serial correlation in idiosyncratic errors.

Breusch-Pagan test for homoskedasticity:

## 
##  studentized Breusch-Pagan test
## 
## data:  twofixedlag
## BP = 14.706, df = 2, p-value = 0.0006408

The test suggests that we have hetersokedaticity.

Normal probability plot of the residuals to test normality of residuals:

We can observe that overall the residuals are are normal, but there are some strong outliers in both tails.

Plot of predicted values and their residuals:

This function plots observed and predicted values of the response of linear (mixed) models for each coefficient and highlights the observed values according to their distance (residuals) to the predicted values. This allows to investigate how well actual and predicted values of the outcome fit across the predictor variables.

The actual (observed) values have a coloured fill, while the predicted values have a solid outline without filling.

Linear Model Assumptions: Homoscedasticity

In here we can observe that the homoscedasticity is not fully complied, becuase there are some outliers violating the constant variance assumption.

Checking for Influential Observations

Equatorial Guinea is one outlier.

Part 3 Conclusions:

General Conclusion:

In conclusion, we find that the best fit is a Panel, Linear, Lagged, Twoway Fixed Effects Model. We find that lag of Ln GDP per Capita PPP and lag of Renewable Energy Consumption best predict Ln CO2 Emissions Per Capita along side Country and Time Fixed Effects.

Final Statistical Findings:

Our final model had an Adjusted R-Squared of 0.44159 and the p-value of the model is < 2.22e-16.

Final Model:

Ln CO2 Emissions per Capita = 0.3929170(LnGDPperCapitaPPPlag) -0.0224857(RenewableEnergyConsumptionlag) + Country Effects + Time Effects + Error

Confidence Intervals for independent variables: LnGDPperCapitaPPPlag (0.35421409, 0.4316199) RenewableEnergyConsumptionlag (-0.02370579, -0.0212656)

Relevance of Findings:

We find that one single variable, Ln GDP per Capita PPP lag, has large predictive power on Ln CO2 Emissions per Capita. Additionally, we find that many other variables such as Income Group, Region, Natural Resources Rents, Population variables, and Employment Economic Sector do not appear to have any statistical significant importance. On the other hand, Renewable Energy Consumption, Country effects, and Time effects appear to have more explanatory power, than other variables commonly refer to in the literature.

Areas of Improvement for Future Research:

The descriptive analysis overall was done without much problem, but the hardest parts were consolidating a database that was going to be useful for the models, and the last part of model diagnostics, when you find that there are strong outliers that are affecting your results, but removing them would be counterintuitive for the purpose of the research.

For future research it would be better to have a balanced panel, with a longer time span, possibly with a more wider set of diverse variables.

One subtantial problem, was the gap between a large number of country clusters, with a very high variance in the time dimension, which prevented the possibility of doing a random sample.

Most of the variables were economic in nature, which for future research it would be interesting to add institutional and other variables.

A longer time span would also help compare different time periods and different frequencies of lag effects.

The problem that our R-Squared dropped dramatically due to the Country and Time effects is that we cannot identify clearly what is behind those effects and this prevents us from being able to generalize any other findings.

Sources:

• World Bank DataBank: https://databank.worldbank.org/home.aspx • https://www.bgs.ac.uk/discoveringGeology/climateChange/CCS/man-madeEffect.html • https://www.epa.gov/ghgemissions/sources-greenhouse-gas-emissions • https://www.epa.gov/climate-indicators/climate-change-indicators-global-greenhouse-gas-emissions • https://www.epa.gov/ghgemissions/overview-greenhouse-gases • https://ourworldindata.org/co2-and-other-greenhouse-gas-emissions • https://www.britannica.com/science/greenhouse-gas • https://www.livescience.com/37821-greenhouse-gases.html

Linear Panel Models to Predict CO2 Emissions by Country

Jaime F. Alliende Acuña & Amy Diakhoumpa

12/10/2019

Part 1: Introduction

Background information

Data Collection

Statistical Details

Library & Data

Reshaping Data into Panel Data

Data Cleaning

Part 2: Data Analysis

Exploratory Data Analysis

Descriptive Statistics

Data Visualizations

Histograms and Boxplots for Main Variables and their Log transformations:

Evaluating The Normal Distribution of CO2 Emissions (KT) and its Log:

Evaluating The Normal Distribution of CO2 Emissions per Capita and its Ln:

Evaluating The Normal Distribution of GDP per Capita PPP and its Ln:

Shapiro-Wilk Normality Tests for the Ln Transformed Variables:

Boxplots comparing the categrical variables of Region and Income Group:

Scatterplots comparing (Ln)CO2 Emissions per Capita by (Ln)GDP per Capita PPP:

Variables Correlations:

Correlations charts between Independnet continuous variables with significance levels:

Correlations chart between Ln CO2 Emissions per Capita and other Independnet continuous variables with significance levels:

Correlations between Ln CO2 Emissions per Capita and other Independnet continuous variables:

Correlations between Ln CO2 Emissions per Capita and other Independent continuous lagged variables:

Heterogeneity across the Years:

Central Questions and Hypothesis to test:

Models

1. Pooled OLS Model:

2. Lagged Pooled OLS Model

Cluster-Robust Standard Errors:

3. Between Model

4. Lagged Between Model

5.Fixed Effects Model (LSDV)

6. Lagged Fixed Effects Model (LSDV)

7. Fixed Effects Model (Within)

8. Lagged Fixed Effects Model (Within) Oneway via Individual

F test for individual (country) effects:

Cluster-Robust Standard Errors:

9. Lagged Random Effects Model

The Hausman test:

10. Lagged Fixed Effects Model (Within) Oneway via Time

F Test for Time effects:

Cluster-Robust Standard Errors:

11. Lagged Twoway Fixed Effects Model

Heterocedasticity consistent coefficients (Arellano):

F Test for Individual Effects:

Confidence Intervals for our independent variables:

Test cross-sectional dependence and contemporaneous correlation:

Breusch-Pagan LM Test of Independence:

Pasaran CD test:

Breusch-Godfrey/Wooldridge test for serial correlation:

Breusch-Pagan test for homoskedasticity:

Normal probability plot of the residuals to test normality of residuals:

Plot of predicted values and their residuals:

Linear Model Assumptions: Homoscedasticity

Checking for Influential Observations

Part 3 Conclusions:

General Conclusion:

Final Statistical Findings:

Relevance of Findings:

Areas of Improvement for Future Research:

Sources: