The Earth naturally absorbs and releases CO2. There are four major carbon storage reservoirs : the atmosphere, the biosphere, the ocean and the subsoil. It is impossible to ignore that people on our planet are inextricably connected through the climate they depend on and the air they breathe. For this reason, in the 21st century, there has been a growing public, political and media awareness of the consequences of global warming around the world. The most emitting sectors are industry, energy production and transport. Household consumption also significantly contributes to CO2 emissions. CO2 is also emitted by the natural activity of our planet: volcanic eruptions, respiration of plants and animals.
In developing countries, most farmers spend more on agricultural energy consumption than on fertilizers, seeds or agrochemicals. On the other hand, in the industrialized countries, the number of machines is decreasing because of the concentration of farms, the increased technical performance of machines, their higher power, the development of the common use of tractors and also because of the increased specialization of farms.
CO2 emissions from human activities are increasingly changing the composition of the atmosphere. The growth in the transport of goods and people, in particular air traffic, as well as the heating of buildings, are leading to a high consumption of fossil fuels. CO2 emissions from the combustion of fuels such as gasoline, diesel or kerosene and fuels such as fuel oil or natural gas enhance the natural greenhouse effect and lead to global warming. Transportation activities are an important source of emissions from the combustion of fossil fuels. Land change and agriculture also contribute to increasing concentrations of CO2 in the atmosphere. Energy consumption in agriculture varies greatly from one country to another, as it is related to the size of the country’s sector and the types of production. The most important countries in terms of agricultural area are also those that consume the most energy in the world. The increasing mechanization of agriculture means a sharp increase in energy consumption.
Industry and, to a lesser extent, the waste management sector are also emitters of greenhouse gases. The consumption of imported goods produces significant foreign emissions that also contribute to global warming. Financing and investment decisions in financial markets also have an impact on the environment and the climate. For example, the investments made today, for example in energy supply, determine future CO2 emissions.
It is disclosed that the richest 10 percent of people produce half of the planet’s individual-consumption-based fossil fuel emissions, while the poorest 50 percent contribute only 10 percent. Due to unprecedented economic growth, much higher than that of developed countries, emerging economies such as China and India have experienced a significant increase in consumption over the past decade. China is a particular country. Although they pollute the most, they are also the largest investors in renewable energy. With their energy system largely geared to coal, developed countries are seeing their emissions grow at a considerable rate. Emerging countries, which are mostly densely populated, are growing in population and urbanization is accelerating as their energy needs increase.
The data was collected from the World Bank DataBank: https://databank.worldbank.org/home.aspx
Most of the data are collected from different international organizations and most are observational and some are estimates. Some unavoidable biases are that most of the data collected by international organizations are given directly by each country, which tend to have some variations in how each government will measure and calculate their own national accounts and other different variables, producing some unavoidable measurement discrepancies between countries.
In our specific case, the variables Region and Income Group were recoded, but were still strongly based on the World Bank classifications. Additionally, Ln transformations, and lag variables were created.
o Statistical Methods:
• Summary statistics, boxplots, histograms, scatter plots, normal probability plots, correlations, multiple linear panel models, statistical tests, and model diagnostics.
The models are multiple linear dynamic panel regressions with the following different specifications:
o Dependent Variables:
CO2EmissionsKT: CO2 emissions in total kilotons(kt), 1960-2014
CO2EmissionsPerCapita: CO2 emissions in metric tons per capita, 1960-2014
o Independent variables:
Log GDP per capita, PPP: Natural log of GDP per capita by Puchasing Power Parity (PPP), constant $ prices 2011 international (continuous), 1990-2018
GDP per capita growth: Percent annual change of rate of GDP per capita, 1961-2018
Total population: Total population measured in thousands, 1960-2018
Population growth: Percent annual change of rate of total population, 1961-2018
Population density: Total number of people per sq. km of land area, 1960-2018
Labor Force: Total number of labor force size, 1990-2018
Employment in Services: Portion of labor force employed in the services sector, 1991-2018
Employment in Agriculture: Portion of labor force employed in the agricultural sector, 1991-2018
Employment in Industry: Portion of labor force employed in the industrial sector, 1991-2018
Total Natural Resources Rents: Portion of GDP from the total earnings from natural resources rents from the sum of oil rents, natural gas rents, coal rents (hard and soft), mineral rents, and forest rents, 1970-2017
Mineral Rents: Portion of GDP from the total earnings from minerals including tin, gold, lead, zinc, iron, copper, nickel, silver, bauxite, and phosphate, 1970-2017
Oil Rents: Portion of GDP from the total earnings from crude oil, 1970-2017
Natural Gas Rents: Portion of GDP from the total earnings from natural gas, 1970-2017
Coal Rents: Portion of GDP from the total earnings from hard and soft coal production, 1970-2017
Forests Rents: Portion of GDP from the total earnings from roundwood harvests, 1970-2017
Renewable Energy consumption: Share of renewables energy in total final energy consumption, 1990-2015
A Panel has a cross-sectional dimension (N) which are the individual-observations and a longitudinal dimension (T) which is the time measurement. In our case, our indiviudals are countries and our time dimension is annual years. First, we will transform the data structure to Panel Data, indexing the observations by Country groups and Years. Therefore, our observations for the panels will be country-years units.
The Data originally has a total of 9950 observations, but many observations have missing and NA values. We will proceed to drop all the observations with missing and NA values.
## Unbalanced Panel: n = 153, T = 8-23, N = 3137
We are left with an unbalanced panel of 3137 total observations, from 153 countries, ranging from 8 years to 23 years per country. The fact that some countries will have more observations than other due to their time range is what defines an unbalanced panel from a balnaced panel. In a balanced panel every country group in our case will have the same number of years.
One important consideration about working with unbalaced panels is that they are more prone to produce estimation errors than balanced panels even though most statistical packages today are minimize the inneficinecy and inconsistency of estimates. But the estimation errors will become more severe as the panel becomes more unbalanced.
In our case, we have a big range in the time dimension between the country-groups which could produce estimation errors. This could become serious if we took a simple random sample, without sampling by clusters of countries, and potentially produce a more unbalanced panel. For now, we will work with our present number of observations, and in the future it is recommended to try to make a more balanced random sample.
In this section we will focus on the descriptive statistics of our variables, plot histograms, boxplots and scatterplots, and correlations between our varibales.
The main descriptive analysis from this summary are:
Our Year variable range is from 1992-2014.
Our Region variable composition is as follows:
Our Income Group variable composition is as follows:
The main descriptive analysis from this summary are:
Mean S.D. Median S.E.
We can observe that our Log transformations reduce effectively the standard deviation and standard error for our main continuous variables of interest: CO2EmissionsKT, CO2EmissionsPerCapita, and GDPperCapitaPPP.
Now we will use histograms, boxplots, and scatterplots to analyze the distribution of our main variables and compare them to their respective Ln transformations and how they relate to one another.
It can be observe that from the histograms and boxplots that the Ln transformations standarizes the varibales distribution, thus reducing the impact of outliers and normalizing the distribution.
We will evaluate if the Ln variable tranformation of CO2 Emissions (KT) appears to be nearly normally distributed in comparison to the orginial variable using a normal probability plot, also called a normal Q-Q plot for “quantile-quantile”.
We will evaluate if the Ln variable tranformation of CO2 Emissions per Capita appears to be nearly normally distributed in comparison to the orginial variable using a normal probability plot.
We will evaluate if the Ln variable tranformation of GDP per Capita PPP appears to be nearly normally distributed in comparison to the orginial variable using a normal probability plot.
##
## Shapiro-Wilk normality test
##
## data: pco2new2$LnCO2EmissionsKT
## W = 0.98987, p-value = 3.754e-14
##
## Shapiro-Wilk normality test
##
## data: pco2new2$LnCO2EmissionsPerCapita
## W = 0.97212, p-value < 2.2e-16
##
## Shapiro-Wilk normality test
##
## data: pco2new2$LnGDPperCapitaPPP
## W = 0.98256, p-value < 2.2e-16
The Shapiro Tests suggests that the variables are not Normal Distributed. This can be confirmed from the Normal Probability plots that show that the distributions have very strong outliers in both tails even after the Ln transformations.
From the following correlations and charts we can detect which independent variables will be more strongly related to our main dependent varibale, Ln CO2 Emissions per Capita. Additionally, very high correlations between independent variables can easily cause multicollinearity problems. We can therefore detect which independent variables are strongly correlated and avoid putting them together in the same model, thus avoiding possible multicollinearity problems beforehand.
In advance, we can expect high correlations between the following:
Each CO2 and GDP variable with its own Ln transformation, The three Employment varibales with each other and , The Total Natural Resources Rents variable with the rest of the Rents variables.
The group of variables that we expected to be highly correlated are effectively. Additionally, we find that Total Population is highly correlated with Total Labor Force.
Now we can focus on looking for which specific independent continuous variables are more correlated with Ln CO2 Emissions per Capita to be able to build the most parsimoniuos model possible using forward selection. We will try to look for the least number of independent variables that are able to explain the most amount of the variance of our Ln CO2 Emissions per Capita variable.
## GDPperCapitaPPP LnGDPperCapitaPPP GDPperCapitaGrowth PopulationTotal
## [1,] 0.6699336 0.9002975 -0.03300276 0.02031009
## PopulationGrowth PopulationDensity LaborForce EmploymentAgriculture
## [1,] -0.2334791 0.07789669 0.04537566 -0.8732624
## EmploymentIndustry EmploymentServices TotalNaturalResourcesRents
## [1,] 0.7319804 0.8114768 -0.06119391
## OilRents NaturalGasRents MineralRents ForestRents CoalRents
## [1,] 0.1859002 0.1748236 -0.1087441 -0.6225594 0.07777814
## RenewableEnergyConsumption
## [1,] -0.8488297
## GDPgrowthlag GDPperCapitaPPPlag LnGDPperCapitaPPPlag
## [1,] -0.01457617 0.6687695 0.9001759
## PopulationTotallag PopulationGrowthlag PopulationDensitylag
## [1,] 0.02192764 -0.2345378 0.07781012
## LaborForcelag EmploymentAgriculturelag EmploymentIndustrylag
## [1,] 0.04714553 -0.8734301 0.7365937
## EmploymentServiceslag TotalNaturalResourcesRentslag OilRentslag
## [1,] 0.8116599 -0.05324654 0.1913722
## NaturalGasRentslag MineralRentslag ForestRentslag CoalRentslag
## [1,] 0.1716228 -0.09529101 -0.6202426 0.08217412
## RenewableEnergyConsumptionlag
## [1,] -0.8509698
We find the following order of the strongest correlations with our variable of interest:
From this correlations we can assume the following general tendency:
The larger the Ln GDP per Capita PPP, the larger the % of Employment in the Services Economic Sector and the larger the % of Employment in the Industry Economic Sector, the values for Ln CO2 Emissions per Capita will be larger.
On the other hand the larger the % of Employment in the Agriculture Economic Sector, the larger the % of Consumption from Renewable Energy, the larger the % of GDP Rents from Forest Extraction, and the larger the Population Growth Rate, the values for Ln CO2 Emissions per Capita will be smaller.
It is interesting to note that Employment in Agriculture has bigger impact than Employment in Services and Industry. Also the direction of the effect for the Population Growth and Forest Rents was the opposite from what was originally expected.
Finnaly, we observe that the lagged values do not change drastically from their original values. This is most likely from the fact that the data is in annual years, and we only applied a lag of only 1 year, therefore the values are not expected to change drastically.
We will now try to observe if there is a specific Time tendency effect.
There appears to be a clear positive tendency starting from 2004, but we will need to run some tests later on to know if the time tendency effectively has an impact on the Ln CO2 Emissions per Capita or if it just statistical noise.
Based on the past research and exploratory analysis we will explore which are the main variables that have larger effects in the emissions of carbon dioxide on the country level.
Our main dependent variable will be the emission of carbon dioxide per capita and our initial main independent variables will be Ln of GDP per capita in PPP, the income group, geographical region, the percentage of renewable energy consumption, percentages of employment in economic sectors and rents from forests extranctions.
We will use linear panel models with different specifications, and use different tests to observe how the estimates change along side different speciications.
We will start with basic Pooled OLS Model, proceeding with forward selection to determine which are the variables that have the most individual effects. We will proceed with comparing models with the variables against models with lagged variables to explore if we can detect dynamic time effects. We will then proceed with Between and Within model specifications to determine if we can detect if there are heterogenous effects from each individual country that affects the carbon dioxide emissions or not. We will analize if there are proper time trends that affect the variation in our variable of interest. We will proceed finally with a random effects model and a twoway within model.
Our main interest is to explore different hypothesis and measure variation from different model specifications that would explain variation in carbon dioxide emission by country.
The main model of interest that we want want to initially test is the following:
Ln CarbonEmissions perCapita = Intercept + B1(Ln GDP perCapita PPP) + B2(Income Group) + B3(Region) + B4(Renewable Energy) + B5(Employment Agriculture) + B6(Employment Services) + B7(Forest Rents) + Country Effects + Time Effects + (Error)
We will initially explore univariate linear models with the Pooled OLS Model and once we have determine which are the variables that effectively appear to impact the most, then we will proceed to building multivariate models with more specificaitons.
In this first model we will explore the impact of geographical region on the levels of carbon emissions per capita:
## Pooling Model
##
## Call:
## plm(formula = LnCO2EmissionsPerCapita ~ Region, data = pco2new2,
## model = "pooling")
##
## Unbalanced Panel: n = 153, T = 8-23, N = 3137
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -3.885816 -0.663883 -0.038697 0.695229 2.930361
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## (Intercept) -1.085388 0.039746 -27.308 < 2.2e-16 ***
## RegionAsia 1.658373 0.063218 26.233 < 2.2e-16 ***
## RegionEurope 2.959081 0.061728 47.938 < 2.2e-16 ***
## RegionLatin America & Caribbean 1.741089 0.065515 26.575 < 2.2e-16 ***
## RegionMiddle East 2.706807 0.079426 34.080 < 2.2e-16 ***
## RegionNorth America 3.944940 0.180054 21.910 < 2.2e-16 ***
## RegionOceania 1.255878 0.105973 11.851 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 8562.4
## Residual Sum of Squares: 4440.3
## R-Squared: 0.48142
## Adj. R-Squared: 0.48042
## F-statistic: 484.282 on 6 and 3130 DF, p-value: < 2.22e-16
From the results we observe that all regions appear to be highly significant and the adjusted R-Squared for the model is 0.48042, and the overall model is also significant. It is also interesting that all regions have a positive effect on the carbon emissions except for Africa which is negative. Apparently on the regional level, the biggest polluter is North America, followed by Europe and the Middle East.
We will now look at the effects of Income Group:
## Pooling Model
##
## Call:
## plm(formula = LnCO2EmissionsPerCapita ~ IncomeGroup, data = pco2new2,
## model = "pooling")
##
## Unbalanced Panel: n = 153, T = 8-23, N = 3137
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -2.648249 -0.558975 -0.020134 0.536949 3.341349
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## (Intercept) 2.365160 0.037889 62.4239 < 2.2e-16 ***
## IncomeGroupLow income -4.039315 0.049520 -81.5692 < 2.2e-16 ***
## IncomeGroupLower-middle income -1.962995 0.044557 -44.0559 < 2.2e-16 ***
## IncomeGroupUpper-middle income -0.521244 0.053009 -9.8331 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 8562.4
## Residual Sum of Squares: 2325.3
## R-Squared: 0.72844
## Adj. R-Squared: 0.72818
## F-statistic: 2801.28 on 3 and 3133 DF, p-value: < 2.22e-16
All income groups again are significant, as the overall model and the model has now an Adjusted R-Squared of 0.72818, substantially higher than our Region variable. All income groups have a negative effect on the carbon emissions with the exception of the High Income group which has a positive impact on the level of emissions.
We will now analize the impact of the Ln of the GDP per Capita measured in PPP:
## Pooling Model
##
## Call:
## plm(formula = LnCO2EmissionsPerCapita ~ LnGDPperCapitaPPP, data = pco2new2,
## model = "pooling")
##
## Unbalanced Panel: n = 153, T = 8-23, N = 3137
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -3.375039 -0.419654 -0.017391 0.372477 2.789897
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## (Intercept) -10.009143 0.091571 -109.31 < 2.2e-16 ***
## LnGDPperCapitaPPP 1.189929 0.010275 115.81 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 8562.4
## Residual Sum of Squares: 1622.3
## R-Squared: 0.81054
## Adj. R-Squared: 0.81048
## F-statistic: 13411.6 on 1 and 3135 DF, p-value: < 2.22e-16
The overall Adjusted R-Sqaured for the model is now 0.81048, the variable and the model are significant and the GDP per capita variable has a positive effect on the carbon emissions as expected.
We will now proceed with using the Ln GDP per Capita PPP and the Renewable Energy Consumption varible:
## Pooling Model
##
## Call:
## plm(formula = LnCO2EmissionsPerCapita ~ LnGDPperCapitaPPP + RenewableEnergyConsumption,
## data = pco2new2, model = "pooling")
##
## Unbalanced Panel: n = 153, T = 8-23, N = 3137
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -2.6380493 -0.3310962 -0.0090764 0.3234117 1.6557494
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## (Intercept) -5.78023321 0.11811789 -48.936 < 2.2e-16 ***
## LnGDPperCapitaPPP 0.79500173 0.01189165 66.854 < 2.2e-16 ***
## RenewableEnergyConsumption -0.02123411 0.00047241 -44.948 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 8562.4
## Residual Sum of Squares: 986.39
## R-Squared: 0.8848
## Adj. R-Squared: 0.88473
## F-statistic: 12035.4 on 2 and 3134 DF, p-value: < 2.22e-16
The Adjunted R-Squared went up to 0.88473, the model and variables are significant and the Renewable Energy variable is negative which is expected.
Now we will include to this model the Region variable:
## Pooling Model
##
## Call:
## plm(formula = LnCO2EmissionsPerCapita ~ LnGDPperCapitaPPP + RenewableEnergyConsumption +
## Region, data = pco2new2, model = "pooling")
##
## Unbalanced Panel: n = 153, T = 8-23, N = 3137
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -2.43915400 -0.30725341 -0.00028667 0.30553897 1.73989505
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## (Intercept) -5.40567712 0.12049976 -44.8605 < 2.2e-16
## LnGDPperCapitaPPP 0.72834782 0.01255026 58.0345 < 2.2e-16
## RenewableEnergyConsumption -0.02054152 0.00050133 -40.9743 < 2.2e-16
## RegionAsia 0.20837764 0.03202288 6.5071 8.887e-11
## RegionEurope 0.42843245 0.03647923 11.7446 < 2.2e-16
## RegionLatin America & Caribbean 0.15616696 0.03338782 4.6774 3.029e-06
## RegionMiddle East 0.16117434 0.04397463 3.6652 0.0002513
## RegionNorth America 0.92605449 0.08684054 10.6638 < 2.2e-16
## RegionOceania 0.18011512 0.04967902 3.6256 0.0002929
##
## (Intercept) ***
## LnGDPperCapitaPPP ***
## RenewableEnergyConsumption ***
## RegionAsia ***
## RegionEurope ***
## RegionLatin America & Caribbean ***
## RegionMiddle East ***
## RegionNorth America ***
## RegionOceania ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 8562.4
## Residual Sum of Squares: 923.52
## R-Squared: 0.89214
## Adj. R-Squared: 0.89187
## F-statistic: 3234.16 on 8 and 3128 DF, p-value: < 2.22e-16
Now the R-Squared had a very slight increase to 0.89187. In general term, we could say that in the presence of GDP per capita and the Renewable Energy variable, the Region variable helps explain very little additional variation of the carbon emission on a country level. In the presence of more variables, the parameter estimates decreased as is generally expected and they had kept the same direction of their partial effects.
We will now run a quick test to see if at this model we are having collinearity:
## [1] "No linear dependent column(s) detected."
On the next model, we included the variable of Employment in Agriculture and we see that the variable does not help explain any more variation. We had also tried including the Income Group variable, and also it does not help significantly explain any additional variation. We can suppose that in presence of the GDP per capita variable, the income group will explain very little. Also the fact that the income groups will average large numbers of countries, this has more potential in diminishing the capacity to explain variation in contrast with the continuous variable of GDP per capita.
## Pooling Model
##
## Call:
## plm(formula = LnCO2EmissionsPerCapita ~ LnGDPperCapitaPPP + RenewableEnergyConsumption +
## Region + EmploymentAgriculture, data = pco2new2, model = "pooling")
##
## Unbalanced Panel: n = 153, T = 8-23, N = 3137
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -2.359408 -0.305261 -0.012447 0.293017 1.782414
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## (Intercept) -4.26159677 0.17288160 -24.6504 < 2.2e-16
## LnGDPperCapitaPPP 0.62183736 0.01702623 36.5223 < 2.2e-16
## RenewableEnergyConsumption -0.01815573 0.00055977 -32.4343 < 2.2e-16
## RegionAsia 0.24039315 0.03180478 7.5584 5.328e-14
## RegionEurope 0.39162891 0.03623483 10.8081 < 2.2e-16
## RegionLatin America & Caribbean 0.07814022 0.03405032 2.2948 0.02181
## RegionMiddle East 0.18211872 0.04346895 4.1896 2.871e-05
## RegionNorth America 0.88285134 0.08585285 10.2833 < 2.2e-16
## RegionOceania 0.22453027 0.04928042 4.5562 5.409e-06
## EmploymentAgriculture -0.00894692 0.00098110 -9.1193 < 2.2e-16
##
## (Intercept) ***
## LnGDPperCapitaPPP ***
## RenewableEnergyConsumption ***
## RegionAsia ***
## RegionEurope ***
## RegionLatin America & Caribbean *
## RegionMiddle East ***
## RegionNorth America ***
## RegionOceania ***
## EmploymentAgriculture ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 8562.4
## Residual Sum of Squares: 899.6
## R-Squared: 0.89494
## Adj. R-Squared: 0.89463
## F-statistic: 2959.56 on 9 and 3127 DF, p-value: < 2.22e-16
Now that we have estbalished that any other additional variables are no longer helping explain any other substantial variation we will keep the following model for now:
Carbon Emissions = B1(GDP per capita) + B2(Renewable Energy) + B3(Region)
We will explore now if we can find any subtantial differences with a dynamic version of the model with lagged variables.
## Pooling Model
##
## Call:
## plm(formula = LnCO2EmissionsPerCapita ~ LnGDPperCapitaPPPlag +
## RenewableEnergyConsumptionlag + Region, data = pco2new2,
## model = "pooling")
##
## Unbalanced Panel: n = 153, T = 8-23, N = 3137
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -2.4817901 -0.3047338 -0.0012595 0.3202074 1.6877058
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## (Intercept) -5.3564160 0.1200654 -44.6125 < 2.2e-16
## LnGDPperCapitaPPPlag 0.7260120 0.0125351 57.9184 < 2.2e-16
## RenewableEnergyConsumptionlag -0.0204678 0.0004996 -40.9688 < 2.2e-16
## RegionAsia 0.2257303 0.0319362 7.0682 1.928e-12
## RegionEurope 0.4249246 0.0364576 11.6553 < 2.2e-16
## RegionLatin America & Caribbean 0.1612189 0.0333321 4.8368 1.384e-06
## RegionMiddle East 0.1593380 0.0439444 3.6259 0.0002925
## RegionNorth America 0.9241806 0.0867471 10.6537 < 2.2e-16
## RegionOceania 0.1737174 0.0496511 3.4988 0.0004739
##
## (Intercept) ***
## LnGDPperCapitaPPPlag ***
## RenewableEnergyConsumptionlag ***
## RegionAsia ***
## RegionEurope ***
## RegionLatin America & Caribbean ***
## RegionMiddle East ***
## RegionNorth America ***
## RegionOceania ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 8562.4
## Residual Sum of Squares: 921.65
## R-Squared: 0.89236
## Adj. R-Squared: 0.89209
## F-statistic: 3241.53 on 8 and 3128 DF, p-value: < 2.22e-16
The model does not change in any substantial way.
We will now do perfrom a test for the Estimated Coefficients, with Robust Covariance Matrix Estimators clustering the observations by Country groups.
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -5.3564160 0.6503777 -8.2359 2.589e-16
## LnGDPperCapitaPPPlag 0.7260120 0.0689114 10.5354 < 2.2e-16
## RenewableEnergyConsumptionlag -0.0204678 0.0024224 -8.4496 < 2.2e-16
## RegionAsia 0.2257303 0.1350392 1.6716 0.09471
## RegionEurope 0.4249246 0.1399230 3.0368 0.00241
## RegionLatin America & Caribbean 0.1612189 0.1081574 1.4906 0.13617
## RegionMiddle East 0.1593379 0.1566088 1.0174 0.30903
## RegionNorth America 0.9241806 0.1875072 4.9288 8.705e-07
## RegionOceania 0.1737174 0.1938300 0.8962 0.37020
##
## (Intercept) ***
## LnGDPperCapitaPPPlag ***
## RenewableEnergyConsumptionlag ***
## RegionAsia .
## RegionEurope **
## RegionLatin America & Caribbean
## RegionMiddle East
## RegionNorth America ***
## RegionOceania
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We now observe that the Region variable drastically looses it significance taking into account the robust estimators clustered by country. Therefore we should move on to the between and within models now.
## Oneway (individual) effect Between Model
##
## Call:
## plm(formula = LnCO2EmissionsPerCapita ~ LnGDPperCapitaPPP + RenewableEnergyConsumption +
## Region, data = pco2new2, model = "between")
##
## Unbalanced Panel: n = 153, T = 8-23, N = 3137
## Observations used in estimation: 153
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -2.047599 -0.275916 0.007135 0.276662 1.347396
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## (Intercept) -6.4555964 0.5660989 -11.4037 < 2.2e-16
## LnGDPperCapitaPPP 0.8427768 0.0589487 14.2968 < 2.2e-16
## RenewableEnergyConsumption -0.0182423 0.0023385 -7.8010 1.14e-12
## RegionAsia 0.0838992 0.1416145 0.5924 0.55448
## RegionEurope 0.3108636 0.1634860 1.9015 0.05924
## RegionLatin America & Caribbean 0.1386481 0.1519585 0.9124 0.36308
## RegionMiddle East 0.1025859 0.1966080 0.5218 0.60263
## RegionNorth America 0.7459526 0.4045680 1.8438 0.06726
## RegionOceania 0.1916401 0.2225150 0.8612 0.39053
##
## (Intercept) ***
## LnGDPperCapitaPPP ***
## RenewableEnergyConsumption ***
## RegionAsia
## RegionEurope .
## RegionLatin America & Caribbean
## RegionMiddle East
## RegionNorth America .
## RegionOceania
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 430.85
## Residual Sum of Squares: 40.365
## R-Squared: 0.90631
## Adj. R-Squared: 0.90111
## F-statistic: 174.132 on 8 and 144 DF, p-value: < 2.22e-16
Our model with the Between estimator drops all the significance for the Region variables and even increases the Adjunted R-Squared to 0.90111.
## Oneway (individual) effect Between Model
##
## Call:
## plm(formula = LnCO2EmissionsPerCapita ~ LnGDPperCapitaPPP + RenewableEnergyConsumption,
## data = pco2new2, model = "between")
##
## Unbalanced Panel: n = 153, T = 8-23, N = 3137
## Observations used in estimation: 153
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -2.1058444 -0.2724307 -0.0025018 0.3126906 1.2939211
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## (Intercept) -6.8014608 0.5389848 -12.6190 < 2.2e-16 ***
## LnGDPperCapitaPPP 0.8978934 0.0540171 16.6224 < 2.2e-16 ***
## RenewableEnergyConsumption -0.0185213 0.0021463 -8.6294 8.271e-15 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 430.85
## Residual Sum of Squares: 42.156
## R-Squared: 0.90216
## Adj. R-Squared: 0.90085
## F-statistic: 691.536 on 2 and 150 DF, p-value: < 2.22e-16
We now are left with the folowing model:
Carbon Emissions = 0.8978934(GDP per capita) -0.0185213(Renewable Energy)
## Oneway (individual) effect Between Model
##
## Call:
## plm(formula = LnCO2EmissionsPerCapita ~ LnGDPperCapitaPPPlag +
## RenewableEnergyConsumptionlag, data = pco2new2, model = "between")
##
## Unbalanced Panel: n = 153, T = 8-23, N = 3137
## Observations used in estimation: 153
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -2.0549496 -0.2775993 -0.0069748 0.3094586 1.2496258
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## (Intercept) -6.6581912 0.5361531 -12.4185 < 2.2e-16 ***
## LnGDPperCapitaPPPlag 0.8875676 0.0539492 16.4519 < 2.2e-16 ***
## RenewableEnergyConsumptionlag -0.0188259 0.0021292 -8.8419 2.381e-15 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 430.85
## Residual Sum of Squares: 41.703
## R-Squared: 0.90321
## Adj. R-Squared: 0.90192
## F-statistic: 699.863 on 2 and 150 DF, p-value: < 2.22e-16
The lagged model again mantains the stability of the parameters and the significance levels.
We will proceed with the Fixed Effects Model via LSDV Estimation.
## Oneway (individual) effect Within Model
##
## Call:
## plm(formula = LnCO2EmissionsPerCapita ~ LnGDPperCapitaPPP + RenewableEnergyConsumption +
## factor(Country) - 1, data = pco2new2)
##
## Unbalanced Panel: n = 153, T = 8-23, N = 3137
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -1.2632499 -0.0835300 0.0021621 0.0808663 1.0139040
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## LnGDPperCapitaPPP 0.24561144 0.00996608 24.645 < 2.2e-16 ***
## RenewableEnergyConsumption -0.02586316 0.00059612 -43.386 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 196.79
## Residual Sum of Squares: 85.452
## R-Squared: 0.56576
## Adj. R-Squared: 0.54334
## F-statistic: 1942.62 on 2 and 2982 DF, p-value: < 2.22e-16
The Adjusted R-Squared dropped drastically to 0.54334. This means that the within country variation helps explains a large portion of the carbon emission, therefore there exists heterogeneous country effects that impact strongly on the carbon emissions per country.
Our parameters are now the following:
Carbon Emissions = 0.24561144(GDP per capita) -0.02586316(Renewable Energy)
Our GDP per capita coefficient decreased and our Renewable Energy increased slightly.
## Oneway (individual) effect Within Model
##
## Call:
## plm(formula = LnCO2EmissionsPerCapita ~ LnGDPperCapitaPPPlag +
## RenewableEnergyConsumptionlag + factor(Country) - 1, data = pco2new2)
##
## Unbalanced Panel: n = 153, T = 8-23, N = 3137
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -1.3676392 -0.0885588 0.0065851 0.0896644 0.8935486
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## LnGDPperCapitaPPPlag 0.25806852 0.01042921 24.745 < 2.2e-16
## RenewableEnergyConsumptionlag -0.02348301 0.00061315 -38.299 < 2.2e-16
##
## LnGDPperCapitaPPPlag ***
## RenewableEnergyConsumptionlag ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 196.79
## Residual Sum of Squares: 92.237
## R-Squared: 0.53128
## Adj. R-Squared: 0.50708
## F-statistic: 1690.03 on 2 and 2982 DF, p-value: < 2.22e-16
In our lagged model the Adjusted R-Squared decreased to 0.50708 from the 0.54334 of the past model. The coefficients are still stable.
Due to the structure of our data and the model specification we do not expected to see any changes from the Fixed Effects Model via LSDV vs Within specification.
## Oneway (individual) effect Within Model
##
## Call:
## plm(formula = LnCO2EmissionsPerCapita ~ LnGDPperCapitaPPP + RenewableEnergyConsumption,
## data = pco2new2, model = "within")
##
## Unbalanced Panel: n = 153, T = 8-23, N = 3137
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -1.2632499 -0.0835300 0.0021621 0.0808663 1.0139040
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## LnGDPperCapitaPPP 0.24561144 0.00996608 24.645 < 2.2e-16 ***
## RenewableEnergyConsumption -0.02586316 0.00059612 -43.386 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 196.79
## Residual Sum of Squares: 85.452
## R-Squared: 0.56576
## Adj. R-Squared: 0.54334
## F-statistic: 1942.62 on 2 and 2982 DF, p-value: < 2.22e-16
## Oneway (individual) effect Within Model
##
## Call:
## plm(formula = LnCO2EmissionsPerCapita ~ LnGDPperCapitaPPPlag +
## RenewableEnergyConsumptionlag, data = pco2new2, model = "within")
##
## Unbalanced Panel: n = 153, T = 8-23, N = 3137
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -1.3676392 -0.0885588 0.0065851 0.0896644 0.8935486
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## LnGDPperCapitaPPPlag 0.25806852 0.01042921 24.745 < 2.2e-16
## RenewableEnergyConsumptionlag -0.02348301 0.00061315 -38.299 < 2.2e-16
##
## LnGDPperCapitaPPPlag ***
## RenewableEnergyConsumptionlag ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 196.79
## Residual Sum of Squares: 92.237
## R-Squared: 0.53128
## Adj. R-Squared: 0.50708
## F-statistic: 1690.03 on 2 and 2982 DF, p-value: < 2.22e-16
Both Within Models are very stable in relation to the LSDV specification. Now we can explore the intercept value of each individual country cluster.
## Afghanistan Albania Algeria
## -3.05970809 -1.11720783 -1.20539383
## Angola Argentina Armenia
## -0.81802956 -0.80781798 -1.63602314
## Australia Austria Azerbaijan
## 0.36215181 0.02704896 -0.78145514
## Bahamas, The Bahrain Bangladesh
## -0.89542384 0.47803905 -1.92130117
## Barbados Belarus Belgium
## -0.67846654 -0.37678968 -0.33377912
## Belize Benin Bhutan
## -0.96529952 -1.35733163 -0.36700435
## Bolivia Botswana Brunei Darussalam
## -1.19509018 -0.71696138 -0.04228724
## Burkina Faso Burundi Cabo Verde
## -2.10635903 -2.87802412 -1.81968584
## Cambodia Cameroon Canada
## -1.57347238 -1.37636081 0.63760840
## Central African Republic Chad Chile
## -2.57444389 -3.16238002 -0.34062891
## Colombia Comoros Congo, Dem. Rep.
## -1.15152277 -2.46758539 -2.90574288
## Congo, Rep. Costa Rica Cote d'Ivoire
## -1.41021965 -1.04928918 -1.21652672
## Croatia Cyprus Czech Republic
## -0.27780774 -0.58026988 0.03794640
## Denmark Dominican Republic Ecuador
## -0.10213361 -1.07910719 -1.16079730
## Egypt, Arab Rep. El Salvador Equatorial Guinea
## -1.40096759 -1.12656792 -1.09624447
## Eritrea Estonia Eswatini
## -1.49250057 0.53739528 -0.94207522
## Ethiopia Finland France
## -2.13561164 0.45431529 -0.64315999
## Gabon Germany Ghana
## 0.61901547 -0.22916078 -1.44122336
## Greece Guatemala Guinea
## -0.29983853 -0.93766438 -1.45725552
## Guinea-Bissau Guyana Haiti
## -1.64577641 -0.51576114 -1.75799565
## Hong Kong SAR, China Hungary Iceland
## -0.88349973 -0.61324683 0.76164048
## India Indonesia Iran, Islamic Rep.
## -0.78813742 -0.77557638 -0.57666945
## Iraq Israel Italy
## -1.03594023 -0.25748915 -0.47312125
## Jamaica Japan Jordan
## -0.78549136 -0.30469548 -1.08046853
## Kazakhstan Kenya Korea, Rep.
## 0.08989056 -1.38436340 -0.25971389
## Kuwait Kyrgyz Republic Lao PDR
## 0.47297188 -1.10762188 -1.98450386
## Latvia Lebanon Lesotho
## -0.37998191 -0.87873734 -0.63827694
## Liberia Luxembourg Macao SAR, China
## -1.32072790 0.24861330 -1.45298599
## Madagascar Malawi Malaysia
## -2.22362213 -2.37017977 -0.49601016
## Maldives Mali Malta
## -1.60254836 -2.77364541 -0.76954182
## Mauritania Mauritius Mexico
## -1.72473111 -0.95293454 -0.74335949
## Moldova Mongolia Montenegro
## -1.67015760 -0.60321368 -0.03813661
## Morocco Myanmar Namibia
## -1.44938702 -1.52739947 -1.35363519
## Nepal Netherlands New Zealand
## -1.74867767 -0.23412234 0.07500689
## Nicaragua Niger Nigeria
## -1.02503899 -2.43546556 -0.56320763
## North Macedonia Oman Pakistan
## -0.27635043 -0.31274193 -1.13024930
## Panama Papua New Guinea Paraguay
## -0.98124723 -1.07820341 -0.97546023
## Peru Philippines Poland
## -1.25066971 -1.43662872 -0.14308108
## Qatar Romania Russian Federation
## 0.89546926 -0.45642404 0.13999301
## Rwanda Samoa Sao Tome and Principe
## -2.37181572 -1.23894649 -1.53539487
## Saudi Arabia Senegal Serbia
## 0.02401119 -1.63502883 -0.14023957
## Sierra Leone Singapore Slovak Republic
## -1.88400129 -0.76079590 -0.51341369
## Solomon Islands Spain Sri Lanka
## -1.40593669 -0.48095373 -1.32120051
## St. Lucia Sweden Switzerland
## -1.53919981 -0.02760581 -0.58699690
## Tajikistan Tanzania Timor-Leste
## -1.46904120 -1.82759937 -2.89463634
## Togo Tonga Trinidad and Tobago
## -1.27329412 -2.05656805 0.85997305
## Tunisia Turkey Turkmenistan
## -1.18289221 -0.72886585 0.01752315
## United Arab Emirates United Kingdom United States
## 0.29546531 -0.44501640 0.34627387
## Uruguay Uzbekistan Vanuatu
## -0.83795487 -0.48073732 -1.90807267
## Venezuela, RB Vietnam West Bank and Gaza
## -0.27893980 -0.92373420 -2.40304998
## Yemen, Rep. Zambia Zimbabwe
## -2.21780486 -1.43687205 -0.27694761
We will run an F test to compare the individual and/or time effects between the Within and the Pooled Model. Ho:No fixed effects.
##
## F test for individual effects
##
## data: LnCO2EmissionsPerCapita ~ LnGDPperCapitaPPPlag + RenewableEnergyConsumptionlag
## F = 183.66, df1 = 146, df2 = 2982, p-value < 2.2e-16
## alternative hypothesis: significant effects
The p-value of our test suggests that we should keep the Within model instead of the Pooled OlS.
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## LnGDPperCapitaPPPlag 0.2580685 0.0312370 8.2616 < 2.2e-16 ***
## RenewableEnergyConsumptionlag -0.0234830 0.0022802 -10.2987 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Our two independent variables have Cluster-Robust Standard Errors.
## Oneway (individual) effect Random Effect Model
## (Swamy-Arora's transformation)
##
## Call:
## plm(formula = LnCO2EmissionsPerCapita ~ LnGDPperCapitaPPPlag +
## RenewableEnergyConsumptionlag, data = pco2new2, model = "random")
##
## Unbalanced Panel: n = 153, T = 8-23, N = 3137
##
## Effects:
## var std.dev share
## idiosyncratic 0.03093 0.17587 0.108
## individual 0.25547 0.50544 0.892
## theta:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.8779 0.9276 0.9276 0.9239 0.9276 0.9276
##
## Residuals:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -1.36983 -0.09438 0.01320 0.00036 0.10898 0.89857
##
## Coefficients:
## Estimate Std. Error z-value Pr(>|z|)
## (Intercept) -1.16335394 0.11267508 -10.325 < 2.2e-16
## LnGDPperCapitaPPPlag 0.28748575 0.01061445 27.084 < 2.2e-16
## RenewableEnergyConsumptionlag -0.02500396 0.00059679 -41.898 < 2.2e-16
##
## (Intercept) ***
## LnGDPperCapitaPPPlag ***
## RenewableEnergyConsumptionlag ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 248.59
## Residual Sum of Squares: 104.72
## R-Squared: 0.57875
## Adj. R-Squared: 0.57849
## Chisq: 4305.82 on 2 DF, p-value: < 2.22e-16
The Hausman test is calculated by estimating the Random Effects and Fixed Effects models and then comparing the estimates:
##
## Hausman Test
##
## data: LnCO2EmissionsPerCapita ~ LnGDPperCapitaPPPlag + RenewableEnergyConsumptionlag
## chisq = 88.283, df = 2, p-value < 2.2e-16
## alternative hypothesis: one model is inconsistent
The p-value of the Hausman Tests tells us that we should prefer the Fixed Effects Model over the Random Effects.
We will now test the within variation of time, instead of the clusters of countries.
## Oneway (individual) effect Within Model
##
## Call:
## plm(formula = LnCO2EmissionsPerCapita ~ LnGDPperCapitaPPPlag +
## RenewableEnergyConsumptionlag + factor(Year), data = pco2new2,
## model = "within")
##
## Unbalanced Panel: n = 153, T = 8-23, N = 3137
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -1.3711920 -0.0841838 0.0053052 0.0891647 0.8717308
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## LnGDPperCapitaPPPlag 0.39291698 0.01974674 19.8978 < 2.2e-16
## RenewableEnergyConsumptionlag -0.02248569 0.00062251 -36.1211 < 2.2e-16
## factor(Year)1993 -0.00955720 0.02356030 -0.4056 0.6850300
## factor(Year)1994 -0.02097023 0.02337531 -0.8971 0.3697331
## factor(Year)1995 0.01105553 0.02339296 0.4726 0.6365330
## factor(Year)1996 0.00964081 0.02317007 0.4161 0.6773752
## factor(Year)1997 0.00486526 0.02296836 0.2118 0.8322586
## factor(Year)1998 -0.03834542 0.02304832 -1.6637 0.0962788
## factor(Year)1999 -0.04681161 0.02312269 -2.0245 0.0430096
## factor(Year)2000 -0.05312725 0.02322053 -2.2879 0.0222110
## factor(Year)2001 -0.05948520 0.02312269 -2.5726 0.0101424
## factor(Year)2002 -0.07709066 0.02320681 -3.3219 0.0009049
## factor(Year)2003 -0.04963509 0.02336611 -2.1242 0.0337332
## factor(Year)2004 -0.05179591 0.02346420 -2.2074 0.0273595
## factor(Year)2005 -0.06423246 0.02393234 -2.6839 0.0073170
## factor(Year)2006 -0.07422110 0.02418737 -3.0686 0.0021702
## factor(Year)2007 -0.09608363 0.02484313 -3.8676 0.0001123
## factor(Year)2008 -0.12874976 0.02540735 -5.0674 4.280e-07
## factor(Year)2009 -0.16224838 0.02590058 -6.2643 4.289e-10
## factor(Year)2010 -0.12471453 0.02589277 -4.8166 1.534e-06
## factor(Year)2011 -0.11831452 0.02632914 -4.4937 7.269e-06
## factor(Year)2012 -0.14002980 0.02698181 -5.1898 2.248e-07
## factor(Year)2013 -0.14674906 0.02749339 -5.3376 1.013e-07
## factor(Year)2014 -0.14408504 0.02797952 -5.1497 2.780e-07
##
## LnGDPperCapitaPPPlag ***
## RenewableEnergyConsumptionlag ***
## factor(Year)1993
## factor(Year)1994
## factor(Year)1995
## factor(Year)1996
## factor(Year)1997
## factor(Year)1998 .
## factor(Year)1999 *
## factor(Year)2000 *
## factor(Year)2001 *
## factor(Year)2002 ***
## factor(Year)2003 *
## factor(Year)2004 *
## factor(Year)2005 **
## factor(Year)2006 **
## factor(Year)2007 ***
## factor(Year)2008 ***
## factor(Year)2009 ***
## factor(Year)2010 ***
## factor(Year)2011 ***
## factor(Year)2012 ***
## factor(Year)2013 ***
## factor(Year)2014 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 196.79
## Residual Sum of Squares: 89.475
## R-Squared: 0.54532
## Adj. R-Squared: 0.51828
## F-statistic: 147.919 on 24 and 2960 DF, p-value: < 2.22e-16
We will now do the F test to compare the time effects Within Model and the Pooled Model. Ho:No fixed time effects.
##
## F test for individual effects
##
## data: LnCO2EmissionsPerCapita ~ LnGDPperCapitaPPPlag + RenewableEnergyConsumptionlag + ...
## F = 163.87, df1 = 168, df2 = 2960, p-value < 2.2e-16
## alternative hypothesis: significant effects
The p-value of our test suggests that we should keep the Within model instead of the Pooled OlS.
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## LnGDPperCapitaPPPlag 0.3929170 0.0283751 13.847 < 2.2e-16 ***
## RenewableEnergyConsumptionlag -0.0224857 0.0015038 -14.952 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Following the suggestion of the Global F Tests, we will now model a Twoway Fixed Effects Model.
## Twoways effects Within Model
##
## Call:
## plm(formula = LnCO2EmissionsPerCapita ~ LnGDPperCapitaPPPlag +
## RenewableEnergyConsumptionlag, data = pco2new2, effect = "twoways",
## model = "within")
##
## Unbalanced Panel: n = 153, T = 8-23, N = 3137
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -1.3711920 -0.0841838 0.0053052 0.0891647 0.8717308
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## LnGDPperCapitaPPPlag 0.39291698 0.01974674 19.898 < 2.2e-16
## RenewableEnergyConsumptionlag -0.02248569 0.00062251 -36.121 < 2.2e-16
##
## LnGDPperCapitaPPPlag ***
## RenewableEnergyConsumptionlag ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 169.76
## Residual Sum of Squares: 89.475
## R-Squared: 0.47293
## Adj. R-Squared: 0.44159
## F-statistic: 1327.97 on 2 and 2960 DF, p-value: < 2.22e-16
Adjusted R-Squared is 0.44159 and the p-value of the model is < 2.22e-16.
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## LnGDPperCapitaPPPlag 0.3929170 0.0592267 6.6341 3.867e-11 ***
## RenewableEnergyConsumptionlag -0.0224857 0.0020558 -10.9378 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Carbon Emissions = 0.3929170(LnGDPperCapitaPPPlag) -0.0224857(RenewableEnergyConsumptionlag)
We will now confirm that the Twoway Model is prefered over the Pooled OLS Model. Ho:No fixed effects.
##
## F test for twoways effects
##
## data: LnCO2EmissionsPerCapita ~ LnGDPperCapitaPPPlag + RenewableEnergyConsumptionlag
## F = 163.87, df1 = 168, df2 = 2960, p-value < 2.2e-16
## alternative hypothesis: significant effects
We can confirm that the Twoway model is prefered over the Pooled OLS.
## 2.5 % 97.5 %
## LnGDPperCapitaPPPlag 0.35421409 0.4316199
## RenewableEnergyConsumptionlag -0.02370579 -0.0212656
Confidence Intervals:
LnGDPperCapitaPPPlag (0.35421409, 0.4316199) RenewableEnergyConsumptionlag (-0.02370579, -0.0212656)
##
## Breusch-Pagan LM test for cross-sectional dependence in panels
##
## data: LnCO2EmissionsPerCapita ~ LnGDPperCapitaPPPlag + RenewableEnergyConsumptionlag
## chisq = 43023, df = 11628, p-value < 2.2e-16
## alternative hypothesis: cross-sectional dependence
##
## Pesaran CD test for cross-sectional dependence in panels
##
## data: LnCO2EmissionsPerCapita ~ LnGDPperCapitaPPPlag + RenewableEnergyConsumptionlag
## z = 1.5486, p-value = 0.1215
## alternative hypothesis: cross-sectional dependence
The Breush-Pagan LM Test tells us there is cross-sectional dependence, but the Pasaran CD Test tells us the opposite.
Monte Carlo experiments show that the standard Breusch–Pagan LM test performs badly for N > T panels, whereas Pesaran’s CD test performs well even for small T and large N. Therefore in our case of N > T, we will follow the results from the Pesaran CD test and assume we don´t have cross-sectional dependence.
##
## Breusch-Godfrey/Wooldridge test for serial correlation in panel
## models
##
## data: LnCO2EmissionsPerCapita ~ LnGDPperCapitaPPPlag + RenewableEnergyConsumptionlag
## chisq = 1136.1, df = 8, p-value < 2.2e-16
## alternative hypothesis: serial correlation in idiosyncratic errors
The test suggests that we have serial correlation in idiosyncratic errors.
##
## studentized Breusch-Pagan test
##
## data: twofixedlag
## BP = 14.706, df = 2, p-value = 0.0006408
The test suggests that we have hetersokedaticity.
We can observe that overall the residuals are are normal, but there are some strong outliers in both tails.
This function plots observed and predicted values of the response of linear (mixed) models for each coefficient and highlights the observed values according to their distance (residuals) to the predicted values. This allows to investigate how well actual and predicted values of the outcome fit across the predictor variables.
The actual (observed) values have a coloured fill, while the predicted values have a solid outline without filling.
In here we can observe that the homoscedasticity is not fully complied, becuase there are some outliers violating the constant variance assumption.
Equatorial Guinea is one outlier.
In conclusion, we find that the best fit is a Panel, Linear, Lagged, Twoway Fixed Effects Model. We find that lag of Ln GDP per Capita PPP and lag of Renewable Energy Consumption best predict Ln CO2 Emissions Per Capita along side Country and Time Fixed Effects.
Our final model had an Adjusted R-Squared of 0.44159 and the p-value of the model is < 2.22e-16.
Final Model:
Ln CO2 Emissions per Capita = 0.3929170(LnGDPperCapitaPPPlag) -0.0224857(RenewableEnergyConsumptionlag) + Country Effects + Time Effects + Error
Confidence Intervals for independent variables: LnGDPperCapitaPPPlag (0.35421409, 0.4316199) RenewableEnergyConsumptionlag (-0.02370579, -0.0212656)
We find that one single variable, Ln GDP per Capita PPP lag, has large predictive power on Ln CO2 Emissions per Capita. Additionally, we find that many other variables such as Income Group, Region, Natural Resources Rents, Population variables, and Employment Economic Sector do not appear to have any statistical significant importance. On the other hand, Renewable Energy Consumption, Country effects, and Time effects appear to have more explanatory power, than other variables commonly refer to in the literature.
The descriptive analysis overall was done without much problem, but the hardest parts were consolidating a database that was going to be useful for the models, and the last part of model diagnostics, when you find that there are strong outliers that are affecting your results, but removing them would be counterintuitive for the purpose of the research.
For future research it would be better to have a balanced panel, with a longer time span, possibly with a more wider set of diverse variables.
One subtantial problem, was the gap between a large number of country clusters, with a very high variance in the time dimension, which prevented the possibility of doing a random sample.
Most of the variables were economic in nature, which for future research it would be interesting to add institutional and other variables.
A longer time span would also help compare different time periods and different frequencies of lag effects.
The problem that our R-Squared dropped dramatically due to the Country and Time effects is that we cannot identify clearly what is behind those effects and this prevents us from being able to generalize any other findings.
• World Bank DataBank: https://databank.worldbank.org/home.aspx • https://www.bgs.ac.uk/discoveringGeology/climateChange/CCS/man-madeEffect.html • https://www.epa.gov/ghgemissions/sources-greenhouse-gas-emissions • https://www.epa.gov/climate-indicators/climate-change-indicators-global-greenhouse-gas-emissions • https://www.epa.gov/ghgemissions/overview-greenhouse-gases • https://ourworldindata.org/co2-and-other-greenhouse-gas-emissions • https://www.britannica.com/science/greenhouse-gas • https://www.livescience.com/37821-greenhouse-gases.html