Applied Econometrics 1

Amine Houjeiri

18/10/2024

1. Research Question and Hypothesis

#What is the impact of a variation of the GDP per capita on CO2 emissions per capita?

#I will be using the following scientifical article, Onofrei, M., Vatamanu, A. F., & Cigu, E. (2022). The relationship between economic growth and CO2 emissions in EU countries: A cointegration analysis. Frontiers in Environmental Science, 10, Article 934885

#In this article we learn that the results confirm a statistically significant long-term cointegration relationship between economic growth and CO2 emissions, indicating that, on average, a 1% increase in GDP leads to a 0.072% rise in CO2 emissions.

#We believe in a positive link between the GDP per capita and the CO2 emissions per capita. We have added a control variable, the urban population in percentage of the total population, we believe in a positive link between this last variable and the other variables, respectfully.

2. Data Description

#We will consider three variables, one explained variable Y and a main explanatory variable X and a controlled variable.

#Carbe is our explained variable, it is the co2 emission (in metric tons per capita), for the year 2019. #Source: Worldbank Group, World Development Indicators.

#Gdpc is our main explanatory variable, it is gdp per capita (in ppp and in current international dollars), for the year 2019. #Source: Worldbank Group, World Development Indicators.

#Urban is the urban population (in percentage of the total population), for the year 2019. #Source: Worldbank Group, World Development Indicators.

# /!\ Code modification 
# replace name_of_your_excel_file.xlsx by the actual name of your excel file. Keep the " " surrounding the name, and watch out for the file extension to match yours .xlsx or .xls
library(readxl)
#> Warning: le package 'readxl' a été compilé avec la version R 4.3.3
Data_ass1 <- read_excel("dataset2019.xlsx")
N Mean SD Min Q1 Median Q3 Max
carbe 30 5.67 2.55 1.75 4.20 5.15 7.26 15.32
gdpc 30 45108.69 21871.05 13413.22 32329.56 43243.79 57229.78 121403.82
urban 30 72.73 14.18 42.73 60.04 72.75 83.65 98.04

#Regarding the carbon emission per capita, for our 30 observations, we have a mean of 5.67, closer to our median (5.15). We conclude that a majority of our observations are between a scale from 1.75 to 10. With a Standard Deviation (SD) of 2.55, we assume that the majority of our observations have pollution emissions between 2.5 and 7.5 metric tons per capita, for the year 2019.

#Regarding the gdpc, for our 30 observations, we have a mean of 45108.69, closer to our median (43243.79). We conclude that a majority of our observations are between a scale from 13000 to 70000. With a SD of 21871.05, we assume that the majority of our observations have a gdpc between 25000 and 85000 ppp in current international dollars for the year 2019.

#Regarding the urban, for our 30 observations, we have a mean of 72.73%, closer to our median (72.75%). We conclude that a majority of our observations are between a scale from 40% to 70%. With a SD of 14.18%, we assume that the majority of our observations have an urban population between 58% and 86%, regarding their respective total population, for the year 2019.

3. Preliminary Graphical Investigations

# /!\ Code modification 
# Replace X1 by the name of your main explanatory variable in the excel file and similarly for Y with the dependent variable.
library(ggplot2)
ggplot(Data_ass1, aes( gdpc , carbe )) + geom_point() + geom_smooth()

#This graph represents the relationship between GDP per capita (gdpc) and CO2 emissions per capita (carbe) for the year 2019. The blue points represent single observations (countries), and the fitted line shows the trend between the two variables.

#From the plot and the fitted line, we can observe a positive correlation between GDP per capita and CO2 emissions per capita, suggesting that if the GDP per capita of a country increases, its CO2 emissions tend to increase. This supports our following economic intuition: wealthier countries tend to consume more energy, leading to higher carbon emissions.

#Additionally, the smooth line indicates that this relationship may not be strictly linear, as the line starts to increase exponentially at higher GDP levels. However, we also notice some spread in the observation, meaning the relationship is not perfect, and other factors (like urbanization) might be influencing emissions.

#Overall, the preliminary graphical analysis supports the hypothesis that GDP per capita is positively related to CO2 emissions per capita, but further investigations is needed to acocunt for other variables like urbanization.

4. Parameter Estimation

# /!\ Code modification 
# Replace X1 by the name of your main explanatory variable in the excel file and similarly for Y with the dependent variable. And so on with X2. Check the slides of lecture 3 for code suggestions.
# Remove the # in front of regression 4 and 5 to activate them (repeat below)
library(huxtable)
regression1 <- lm( carbe ~ gdpc                  , data = Data_ass1)
regression2 <- lm( carbe ~ gdpc + urban             , data = Data_ass1)
regression3 <- lm( carbe ~ gdpc + urban             , data = Data_ass1)
regression4 <- lm( log(carbe) ~ log(gdpc)             , data = Data_ass1)
regression5 <- lm( log(carbe) ~ log(gdpc) + log(urban)             , data = Data_ass1)


# Remove the # in front of regression 4 and 5 to activate them. Do not modify the rest !
huxreg(regression1 , regression2 , regression3 , regression4 , regression5 , 
       statistics   = c("N. obs."     = "nobs", 
                        "R2"          = "r.squared", 
                        "R2 adj"      = "adj.r.squared", 
                        "F statistic" = "statistic",
                        "p-value"     = "p.value"),
       error_format = "({std.error})",
       note         = "{stars}. Standard errors in parenthesis",
       stars        = c(`***` = 0.01, `**` = 0.05, `*` = 0.10))
(1)(2)(3)(4)(5)
(Intercept)1.910 ** 3.473 *  3.473 *  -4.117 ***-3.554 ** 
(0.772)   (1.848)   (1.848)   (1.295)   (1.416)   
gdpc0.000 ***0.000 ***0.000 ***                
(0.000)   (0.000)   (0.000)                   
urban        -0.029    -0.029                    
        (0.031)   (0.031)                   
log(gdpc)                        0.544 ***0.660 ***
                        (0.122)   (0.169)   
log(urban)                                -0.420    
                                (0.425)   
N. obs.30        30        30        30        30        
R20.510    0.526    0.526    0.415    0.435    
R2 adj0.493    0.490    0.490    0.394    0.393    
F statistic29.185    14.957    14.957    19.848    10.404    
p-value0.000    0.000    0.000    0.000    0.000    
*** p < 0.01; ** p < 0.05; * p < 0.1. Standard errors in parenthesis

#Our first regression shows us how much carbe should change with a one unit of increasre in gdpc. Here the coefficient of gdpc is equal to zero, we believe that it is due to linear issues, in log (regression 4 and 5) we don’t encounter this problem.

#Our second regression is similar to the first one, but with the addition of the controlled variable urban. It does not influence the coefficient of gdpc because it was previously equal to zero.

#Regression three is similar to regression two.

#Regression four is interesting, having this regression in log, it makes it more linear, thanks to it, we have coefficients. With a one unit of increase of gdpc we observe an increase for carbe of 0.544. Our R2 adj is equal to 0.394, our explanatory variable does not accurately represent the concrete impact that is has on our explained variable. However we have a significant three start for the value of the coefficient of gdpc (with a small error, 0.122)

#Regression five is similar to regression four but with the addition of our controlled variable. We observe an increase of the coefficient of gdpc,reducing endogeneity issues, our new regression should capture more the impact of gdpc on carbe. Therefore we can have a more accurate impact of an increase of one unit of gdpc on carbe (0.660). R2 adj has insignificantly changed, we have added a controlled variable, it has more impact on gdpc rather than carbe. The gdpc coefficient is significant (three star raking), our intercept is relevant (two star ranking).

5. Discussion

#CO2 emissions might affect gdpc, not just the other way around. Important factors like energy policies aren’t included, leading to incomplete results. Urbanization and gdp may influence each other, making it hard to separate their effects on emissions. Finally, this model shows correlation, not causality. It can’t prove that gdp growth causes higher emissions.