project2.utf8

After I finished my exams, I wanted to play more with our dataset from the last time since I have more free time now. The dataset is here

This time let’s look across countries. I wanted to see if the change of carbon dioxide emissions per capita, measured in metric tons, is significant in the long run. We expect it to be positive and distinguishable from zero.

First, we might calculate the change rate of the emissions for each country over the period 1960-2016.

#load the libraries we need.
library(readxl)
library(dplyr)
library(data.table)
library(EnvStats)
library(DT)

#import the dataset. I renamed the file as you can see.
emission<-read_xls("carbondioxideemmision.xls")

#same cleaning as before in the last time.
emission<-emission[-c(1:2),]
colnames(emission)<-emission[1,]
emission<-emission[-c(1),]
emission<-as.data.frame(t(emission))
colnames(emission)<-emission[1,]
emission<-emission[-c(1:4),]
emission[]<-lapply(emission,function(x) as.numeric(as.character(x)))
emission<-emission[, colSums(is.na(emission))<nrow(emission)]
setDT(emission,keep.rownames = "year")
emission<-melt(emission, id.vars="year",variable.name = "country")

#filter our data so that we would have only the observations for 1960 and 2016.
emission_6016<-dplyr::filter(emission,year %in% c(1960,2016))

#let's calculate the change rate.
emission_6016<- emission_6016 %>%
    group_by(country) %>%
    mutate(change=(((value/lag(value))^(1/(2016-1960)))-1)*100)

#let's have a look.
head(emission_6016)

# A tibble: 6 x 4
# Groups:   country [3]
  year  country        value change
  <chr> <fct>          <dbl>  <dbl>
1 1960  Aruba       205.      NA   
2 2016  Aruba         8.43    -5.54
3 1960  Afghanistan   0.0461  NA   
4 2016  Afghanistan   0.245    3.03
5 1960  Angola        0.101   NA   
6 2016  Angola        1.20     4.53

#we only need the "country" and its change rate, so we can remove the other two columns.
emission_6016<-emission_6016[,-c(1,3)]

#remove the missing values, since it's only one value for each country.
emission_6016<-na.omit(emission_6016)

#let's have another look.
head(emission_6016)

# A tibble: 6 x 2
# Groups:   country [6]
  country              change
  <fct>                 <dbl>
1 Aruba                -5.54 
2 Afghanistan           3.03 
3 Angola                4.53 
4 Albania               0.404
5 Arab World            3.73 
6 United Arab Emirates  9.77

In addition, let’s consider the average emissions per capita for each country over the same period, and the coefficient of variation as well.

#turning back to our original data, calculate the mean and the coefficient of variation by country and then merge it with the subset with the change rate that we created it before.
emission<-inner_join(emission_6016,emission %>%
             group_by(country) %>%
             summarise(mean=mean(value,na.rm = T), cv=cv(value,na.rm=T)))

Joining, by = "country"

#round our variables to the fourth digit
emission$change<-round(emission$change,4)
emission$mean<-round(emission$mean,4)
emission$cv<-round(emission$cv,4)

head(emission)

# A tibble: 6 x 4
# Groups:   country [6]
  country              change    mean    cv
  <fct>                 <dbl>   <dbl> <dbl>
1 Aruba                -5.54  105.    0.996
2 Afghanistan           3.03    0.148 0.603
3 Angola                4.53    0.652 0.558
4 Albania               0.404   1.65  0.393
5 Arab World            3.73    3.03  0.396
6 United Arab Emirates  9.77   31.0   0.681

#since the world bank includes regions among countries, I created a vector manually to contain all regions that are not individual countries, since including them might bias our calculations.
regions<-c("Arab World","Central Europe and the Baltics","Caribbean small states","East Asia & Pacific (excluding high income)","Early-demographic dividend","East Asia & Pacific","Europe & Central Asia (excluding high income)","Europe & Central Asia","European Union","Fragile and conflict affected situations","High income","Heavily indebted poor countries (HIPC)","IBRD only", "IDA & IBRD total","IDA total","IDA blend","IDA only","Latin America & Caribbean (excluding high income)","Latin America & Caribbean","Least developed countries: UN classification","Low income","Lower middle income","Low & middle income","Late-demographic dividend","Middle East & North Africa","Middle income","Middle East & North Africa (excluding high income)","North America","OECD members","Pre-demographic dividend","Pacific island small states","Other small states","Post-demographic dividend","South Asia","Sub-Saharan Africa (excluding high income)","Sub-Saharan Africa","Small states","East Asia & Pacific (IDA & IBRD countries)","Europe & Central Asia (IDA & IBRD countries)","Latin America & the Caribbean (IDA & IBRD countries)","Middle East & North Africa (IDA & IBRD countries)","South Asia (IDA & IBRD)","Sub-Saharan Africa (IDA & IBRD countries)","Upper middle income","World")

#quickly create a new subset to include these regions only, and then filter our data to include only the places that are not in the subset created. there are many ways to do that, I just chose what came to my mind. 
emission2<-filter(emission,country%in%regions)
emission<-emission[!(emission$country %in% emission2$country),]

#let's create a table to show our data.
datatable(emission, colnames = c("country","percentage change of emissions per capita \n (1960-2016)","Average of emissions per capita (in metric tons) \n (1960-2016)","coefficent of variation"))

We can see which country has the highest and the lowest emission change in the long run.

> #country with the minimum emission change 
> emission[which.min(emission$change),]

# A tibble: 1 x 4
# Groups:   country [1]
  country change  mean    cv
  <fct>    <dbl> <dbl> <dbl>
1 Aruba    -5.54  105. 0.996

> #country with the maximum emission change
> emission[which.max(emission$change),]

# A tibble: 1 x 4
# Groups:   country [1]
  country              change  mean    cv
  <fct>                 <dbl> <dbl> <dbl>
1 United Arab Emirates   9.77  31.0 0.681

Probability distribution of the change rate of CO₂ emissions per capita in the long run:

Let’s have a closer look to our variable distribution:

#plot the empirical probability density function.
epdfPlot(emission$change,xlab=expression('Change of CO'[2] ~ 'emissions per capita in the long run'),main=expression('Empirical PDF of the percentage change of CO'[2]~ 'emissions per capita in the long run'),cex.main=0.95)
abline(v=mean(emission$change),col="darkgray")

The dark gray vertical line represents the mean in our data, as you might expected from the code.

#plot of the empirical cumulative distribution function.
ecdfPlot(emission$change, xlab =expression('Order Statistics for the change rate of CO'[2]~'emissions per capita'),main=expression('Empirical CDF of the percentage change of CO'[2]~ 'emissions per capita in the long run'),cex.main=0.95)

We can see that the probability that the emission change rate in the long run is less than or equal to zero is almost \(0\), supporting our hypothesis that the change rate is positive.

Simply, let \(X\) be the change of CO₂ emissions per capita, then \(P(X \le 0) \approx 0\), but we didn’t check yet for its significance from zero, which comes later.

Now, let’s “boxplot” it!

boxplot(emission$change,horizontal=T,xlab=expression('Change of CO'[2] ~ 'emissions per capita in the long run'),varwidth=T)
abline(v=0)

It doesn’t seem close to zero but it might not be significant, so let’s test if the change in the long run across countries is significant from zero.

#using t-test.
t.test(emission$change)


    One Sample t-test

data:  emission$change
t = 14.803, df = 152, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
 2.149774 2.811979
sample estimates:
mean of x 
 2.480876

The p-value is extremely small almost equal to \(0\), thus the change in the long run is significant from zero at all levels.

In other words, the carbon dioxide emissions per capita, in metric tons, across countries over the period \(1960-2016\) increased significantly by more than \(2\%\), on average.

We can play more with our data, but I’ll leave that to you, which is why I inserted the interactive table.

Have fun!

Change of CO₂ emissions per capita across countries in the long run

Change of CO₂ emissions per capita across countries in the long run

Ahmed Elhefnawy

Ahmed Elhefnawy

Probability distribution of the change rate of CO₂ emissions per capita in the long run:

Change of CO2 emissions per capita across countries in the long run

Change of CO2 emissions per capita across countries in the long run

Ahmed Elhefnawy

Ahmed Elhefnawy

Probability distribution of the change rate of CO2 emissions per capita in the long run:

Change of CO₂ emissions per capita across countries in the long run

Change of CO₂ emissions per capita across countries in the long run

Probability distribution of the change rate of CO₂ emissions per capita in the long run: