After I finished my exams, I wanted to play more with our dataset from the last time since I have more free time now. The dataset is here
This time let’s look across countries. I wanted to see if the change of carbon dioxide emissions per capita, measured in metric tons, is significant in the long run. We expect it to be positive and distinguishable from zero.
First, we might calculate the change rate of the emissions for each country over the period 1960-2016.
#load the libraries we need.
library(readxl)
library(dplyr)
library(data.table)
library(EnvStats)
library(DT)
#import the dataset. I renamed the file as you can see.
emission<-read_xls("carbondioxideemmision.xls")
#same cleaning as before in the last time.
emission<-emission[-c(1:2),]
colnames(emission)<-emission[1,]
emission<-emission[-c(1),]
emission<-as.data.frame(t(emission))
colnames(emission)<-emission[1,]
emission<-emission[-c(1:4),]
emission[]<-lapply(emission,function(x) as.numeric(as.character(x)))
emission<-emission[, colSums(is.na(emission))<nrow(emission)]
setDT(emission,keep.rownames = "year")
emission<-melt(emission, id.vars="year",variable.name = "country")
#filter our data so that we would have only the observations for 1960 and 2016.
emission_6016<-dplyr::filter(emission,year %in% c(1960,2016))
#let's calculate the change rate.
emission_6016<- emission_6016 %>%
group_by(country) %>%
mutate(change=(((value/lag(value))^(1/(2016-1960)))-1)*100)
#let's have a look.
head(emission_6016)
# A tibble: 6 x 4
# Groups: country [3]
year country value change
<chr> <fct> <dbl> <dbl>
1 1960 Aruba 205. NA
2 2016 Aruba 8.43 -5.54
3 1960 Afghanistan 0.0461 NA
4 2016 Afghanistan 0.245 3.03
5 1960 Angola 0.101 NA
6 2016 Angola 1.20 4.53
#we only need the "country" and its change rate, so we can remove the other two columns.
emission_6016<-emission_6016[,-c(1,3)]
#remove the missing values, since it's only one value for each country.
emission_6016<-na.omit(emission_6016)
#let's have another look.
head(emission_6016)
# A tibble: 6 x 2
# Groups: country [6]
country change
<fct> <dbl>
1 Aruba -5.54
2 Afghanistan 3.03
3 Angola 4.53
4 Albania 0.404
5 Arab World 3.73
6 United Arab Emirates 9.77
In addition, let’s consider the average emissions per capita for each country over the same period, and the coefficient of variation as well.
#turning back to our original data, calculate the mean and the coefficient of variation by country and then merge it with the subset with the change rate that we created it before.
emission<-inner_join(emission_6016,emission %>%
group_by(country) %>%
summarise(mean=mean(value,na.rm = T), cv=cv(value,na.rm=T)))
Joining, by = "country"
#round our variables to the fourth digit
emission$change<-round(emission$change,4)
emission$mean<-round(emission$mean,4)
emission$cv<-round(emission$cv,4)
head(emission)
# A tibble: 6 x 4
# Groups: country [6]
country change mean cv
<fct> <dbl> <dbl> <dbl>
1 Aruba -5.54 105. 0.996
2 Afghanistan 3.03 0.148 0.603
3 Angola 4.53 0.652 0.558
4 Albania 0.404 1.65 0.393
5 Arab World 3.73 3.03 0.396
6 United Arab Emirates 9.77 31.0 0.681
#since the world bank includes regions among countries, I created a vector manually to contain all regions that are not individual countries, since including them might bias our calculations.
regions<-c("Arab World","Central Europe and the Baltics","Caribbean small states","East Asia & Pacific (excluding high income)","Early-demographic dividend","East Asia & Pacific","Europe & Central Asia (excluding high income)","Europe & Central Asia","European Union","Fragile and conflict affected situations","High income","Heavily indebted poor countries (HIPC)","IBRD only", "IDA & IBRD total","IDA total","IDA blend","IDA only","Latin America & Caribbean (excluding high income)","Latin America & Caribbean","Least developed countries: UN classification","Low income","Lower middle income","Low & middle income","Late-demographic dividend","Middle East & North Africa","Middle income","Middle East & North Africa (excluding high income)","North America","OECD members","Pre-demographic dividend","Pacific island small states","Other small states","Post-demographic dividend","South Asia","Sub-Saharan Africa (excluding high income)","Sub-Saharan Africa","Small states","East Asia & Pacific (IDA & IBRD countries)","Europe & Central Asia (IDA & IBRD countries)","Latin America & the Caribbean (IDA & IBRD countries)","Middle East & North Africa (IDA & IBRD countries)","South Asia (IDA & IBRD)","Sub-Saharan Africa (IDA & IBRD countries)","Upper middle income","World")
#quickly create a new subset to include these regions only, and then filter our data to include only the places that are not in the subset created. there are many ways to do that, I just chose what came to my mind.
emission2<-filter(emission,country%in%regions)
emission<-emission[!(emission$country %in% emission2$country),]
#let's create a table to show our data.
datatable(emission, colnames = c("country","percentage change of emissions per capita \n (1960-2016)","Average of emissions per capita (in metric tons) \n (1960-2016)","coefficent of variation"))
We can see which country has the highest and the lowest emission change in the long run.
> #country with the minimum emission change
> emission[which.min(emission$change),]
# A tibble: 1 x 4
# Groups: country [1]
country change mean cv
<fct> <dbl> <dbl> <dbl>
1 Aruba -5.54 105. 0.996
> #country with the maximum emission change
> emission[which.max(emission$change),]
# A tibble: 1 x 4
# Groups: country [1]
country change mean cv
<fct> <dbl> <dbl> <dbl>
1 United Arab Emirates 9.77 31.0 0.681
#plot the empirical probability density function.
epdfPlot(emission$change,xlab=expression('Change of CO'[2] ~ 'emissions per capita in the long run'),main=expression('Empirical PDF of the percentage change of CO'[2]~ 'emissions per capita in the long run'),cex.main=0.95)
abline(v=mean(emission$change),col="darkgray")
The dark gray vertical line represents the mean in our data, as you might expected from the code.
#plot of the empirical cumulative distribution function.
ecdfPlot(emission$change, xlab =expression('Order Statistics for the change rate of CO'[2]~'emissions per capita'),main=expression('Empirical CDF of the percentage change of CO'[2]~ 'emissions per capita in the long run'),cex.main=0.95)
Simply, let \(X\) be the change of CO2 emissions per capita, then \(P(X \le 0) \approx 0\), but we didn’t check yet for its significance from zero, which comes later.
Now, let’s “boxplot” it!
boxplot(emission$change,horizontal=T,xlab=expression('Change of CO'[2] ~ 'emissions per capita in the long run'),varwidth=T)
abline(v=0)
It doesn’t seem close to zero but it might not be significant, so let’s test if the change in the long run across countries is significant from zero.
#using t-test.
t.test(emission$change)
One Sample t-test
data: emission$change
t = 14.803, df = 152, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
2.149774 2.811979
sample estimates:
mean of x
2.480876
The p-value is extremely small almost equal to \(0\), thus the change in the long run is significant from zero at all levels.
In other words, the carbon dioxide emissions per capita, in metric tons, across countries over the period \(1960-2016\) increased significantly by more than \(2\%\), on average.
We can play more with our data, but I’ll leave that to you, which is why I inserted the interactive table.
Have fun!