For my third experiment with data analysis, I decided to test some assumptions related with economic development and greenhouse gases emission (particularly CO2), in regards with climate change policies. It is generally accepted that the causes of the ongoing man-made climate change are related with the industrial revolution and the rise of fossil fuel consumption, along with other causes such as deforestation. Economic and geo-politics trends have thrusted nations onto development races, meaning that in order to attain economic power and social development, nations must industrialize and modernize their economies. This means an increasing consumption of fossil fuels as a mean to achieve a higher human development and lifestyle.
What I intend to do is to state two assumptions and test them with data visualization models. Hence, my starting point will be these assumptions, which I will try to develop onto working hypothesis; then I will look up data sources and finally make up the data models to test and visualize them.
The models I will test will be based on comparing data from different countries by creating “emission profiles”, that is, classifying countries by their overall emission contribution.
This is a third project I undertook to practice and further my knowledge in R programming and data analysis, on the same way as my previous one on Argentine films. This time, the focus is not so much on the data wrangling, but on asking questions to the data sets and obtaining some answers and insights through analysis and visualization.
While climate change, development and environmental policy topics really interest me, I am aware that many of the most brilliant minds have been working on this. I do not claim or intend to make completely original discoveries, but only to get answers by myself. In addition, this is not a scientific paper or publication as it would require an examination of existing literature on the topic, among other things. This is an indepent project I do during my free time to learn and challenge myself, so I can’t afford the time for all of that.
Finally, in this report I will not display all the code I use, as the focus is on the analysis. Should a reader would be interested in taking a look at the code, feel free to contact me.
The two variables to be considered are a human development index (HDI) or score variable, against a CO2 emission per capita (CO2EPC). The “per capita” calculation should neutralize differences in population sizes, focusing the model on how different CO2 emissions are in relation to each country’s HDI for every person living in them.
The proposed model is a direct relationship where HDI scores correlate or are “pushed up” by increasing CO2 emissions.
Additionally, this model might be improved by looking into economic variables, such as economic outputs (agriculture, industry or services), testing if capital and/or knowledge intensive economies achieve higher HDI and higher CO2 emissions, and so on.
This hypothesis follows up the latter one, by proposing that high HDI countries can afford or focus in new priorities such as climate change or an environmental agenda. This also concurs with the idea that poorer countries are signalled as environmentally uncompliant because of their economic development hunger.
The model would be a zoom into the late XX and early XXI century in which the per capita C02 emission in high HDI countries start to bend down, but are not followed by emerging or developing countries with lower HDI.
This report will focus in emission profiles, which is a classification I invented to the sole purpose to compare countries’ CO2 tons emissions per capita (CO2EPC) in a simple, aggregated way.
This project is based on Gapminder data. It will look into behaviours and changes from the CO2 emissions per capita by country, per year.
The base dataset contains the annual CO2EPC for 192 countries; the time series spans from 1800 to 2014.
This report framework is on the human scale of development and GHG emissions; it is not a comprehensive analysis on environmental performance of countries.
“Per capita” indexes are aggregated values for a whole population, and they work as an estimative of each individual contribution or performance to the total output. For example, suppose two countries with the same GHG yearly emissions, one with a total population of two millions and the second one with twenty millions. Both countries pollute the same, but their inhabitants contribute very differently.
Similarly, Human Development Index provides an estimative of the overall life quality in each country, taking into account existing inequalities. This mean that an individual may have an excellent life standard in a low HDI country, but most of his/her fellow citizens will not probably enjoy the same standard.
Unfortunately, looking at the CO2EPC dataset, it turns out that there are many missing values. The following graph shows the number of countries with missing values per year.
Until 1950, there are too many missing values. The analysis will focus on the year 1950 to 2014.
In this report, I will define an emission profile of a country based on two dimensions:
By defining categories of countries according to these two dimensions, we should be able to tell: (1.) How did their CO2EPC changed over time and (2.) how big or small are each countries’ CO2EPC, in comparison with other countries
I did a little research about a hypothetical CO2EPC baseline and was not able to tell if there is a sustainable or acceptable emission level of per capita for any country. Instead, there are calls to reduce emissions globally, but (as far as I could find) they do not tell how much per capita emissions would be acceptable or recommended. Therefore, in order to classify countries, I will compare their overall emissions.
Let’s take first a look into the net variation of CO2EPC. This could be calculated as the difference between the 2014 and 1950 records:
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## -7.7000 0.4462 1.6640 3.0596 3.9225 31.3000 9
While half of the countries have a net CO2EPC increase under 1.66, the average difference is around 3.06 increase in emissions, meaning that there are countries with big increases, pushing the average up. Let’s look at a frequency histogram:
While a majority of countries had their CO2EPC increased around 1 ton per capita; countries with greater net increases in their CO2EPC have greater magnitudes than countries with net decreases.
For the purpose of grouping countries together, I will divide them in 3 bins:
There were 9 countries with some missing values, which were unable to fit the period variation analysis.
This variable is a discrete and categorical scale which compares, during a given period of time, a country’s position in the CO2 per capita emission ranking. It is calculated this way:
Although this method reduces data complexity, the aim is to compare all countries’ position over periods of time. In addition, looking at the data series, many (but not all) year to year changes in CO2 per capita emissions are small; calculating an average helps to discern the ranking position of each country on a global perspective.
## country 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959
## 1: Argentina 1.750 2.000 2.020 1.930 1.98 2.09 2.30 2.43 2.22 2.42
## 2: Armenia 0.875 0.929 0.969 0.995 1.05 1.15 1.22 1.29 1.32 1.34
## 3: Australia 6.700 7.020 6.990 6.750 7.54 7.68 7.76 7.74 7.88 8.31
## 4: Austria 3.020 3.360 3.190 3.130 3.53 4.19 4.05 4.18 4.01 3.97
## country 1966 1967 1968 1969 1970 1971 1972 1973
## 1: Brunei 4.5800 3.9100 3.6400 3.9400 63.4000 55.4000 66.6000 67.2000
## 2: Bulgaria 5.8900 6.6100 7.0900 7.8500 7.2000 7.5100 7.6900 7.9500
## 3: Cambodia 0.0713 0.0625 0.0725 0.1960 0.1680 0.0349 0.0161 0.0172
## 4: Cameroon 0.0583 0.0757 0.0815 0.0899 0.0978 0.1210 0.1250 0.1270
## country 1986 1987 1988 1989 1990 1991 1992 1993
## 1: Indonesia 0.723 0.718 0.755 0.735 0.824 0.974 1.08 1.15
## 2: Iran 3.010 3.120 3.320 3.490 3.740 3.960 3.92 4.02
## 3: Iraq 3.000 3.250 4.080 4.300 2.730 2.540 3.20 3.33
## 4: Ireland 7.990 8.450 8.360 8.410 8.750 8.780 8.65 8.67
In order to compare behaviour trends, the time series will be divided in three:
This decision is based on the purpose of including some countries with missing values, by averaging their available values within a period and also to compare against the Human Development Index data availablity, no earlier than 1990.
Visually, the results are countries grouped together in an index or colour by their position in the scale bins. However, it is interesting to observe how these profiles change over time:
There are two observations upon classifying countries this way:
On the first graphical analysis, countries with the “stable” label will use the following colours:
Variable countries, however, will be shown with different colours.
While “stable” countries pattern are easily noticeable, “variable” countries remain difficult to assess. We will try to group them by their variation behavior through the 1950-2014 period.
The following plot shows the frequencies across both dimensions analyzing the so-called emission profiles. The bigger the circles, the more countries fitting into the tile.
These trends should not be surprising, because stable emission scales are classified by holding their positions in the ranking throughout the three periods; for instance, countries in the “low scale”, should have variations below global average to stay in that quartile.
However, it might be interesting to compare this variation behaviors with the “variable” profiles:
The CO2EPC scale (vertical axis) has been limited to 15 tons per capita in order to avoid squeezing the plots; there is an outlier country with values between 30 and 50 CO2EPC in the 1970s-early 1980s period that is cut out from the plot; plus a few points from other countries reaching close to 20 CO2EPC.
These are some insights we can learn from this plot:
A vast majority of countries with CO2EPC values under the global average have experienced an average varation. Countries on the “Above Variation” box started below 5 tons CO2EPC and the big jump took place sometime between 1960 and 1970 in many cases. Some countries in the middle box with above-average values show similar behavior as the ones in the “Below average” variation box, increasing emissions until 1980s with a steep decline close to the 1990s or later.
This index was developed by the United Nations Development Programmes as a response to the generalized use of the GDP per capita indicator as a measure of a country’s development and affluence. However, the latter focused only on the overall wealth generated in a country, but not focused so much in other aspects of human life.
The HDI takes overall wealth, as well as other indicators into account, such as education, life expectancy and more.
The following boxplot compresses the distribution of HDI performance scores, globally, over 5-year periods between 1990 and 2015.
The central lines in each box represent the median (the central value that divides in 2 the distribution) and the blue continuos line is the global mean or average.
This boxplot is good news for humanity. Between 1990 and 2015, the minimum HDI score rose from 0.2 to 0.4 and the median and mean rose more than 0.1. The global trend is an approximate 0.007 increase per year.
How do each identified emission profile performs on their HDI? The following animated scatter plot presents the year-to-year score for all “emissions-stable” countries:
## # A tibble: 4 x 6
## # Groups: behavior [4]
## behavior `1990` `2000` `2015` difference avg_standard_dev
## <fct> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Low 0.377 0.425 0.520 0.144 0.0834
## 2 Mid-low 0.572 0.629 0.698 0.126 0.0772
## 3 Mid-high 0.674 0.727 0.792 0.118 0.0782
## 4 High 0.766 0.807 0.853 0.0871 0.0808
While there is an ample variation within each “stable” group, the pattern is clear: the higher the emission scale, the better HDI performance. Although low emission countries have been the ones with a greater increase in their HDI, their 2015 average value does not even match the low-intermediate baseline by 1990
## # A tibble: 3 x 6
## # Groups: varProfile [3]
## varProfile `1990` `2000` `2015` difference avg_standard_dev
## <fct> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Below Var 0.583 0.581 0.656 0.0733 0.165
## 2 Normal Var 0.612 0.642 0.716 0.104 0.118
## 3 Above Var 0.708 0.752 0.820 0.112 0.0961
Just as expected, ‘Variable’ profiles show yet a great diversity, with just a noticeable concentration of ‘Below average variation’ countries with poorer HDI scores and the other way round for ‘Above average variation’. Their standard deviation (averaged value throughout the period) is noticeable bigger than the stable countries.
How do these two variables behave together? The following plots show the Pearson’s correlation coeffcient for every year, the thick red line is the global correlation average, which is compared against the identified sub-groups.
Mind that this is not an explanatory test, but it only assesses how much these variables variate together, either on direct or opposite proportions.
The global correlation average starts at 0.665 in 1990 and it is 0.529 by 2014. This is an intermediate value. There is a very weak correlation for the high emissions country, even with negative values by the end of the period. This means that as much as they increase their emissions, their HDI score does not follow. Similarly, there is a sharp decline in high-intermediate around the year 2010; whereas the low-emission countries maintain a mildly strong correlation.
How to understand these correlation values? Let’s take a look back to the HDI and stable profiles plot above: green dots start close to the 0.4 HDI around 1990 and by 2014 most of them are closer to the 0.5; whereas darkred dots are around the 0.8 HDI value and their overall increase do not go far above that threshold. In addition, we know that the majority of low-emissions countries fall inot the “below average variation” category, while the opposite applies to the high-emissions countries. They increase their emissions more, but not so much their HDI score.
“Below average variation” countries show even a stronger correlation, meaning that their HDI variation (either decrease or small increase) is quite close to that of their CO2EPC. Countries within the central variation values experience a slighly weaker correlation, and countries with greater CO2EPC eventually fall under the global average in their correlation.
The label is not exact, but it brings up a general idea of what it means: greater HDI correlates with CO2EPC positively. What do we know so far:
The answer seems to be “yes, but…”. There is evidence that countries with a historic high CO2EPC have championed the HDI ranking. Notwithstanding, the low emissions and low emission-increase groups show a stronger improvement in HDI, but the yellow or intermediate groups are not so far away from that performance.
In order to answer this, first we are going to isolate the countries with highest HDI during the 90’s and then test if they have been able to reduce their CO2EPC during the XXI century, or before. The steps taken into this analysis are:
A first glimpse at this analysis shows actually a weak trend towards CO2EPC reduction, as well as some countries with huge year-to-year differences.
I will delve I little bit more into this, and try a different model and alternatives:
## [1] "Contingency table: CO2EPC reductions from 1990 levels to 2015 by HDI levels"
##
## 1_Highest HDI 2_High HDI 3_Low HDI 4_Lowest HDI
## No reduction 8 16 23 30
## Reduced CO2 24 9 4 2
##
## Pearson's Chi-squared test
##
## data: Qtest1990
## X-squared = 39.635, df = 3, p-value = 1.273e-08
## [1] "Contingency table: CO2EPC reductions from 1970 levels to 2015 by HDI levels"
##
## 1_Highest HDI 2_High HDI 3_Low HDI 4_Lowest HDI
## No reduction 16 15 24 23
## Reduced CO2 16 10 3 9
##
## Pearson's Chi-squared test
##
## data: Qtest1970
## X-squared = 10.97, df = 3, p-value = 0.01189
Comparing the same analytical model between 1970 and 1990, against 2015 CO2EPC we notice more countries on the highest HDI levels reducing their CO2EPC from their 1990 values, but less than 1970 (though half of them managed to achieve that). This means that the curve bend is recent. Intermediate HDI levels do not show significant difference between both tables. Lowest HDI countries show a puzzling behaviour: for some of them their 2015 CO2EPC is lower than 1970, but not 1990. However, their values might be so small than comparing between these periods does not reflect really significant differences.
There is evidence to support there is a curve bend in the highest HDI countries group; it is also supported by the intermediate HDI countries, but taking into account the comparison with 1970 values, the premise might not fully apply to the lowest HDI countries group.