Introduction

Problem Statement

I am interested to see the relationship between a country’s natural resource endowment and their GDP Growth. This relationship should provide insight into the phenomenon known as the ‘resource curse’. The ‘resource curse’ is the idea that countries well-endowed with natural resources tend to have lower economic growth rates than countries with lesser resource endowment. The ‘resource curse’ is debated, with some scholars refuting the notion that the ‘resource curse’ exists. I will focus specifically on the natural resource of Oil, and how the exploitation of this resource has impacted countries over the past 20 years.

Project Importance to Me

I majored in economics and worked on a similar project last year involving the ‘resource curse’, using STATA. The major flaw with my analysis was that I used 2014 data only. I am still curious about this topic, and would like to now extend my analysis using R visualizatons and do so over a period of 20 years. Oil has shaped our modern world, and understanding the full impact of this natural resource is important to shaping future decisions.

Dataset Description

I will be using two datasets that will be provided by the World Bank. This data varies in its completeness, with some countries having incomplete datasets. The data seems to be available up to 2016, so this will be the most recent year analyzed. There are over 200 countries that are included in these datasets, but for some, the majority of the data is blank. Some of these countries may have to be excluded from analysis.

Oil rents (% of GDP)

Oil is the most valuable natural resource in the world, so it is a good variable to account for resource endowment. Another benenefit of this variable is that it accounts for the cost of production, as well as the price that the oil is sold at.

GDP per capita growth (annual %)

This statistic was chosen because it facilitates comparison of different sized countries, and also considers population size fluctuation. This statistic is aimed at providing insight into citizens welfare.

World Data

This data was used to provide mapping visualizations. This data came along with ggplot2.

Geocode Data

This data adds Continent labelling to the finalized dataset.

Proposed Methodology

I have discovered that there is an R package called WDI that allows for me to access the aformentioned datasets provided by the World Bank. I would then like to run cross-country fixed-effect regressions for each individual year years, finding the summary statistics such as the coefficients, the R-Squared values, and the standard errors for each year. From there I will calculate the averages for these statistics with the values, and see if the results support the ‘resource curse’.

Visualizations

As this dataset is massive, with many countries and years, it seems that a shiny application would help to visualize the data. I have no experience with this application, but I would like to make this part of the final product. I am thinking that there will be a scatter plot of all data points, disregarding the time, as well as histograms that represent the time element. The difficulty with many of these visualizations is the sheer number of countries that will be analyzed. because of this, I think that the shiny application would help to provide greater insight. The summary statistics can also be visualized over time so this may be done through a bar chart or a similar type of graph.

Consumer Application

This project has real application. I think the main consumers of this project would be governments. Information on the ‘resource curse’ could assist these institutions in crafting policies. Should this data prove meaningful, it would have myriad implications for governments. This information could provide guidance on how to apply taxes, tariffs, and quotas, and could also assist governments in avoiding the pitfall known as the ‘resource curse’.

From a different perspective, this data could perhaps be used by a stock trader or a company. They could use this data to predict future policy decisions of goverments, which impact business decisions. These insights could give strategic advantage, and could ultimately assist in the extraction of profit from the marketplace.

Packages Required

library("dataMeta")
library("DT")
library("WDI")
library("tidyverse")
library("rvest")
library("plotly")
library('knitr')
library('kableExtra')
library('broom')
library('sjPlot')
  • dataMeta was used to make the data dictionary

  • DT was used to make the datatable

  • WDI was a package containing datasets for GDP Growth as a percent, and Oil Rents as a percentage of GDP

  • tidyverse was used throughout the project, for ggplot2 and dplyr manipulation and visualization capabilities

  • rvest was used to import the html table

  • plotly was used to make the line on my facet wraps

  • knitr was used to make the rmarkdown document

  • kableExtra displays tables nicely

  • broom was used for its tidy functionality, to check the coefficients of a linear model prior to making a table from the information

  • sjPlot was used as a way to show statistics tables in an organized manor

Data Preparation

Geocode Data

I uploaded the Geocode Data using a read_html command. This dataset was chosen so that I could pair my countries with continents. There were two problems with this dataset. One problem was that a variable name was in a format that caused errors within RStudio. Because of this, I renamed this variable to ‘iso2c’ to match my other dataset. The other problem was that the continent North America was not coming in properly, as it was labeled as ‘NA’, so I changed the North America strings to read as ‘NAM’

#this dataset contains continent, iso2c, and country name. 
html<- read_html("http://www.geonames.org/countries/")
geocodes1 <- html_nodes(html, "table")%>% 
  .[2]%>%
  html_table(fill = TRUE)
#remove scientific notation
#options(scipen=999)

geocodes_df <- as.tibble(geocodes1[[1]]) %>%
  select( Country, Continent, dplyr::contains(('alpha2')))

colnames(geocodes_df[3])<- "iso2c"



#unfortunately, North America is NA, so it is not included. we must fix this

geocodes_df$Continent <- as.character(geocodes_df$Continent)
geocodes_df$Continent <- ifelse(is.na(geocodes_df$Continent), 
                                'NAM', geocodes_df$Continent)
names(geocodes_df)[3] <- "iso2c"
datatable(geocodes_df)
WDI Data

The World Bank has its own R package. After discovering this, I realized that this was a much simpler way of importing the two datasets containing Per Capita GDP Growth (in %) and Oil Rents as a Percentage of GDP. The package also allowed me to choose the number of years to select, so I chose a set of 10 years, from 2006 through 2015. Originally, I had chosen to do the past 20 years, but for many of the countries, there was a lack of data ranging this far.

I loaded both datasets, labelling them ‘oil’ and ‘growth’ and then merged them by their corresponding variables. I labelled this new dataset ‘total’.

oil<- WDI(indicator='NY.GDP.PETR.RT.ZS', country= "all",
         start=2006, end=2016, extra=FALSE)


growth<- WDI(indicator='NY.GDP.PCAP.KD.ZG', country = "all",
            start = 2006, end=2016, extra = FALSE)

#merging oil and growth datasets

total<- merge(growth, oil, by=c("year","country", 'iso2c'))

colnames(total)[4] <- "percapgdpgrowth"
colnames(total)[5] <- "oilrents"
World Data
world<- map_data('world', by = 'Country')
colnames(world)[5]<-'Country'
datatable(world)
Combining the Dataframes

The next step was to merge the Total and Geocodes datasets. There were myriad NA’s throughout the ‘total’ dataset so I first used a rm.na function to remove the NA’s from this set, and then merged the two datasets, naming the new dataframe TOTNA.TOTNA also includes mapping data from the ggplot2 package. This data was integrated using a right merge.

totalcon<- merge(total, geocodes_df, by="iso2c")

totalcon[totalcon == 0]<- NA
#removing rows with NA in them

TOTNA<-na.omit(totalcon)

#removing redundant country column in TOTNA
TOTNA$country <- NULL

#Renaming United States to USA
TOTNA$Country[TOTNA$Country=='United States']<-'USA'
datatable(TOTNA)
High Oil Rents
aggTOTNA <-TOTNA %>% 
  group_by(iso2c,Country, Continent) %>%
  summarise_at(vars(oilrents, percapgdpgrowth), mean)
  
highoilrents<-subset(aggTOTNA, oilrents>5) %>% 
  arrange(-oilrents)
Data Dictionary
var_desc <- c("Country Code is denoted in ISO codes, in this case denoted by the variable iso2c", "Year", "Per Capita GDP Growth in %", "Oil Rents as a Percentage of GDP.
              Oil Rents is the difference between the value of oil production at world prices and the costs to attain
              and produce this resource",
              "Country", "Continent, in abbreviated form. AF corresponds to Africa, AS coresponds to Asia, EU corresponds to Europe, NAM corresponds to North America, OC corresponds to Oceana, and SA correstponds to South America")
var_type <- c(0,0,0,0,0,0)

linker <- build_linker(TOTNA, variable_description = var_desc, variable_type = 
                         var_type)
dict <- build_dict(my.data = TOTNA, linker = linker, option_description = NULL, 
                   prompt_varopts = FALSE)

kable(dict, format = "html", caption = "Data Dictionary")
Data Dictionary
variable_name variable_description variable_options
Continent Continent, in abbreviated form. AF corresponds to Africa, AS coresponds to Asia, EU corresponds to Europe, NAM corresponds to North America, OC corresponds to Oceana, and SA correstponds to South America AF to SA
Country Country Afghanistan to Yemen
iso2c Country Code is denoted in ISO codes, in this case denoted by the variable iso2c AE to ZA
oilrents Oil Rents as a Percentage of GDP. Oil Rents is the difference between the value of oil production at world prices and the costs to attain and produce this resource 0.000104741877702483 to 65.42039096255
percapgdpgrowth Per Capita GDP Growth in % -62.2250869981277 to 33.0304872845886
year Year 2006 to 2015

Exploratory Analyisis and Visualizations

Scatter Plots and Linear Regressions
modelAF<- lm (oilrents~percapgdpgrowth, data=TOTNA, subset=Continent=='AF')


modelAS<- lm (oilrents~percapgdpgrowth, data=TOTNA, subset=Continent=='AS')


modelEU<- lm (oilrents~percapgdpgrowth, data=TOTNA, subset=Continent=='EU')

modelNAM<- lm(oilrents~percapgdpgrowth, data=TOTNA, subset=Continent=='NAM')

modelOC<- lm(oilrents~percapgdpgrowth, data=TOTNA, subset=Continent== 'OC')

modelSA<- lm(oilrents~percapgdpgrowth, data=TOTNA, subset=Continent== 'SA')

modelALL<- lm(oilrents~percapgdpgrowth, data=TOTNA)
#make smaller dataset for Africa
AF<- TOTNA[grep('AF', TOTNA$Continent),]

#make smaller dataset for Asia
AS<- TOTNA[grep('AS', TOTNA$Continent),]

#make smaller dataset for Europe
EU<- TOTNA[grep('EU', TOTNA$Continent),]

#make smaller dataset for North America
NAM<- TOTNA[grep('NAM', TOTNA$Continent),]

#make smaller dataset for Oceana
OC<- TOTNA[grep('OC', TOTNA$Continent),]

#make smaller dataset for North America
SA<- TOTNA[grep('SA', TOTNA$Continent),]
#ggplotting TOTNA for all datapoints, colors assigned via continent
ggplot(TOTNA)+
  geom_point(aes(oilrents, percapgdpgrowth, colour=Continent))+
  geom_smooth(method="lm", se=FALSE, aes(oilrents, percapgdpgrowth))+
  ggtitle('Figure 1: Scatter World, Colored by Continent')

Figure 1 shows all available datapoints, and is colored by continent. There is a negative sloping linear regression line, but as was discovered in Figure 11, the line provides little to no real insight. There appears to be a concentration of countries below the line that are coming from Africa, however.

#ggplotting TOTNA for all datapoints, colors assigned via year
ggplot(TOTNA)+
  geom_point(aes(oilrents, percapgdpgrowth, colour=year))+
  geom_smooth(method="lm", se=FALSE, aes(oilrents, percapgdpgrowth))+
  ggtitle('Figure 2: Scatter World, Colored by Year')

Figure 1 shows all available datapoints, and is colored by year There is a negative sloping linear regression line, but as was discovered in Figure 11, the line provides little to no real insight. It appears that the more recent years depict a steeper negative slope.

#Facet plotting by year
ggplot(TOTNA)+
  geom_jitter(aes(oilrents, percapgdpgrowth))+
  geom_smooth(method="lm", se=FALSE, aes(oilrents,percapgdpgrowth))+
  facet_wrap(~year, scales='free')+
  ggtitle('Figure 3: Scatter by Year')

Figure 3 shows the all available datapoints, faceted by year. There appears to be a stronger negative slope in the years 2010 through 2015, with the exception of 2012.

ggplot(AF)+
  geom_jitter(aes(oilrents, percapgdpgrowth, colour=Country))+
  geom_smooth(method="lm", se=FALSE, aes(oilrents,percapgdpgrowth))+
  ggtitle('Figure 4: Africa')

Figure 4 displays all datapoints from Africa. There is a negative slope, and the coefficients for the linar model are displayed in figure 12.

ggplot(AS)+
  geom_jitter(aes(oilrents, percapgdpgrowth, colour=Country))+
  geom_smooth(method="lm", se=FALSE, aes(oilrents,percapgdpgrowth))+
  ggtitle('Figure 5: Asia')

Figure 5 displays all datapoints from Asia. There is a negative slope, and the coefficients for the linar model are displayed in figure 12.

ggplot(EU)+
  geom_jitter(aes(oilrents, percapgdpgrowth, colour=Country))+
  geom_smooth(method="lm", se=FALSE, aes(oilrents,percapgdpgrowth))+
  ggtitle('Figure 6: Europe')

Figure 6 displays all datapoints from Europe There is a positive slope, and the coefficients for the linar model are displayed in figure 12.

ggplot(NAM)+
  geom_jitter(aes(oilrents, percapgdpgrowth, colour=Country))+
  geom_smooth(method="lm", se=FALSE, aes(oilrents,percapgdpgrowth))+
  ggtitle('Figure 7: North America')

Figure 7 displays all datapoints from North America. There is a positive slope, and the coefficients for the linar model are displayed in figure 12.

ggplot(OC)+
  geom_jitter(aes(oilrents, percapgdpgrowth, colour=Country))+
  geom_smooth(method="lm", se=FALSE, aes(oilrents,percapgdpgrowth))+
  ggtitle('Figure 8: Oceania')

Figure 8 displays all datapoints from Oceania. There is a positive slope, and the coefficients for the linar model are displayed in figure 12.

Figure 9 displays all datapoints from South America. There is a negative slope, and the coefficients for the linar model are displayed in figure 12.

ggplot(highoilrents)+
  geom_point(aes(oilrents, percapgdpgrowth, colour=Continent))+
  geom_smooth(method="lm", se=FALSE, aes(oilrents, percapgdpgrowth))+
  ggtitle('Figure 10: High Oil Rents')

Figure 10 displays all datapoints from the High Oilrents Dataset. There is a negative slope, and the coefficients for the linar model are displayed in figure 12.

Tables for Linear Regressions
Figure 11: Linear Regression for World
    WORLD
    B CI p
(Intercept)   8.08 7.17 – 8.98 <.001
percapgdpgrowth   -0.16 -0.33 – -0.00 .047
Observations   972
R2 / adj. R2   .004 / .003

Figure 11 displays a linear regression for all datapoints from 2006 through 2015, around the globe. At a 5% significance level, the regression is statistically significant, however the low R-Squared value indicates that the linear regression is not good at predicting the trend. The dependent variable in this table is oilrents. This table corresponds with the best fit line shown in Figures 1 and 2.

Figure 12: Linear Regression by Continent

sjt.lm(modelAF, modelAS, modelEU, modelNAM, modelOC, modelSA, depvar.labels= 
         c('AFRICA', 'ASIA','EUROPE', 'NORTH AMERICA', 'OCEANIA', 
           'SOUTH AMERICA'))
    AFRICA   ASIA   EUROPE   NORTH AMERICA   OCEANIA   SOUTH AMERICA
    B CI p   B CI p   B CI p   B CI p   B CI p   B CI p
(Intercept)   15.70 13.22 – 18.17 <.001   12.25 10.34 – 14.15 <.001   0.91 0.60 – 1.22 <.001   2.10 1.49 – 2.71 <.001   1.02 -0.66 – 2.69 .223   4.49 3.01 – 5.96 <.001
percapgdpgrowth   -0.16 -0.50 – 0.18 .345   -0.50 -0.81 – -0.18 .002   0.03 -0.04 – 0.10 .382   0.17 -0.03 – 0.37 .088   0.73 0.18 – 1.27 .011   0.21 -0.15 – 0.57 .252
Observations   195   331   249   80   29   88
R2 / adj. R2   .005 / -.001   .029 / .026   .003 / -.001   .037 / .024   .219 / .190   .015 / .004

Figure 12 displays regressions for all datapoints from 2006 through 2015, separated by continent. The dependent variable is oilrents, and these statistics correspond with Figures 4 through 9. The Coefficients for the slope are not statistically significant at the 5% significant level for Africa, Europe, North America, and South America. Also, The R-Squared values are very low, indicating that the linear model is not good at predicting the trend.

Figure 13: Linear Regression where Avg. Oilrents>5

modelhighoilrents<-lm(oilrents~percapgdpgrowth, data=highoilrents)

sjt.lm(modelhighoilrents, depvar.labels='Oilrents>5')
    Oilrents>5
    B CI p
(Intercept)   23.87 18.91 – 28.82 <.001
percapgdpgrowth   -1.66 -2.98 – -0.34 .015
Observations   34
R2 / adj. R2   .170 / .144

Figure 13 is the most important table, as it has a relatively high R-Squared value, and at the 5% significance level has statistically significant coefficients. This table was created basedco on the High Oil Rens Dataset, and only regresses on countries whom average over 5% oilrents as a percentage of GDP. This chart supports the ‘resource curse’. This chart corresponds with Figure 10.

Maps
TOTNA%>% 
  right_join(world, by = 'Country') %>% 
  filter(year %in% seq(2006, 2016, by = 9)) %>% 
  ggplot()+
  geom_polygon(aes(x=long,y=lat, group=group, fill=percapgdpgrowth))+
  facet_wrap(~year,nrow=6)+
  scale_size_continuous(name='percapgdpgrowth',
                        breaks = waiver(),
                        labels = waiver())+
  ggtitle('GDP Growth as a Percentage of GDP, Visualized for the years 
          2006 and 2015')

The above maps are colored based on Per Capita GDP Growth (in %) for the years 2006 and 2015. The maps are similar, however in 2015 there appears to be more countries in Africa and Asia experiencing severe GDP declines.

TOTNA%>% 
  right_join(world, by = 'Country') %>% 
  filter(year %in% seq(2006, 2016, by = 9)) %>% 
  ggplot()+
  geom_polygon(aes(x=long,y=lat, group=group, fill=oilrents))+
  facet_wrap(~year,nrow=6)+
  scale_size_continuous(name='oilrents',
                        breaks = NULL,
                        labels = waiver())+
  ggtitle('Country Oil Rents as a Percentage of GDP, Visualized for 
          the years 2006 and 2015')

The above maps are colored based on Oil Rents as a Percentage of GDP. The years 2006, and 2015 have been chosen. 2006 clearly displays a concentration of countries deriving a large proportion of their GDP from oil in African and some Asian countries. 2015 shows a similar, but less distinguished trend. This diminished trend is likely due to depressed oil prices.

Summary

Project Inspiration

The goal of this project was to see if there is justification to believe in the ‘resource curse’ phenomenon, as it applies in the modern context. The ‘resource curse’ is essentially the idea that countries well endowed with natural resources often suffer from lower GDP growth.

Data

To address the validity of the ‘resource curse’ in the modern context, we chose oil as the natural resource that would be used to test this theory. The resource oil was chosen, because it is used in every country. Even in 2017, oil is heavily relied upon to run the modern economy. Because of this, we used Oil Rents (as a percentage of GDP) and traced the relationship between this variable and GDP Growth Per Capita (in percent). Regressions were run, and the data was analyzed and visualized in myriad ways.

Methodology

  1. First, a regression was run with all datapoints from 2006 throgh 2015, there was a slightly negative linear fit, as can be seen in figures 1 and 2. Despite this negative relationship, the table in figure 10 showed a very low R-Squared value, so despite the P values which were significant at the 5% significance level, no concluson could be drawn.

  2. A facet_grid was then run, to facet by year. This step is depicted in figure 3. Generally, there was a negative linear relationiship between the variables, with the exception of a few years, with 2006 being the most notable positive linear relationship. Overall, however, there was generally a negative trend.

  3. The next step was to break down the data by continent, as perhaps greater insight could be attained. Unfortunately, separating the data by continent only further muddled the results. In fact, four of the six continents observed had positive slope coefficients, which was evidence to refute the ‘resource curse’. The coefficients by continent can be observed in figure 12. the R-Squared values for these graphs were still quite low, though generally were better fit than the the regression displayed in Figure 11.

  4. The previous visualizations and statistical methods yielded unclear results. I decided to focus solely on countries that were heavily reliant on oil rents as a percentage of gdp. I selected countries with heavy reliance by averaging their oil rents and per capita GDP growth data for the past ten years, and selecting countries who had oil rents averaging over five percent of their GDP. This provided the strongest results. There was a definite negative slope for the line of best fit, as can be seen in Figure 10. Additionally, the slope was statistically significant at the 5% significance level, and the R-Squared was .17, a vastly better value than any of the previous regressions.

Consumer Application

There are plenty of ways to dice this data to suit the consumer. These applications could be useful to:

  • Investors This research could be evidence to investors not to invest in Africa, or Asia. Many of the countries in these continents are heavily reliant on oil, seem to provide data that supports the ‘resource curse’. This is especially relevant given the recent downturn in oil prices over the past few years. Oil prices are now extremely low, and will likely continue to recieve downward presure as alternative sources improve and proliferate around the globe.

  • Governments There are likely many factors at play in policy making decisions, but especially for institutions in Africa and Asia, this information should provide insight that could shape decisions related to international relations, taxation policies, tariffs, etc.

Limitations/Future Research

Following this research, there are a number of ways that greater insight could attained into this topic. For one, no control variables were used. Given that there are myriad factors influencing the GDP Growth of a nation, a future study would attempt to control for this. Some possilbe control variables could be Foreign Direct Investment or The country rank on the Global Peace Index. A study that incorporated these additional factors could give further insights.


References

“GDP per capita growth (annual %)” Data. World Bank, n.d. Web 26 July. 2017.

“Oil Rents (% of GDP).” Data. World Bank, n.d. Web. 26 July. 2017.

Sachs, Jeffery D., and Andrew M. Warner. “Natural Resource Abundance and Economic Growth.” NBER Working Paper No. 5398 (1995)

Sowa, Michael “A 2014 Analysis of Economic Growth and the Resource Curse”. Unpublished Manuscript. The University of Connecticut (2016)

“Country Codes” Data. Geonames.org, n.d. Web 8 August. 2017.