Task 1: Use World Bank data to analyze something

Part 1.1 Get the data

I download the online WB data on food production index and the intensity of fertizer use as shown below:

library(wbstats)
str(wb_cachelist, max.level=1)
## List of 7
##  $ countries  :'data.frame': 304 obs. of  18 variables:
##  $ indicators :'data.frame': 16978 obs. of  7 variables:
##  $ sources    :'data.frame': 43 obs. of  8 variables:
##  $ datacatalog:'data.frame': 238 obs. of  29 variables:
##  $ topics     :'data.frame': 21 obs. of  3 variables:
##  $ income     :'data.frame': 7 obs. of  3 variables:
##  $ lending    :'data.frame': 4 obs. of  3 variables:
newWbCache <- wbcache()
wbsearch("fertilizer consumption .*kilograms per hectare", cache=newWbCache)
##        indicatorID
## 992 AG.CON.FERT.ZS
##                                                         indicator
## 992 Fertilizer consumption (kilograms per hectare of arable land)
wbsearch("food production index .*100)", cache=newWbCache)
##           indicatorID                                          indicator
## 1057   AG.PRD.FOOD.XD            Food production index (2004-2006 = 100)
## 1061  AG.PRD.GFOOD.XD     Food production index (gross, 1999-2001 = 100)
## 1063 AG.PRD.GNFOOD.XD Non-food production index (gross, 1999-2001 = 100)
## 1067  AG.PRD.NFOOD.XD  Gross non-food production index (1999-2001 = 100)
newWbData <- wb(indicator= c("AG.CON.FERT.ZS", "AG.PRD.FOOD.XD"))
names(newWbData)
## [1] "iso3c"       "date"        "value"       "indicatorID" "indicator"  
## [6] "iso2c"       "country"

Part 1.2 Clean up the data

I now remove the aggregates at region level and rename the indicators.

newWbDataCountries <- wbcountries()
names(newWbDataCountries)
##  [1] "iso3c"         "iso2c"         "country"       "capital"      
##  [5] "long"          "lat"           "regionID"      "region_iso2c" 
##  [9] "region"        "adminID"       "admin_iso2c"   "admin"        
## [13] "incomeID"      "income_iso2c"  "income"        "lendingID"    
## [17] "lending_iso2c" "lending"
newWbData <- merge(newWbData, y=newWbDataCountries[c("iso2c", "region")], by="iso2c", all.x=T)
newWbData$indicatorID[newWbData$indicatorID=="AG.CON.FERT.ZS"] <- "Intensity_fertilizer_use"
newWbData$indicatorID[newWbData$indicatorID=="AG.PRD.FOOD.XD"] <- "Food_production_index"

library(reshape2)
newWbData <- dcast(newWbData, iso2c + country + date + region ~ indicatorID, value.var = 'value')

Part 1.3 Plot a graph

Here is plot of the relationship between food production index (unit-free and covers edible food crops with nutrient contents) and the intensity of fertilizer use (kilgrams per hectare of arable land) in 2008. A couple of comments are worth notcing:

  • there is an outlier, Singapore, with extreme value of fertilizer use ;

  • excluding that country, however, it seems that there is no such association in 2008, which is a year of global food crisis.

ggplot(subset(newWbData, date=="2008"), aes(x=Intensity_fertilizer_use, y=Food_production_index, color=country=="Singapore")) + geom_point()

To assess further whether the potential association varies across time, I made a plot for the subsequent years in the dataset. As illustrated, there is no descriptive evidence that large food production is associated with higher use of fertilizer. Other factors might explain food production: agroecological conditions, agricultural mechanizaton, and human capital of farmers, etc. Also, per hectare food productivity rather than the unit-free food production index might correlate with per hectare fertilizer use.

ggplot(subset(newWbData, date > "2007" & date <= "2018"), aes(x=Intensity_fertilizer_use, y=Food_production_index, color = country == "Singapore", tooltip=country)) + geom_point() + facet_wrap(~ date) + theme(legend.position = "none")