AAEC 8610 Homework 6

Task 1: Use World Bank data to analyze something

Part 1.1 Get the data

I download the online WB data on food production index and the intensity of fertizer use as shown below:

library(wbstats)
str(wb_cachelist, max.level=1)

## List of 7
##  $ countries  :'data.frame': 304 obs. of  18 variables:
##  $ indicators :'data.frame': 16978 obs. of  7 variables:
##  $ sources    :'data.frame': 43 obs. of  8 variables:
##  $ datacatalog:'data.frame': 238 obs. of  29 variables:
##  $ topics     :'data.frame': 21 obs. of  3 variables:
##  $ income     :'data.frame': 7 obs. of  3 variables:
##  $ lending    :'data.frame': 4 obs. of  3 variables:

newWbCache <- wbcache()
wbsearch("fertilizer consumption .*kilograms per hectare", cache=newWbCache)

##        indicatorID
## 992 AG.CON.FERT.ZS
##                                                         indicator
## 992 Fertilizer consumption (kilograms per hectare of arable land)

wbsearch("food production index .*100)", cache=newWbCache)

##           indicatorID                                          indicator
## 1057   AG.PRD.FOOD.XD            Food production index (2004-2006 = 100)
## 1061  AG.PRD.GFOOD.XD     Food production index (gross, 1999-2001 = 100)
## 1063 AG.PRD.GNFOOD.XD Non-food production index (gross, 1999-2001 = 100)
## 1067  AG.PRD.NFOOD.XD  Gross non-food production index (1999-2001 = 100)

newWbData <- wb(indicator= c("AG.CON.FERT.ZS", "AG.PRD.FOOD.XD"))
names(newWbData)

## [1] "iso3c"       "date"        "value"       "indicatorID" "indicator"  
## [6] "iso2c"       "country"

Part 1.2 Clean up the data

I now remove the aggregates at region level and rename the indicators.

newWbDataCountries <- wbcountries()
names(newWbDataCountries)

##  [1] "iso3c"         "iso2c"         "country"       "capital"      
##  [5] "long"          "lat"           "regionID"      "region_iso2c" 
##  [9] "region"        "adminID"       "admin_iso2c"   "admin"        
## [13] "incomeID"      "income_iso2c"  "income"        "lendingID"    
## [17] "lending_iso2c" "lending"

newWbData <- merge(newWbData, y=newWbDataCountries[c("iso2c", "region")], by="iso2c", all.x=T)
newWbData$indicatorID[newWbData$indicatorID=="AG.CON.FERT.ZS"] <- "Intensity_fertilizer_use"
newWbData$indicatorID[newWbData$indicatorID=="AG.PRD.FOOD.XD"] <- "Food_production_index"

library(reshape2)
newWbData <- dcast(newWbData, iso2c + country + date + region ~ indicatorID, value.var = 'value')

Part 1.3 Plot a graph

Here is plot of the relationship between food production index (unit-free and covers edible food crops with nutrient contents) and the intensity of fertilizer use (kilgrams per hectare of arable land) in 2008. A couple of comments are worth notcing:

there is an outlier, Singapore, with extreme value of fertilizer use ;
excluding that country, however, it seems that there is no such association in 2008, which is a year of global food crisis.

ggplot(subset(newWbData, date=="2008"), aes(x=Intensity_fertilizer_use, y=Food_production_index, color=country=="Singapore")) + geom_point()

To assess further whether the potential association varies across time, I made a plot for the subsequent years in the dataset. As illustrated, there is no descriptive evidence that large food production is associated with higher use of fertilizer. Other factors might explain food production: agroecological conditions, agricultural mechanizaton, and human capital of farmers, etc. Also, per hectare food productivity rather than the unit-free food production index might correlate with per hectare fertilizer use.

ggplot(subset(newWbData, date > "2007" & date <= "2018"), aes(x=Intensity_fertilizer_use, y=Food_production_index, color = country == "Singapore", tooltip=country)) + geom_point() + facet_wrap(~ date) + theme(legend.position = "none")

Task 2: Use Google trends data to analyze something

Part 2.1. Get the data

Using Google trends, I obtain food data for the US covering the period 2007-2020.

fooddata = gtrends(c("food"), gprop = "web",  geo = c("US"), time = "2007-01-01 2020-01-01") [[1]] 

fooddata = dcast(fooddata, date ~ keyword + geo, value.var = "hits") 
tail(fooddata)

##           date food_US
## 152 2019-08-01      97
## 153 2019-09-01      89
## 154 2019-10-01      89
## 155 2019-11-01      92
## 156 2019-12-01      90
## 157 2020-01-01      90

Part 2.2. Plot a graph

I now plot these data. The data suggest that, in the US, food has been a matter of an increasing interest in the year of global food crisis (2008); it slows down until 2015, but is becoming more serious since 2015.

ggplot(fooddata, aes(x=date, y=food_US)) + geom_line() + labs(x="Year", y="Google trends for food")