In this assignment, I present some interesting cases using the World Bank’s World Development Indicator’s Data and Google Trends Data.
I am interested to know the effect of chemical fertilizer usage in national production at the aggregate level. Note that this is very simple example and should not be interpreted as actual analysis. The variables I would need are:
Cereal production (metric tons): Production data on cereals relate to crops harvested for dry grain only.
Fertilizer consumption (% of fertilizer production)
GDP per capita proxies the level of development, but because of lot of missing values, I am using total tax share in GDP as a proxy.
Using the World Bank API, I can download the data and conduct the analysis as follows:
#Downloading data from World Bank:
library(wbstats)
str(wb_cachelist, max.level = 1)## List of 8
## $ countries : tibble [304 × 18] (S3: tbl_df/tbl/data.frame)
## $ indicators : tibble [16,649 × 8] (S3: tbl_df/tbl/data.frame)
## $ sources : tibble [63 × 9] (S3: tbl_df/tbl/data.frame)
## $ topics : tibble [21 × 3] (S3: tbl_df/tbl/data.frame)
## $ regions : tibble [48 × 4] (S3: tbl_df/tbl/data.frame)
## $ income_levels: tibble [7 × 3] (S3: tbl_df/tbl/data.frame)
## $ lending_types: tibble [4 × 3] (S3: tbl_df/tbl/data.frame)
## $ languages : tibble [23 × 3] (S3: tbl_df/tbl/data.frame)
#lets search for the available variables using regular expressions:
income_vars <- wb_search(pattern = "GDP")
income_vars2 <- wb_search(pattern = "production")
# I am interested in GC.TAX.TOTL.GD.ZS, AG.PRD.CREL.MT and AG.CON.FERT.PT.ZS
GDPdata <- wb_data("GC.TAX.TOTL.GD.ZS",
start_date = 2000, end_date = 2020,
return_wide = FALSE)
fertilizerdata <- wb_data("AG.CON.FERT.PT.ZS",
start_date = 2000, end_date = 2020,
return_wide = FALSE)
productiondata <- wb_data("AG.PRD.CREL.MT",
start_date = 2000, end_date = 2020,
return_wide = FALSE)
#joining the dataset with multiple joins
library(dplyr)
temp <- full_join(productiondata, fertilizerdata, by = c('country' = 'country', 'date' = 'date'))
dataset2use <- full_join(temp, GDPdata, by = c('country' = 'country', 'date' = 'date'))
#value.x = data for production, value.y = data for fertilizer, value = economy
dataset2use <- subset(dataset2use, select= c(country, date, value.x, value.y, value))
dataset2use$lnproduction = log(dataset2use$value.x)
dataset2use$lnfertilizer = log(dataset2use$value.y)
dataset2use$economy = log(dataset2use$value)
#lets plot the relation and fit a line first,
library(lattice)
xyplot(dataset2use$lnproduction ~ dataset2use$lnfertilizer, type=c("smooth", "p"),
main = "Figure 1: lnProduction vs. lnFertilizer",
xlab = "log of fertilizer consumption",
ylab = "log of cereal production")#panel regression
library(plm)
model <- plm(lnproduction ~ lnfertilizer + economy, data = dataset2use, model="within", effect = "twoways")
#tabulate the regression results:
library(stargazer)
stargazer(model,
type="text",
align=TRUE,
no.space=TRUE,
column.labels=c("log of cereal production"),
covariate.labels = c("lnfertilizer", "Economic Development"),
title="Table 1: Impact of Fertilizer Use in Cereal Production Across Nations (2000-2020)")##
## Table 1: Impact of Fertilizer Use in Cereal Production Across Nations (2000-2020)
## ================================================
## Dependent variable:
## ---------------------------
## lnproduction
## log of cereal production
## ------------------------------------------------
## lnfertilizer 0.075***
## (0.012)
## Economic Development -0.093**
## (0.040)
## ------------------------------------------------
## Observations 1,176
## R2 0.039
## Adjusted R2 -0.048
## F Statistic 21.903*** (df = 2; 1077)
## ================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
This illustrates that, upon using two-way fixed effects regression, increase in 1% consumption of fertilizer increases 0.075% cereal production, which is significant at 1% level. Likewise, with the increasing economy size, the cereal production decreases. Latter result is illustrating that with higher the level of economic development, country produces either more variety of agricultural goods or focuses more on non-agricultural production. This simple illustration in Table 1 interestingly captures the results shown in Figure 1, although the curve in figure is non-linear.
For this section, I am interested to explore what kind of internet searches people had when following stay-at-home orders during COVID pandemic. Many people had to stay at home and they did not have any plans to work on. Of course, internet and social media accompanied all of us.
Lets see the plot of global COVID cases and observe the first few peaks:
Figure 2: COVID Confirmed Cases (Source: OurWorldInData)
Using the Google Trends API, I can download the data and conduct the analysis as follows:
#Downloading data from Google Trends:
library(gtrendsR)
library(reshape2)
#lets explore trend over time first
grocery <- (gtrends(c("grocery"), time = "2019-01-01 2021-01-01", gprop = "web")$interest_over_time)
stocks <- (gtrends(c("stocks"), time = "2019-01-01 2021-01-01", gprop = "web")$interest_over_time)
indoor <- (gtrends(c("indoor games"), time = "2019-01-01 2021-01-01", gprop = "web")$interest_over_time)
netflix <- (gtrends(c("netflix"), time = "2019-01-01 2021-01-01", gprop = "web")$interest_over_time)
amazon <- (gtrends(c("amazon"), time = "2019-01-01 2021-01-01", gprop = "web")$interest_over_time)
library(ggplot2)
figure <- ggplot(grocery, aes(x=date), height = 400, width = 7) +
theme_classic()+
geom_line(aes(x=date, y=hits, colour='Grocery'), size = 0.5)+
geom_line(aes(x=date, y=stocks$hits, colour='Stocks'), size = 0.5)+
geom_line(aes(x=date, y=indoor$hits, colour='IndoorGames'), size = 0.5)+
geom_line(aes(x=date, y=netflix$hits, colour='Netflix'), size = 0.5)+
geom_line(aes(x=date, y=amazon$hits, colour='Amazon'), size = 0.5)+
scale_color_manual(name = "",
values = c("Grocery" = "blue",
"Stocks" = "orange",
"IndoorGames"="darkred",
"Netflix"="black",
"Amazon"="red"))+ #colour manual
xlab("") + ylab("Total Hits")+
ggtitle("Figure 3: Hits in Google for Different Keywords (2019-2021)")
figure We can
notice the pattern for these keywords searched in Google over time. The
COVID cases, globally, started increasing after mid-Feb to early March,
2020 and that’s when the keywords like grocery, stocks, indoor games,
netflix and amazon were increasingly googled. With the stay-at-home
orders in most of the nations, people’s use of internet and reliance in
internet for daily activities has increased. But over time, these
patterns have been fallen across the globe.
Now, lets go ahead and see which cities searched “grocery” (for example) the most during the COVID time. We may think of these cities have the peoples who were really worried of managing groceries during pandemic.
# lets do some plots based on five cities that used the keyword "grocery" the most
grocery2 = (gtrends(c("grocery"), gprop = "web", time = "2020-01-01 2021-01-01")$interest_by_city)
library(dplyr)
grocery2 <- grocery2 %>% slice_max(hits, n = 5) #taking only 5 cities
barplot(grocery2$hits,names.arg=grocery2$location,xlab="Cities",ylab="Total Hits",col="blue",
main="Figure 4: Five Major Cities that googled 'Grocery' during 2020",border="red") Among all
the cities across the world, Columbus followed by Topeka and Athens
searched “grocery” in google for the highest number of times during
COVID pandemic.