HW6

Task 1: Use world bank data to analyze something

I focus on the unemployment rates of the United State of America for this assignment.

library(wbstats)
library(ggplot2)

str(wb_cachelist, max.level = 1)

## List of 7
##  $ countries  :'data.frame': 304 obs. of  18 variables:
##  $ indicators :'data.frame': 16978 obs. of  7 variables:
##  $ sources    :'data.frame': 43 obs. of  8 variables:
##  $ datacatalog:'data.frame': 238 obs. of  29 variables:
##  $ topics     :'data.frame': 21 obs. of  3 variables:
##  $ income     :'data.frame': 7 obs. of  3 variables:
##  $ lending    :'data.frame': 4 obs. of  3 variables:

new_cache <- wbcache()
unemploy_vars <- wbsearch(pattern = "unemployment")
head(unemploy_vars)

##    indicatorID
## 35   WP15177.9
## 36   WP15177.8
## 37   WP15177.7
## 38   WP15177.6
## 39   WP15177.5
## 40   WP15177.4
##                                                                                        indicator
## 35         Received government transfers in the past year, income, richest 60% (% ages 15+) [w2]
## 36         Received government transfers in the past year, income, poorest 40% (% ages 15+) [w2]
## 37 Received government transfers in the past year, secondary education or more (% ages 15+) [w2]
## 38   Received government transfers in the past year, primary education or less (% ages 15+) [w2]
## 39                Received government transfers in the past year, older adults (% ages 25+) [w2]
## 40              Received government transfers in the past year, young adults (% ages 15-24) [w2]

unEmpRate <-wb(indicator = "SL.UEM.TOTL.NE.ZS")
head(unEmpRate)

##    iso3c date     value       indicatorID
## 4    ARB 2016  9.559937 SL.UEM.TOTL.NE.ZS
## 6    ARB 2014 10.526193 SL.UEM.TOTL.NE.ZS
## 8    ARB 2012 10.674805 SL.UEM.TOTL.NE.ZS
## 9    ARB 2011 11.146858 SL.UEM.TOTL.NE.ZS
## 10   ARB 2010  9.301470 SL.UEM.TOTL.NE.ZS
## 11   ARB 2009  9.501470 SL.UEM.TOTL.NE.ZS
##                                                           indicator iso2c
## 4  Unemployment, total (% of total labor force) (national estimate)    1A
## 6  Unemployment, total (% of total labor force) (national estimate)    1A
## 8  Unemployment, total (% of total labor force) (national estimate)    1A
## 9  Unemployment, total (% of total labor force) (national estimate)    1A
## 10 Unemployment, total (% of total labor force) (national estimate)    1A
## 11 Unemployment, total (% of total labor force) (national estimate)    1A
##       country
## 4  Arab World
## 6  Arab World
## 8  Arab World
## 9  Arab World
## 10 Arab World
## 11 Arab World

class(unEmpRate)

## [1] "data.frame"

str(unEmpRate)

## 'data.frame':    4311 obs. of  7 variables:
##  $ iso3c      : chr  "ARB" "ARB" "ARB" "ARB" ...
##  $ date       : chr  "2016" "2014" "2012" "2011" ...
##  $ value      : num  9.56 10.53 10.67 11.15 9.3 ...
##  $ indicatorID: chr  "SL.UEM.TOTL.NE.ZS" "SL.UEM.TOTL.NE.ZS" "SL.UEM.TOTL.NE.ZS" "SL.UEM.TOTL.NE.ZS" ...
##  $ indicator  : chr  "Unemployment, total (% of total labor force) (national estimate)" "Unemployment, total (% of total labor force) (national estimate)" "Unemployment, total (% of total labor force) (national estimate)" "Unemployment, total (% of total labor force) (national estimate)" ...
##  $ iso2c      : chr  "1A" "1A" "1A" "1A" ...
##  $ country    : chr  "Arab World" "Arab World" "Arab World" "Arab World" ...

usaunEmpRate <- wb(country="USA", indicator = "SL.UEM.TOTL.NE.ZS")
head(usaunEmpRate[1:3])

##   iso3c date  value
## 2   USA 2018 3.8956
## 3   USA 2017 4.3552
## 4   USA 2016 4.8692
## 5   USA 2015 5.2800
## 6   USA 2014 6.1675
## 7   USA 2013 7.3749

unEmpRateUSA <- wb(country=c("USA"), indicator = "SL.UEM.TOTL.NE.ZS")
g <- ggplot(unEmpRateUSA, aes(x=as.numeric(date), y=value, color=country)) +
  geom_line() +
  labs(title="Unemployment Rate in USA",
       x= "Date",
       y= "Percent")
g

We can draw two conclusions. First, unemployment rates peaked around 1982 and 2009 and reached around 10% due to the early 1980s recession and the Great recession of 2008. Since they were a severe global economic recessions, I want to compare with other countries in Europe and Asia, especailly, United Kingdom and Japan.

unEmpRateUSKORENG <- wb(country=c("USA","JPN","GBR"), indicator = "SL.UEM.TOTL.NE.ZS")
h <- ggplot(unEmpRateUSKORENG, aes(x=as.numeric(date), y=value, color=country)) +
  geom_line() +
  labs(title="Unemployment Rate Comparisons",
       x= "Date",
       y= "Percent")
h

Recession hit the United Kingdom the most at the beginning of the 1980s. The unemployment rate of the United Kingdom was gradually decreased since 1982, while the United State had some cycles but overall, the unemployment rate of the United State also was decrease. The United States exited the recession relatively early. Japan had a different pattern from the United Kingdom and the United State. the unemployment rate of Japan peaked 2001 around 5%.

Task 2: Use Google trends data to analyze something

I often play tennis. My key word is “Tennis” for this assignment. I would like to see growing or decresaing popularity or to review periodic variations over the year.

library(gtrendsR)
library(reshape2)
googleData = gtrends(c("Tennis"), gprop = "web", time = "today 12-m")[[1]]
googleData = dcast(googleData, date ~ keyword + geo, value.var = "hits")
plot(googleData, type="l")

The graph peaked around September 1 to 7 because US Open begins last Monday in August and ends on second Sunday in September. The graph had the second peak around July because other US Open series begin July.

Now, I am wondering if other countries which have the Grand Slam tournaments have different trends on keyword, “tennis”. There are 4 Grand Slam tournaments: Australian Open, French Open, Wimbledon (England), and US Open. Therefore, I selected three countries, such as Australia, France, and England.

google.trends = gtrends(c("Tennis"), geo = c("US", "AU", "GB", "FR"), gprop = "web", time = "today 12-m")
plot(google.trends)

Let’s check each Grand Slam dates.

Australian Open: mid/late January
French Open: late May/early June
Wimbledon: late June/early July
US Open: late August/ early September

Australia has a peak in January; Franch has a peak in June; England has a peak in July. People like their own Grand Slam more.

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(ggplot2)
library(gtrendsR)
library(maps)

world <- map_data("world")

world %>%
  mutate(region = replace(region, region=="USA", "United States")) %>%
  mutate(region = replace(region, region=="UK", "United Kingdom")) -> world

tennis_world <- gtrends("tennis", time = "today 12-m")

# create data frame for plotting
tennis_world$interest_by_country %>%
  filter(location %in% world$region, hits > 0) %>%
  mutate(region = location, hits = as.numeric(hits)) %>%
  select(region, hits) -> my_df

ggplot() +
  geom_map(data = world,
           map = world,
           aes(x = long, y = lat, map_id = region),
           fill="#ffffff", color="#ffffff", size=0.15) +
  geom_map(data = my_df,
           map = world,
           aes(fill = hits, map_id = region),
           color="#ffffff", size=0.15) +
  scale_fill_continuous(low = 'grey', high = 'red') +
  theme(axis.ticks = element_blank(),
        axis.text = element_blank(),
        axis.title = element_blank())

HW6

Yoo

2/17/2020

Task 1: Use world bank data to analyze something

Task 2: Use Google trends data to analyze something