library(tidyverse)
library(ggplot2)
library(readr)
library(here)
library(dplyr)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)Challenge 6 Instructions
Challenge Overview
Today’s challenge is to:
- read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)
- tidy data (as needed, including sanity checks)
- mutate variables as needed (including sanity checks)
- create at least one graph including time (evolution)
- try to make them “publication” ready (optional)
- Explain why you choose the specific graph type
- Create at least one graph depicting part-whole relationships
- By part whole, we just mean how one variable’s constituent parts differ (or don’t). For example, does the gender of computer programmers affect salary?
- try to make them “publication” ready (optional)
- Explain why you choose the specific graph type
R Graph Gallery is a good starting point for thinking about what information is conveyed in standard graph types, and includes example R code.
(be sure to only include the category tags for the data you use!)
Read in data
Read in one (or more) of the following datasets, using the correct R package and command.
- debt ⭐
- fed_rate ⭐⭐
- abc_poll ⭐⭐⭐
- usa_hh ⭐⭐⭐
- hotel_bookings ⭐⭐⭐⭐
- AB_NYC ⭐⭐⭐⭐⭐
Briefly describe the data
Tidy Data (as needed)
Is your data already tidy, or is there work to be done? Be sure to anticipate your end result to provide a sanity check, and document your work here.
Are there any variables that require mutation to be usable in your analysis stream? For example, do you need to calculate new values in order to graph them? Can string values be represented numerically? Do you need to turn any variables into factors and reorder for ease of graphics and visualization?
Document your work here.
Time Dependent Visualization
Visualizing Part-Whole Relationships
Solutions
Reading the Data
The working directory for RStudio has been set such that “FedFundsRate.csv” can be found at the root of the working directory using the setwd() method.
ffr_orig <- read_csv(here("FedFundsRate.csv"))
ffr_orig# A tibble: 904 × 10
Year Month Day `Federal Funds Target Rate` `Federal Funds Upper Target`
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1954 7 1 NA NA
2 1954 8 1 NA NA
3 1954 9 1 NA NA
4 1954 10 1 NA NA
5 1954 11 1 NA NA
6 1954 12 1 NA NA
7 1955 1 1 NA NA
8 1955 2 1 NA NA
9 1955 3 1 NA NA
10 1955 4 1 NA NA
# ℹ 894 more rows
# ℹ 5 more variables: `Federal Funds Lower Target` <dbl>,
# `Effective Federal Funds Rate` <dbl>, `Real GDP (Percent Change)` <dbl>,
# `Unemployment Rate` <dbl>, `Inflation Rate` <dbl>
Data Description
High Level Description
The data set comprises of 904 rows with 10 columns.
ffr_orig# A tibble: 904 × 10
Year Month Day `Federal Funds Target Rate` `Federal Funds Upper Target`
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1954 7 1 NA NA
2 1954 8 1 NA NA
3 1954 9 1 NA NA
4 1954 10 1 NA NA
5 1954 11 1 NA NA
6 1954 12 1 NA NA
7 1955 1 1 NA NA
8 1955 2 1 NA NA
9 1955 3 1 NA NA
10 1955 4 1 NA NA
# ℹ 894 more rows
# ℹ 5 more variables: `Federal Funds Lower Target` <dbl>,
# `Effective Federal Funds Rate` <dbl>, `Real GDP (Percent Change)` <dbl>,
# `Unemployment Rate` <dbl>, `Inflation Rate` <dbl>
The data set has a total of 10 <dbl> type columns. The Year, Month and Day variables contain date at which the observation was recorded. The Federal Funds Target Rate, Federal Funds Upper Target and Federal Funds Lower Target and Effective Federal Funds Rate variables contain the ideal, upper bound, lower bound and actual federal funds rates. The Real GDP (Percent Change), Unemployment Rate and Inflation Rate variables contain the percentage of GDP change recorded, rate of unemployment and the inflation rate recorded for the date.
Each case represents a monthly observation of macroeconomic conditions alongside federal fund rate change.
How was the Data likely collected?
From Wikipedia, the effective federal funds rate is published daily by the Federal Reserve Bank of New York. The federal funds target range is determined by the Federal Open Market Committee (FOMC) which normally occurs eight times a year about seven weeks apart. The target range is chosen to influence macroeconomic conditions like inflation, unemployment and GDP. The data contains monthly observations from 1954 until 2017.
Tidying the Data
The ffr_orig data in its current form is not tidy. The anticipated end result should be the Year, Month and Day variables combined into a single date variables and all “NA” values for other columns should be filled.
The Year, Month and Day variables can be mutated into a single date variable with the “date” data type. Additionally, since GDP, Effective Federal Rate, Unemployment and Inflation rates are continuous variables that change overtime, these have been “filled” downward. Since ffr_orig does not contain “Inflation Rate” for observations before 1958, this has assumed to be 0. The Federal Funds Target Rate has been set equal to the Effective Federal Funds Rate wherever it is “NA”. The Federal Funds Upper Target is set to the maximum of the Federal Funds Target Rate and the Effective Federal Funds Rate. Similarly, the Federal Funds Lower Target has been set to the minimum of the Federal Funds Target Rate and the Effective Federal Funds Rate.
ffr <- ffr_orig%>%
mutate(date = str_c(Year, Month, Day, sep="-"),
date = ymd(date)) %>%
fill(`Real GDP (Percent Change)`, `Unemployment Rate`, `Inflation Rate`, `Effective Federal Funds Rate`, .direction = "down") %>%
mutate(`Inflation Rate` = ifelse(year(date) < 1958, 0, `Inflation Rate`),
`Federal Funds Target Rate` = ifelse(is.na(`Federal Funds Target Rate`), `Effective Federal Funds Rate`, `Federal Funds Target Rate`),
`Federal Funds Upper Target` = pmax(`Federal Funds Target Rate`, `Effective Federal Funds Rate`),
`Federal Funds Lower Target` = pmin(`Federal Funds Target Rate`, `Effective Federal Funds Rate`)) %>%
select(-Year, -Month, -Day)
ffr# A tibble: 904 × 8
`Federal Funds Target Rate` Federal Funds Upper Targ…¹ Federal Funds Lower …²
<dbl> <dbl> <dbl>
1 0.8 0.8 0.8
2 1.22 1.22 1.22
3 1.06 1.06 1.06
4 0.85 0.85 0.85
5 0.83 0.83 0.83
6 1.28 1.28 1.28
7 1.39 1.39 1.39
8 1.29 1.29 1.29
9 1.35 1.35 1.35
10 1.43 1.43 1.43
# ℹ 894 more rows
# ℹ abbreviated names: ¹`Federal Funds Upper Target`,
# ²`Federal Funds Lower Target`
# ℹ 5 more variables: `Effective Federal Funds Rate` <dbl>,
# `Real GDP (Percent Change)` <dbl>, `Unemployment Rate` <dbl>,
# `Inflation Rate` <dbl>, date <date>
A sanity check for the above mutated data is that the ffr contains the same number of observations as ffr_orig and also starts with “1954-07-01” as the starting date and ends with “2017-03-16” implying no loss of data. Additionally, none of the observations have “NA” for any of the variables.
ffr %>%
filter(rowSums(is.na(.)) > 0)# A tibble: 0 × 8
# ℹ 8 variables: Federal Funds Target Rate <dbl>,
# Federal Funds Upper Target <dbl>, Federal Funds Lower Target <dbl>,
# Effective Federal Funds Rate <dbl>, Real GDP (Percent Change) <dbl>,
# Unemployment Rate <dbl>, Inflation Rate <dbl>, date <date>
Time Dependent Visualization
The following queries demonstrate the time dependent evolution of Real GDP (Percent Change), Unemployment Rate and Inflation Rate.
ffr %>%
ggplot(aes(date, `Real GDP (Percent Change)`))+
geom_path() +
scale_x_date(limits=range(ffr$date),breaks="5 years",date_labels = "%Y")+
scale_y_continuous(limits=c(floor(min(ffr$`Real GDP (Percent Change)`)),ceiling(max(ffr$`Real GDP (Percent Change)`))))+
labs(x="Year",y="Percentage of Real GDP Change",
title="US Real GDP Change, 1954-2017")From the above plot, it is observed that the GDP percent is impacted the most negatively in 1958 and the most positively in 1978.
ffr %>%
ggplot(aes(date, `Unemployment Rate`))+
geom_path() +
scale_x_date(limits=range(ffr$date),breaks="10 years",date_labels = "%Y")+
scale_y_continuous(limits=c(0, ceiling(max(ffr$`Unemployment Rate`))))+
labs(x="Year",y="Unemployment Rate",
title="US Unemployment Rate, 1954-2017")From the above plot, it is observed that the unemployment rate is the highest in 1982 and the lowest in 1969.
ffr %>%
ggplot(aes(date, `Inflation Rate`))+
geom_path() +
scale_x_date(limits=range(ffr$date),breaks="10 years",date_labels = "%Y")+
scale_y_continuous(limits=c(0, ceiling(max(ffr$`Inflation Rate`))))+
labs(x="Year",y="Inflation Rate",
title="US Inflation Rate, 1954-2017")From the above plot it is observed that the inflation rate is the highest in 1980 and the lowest in 2010 (excluding the pre-filled 0s).
Why Choose a Path Plot?
A geom_path() is best suited to visualize variations of a continuous data variable. Trends in the data can easily be observed using a path plot. Additionally, a path plot is useful in case the data has missing values or discrepancies since the general trend will continue to be correct even if a few data points are inaccurate. The highest and lowest values of a continuous variable are easily visible through a path plot.
Visualizing Part-Whole Relationships
The following query helps visualize part-whole relationships for Real GDP (Percent Change), Unemployment Rate and Inflation Rate variables as seen across 20 year intervals in 1970, 1990 and 2010.
ffr %>%
mutate(year = year(date)) %>%
filter(year %in% c(1970, 1990, 2010)) %>%
group_by(year) %>%
summarize(across(c(`Real GDP (Percent Change)`, `Unemployment Rate`, `Inflation Rate`), mean, na.rm = T)) %>%
gather(key = "Variable", value = "Mean Value", -year) %>%
ggplot(aes(x = factor(year), y = `Mean Value`, fill = Variable)) +
geom_bar(stat = "identity", position = "dodge") +
labs(title = "Grouped Bar Chart of Mean Values",
x = "Year",
y = "Mean Value") +
theme_minimal()Why Choose a Grouped Bar Plot?
A grouped bar plot is a R Gallery recommended plot for visualizing part-whole relationships. It helps compare mean values across multiple categories (the GDP, unemployment and inflation rates in this case). The plot effectively communicates the part-to-whole relationship by showing how each variable contributes to the overall mean for each year. In this case, each variable is a “part,” and the overall mean for a specific year is the “whole.” Therefore, the plot shows how a “whole” year’s multiple “parts” (GDP, unemployment rate and inflation rate) differ.