This challenge will focus on mutations and sanity checks. The dataset being used is the Federal Funds rate.
First the necessary libraries are loaded.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(readr)
library(here)
## here() starts at C:/Users/SHAURYA/Desktop/Studies/Winter 2024 601/Challenges/challenge 4
library(zoo)
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
Loading the dataset,
fed_data <- read_csv("FedFundsRate.csv", show_col_types = FALSE)
Taking a look at the data,
fed_data
## # A tibble: 904 × 10
## Year Month Day `Federal Funds Target Rate` `Federal Funds Upper Target`
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1954 7 1 NA NA
## 2 1954 8 1 NA NA
## 3 1954 9 1 NA NA
## 4 1954 10 1 NA NA
## 5 1954 11 1 NA NA
## 6 1954 12 1 NA NA
## 7 1955 1 1 NA NA
## 8 1955 2 1 NA NA
## 9 1955 3 1 NA NA
## 10 1955 4 1 NA NA
## # ℹ 894 more rows
## # ℹ 5 more variables: `Federal Funds Lower Target` <dbl>,
## # `Effective Federal Funds Rate` <dbl>, `Real GDP (Percent Change)` <dbl>,
## # `Unemployment Rate` <dbl>, `Inflation Rate` <dbl>
We can retrieve all column names to understand what they are.
colnames(fed_data)
## [1] "Year" "Month"
## [3] "Day" "Federal Funds Target Rate"
## [5] "Federal Funds Upper Target" "Federal Funds Lower Target"
## [7] "Effective Federal Funds Rate" "Real GDP (Percent Change)"
## [9] "Unemployment Rate" "Inflation Rate"
The following gives the description of each column.
Year - Denotes the year.
Month - Represents each month of the year.
Day - Set of days related to the month.
Federal Funds Target Rate - It is the targest interest range set by the Federal Reserve in which banks charge other institutions for lending excess cash to them from their reserves on overnight basis.
Federal Funds Upper Target - It is the highest point of the target rate range.
Federal Funds Lower Target - It is the lowest point of the target rate range.
Effective Federal Funds Rate - This is the actual rate at which the banks deal with the cash exchange.
Real GDP - It is the GDP adjusted for inflation. Measure of economic output of country after accounting for changes in overall price level.
Unemployment Rate - It is the quarterly rate of unemployment in the country. An update is given once every 3 months.
Inflation Rate - Represents the monthly inflation.
We can get the dimensions of the data.
dim(fed_data)
## [1] 904 10
There are about 904 different months with each month of the year as its own row.
We can have a look at the different years being covered in this dataset.
all_years <- select(fed_data, "Year")
all_years
## # A tibble: 904 × 1
## Year
## <dbl>
## 1 1954
## 2 1954
## 3 1954
## 4 1954
## 5 1954
## 6 1954
## 7 1955
## 8 1955
## 9 1955
## 10 1955
## # ℹ 894 more rows
We can see the distribution of the years.
table(all_years)
## Year
## 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969
## 6 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12
## 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985
## 12 12 12 12 12 12 12 12 12 12 12 12 16 20 25 21
## 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001
## 19 21 24 23 17 21 15 12 18 14 13 13 15 15 15 23
## 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017
## 13 13 17 19 16 15 19 12 12 12 12 12 12 13 13 4
While most years have only one day per month as the row there are certain years with way more such as 1984 and 1988 and some are well below the average like 1954 and 2017.
We can tidy the data by filling in the missing values. We can do this by using the mean and approximation operations.
First we check for missing data.
summary(is.na(fed_data[c("Federal Funds Target Rate", "Federal Funds Upper Target", "Federal Funds Lower Target", "Effective Federal Funds Rate", "Real GDP (Percent Change)", "Unemployment Rate", "Inflation Rate")]))
## Federal Funds Target Rate Federal Funds Upper Target
## Mode :logical Mode :logical
## FALSE:462 FALSE:103
## TRUE :442 TRUE :801
## Federal Funds Lower Target Effective Federal Funds Rate
## Mode :logical Mode :logical
## FALSE:103 FALSE:752
## TRUE :801 TRUE :152
## Real GDP (Percent Change) Unemployment Rate Inflation Rate
## Mode :logical Mode :logical Mode :logical
## FALSE:250 FALSE:752 FALSE:710
## TRUE :654 TRUE :152 TRUE :194
We see that there is plenty of missing data in each of these columns so it is important to fill them as it will completely tidy the data.
fed_data$`Federal Funds Target Rate`[is.na(fed_data$`Federal Funds Target Rate`)] <- mean(fed_data$`Federal Funds Target Rate`, na.rm = TRUE)
fed_data$`Federal Funds Upper Target`[is.na(fed_data$`Federal Funds Upper Target`)] <- mean(fed_data$`Federal Funds Upper Target`, na.rm = TRUE)
fed_data$`Federal Funds Lower Target`[is.na(fed_data$`Federal Funds Lower Target`)] <- mean(fed_data$`Federal Funds Lower Target`, na.rm = TRUE)
fed_data$`Effective Federal Funds Rate`[is.na(fed_data$`Effective Federal Funds Rate`)] <- mean(fed_data$`Effective Federal Funds Rate`, na.rm = TRUE)
fed_data$`Real GDP (Percent Change)` <- na.aggregate(fed_data$`Real GDP (Percent Change)`, FUN = mean, na.rm = TRUE)
fed_data$`Unemployment Rate`[is.na(fed_data$`Unemployment Rate`)] <- mean(fed_data$`Unemployment Rate`, na.rm = TRUE)
fed_data$`Inflation Rate`[is.na(fed_data$`Inflation Rate`)] <- mean(fed_data$`Inflation Rate`, na.rm = TRUE)
Now if we check for missing values again-
summary(is.na(fed_data[c("Federal Funds Target Rate", "Federal Funds Upper Target", "Federal Funds Lower Target", "Effective Federal Funds Rate", "Real GDP (Percent Change)", "Unemployment Rate", "Inflation Rate")]))
## Federal Funds Target Rate Federal Funds Upper Target
## Mode :logical Mode :logical
## FALSE:904 FALSE:904
## Federal Funds Lower Target Effective Federal Funds Rate
## Mode :logical Mode :logical
## FALSE:904 FALSE:904
## Real GDP (Percent Change) Unemployment Rate Inflation Rate
## Mode :logical Mode :logical Mode :logical
## FALSE:904 FALSE:904 FALSE:904
We see that all values have been filled.
We now perform sanity checks to see if we tidied the data in a reasonable manner.
We can do that by checking stats like mean, median, max and min for each of these columns.
We can get that by the summary function.
summary(fed_data[, c(
"Federal Funds Target Rate", "Federal Funds Upper Target",
"Federal Funds Lower Target", "Effective Federal Funds Rate",
"Real GDP (Percent Change)", "Unemployment Rate", "Inflation Rate"
)])
## Federal Funds Target Rate Federal Funds Upper Target
## Min. : 1.000 Min. :0.2500
## 1st Qu.: 5.500 1st Qu.:0.3083
## Median : 5.658 Median :0.3083
## Mean : 5.658 Mean :0.3083
## 3rd Qu.: 5.658 3rd Qu.:0.3083
## Max. :11.500 Max. :1.0000
## Federal Funds Lower Target Effective Federal Funds Rate
## Min. :0.00000 Min. : 0.070
## 1st Qu.:0.05825 1st Qu.: 2.940
## Median :0.05825 Median : 4.911
## Mean :0.05825 Mean : 4.911
## 3rd Qu.:0.05825 3rd Qu.: 6.013
## Max. :0.75000 Max. :19.100
## Real GDP (Percent Change) Unemployment Rate Inflation Rate
## Min. :-10.000 Min. : 3.400 Min. : 0.600
## 1st Qu.: 3.138 1st Qu.: 5.100 1st Qu.: 2.200
## Median : 3.138 Median : 5.979 Median : 3.733
## Mean : 3.138 Mean : 5.979 Mean : 3.733
## 3rd Qu.: 3.138 3rd Qu.: 6.700 3rd Qu.: 4.300
## Max. : 16.500 Max. :10.800 Max. :13.600
From the above result, we see that the sanity check does show that tidy operation was successful as the values in a reasonable range as expected.
We can use a single date column instead of separate year, month, and day columns. We can also calculate the spread of upper and lower targets and the difference between the federal rates and effective rates.
fed_data <- fed_data %>%
mutate(Date = as.Date(paste(Year, Month, Day, sep = "-")))
fed_data
## # A tibble: 904 × 11
## Year Month Day `Federal Funds Target Rate` `Federal Funds Upper Target`
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1954 7 1 5.66 0.308
## 2 1954 8 1 5.66 0.308
## 3 1954 9 1 5.66 0.308
## 4 1954 10 1 5.66 0.308
## 5 1954 11 1 5.66 0.308
## 6 1954 12 1 5.66 0.308
## 7 1955 1 1 5.66 0.308
## 8 1955 2 1 5.66 0.308
## 9 1955 3 1 5.66 0.308
## 10 1955 4 1 5.66 0.308
## # ℹ 894 more rows
## # ℹ 6 more variables: `Federal Funds Lower Target` <dbl>,
## # `Effective Federal Funds Rate` <dbl>, `Real GDP (Percent Change)` <dbl>,
## # `Unemployment Rate` <dbl>, `Inflation Rate` <dbl>, Date <date>
select(fed_data, Date)
## # A tibble: 904 × 1
## Date
## <date>
## 1 1954-07-01
## 2 1954-08-01
## 3 1954-09-01
## 4 1954-10-01
## 5 1954-11-01
## 6 1954-12-01
## 7 1955-01-01
## 8 1955-02-01
## 9 1955-03-01
## 10 1955-04-01
## # ℹ 894 more rows
We see the new Date column being successfully added. We can see that the information is redundant so we can use the Date column to replace the three columns and place it first ahead of other variables.
fed_data <- fed_data %>%
mutate(Date = as.Date(paste(Year, Month, Day, sep = "-"))) %>%
select(-Year, -Month, -Day) %>%
select(Date, everything())
fed_data
## # A tibble: 904 × 8
## Date `Federal Funds Target Rate` `Federal Funds Upper Target`
## <date> <dbl> <dbl>
## 1 1954-07-01 5.66 0.308
## 2 1954-08-01 5.66 0.308
## 3 1954-09-01 5.66 0.308
## 4 1954-10-01 5.66 0.308
## 5 1954-11-01 5.66 0.308
## 6 1954-12-01 5.66 0.308
## 7 1955-01-01 5.66 0.308
## 8 1955-02-01 5.66 0.308
## 9 1955-03-01 5.66 0.308
## 10 1955-04-01 5.66 0.308
## # ℹ 894 more rows
## # ℹ 5 more variables: `Federal Funds Lower Target` <dbl>,
## # `Effective Federal Funds Rate` <dbl>, `Real GDP (Percent Change)` <dbl>,
## # `Unemployment Rate` <dbl>, `Inflation Rate` <dbl>
We have successfully replaced the 3 columns by a single Date column in the original column.
The other mutation operations are-
fed_data <- fed_data %>%
mutate(`Target Rates Spread` = `Federal Funds Upper Target` - `Federal Funds Lower Target`)
fed_data
## # A tibble: 904 × 9
## Date `Federal Funds Target Rate` `Federal Funds Upper Target`
## <date> <dbl> <dbl>
## 1 1954-07-01 5.66 0.308
## 2 1954-08-01 5.66 0.308
## 3 1954-09-01 5.66 0.308
## 4 1954-10-01 5.66 0.308
## 5 1954-11-01 5.66 0.308
## 6 1954-12-01 5.66 0.308
## 7 1955-01-01 5.66 0.308
## 8 1955-02-01 5.66 0.308
## 9 1955-03-01 5.66 0.308
## 10 1955-04-01 5.66 0.308
## # ℹ 894 more rows
## # ℹ 6 more variables: `Federal Funds Lower Target` <dbl>,
## # `Effective Federal Funds Rate` <dbl>, `Real GDP (Percent Change)` <dbl>,
## # `Unemployment Rate` <dbl>, `Inflation Rate` <dbl>,
## # `Target Rates Spread` <dbl>
It gives the spread of higher and lower targets of the federal rate.
fed_data <- fed_data %>%
mutate(`Difference of Rates` = `Federal Funds Target Rate` - `Effective Federal Funds Rate`)
fed_data
## # A tibble: 904 × 10
## Date `Federal Funds Target Rate` `Federal Funds Upper Target`
## <date> <dbl> <dbl>
## 1 1954-07-01 5.66 0.308
## 2 1954-08-01 5.66 0.308
## 3 1954-09-01 5.66 0.308
## 4 1954-10-01 5.66 0.308
## 5 1954-11-01 5.66 0.308
## 6 1954-12-01 5.66 0.308
## 7 1955-01-01 5.66 0.308
## 8 1955-02-01 5.66 0.308
## 9 1955-03-01 5.66 0.308
## 10 1955-04-01 5.66 0.308
## # ℹ 894 more rows
## # ℹ 7 more variables: `Federal Funds Lower Target` <dbl>,
## # `Effective Federal Funds Rate` <dbl>, `Real GDP (Percent Change)` <dbl>,
## # `Unemployment Rate` <dbl>, `Inflation Rate` <dbl>,
## # `Target Rates Spread` <dbl>, `Difference of Rates` <dbl>
This gives the required difference of the two rates.
We first performed the basic description operations followed by massive tidy up of the data by filling the missing values. This was followed by certain mutation operations that makes the data look more visually appealing and with more information than before.