Introduction

This challenge will focus on mutations and sanity checks. The dataset being used is the Federal Funds rate.

Reading the Dataset

First the necessary libraries are loaded.

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(readr)
library(here)
## here() starts at C:/Users/SHAURYA/Desktop/Studies/Winter 2024 601/Challenges/challenge 4
library(zoo)
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

Loading the dataset,

fed_data <- read_csv("FedFundsRate.csv", show_col_types = FALSE)

Taking a look at the data,

fed_data
## # A tibble: 904 × 10
##     Year Month   Day `Federal Funds Target Rate` `Federal Funds Upper Target`
##    <dbl> <dbl> <dbl>                       <dbl>                        <dbl>
##  1  1954     7     1                          NA                           NA
##  2  1954     8     1                          NA                           NA
##  3  1954     9     1                          NA                           NA
##  4  1954    10     1                          NA                           NA
##  5  1954    11     1                          NA                           NA
##  6  1954    12     1                          NA                           NA
##  7  1955     1     1                          NA                           NA
##  8  1955     2     1                          NA                           NA
##  9  1955     3     1                          NA                           NA
## 10  1955     4     1                          NA                           NA
## # ℹ 894 more rows
## # ℹ 5 more variables: `Federal Funds Lower Target` <dbl>,
## #   `Effective Federal Funds Rate` <dbl>, `Real GDP (Percent Change)` <dbl>,
## #   `Unemployment Rate` <dbl>, `Inflation Rate` <dbl>

We can retrieve all column names to understand what they are.

colnames(fed_data)
##  [1] "Year"                         "Month"                       
##  [3] "Day"                          "Federal Funds Target Rate"   
##  [5] "Federal Funds Upper Target"   "Federal Funds Lower Target"  
##  [7] "Effective Federal Funds Rate" "Real GDP (Percent Change)"   
##  [9] "Unemployment Rate"            "Inflation Rate"

The following gives the description of each column.

  1. Year - Denotes the year.

  2. Month - Represents each month of the year.

  3. Day - Set of days related to the month.

  4. Federal Funds Target Rate - It is the targest interest range set by the Federal Reserve in which banks charge other institutions for lending excess cash to them from their reserves on overnight basis.

  5. Federal Funds Upper Target - It is the highest point of the target rate range.

  6. Federal Funds Lower Target - It is the lowest point of the target rate range.

  7. Effective Federal Funds Rate - This is the actual rate at which the banks deal with the cash exchange.

  8. Real GDP - It is the GDP adjusted for inflation. Measure of economic output of country after accounting for changes in overall price level.

  9. Unemployment Rate - It is the quarterly rate of unemployment in the country. An update is given once every 3 months.

  10. Inflation Rate - Represents the monthly inflation.

We can get the dimensions of the data.

dim(fed_data)
## [1] 904  10

There are about 904 different months with each month of the year as its own row.

We can have a look at the different years being covered in this dataset.

all_years <- select(fed_data, "Year")
all_years
## # A tibble: 904 × 1
##     Year
##    <dbl>
##  1  1954
##  2  1954
##  3  1954
##  4  1954
##  5  1954
##  6  1954
##  7  1955
##  8  1955
##  9  1955
## 10  1955
## # ℹ 894 more rows

We can see the distribution of the years.

table(all_years)
## Year
## 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 
##    6   12   12   12   12   12   12   12   12   12   12   12   12   12   12   12 
## 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 
##   12   12   12   12   12   12   12   12   12   12   12   12   16   20   25   21 
## 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 
##   19   21   24   23   17   21   15   12   18   14   13   13   15   15   15   23 
## 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 
##   13   13   17   19   16   15   19   12   12   12   12   12   12   13   13    4

While most years have only one day per month as the row there are certain years with way more such as 1984 and 1988 and some are well below the average like 1954 and 2017.

Tidying the data

We can tidy the data by filling in the missing values. We can do this by using the mean and approximation operations.

First we check for missing data.

summary(is.na(fed_data[c("Federal Funds Target Rate", "Federal Funds Upper Target", "Federal Funds Lower Target", "Effective Federal Funds Rate", "Real GDP (Percent Change)", "Unemployment Rate", "Inflation Rate")]))
##  Federal Funds Target Rate Federal Funds Upper Target
##  Mode :logical             Mode :logical             
##  FALSE:462                 FALSE:103                 
##  TRUE :442                 TRUE :801                 
##  Federal Funds Lower Target Effective Federal Funds Rate
##  Mode :logical              Mode :logical               
##  FALSE:103                  FALSE:752                   
##  TRUE :801                  TRUE :152                   
##  Real GDP (Percent Change) Unemployment Rate Inflation Rate 
##  Mode :logical             Mode :logical     Mode :logical  
##  FALSE:250                 FALSE:752         FALSE:710      
##  TRUE :654                 TRUE :152         TRUE :194

We see that there is plenty of missing data in each of these columns so it is important to fill them as it will completely tidy the data.

fed_data$`Federal Funds Target Rate`[is.na(fed_data$`Federal Funds Target Rate`)] <- mean(fed_data$`Federal Funds Target Rate`, na.rm = TRUE)

fed_data$`Federal Funds Upper Target`[is.na(fed_data$`Federal Funds Upper Target`)] <- mean(fed_data$`Federal Funds Upper Target`, na.rm = TRUE)

fed_data$`Federal Funds Lower Target`[is.na(fed_data$`Federal Funds Lower Target`)] <- mean(fed_data$`Federal Funds Lower Target`, na.rm = TRUE)

fed_data$`Effective Federal Funds Rate`[is.na(fed_data$`Effective Federal Funds Rate`)] <- mean(fed_data$`Effective Federal Funds Rate`, na.rm = TRUE)

fed_data$`Real GDP (Percent Change)` <- na.aggregate(fed_data$`Real GDP (Percent Change)`, FUN = mean, na.rm = TRUE)

fed_data$`Unemployment Rate`[is.na(fed_data$`Unemployment Rate`)] <- mean(fed_data$`Unemployment Rate`, na.rm = TRUE)

fed_data$`Inflation Rate`[is.na(fed_data$`Inflation Rate`)] <- mean(fed_data$`Inflation Rate`, na.rm = TRUE)

Now if we check for missing values again-

summary(is.na(fed_data[c("Federal Funds Target Rate", "Federal Funds Upper Target", "Federal Funds Lower Target", "Effective Federal Funds Rate", "Real GDP (Percent Change)", "Unemployment Rate", "Inflation Rate")]))
##  Federal Funds Target Rate Federal Funds Upper Target
##  Mode :logical             Mode :logical             
##  FALSE:904                 FALSE:904                 
##  Federal Funds Lower Target Effective Federal Funds Rate
##  Mode :logical              Mode :logical               
##  FALSE:904                  FALSE:904                   
##  Real GDP (Percent Change) Unemployment Rate Inflation Rate 
##  Mode :logical             Mode :logical     Mode :logical  
##  FALSE:904                 FALSE:904         FALSE:904

We see that all values have been filled.

We now perform sanity checks to see if we tidied the data in a reasonable manner.

We can do that by checking stats like mean, median, max and min for each of these columns.

We can get that by the summary function.

summary(fed_data[, c(
  "Federal Funds Target Rate", "Federal Funds Upper Target", 
  "Federal Funds Lower Target", "Effective Federal Funds Rate",
  "Real GDP (Percent Change)", "Unemployment Rate", "Inflation Rate"
)])
##  Federal Funds Target Rate Federal Funds Upper Target
##  Min.   : 1.000            Min.   :0.2500            
##  1st Qu.: 5.500            1st Qu.:0.3083            
##  Median : 5.658            Median :0.3083            
##  Mean   : 5.658            Mean   :0.3083            
##  3rd Qu.: 5.658            3rd Qu.:0.3083            
##  Max.   :11.500            Max.   :1.0000            
##  Federal Funds Lower Target Effective Federal Funds Rate
##  Min.   :0.00000            Min.   : 0.070              
##  1st Qu.:0.05825            1st Qu.: 2.940              
##  Median :0.05825            Median : 4.911              
##  Mean   :0.05825            Mean   : 4.911              
##  3rd Qu.:0.05825            3rd Qu.: 6.013              
##  Max.   :0.75000            Max.   :19.100              
##  Real GDP (Percent Change) Unemployment Rate Inflation Rate  
##  Min.   :-10.000           Min.   : 3.400    Min.   : 0.600  
##  1st Qu.:  3.138           1st Qu.: 5.100    1st Qu.: 2.200  
##  Median :  3.138           Median : 5.979    Median : 3.733  
##  Mean   :  3.138           Mean   : 5.979    Mean   : 3.733  
##  3rd Qu.:  3.138           3rd Qu.: 6.700    3rd Qu.: 4.300  
##  Max.   : 16.500           Max.   :10.800    Max.   :13.600

From the above result, we see that the sanity check does show that tidy operation was successful as the values in a reasonable range as expected.

Variables to be mutated

We can use a single date column instead of separate year, month, and day columns. We can also calculate the spread of upper and lower targets and the difference between the federal rates and effective rates.

Mutating Variables

fed_data <- fed_data %>%
  mutate(Date = as.Date(paste(Year, Month, Day, sep = "-")))

fed_data
## # A tibble: 904 × 11
##     Year Month   Day `Federal Funds Target Rate` `Federal Funds Upper Target`
##    <dbl> <dbl> <dbl>                       <dbl>                        <dbl>
##  1  1954     7     1                        5.66                        0.308
##  2  1954     8     1                        5.66                        0.308
##  3  1954     9     1                        5.66                        0.308
##  4  1954    10     1                        5.66                        0.308
##  5  1954    11     1                        5.66                        0.308
##  6  1954    12     1                        5.66                        0.308
##  7  1955     1     1                        5.66                        0.308
##  8  1955     2     1                        5.66                        0.308
##  9  1955     3     1                        5.66                        0.308
## 10  1955     4     1                        5.66                        0.308
## # ℹ 894 more rows
## # ℹ 6 more variables: `Federal Funds Lower Target` <dbl>,
## #   `Effective Federal Funds Rate` <dbl>, `Real GDP (Percent Change)` <dbl>,
## #   `Unemployment Rate` <dbl>, `Inflation Rate` <dbl>, Date <date>
select(fed_data, Date)
## # A tibble: 904 × 1
##    Date      
##    <date>    
##  1 1954-07-01
##  2 1954-08-01
##  3 1954-09-01
##  4 1954-10-01
##  5 1954-11-01
##  6 1954-12-01
##  7 1955-01-01
##  8 1955-02-01
##  9 1955-03-01
## 10 1955-04-01
## # ℹ 894 more rows

We see the new Date column being successfully added. We can see that the information is redundant so we can use the Date column to replace the three columns and place it first ahead of other variables.

fed_data <- fed_data %>%
  mutate(Date = as.Date(paste(Year, Month, Day, sep = "-"))) %>%
  select(-Year, -Month, -Day) %>%
  select(Date, everything())

fed_data
## # A tibble: 904 × 8
##    Date       `Federal Funds Target Rate` `Federal Funds Upper Target`
##    <date>                           <dbl>                        <dbl>
##  1 1954-07-01                        5.66                        0.308
##  2 1954-08-01                        5.66                        0.308
##  3 1954-09-01                        5.66                        0.308
##  4 1954-10-01                        5.66                        0.308
##  5 1954-11-01                        5.66                        0.308
##  6 1954-12-01                        5.66                        0.308
##  7 1955-01-01                        5.66                        0.308
##  8 1955-02-01                        5.66                        0.308
##  9 1955-03-01                        5.66                        0.308
## 10 1955-04-01                        5.66                        0.308
## # ℹ 894 more rows
## # ℹ 5 more variables: `Federal Funds Lower Target` <dbl>,
## #   `Effective Federal Funds Rate` <dbl>, `Real GDP (Percent Change)` <dbl>,
## #   `Unemployment Rate` <dbl>, `Inflation Rate` <dbl>

We have successfully replaced the 3 columns by a single Date column in the original column.

The other mutation operations are-

fed_data <- fed_data %>%
  mutate(`Target Rates Spread` = `Federal Funds Upper Target` - `Federal Funds Lower Target`)

fed_data
## # A tibble: 904 × 9
##    Date       `Federal Funds Target Rate` `Federal Funds Upper Target`
##    <date>                           <dbl>                        <dbl>
##  1 1954-07-01                        5.66                        0.308
##  2 1954-08-01                        5.66                        0.308
##  3 1954-09-01                        5.66                        0.308
##  4 1954-10-01                        5.66                        0.308
##  5 1954-11-01                        5.66                        0.308
##  6 1954-12-01                        5.66                        0.308
##  7 1955-01-01                        5.66                        0.308
##  8 1955-02-01                        5.66                        0.308
##  9 1955-03-01                        5.66                        0.308
## 10 1955-04-01                        5.66                        0.308
## # ℹ 894 more rows
## # ℹ 6 more variables: `Federal Funds Lower Target` <dbl>,
## #   `Effective Federal Funds Rate` <dbl>, `Real GDP (Percent Change)` <dbl>,
## #   `Unemployment Rate` <dbl>, `Inflation Rate` <dbl>,
## #   `Target Rates Spread` <dbl>

It gives the spread of higher and lower targets of the federal rate.

fed_data <- fed_data %>%
  mutate(`Difference of Rates` = `Federal Funds Target Rate` - `Effective Federal Funds Rate`)

fed_data
## # A tibble: 904 × 10
##    Date       `Federal Funds Target Rate` `Federal Funds Upper Target`
##    <date>                           <dbl>                        <dbl>
##  1 1954-07-01                        5.66                        0.308
##  2 1954-08-01                        5.66                        0.308
##  3 1954-09-01                        5.66                        0.308
##  4 1954-10-01                        5.66                        0.308
##  5 1954-11-01                        5.66                        0.308
##  6 1954-12-01                        5.66                        0.308
##  7 1955-01-01                        5.66                        0.308
##  8 1955-02-01                        5.66                        0.308
##  9 1955-03-01                        5.66                        0.308
## 10 1955-04-01                        5.66                        0.308
## # ℹ 894 more rows
## # ℹ 7 more variables: `Federal Funds Lower Target` <dbl>,
## #   `Effective Federal Funds Rate` <dbl>, `Real GDP (Percent Change)` <dbl>,
## #   `Unemployment Rate` <dbl>, `Inflation Rate` <dbl>,
## #   `Target Rates Spread` <dbl>, `Difference of Rates` <dbl>

This gives the required difference of the two rates.

Conclusion

We first performed the basic description operations followed by massive tidy up of the data by filling the missing values. This was followed by certain mutation operations that makes the data look more visually appealing and with more information than before.