Challenge 4

Pull in, orient and clean up the data

fedFundsRate <- read_csv("challenge_datasets/FedFundsRate.csv")

## Rows: 904 Columns: 10
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (10): Year, Month, Day, Federal Funds Target Rate, Federal Funds Upper T...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

head(fedFundsRate)

## # A tibble: 6 × 10
##    Year Month   Day `Federal Funds Target Rate` `Federal Funds Upper Target`
##   <dbl> <dbl> <dbl>                       <dbl>                        <dbl>
## 1  1954     7     1                          NA                           NA
## 2  1954     8     1                          NA                           NA
## 3  1954     9     1                          NA                           NA
## 4  1954    10     1                          NA                           NA
## 5  1954    11     1                          NA                           NA
## 6  1954    12     1                          NA                           NA
## # ℹ 5 more variables: `Federal Funds Lower Target` <dbl>,
## #   `Effective Federal Funds Rate` <dbl>, `Real GDP (Percent Change)` <dbl>,
## #   `Unemployment Rate` <dbl>, `Inflation Rate` <dbl>

My first goal is to get oriented to the data so I can understand if I need to tidy up (filter any rows or columns, add a column, etc. )

When data is reported

I did a manual scroll through the Federal Funds* columns. It there are not values until 1982. In 1982 there are values for ‘Federal Funds Target rate’ until December 2008. Then there are values in ‘Federal Funds Upper Target’ and ‘Federal Funds Lower Target’. It seems as the Fed changed the way they looked at the target as a single target number to between a range at the end of 2008.

What days are included in the data set and why?

Days always seem to be ‘1’ so I will check that with distinct. I was incorrect by a brief manual scan and it looks like the target changed on different days in the month.

fedFundsRate %>% 
  distinct(Day)

## # A tibble: 29 × 1
##      Day
##    <dbl>
##  1     1
##  2    27
##  3     7
##  4    19
##  5    14
##  6    31
##  7    25
##  8    24
##  9    20
## 10    11
## # ℹ 19 more rows

So why is it that other days of the month in the tibble?

When selecting the year, month, day and Federal Funds Target Rate and filtering when Federal Funds Target Rate started, I can see when there is a change in the ‘Federal Funds Target Rate’ during a month, it was logged.

fedFundsRate %>%
  select(Year, Month, Day, `Federal Funds Target Rate`) %>%
  filter(Year > 1985) %>% 
  head(n = 30)

## # A tibble: 30 × 4
##     Year Month   Day `Federal Funds Target Rate`
##    <dbl> <dbl> <dbl>                       <dbl>
##  1  1986     1     1                        7.75
##  2  1986     2     1                        7.75
##  3  1986     3     1                        7.75
##  4  1986     3     7                        7.25
##  5  1986     4     1                        7.25
##  6  1986     4     2                        7.31
##  7  1986     4    21                        6.75
##  8  1986     5     1                        6.75
##  9  1986     5    22                        6.81
## 10  1986     6     1                        6.81
## # ℹ 20 more rows

I want to check if the upper and lower target is reported on the 1st of every month unless it changes in the middle of a month.

I first check by filter after 2008 when the range versus one target was implemented in the data set and see the distinct days. I can see there are three distinct days. Next I want to see when these distinct days show up so I again filter past 2008, arrange by descending day and select the columns I am interested in. With this table I see that the days, other than the 1st of the month occur between 2015 - 2017. I see that the ‘Federal Funds Lower Target’ and ‘Federal Funds Upper Target’ are also only reported on the 1st of the month UNLESS there is a change in the range in the middle of the month.

fedFundsRate %>%
  filter(Year > 2008) %>% 
  distinct(Day)

## # A tibble: 3 × 1
##     Day
##   <dbl>
## 1     1
## 2    16
## 3    14

fedFundsRate %>%
  filter(Year > 2008) %>% 
  arrange(desc(Day)) %>% 
  select(Year, Month, Day, `Federal Funds Lower Target`, `Federal Funds Upper Target`)

## # A tibble: 102 × 5
##     Year Month   Day `Federal Funds Lower Target` `Federal Funds Upper Target`
##    <dbl> <dbl> <dbl>                        <dbl>                        <dbl>
##  1  2015    12    16                         0.25                         0.5 
##  2  2017     3    16                         0.75                         1   
##  3  2016    12    14                         0.5                          0.75
##  4  2009     1     1                         0                            0.25
##  5  2009     2     1                         0                            0.25
##  6  2009     3     1                         0                            0.25
##  7  2009     4     1                         0                            0.25
##  8  2009     5     1                         0                            0.25
##  9  2009     6     1                         0                            0.25
## 10  2009     7     1                         0                            0.25
## # ℹ 92 more rows

fedFundsRate %>%
  filter(Year %in% 2015:2017) %>% 
  select(Year, Month, Day, `Federal Funds Lower Target`, `Federal Funds Upper Target`) %>% 
  head(n = 30)

## # A tibble: 30 × 5
##     Year Month   Day `Federal Funds Lower Target` `Federal Funds Upper Target`
##    <dbl> <dbl> <dbl>                        <dbl>                        <dbl>
##  1  2015     1     1                            0                         0.25
##  2  2015     2     1                            0                         0.25
##  3  2015     3     1                            0                         0.25
##  4  2015     4     1                            0                         0.25
##  5  2015     5     1                            0                         0.25
##  6  2015     6     1                            0                         0.25
##  7  2015     7     1                            0                         0.25
##  8  2015     8     1                            0                         0.25
##  9  2015     9     1                            0                         0.25
## 10  2015    10     1                            0                         0.25
## # ℹ 20 more rows

When is GDP in the table

I can’t quite tell when ‘Real GDP (Percent Change)’ has a value versus doesn’t have a value. I selected just the year, month, day and Real GDP (Percent Change) and then filtered to only show when Real GDP (Percent Change) has a value. The had previewed the tibble with head to show me 30 rows to see if there is a pattern. It shows the Real GDP (Percent Change) is reported on the 1st, 4th, 7th and 10th months of the year.

fedFundsRate %>%
  select(Year, Month, Day, `Real GDP (Percent Change)`) %>%
  filter(complete.cases(`Real GDP (Percent Change)`)) %>%
  head(n = 30)

## # A tibble: 30 × 4
##     Year Month   Day `Real GDP (Percent Change)`
##    <dbl> <dbl> <dbl>                       <dbl>
##  1  1954     7     1                         4.6
##  2  1954    10     1                         8  
##  3  1955     1     1                        11.9
##  4  1955     4     1                         6.7
##  5  1955     7     1                         5.5
##  6  1955    10     1                         2.4
##  7  1956     1     1                        -1.5
##  8  1956     4     1                         3.4
##  9  1956     7     1                        -0.3
## 10  1956    10     1                         6.7
## # ℹ 20 more rows

When is unemployment and inflation rate in the table

Next I want to see if when unemployment rate and inflation rate is reported. I can look at this similarly of how I got when GDP was reported and can see these rates occur on the first day of every month.

fedFundsRate %>%
  select(Year, Month, Day, `Unemployment Rate`) %>%
  filter(complete.cases(`Unemployment Rate`)) %>%
  head(n = 30)

## # A tibble: 30 × 4
##     Year Month   Day `Unemployment Rate`
##    <dbl> <dbl> <dbl>               <dbl>
##  1  1954     7     1                 5.8
##  2  1954     8     1                 6  
##  3  1954     9     1                 6.1
##  4  1954    10     1                 5.7
##  5  1954    11     1                 5.3
##  6  1954    12     1                 5  
##  7  1955     1     1                 4.9
##  8  1955     2     1                 4.7
##  9  1955     3     1                 4.6
## 10  1955     4     1                 4.7
## # ℹ 20 more rows

fedFundsRate %>%
  select(Year, Month, Day, `Inflation Rate`) %>%
  filter(complete.cases(`Inflation Rate`)) %>%
  head(n = 30)

## # A tibble: 30 × 4
##     Year Month   Day `Inflation Rate`
##    <dbl> <dbl> <dbl>            <dbl>
##  1  1958     1     1              3.2
##  2  1958     2     1              3.2
##  3  1958     3     1              2.8
##  4  1958     4     1              2.4
##  5  1958     5     1              2.4
##  6  1958     6     1              2.1
##  7  1958     7     1              2.4
##  8  1958     8     1              2.1
##  9  1958     9     1              1.7
## 10  1958    10     1              1.7
## # ℹ 20 more rows

When is Effective federal funds rate in the table

I select the columns I want to see next to each other (year, month, day and Effective Federal Funds Rate), I then filter to see only when Effective Federal Funds Rate has a value. Visually looking at this table it looks like the Effective Federal Funds Rate is only reported on the first of the month. To confirm this, I will use distinct(day) to see what days appear. Only the 1st of the month is present which confirms that the Effective Federal Funds Rate is only reported on the first of the month.

fedFundsRate %>%
  select(Year, Month, Day, `Effective Federal Funds Rate`) %>%  
  filter(complete.cases(`Effective Federal Funds Rate`)) %>% 
  distinct(Day)

## # A tibble: 1 × 1
##     Day
##   <dbl>
## 1     1

At first glance when the columns having many N/A, it wasn’t clear what was occurring. I now have a better sense of when each column will have a value when further manipulating the data set.

Changing Months from numbers to words

Months are currently numeric when they are nominal data. So I will change the months from numbers to words using mutate to replace the Month column and case_when to replace the number

fedFundsRateClean <- fedFundsRate %>%
   mutate(
    Month=case_when(
      Month==1 ~ "January", 
      Month==2 ~ "February", 
      Month==3 ~ "March", 
      Month==4 ~ "April", 
      Month==5 ~ "May", 
      Month==6 ~ "June", 
      Month==7 ~ "July", 
      Month==8 ~ "August", 
      Month==9 ~ "September", 
      Month==10 ~ "October", 
      Month==11 ~ "November", 
      Month==12 ~ "December"))
head(fedFundsRateClean)

## # A tibble: 6 × 10
##    Year Month       Day `Federal Funds Target Rate` `Federal Funds Upper Target`
##   <dbl> <chr>     <dbl>                       <dbl>                        <dbl>
## 1  1954 July          1                          NA                           NA
## 2  1954 August        1                          NA                           NA
## 3  1954 September     1                          NA                           NA
## 4  1954 October       1                          NA                           NA
## 5  1954 November      1                          NA                           NA
## 6  1954 December      1                          NA                           NA
## # ℹ 5 more variables: `Federal Funds Lower Target` <dbl>,
## #   `Effective Federal Funds Rate` <dbl>, `Real GDP (Percent Change)` <dbl>,
## #   `Unemployment Rate` <dbl>, `Inflation Rate` <dbl>

When has there been the most Inflation or unemployment?

First, I know from orienting to the data, the inflation and employment only is reported on the first of the month, so I am going to filter the data so I only have the 1st of the month and then select the columns I am most interested in (year, month, unemployment rate and inflation rate).

fedFundsRate1st <- fedFundsRateClean %>% 
  filter(Day == 1) %>% 
  select(Year, Month, `Unemployment Rate`, `Inflation Rate`)

First I want to make a new column showing change in inflation and a new column showing change in unemployment.

To do this, I mutate a new column for change in inflation which uses lead to get the next value In the column so I can see if and what the change is in inflation. I do the same for ’change in employment.

I also created two new columns to make categorical column to show increase, no change and decrease for both inflation and unemployment changes

fedFundsRate1st <- fedFundsRate1st %>%
  mutate('Change in Inflation' = lead(`Inflation Rate`) - `Inflation Rate`) %>% 
    mutate('Change in Inflation Cat' =case_when(
    `Change in Inflation` > 0 ~ 'Increase',
    `Change in Inflation` == 0 ~ 'No Change',
    `Change in Inflation` < 0 ~ 'Decrease')) %>% 
  mutate('Change in Employment' = lead(`Unemployment Rate`) - `Unemployment Rate`) %>% 
  mutate('Change in Employment Cat' =case_when(
    `Change in Employment` > 0 ~ 'Increase',
    `Change in Employment` == 0 ~ 'No Change',
    `Change in Employment` < 0 ~ 'Decrease'))

When inflation changes, does unemployment also change in the same direction?

To see if when there is a change in inflation (e.g. increase), does the change in unemployment mirror that change each time it is reported. To change this, I will mutate a column which uses ‘==’ to have the change in inflation compared to the change in employment. If they match, the new column, aligned, will report TRUE and if they don’t, the aligned column will report FALSE. I then want to summarize to see how many times TRUE versus FALSE occurs. I see that they mirror each other 67 times and don’t mirror each other 514 times.

From this I would state that the change in inflation and change in employment generally does not mirror each other.

unemployInflationChange <- fedFundsRate1st %>% 
  mutate(Aligned = `Change in Inflation` == `Change in Employment`) %>% 
  summarise(`Aligned True` = sum(Aligned == TRUE, na.rm = TRUE), 
            `Aligned False` = sum(Aligned == FALSE, na.rm = TRUE))

What years have had the most inflation on average

I will group by year and then summarize the change in inflation column, ungroup the year and arrange the yearly average inflation to see which year has had the most Inflation on average.

1974 had the most average inflation of 5.5%

fedFundsRate1st %>%
  group_by(Year) %>%
  summarize(yearAverageInflation = mean(`Change in Inflation`, na.rm = TRUE)) %>% 
  ungroup() %>% 
  arrange(desc(yearAverageInflation))

## # A tibble: 64 × 2
##     Year yearAverageInflation
##    <dbl>                <dbl>
##  1  1974               0.55  
##  2  1979               0.283 
##  3  1966               0.225 
##  4  1978               0.183 
##  5  1973               0.175 
##  6  2011               0.108 
##  7  2004               0.1   
##  8  1990               0.100 
##  9  1969               0.0917
## 10  1968               0.0833
## # ℹ 54 more rows

Challenge 4

Ciara Salmon

2024-01-02

Challenge 4

Pull in, orient and clean up the data

When data is reported

What days are included in the data set and why?

When is GDP in the table

When is unemployment and inflation rate in the table

When is Effective federal funds rate in the table

Changing Months from numbers to words

When has there been the most Inflation or unemployment?

When inflation changes, does unemployment also change in the same direction?

What years have had the most inflation on average