This project explores the calculation of a Leading Economic Index (LEI). The LEI is a composite index that is designed to forecast future economic activity. It is composed of several individual economic indicators that historically have shown consistent patterns of leading the overall economy. The LEI is used by economists, policymakers, and business leaders to predict economic trends and potential turning points in economic activity. We will be loosely following the Conference Board’s methodology in order to calculate our index.
Sections 2.0 - 4.0 will be where data collection and transformations will take place. Then in sections 5.0 and 6.0 we will analyze the Index and lastly section 7.0 is a collection of references used in this project.
Conference Board Leading Index Components (FRED ID)
BCI-01 Average weekly hours, manufacturing (AWHMAN)
BCI-05 Average weekly initial claims for unemployment insurance (ICSA)
BCI-08 Manufacturers’ new orders, consumer goods and materials (ACOGNO)
BCI-130 ISM new order index (NOCDFSA066MSFRBPHI)
BCI-33 Manufacturers’ new orders, non-defense capital goods excl. aircraft (NEWORDER)
BCI-29 Building permits, new private housing units (PERMIT)
BCI-19 Stock prices, 500 common stocks (M1346BUSM156NNBR)
BCI-107 Leading Credit Index (M2REAL)
1.) 2-years Swap Spread (real time), 2.) LIBOR 3 month less 3 month Treasury-Bill yield spread (real time), 3.) Debit balances at margin account at broker dealer (monthly), 4.) AAII Investors Sentiment Bullish (%) less Bearish (%) (weekly), 5.) Senior Loan Officers C&I loan survey – Bank tightening Credit to Large and Medium Firms (quarterly), 6.) Security Repurchases (quarterly) from the Total Finance-Liabilities section of Federal Reserve’s flow of fund report
BCI-129 Interest rate spread, 10-year Treasury bonds less federal funds (T10YFF)
BCI-125 Avg. consumer expectations for business and economic conditions (UMCSENT)
Industrial Production: Total Index (INDPRO)
# clean workspace
rm(list = ls())
# Set the seed for reproducibility
set.seed(123)
# Prepare needed libraries
packages <- c("ggplot2", # Visualizations
"dplyr", # Data manipulation
"tidyr", # Data formatting
"stargazer", # LaTex tables
"knitr", # markdown output formatting
"readxl", # read in excel files
"tsibble", # tidy timeseries extension
"zoo", # time-indexing and analysis
"hpfilter", # Hodrick-Prescott Filter
"forecast", # forecasting techniques
"lubridate", # Date handling
"tseries", # Tim Serise Analysis
"fabletools", # tools for fable framework
"fable", # Forecasting for Tidy
"fpp3",
"BCDating", # Business Cycle Dating tools
"fredr", # R bindings for Fred API
"purrr", # functional programming tools
"BCDating", # Business cycle dating tools
"lintr"
)
# install and upload packages
for (i in 1:length(packages)) {
if (!packages[i] %in% rownames(installed.packages())) {
install.packages(packages[i]
, repos = "http://cran.rstudio.com/"
, dependencies = TRUE
)
}
library(packages[i], character.only = TRUE)
}
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
##
## Please cite as:
## Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.
## R package version 5.2.3. https://CRAN.R-project.org/package=stargazer
## Registered S3 method overwritten by 'tsibble':
## method from
## as_tibble.grouped_df dplyr
##
## Attaching package: 'tsibble'
## The following objects are masked from 'package:base':
##
## intersect, setdiff, union
##
## Attaching package: 'zoo'
## The following object is masked from 'package:tsibble':
##
## index
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
##
## Attaching package: 'lubridate'
## The following object is masked from 'package:tsibble':
##
## interval
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.0 ──
## ✔ tibble 3.2.1 ✔ feasts 0.3.2
## ✔ tsibbledata 0.4.1
## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date() masks base::date()
## ✖ dplyr::filter() masks stats::filter()
## ✖ zoo::index() masks tsibble::index()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ lubridate::interval() masks tsibble::interval()
## ✖ dplyr::lag() masks stats::lag()
## ✖ tsibble::setdiff() masks base::setdiff()
## ✖ tsibble::union() masks base::union()
The data that we will use in the project can be found on the FRED
API. We will use the fredr
package to access the API
quickly. This comes with the added benefit to be able to keep the index
up to date.
We will then use the fetch_data
function in a loop to
grab all the data we want and store the data frames within a list so
they are easy to keep track of.
# function that collects our data
# coerce to data frame
fetch_data <- function(x, frequency = "m") {
as.data.frame(fredr(
series_id = x,
frequency = frequency
))
}
# list of series IDs
cmpts <- c(
"AWHMAN",
"ICSA",
"ACOGNO",
"NOCDFSA066MSFRBPHI",
"NEWORDER",
"PERMIT",
"M2REAL",
"T10YFF",
"UMCSENT",
"INDPRO"
)
# place to store data frames
data_list <- list()
# loop through list to collect data and store in list
for(i in 1:length(cmpts)){
data_list[[cmpts[i]]] <- fetch_data(cmpts[i], "m")
}
Now that we have our data we will need to do some processing so that
we can create a data frame that is in the correct format. First we need
to rename the value
column from each API request to its
actual name. Then we will un-nest our list of data frames and put each
column representing a component and each row representing a month.
finally we will be done once we create a tsibble out of our merged data
frame.
# change name of column `value` in each data frame in prep
for(i in 1:length(cmpts)) {
# rename individual df columns
names(data_list[[i]])[names(data_list[[i]]) == "value"] <- cmpts[i]
# use list slicing to drop un-needed columns
data_list[[i]] <- data_list[[i]][, c("date", cmpts[i])]
}
# Function to merge data frames by 'date'
merge_dfs_by_date <- function(dfs) {
reduce(dfs, full_join, by = "date")
}
# Merging all data frames by 'date'
merged_df <- merge_dfs_by_date(data_list)
# tibble will be easiest to work with
df <- merged_df |>
# format date
mutate(date = as.yearmon(date)) |>
# coerce to tibble
as_tibble(index = date)
The last step in preparing our data is to collect and then reformat the NBER business cycle dates so that we can plot them. This will allow us to visually inspect our leading index and put it context of the past business cycles.
We will start by using the fetch_data
function that was
defined earlier. The FRED API keeps the recession dates in binary format
1 = Rrecession
0 = NON-Recession
. We must
change this format for easy plotting with ggplot
to
accomplish this we must create a peak
and
trough
columns that contain the dates.
# retrieve NBER Recession Dates
RECD <- fetch_data("USRECD")
# drop columns that are not needed
RECD <- RECD |>
select(-realtime_start,-realtime_end)
# show data frame structure
str(RECD)
## 'data.frame': 2037 obs. of 3 variables:
## $ date : Date, format: "1854-12-01" "1855-01-01" ...
## $ series_id: chr "USRECD" "USRECD" "USRECD" "USRECD" ...
## $ value : num 1 0 0 0 0 0 0 0 0 0 ...
Create leading and lagging values and then check those values for changes from 0 to 1 and 1 to 0 to identity the peaks and troughs of the business cycle.
# find the peaks and trough in recession dates
statechanges <- RECD |>
mutate(prev_value = lag(value),
lead_value = lead(value)) |>
filter((value == 1 & prev_value == 0) | (value == 1 & lead_value == 0)) |>
mutate(indctr = ifelse(lead_value == 1, "peak", "trough") ) |>
select(date,indctr)
We should now have the peaks and toughs of the business cycle. This data needs to processed slightly so that it is in the proper format for plotting.
# reformat data
turning_points <- statechanges[2:69,] |>
pivot_wider(names_from = indctr,
values_from = date)
# Unnest the list-cols from previous line
turning_points <- unnest(turning_points, cols = c(trough, peak))
# Ensure peak and trough columns in turning_points are in Date format
turning_points$peak <- as.yearmon(as.Date(turning_points$peak, format = "%Y-%m-%d"))
turning_points$trough <- as.yearmon(as.Date(turning_points$trough, format = "%Y-%m-%d"))
# get quartly gdp data
rgdp_q <- fetch_data("GDPC1", frequency = "q")
# format the data
rgdp_q <- rgdp_q |>
rename(GDPC1 = value) |>
mutate(date = yearquarter(as.Date(date, format = "%Y-%m-%d"))) |>
select(date, GDPC1)
In this section we will create some plots of all of our componets,
this is done so that we can verfy that our data came in correctly and to
get some insight on each one. We must create some functions that use the
ggplot
library. The functions will then be applied and we
will store the results in a list in order to keep them organized.
# ggplot main plotting function
cmpt_plot <- function(Data, Cmpt, color = "blue"){
# Filter the data to remove rows where the Cmpt column is NA
filtered_data <- Data[!is.na(Data[[Cmpt]]), ]
# build plot
ggplot(filtered_data, aes(x = date, y = .data[[Cmpt]])) +
geom_line(color = color) +
theme_linedraw()
}
# function to display recession bars
recession_bars <- function(start,stop){
geom_rect(data=turning_points[start:stop,],
aes(xmin=peak,
xmax=trough,
ymin=-Inf,
ymax=+Inf),
fill='grey1',
alpha=0.2,
inherit.aes = FALSE )
}
# apply plotting function to list
# lapply is probably faster than a loop
plots_list <- lapply(colnames(df), cmpt_plot, Data = df)
plots_list[[2]] +
labs(title = "Average Weekly Hours of Production and Nonsupervisory Employees, Manufacturing")+
recession_bars(22,44)
plots_list[[3]] +
labs(title = "Average weekly initial claims for unemployment insurance") +
recession_bars(27,44)
plots_list[[4]] +
labs(title = "Manufacturers' new orders, consumer goods and materials") +
recession_bars(31,44)
plots_list[[5]] +
labs(title = "ISM new order index") +
recession_bars(27,44)
plots_list[[6]] +
labs(title = "Manufacturers' new orders, non-defense capital goods excl. aircraft") +
recession_bars(31,44)
plots_list[[7]] +
labs(title = "Housing Permits") + recession_bars(26,44)
plots_list[[8]] +
labs(title = "M2") + recession_bars(26,44)
plots_list[[9]] +
labs(title = "Interest rate spread, 10-year Treasury bonds less federal funds") + recession_bars(26,44)
plots_list[[10]] +
labs(title = "Avg. consumer expectations for business and economic conditions") + recession_bars(24,44)
plots_list[[11]] +
labs(title = "Industrial Production: Total Index ") +
recession_bars(17,44)
# build plot
gdp_plot <- cmpt_plot(rgdp_q,"GDPC1")
# add recession bars
gdp_plot +
labs(title = "Real GDP") +
recession_bars(22,44)
## Warning: Removed 10 rows containing missing values or values outside the scale range
## (`geom_rect()`).
To calculate our leading index we will use the conference boards composite index calculation methodology. This method uses 6 steps and can boils all of the information in each series into one index that will serve as an early warning sign of the business cycle.
Month-to-month changes are computed for each component using the symmetric percent change formula for all components that are not already in that form.
We will also start by dropping our missing data, each component has a different length so this will standardize them so that we don’t have to do imputation.
# find complete cases
df$ind <-complete.cases(df)
# drop all but the complete cases
df <- df |>
filter(ind == TRUE) |>
select(-ind)
# we end up with data from 1992- 2024
# slightly verbose way of doing this but it works well
# Function to calculate symmetric percent change for vectors
symmetric_percent_change <- function(X_t) {
return(200 * ((X_t - lag(X_t)) / (X_t + lag(X_t))))
}
# Columns to use
columns_to_change <- c("AWHMAN", "ICSA", "ACOGNO","NEWORDER", "PERMIT","M2REAL", "UMCSENT", "INDPRO")
# Create a new tibble to store results
pchange_results <- df |>
select(date, all_of(columns_to_change))
# calculate symmetric percent change and add new columns.
pchange_results <- pchange_results |>
mutate(across(all_of(columns_to_change),
.fns = ~ symmetric_percent_change(.),
.names = "symmetric_{.col}"))
# Convert to data frame
pchange_results <- as.data.frame(pchange_results)
# Print the results
stargazer(pchange_results, type = "text")
##
## ==================================================================
## Statistic N Mean St. Dev. Min Max
## ------------------------------------------------------------------
## AWHMAN 389 41.202 0.608 38.500 42.300
## ICSA 389 370,741.400 289,092.500 197,750 4,663,250
## ACOGNO 389 162,949.800 45,642.080 86,445 264,445
## NEWORDER 389 59,147.920 9,272.555 33,857 73,992
## PERMIT 389 1,372.758 403.027 513 2,263
## M2REAL 389 4,128.279 1,579.775 2,305.100 7,656.700
## UMCSENT 389 85.845 13.409 50.000 112.000
## INDPRO 389 91.754 11.330 61.919 104.104
## symmetric_AWHMAN 388 0.001 0.630 -6.775 3.951
## symmetric_ICSA 388 -0.347 11.321 -72.838 167.161
## symmetric_ACOGNO 388 0.274 2.331 -19.073 17.352
## symmetric_NEWORDER 388 0.200 3.036 -11.390 9.090
## symmetric_PERMIT 388 0.062 5.014 -24.655 17.011
## symmetric_M2REAL 388 0.260 0.673 -1.429 6.991
## symmetric_UMCSENT 388 -0.002 5.186 -21.504 15.132
## symmetric_INDPRO 388 0.133 1.070 -14.181 6.375
## ------------------------------------------------------------------
The monthly contributions of the components are adjusted to equalize the volatility of each component.
# add columns that don't need symmetric %
# also invert jobless cliams
pchange_results <- pchange_results |>
mutate(T10YFF = df$T10YFF,
NOCDF = df$NOCDFSA066MSFRBPHI,
INDPRO = df$INDPRO,
INDPRO_GR = symmetric_INDPRO)
calculate_adjusted <- function(data, columns) {
adjusted_columns <- c()
data <- stats::na.omit(data)
for (col in columns) {
# calculate standard deviation
v_x <- sd(data[[col]])
# Check if the standard deviation is zero or NA
if (is.na(v_x) || v_x == 0) {
stop(paste("Standard deviation for column", col, "is NA or zero, can't invert"))
}
# Invert Standard deviation
w_x <- 1/v_x
# Append to adjusted columns
adjusted_columns <- c(adjusted_columns, w_x)
}
# sum inverted standard deviations for all columns
k <- sum(adjusted_columns)
# restate sum between 1,0 for all columns
adjusted_columns <- adjusted_columns / k
# Adjust contributions for all columns
for (i in 1:length(columns)) {
col <- columns[i]
rx <- adjusted_columns[i]
data[[paste0("adj_", col)]] <- rx * data[[col]]
}
return(data)
}
# Columns for which you want to calculate adjusted contributions
columns_to_adj <- c("symmetric_AWHMAN",
"symmetric_ICSA",
"symmetric_ACOGNO",
"symmetric_NEWORDER",
"symmetric_PERMIT",
"symmetric_M2REAL",
"symmetric_UMCSENT",
"T10YFF",
"NOCDF")
# Call the function to calculate adjusted contributions for all columns
pchange_results <- calculate_adjusted(pchange_results, columns_to_adj)
# Print the results
stargazer(pchange_results, type = "text", title = "Leading index")
##
## Leading index
## ======================================================================
## Statistic N Mean St. Dev. Min Max
## ----------------------------------------------------------------------
## AWHMAN 388 41.204 0.608 38.500 42.300
## ICSA 388 370,557.200 289,442.900 197,750 4,663,250
## ACOGNO 388 163,147.000 45,534.820 90,386 264,445
## NEWORDER 388 59,213.110 9,194.857 35,258 73,992
## PERMIT 388 1,373.343 403.383 513 2,263
## M2REAL 388 4,132.597 1,579.515 2,305.100 7,656.700
## UMCSENT 388 85.889 13.398 50.000 112.000
## INDPRO 388 91.831 11.242 62.440 104.104
## symmetric_AWHMAN 388 0.001 0.630 -6.775 3.951
## symmetric_ICSA 388 -0.347 11.321 -72.838 167.161
## symmetric_ACOGNO 388 0.274 2.331 -19.073 17.352
## symmetric_NEWORDER 388 0.200 3.036 -11.390 9.090
## symmetric_PERMIT 388 0.062 5.014 -24.655 17.011
## symmetric_M2REAL 388 0.260 0.673 -1.429 6.991
## symmetric_UMCSENT 388 -0.002 5.186 -21.504 15.132
## symmetric_INDPRO 388 0.133 1.070 -14.181 6.375
## T10YFF 388 1.433 1.311 -1.470 3.800
## NOCDF 388 8.684 14.920 -69.300 46.700
## INDPRO_GR 388 0.133 1.070 -14.181 6.375
## adj_symmetric_AWHMAN 388 0.0002 0.195 -2.091 1.219
## adj_symmetric_ICSA 388 -0.006 0.195 -1.252 2.872
## adj_symmetric_ACOGNO 388 0.023 0.195 -1.592 1.448
## adj_symmetric_NEWORDER 388 0.013 0.195 -0.730 0.582
## adj_symmetric_PERMIT 388 0.002 0.195 -0.957 0.660
## adj_symmetric_M2REAL 388 0.075 0.195 -0.413 2.020
## adj_symmetric_UMCSENT 388 -0.0001 0.195 -0.807 0.568
## adj_T10YFF 388 0.213 0.195 -0.218 0.564
## adj_NOCDF 388 0.113 0.195 -0.903 0.609
## ----------------------------------------------------------------------
Add the adjusted monthly contribution across the components for each month to obtain the growth rate of the index.
# Get the columns with adjusted contributions
adj_columns <- grep("^adj_", names(pchange_results), value = TRUE)
# Calculate the monthly growth rate for each month
pchange_results$new_monthly_growth_rate <- rowSums(pchange_results[, adj_columns])
# Print the results
stargazer(pchange_results, type = "text")
##
## =======================================================================
## Statistic N Mean St. Dev. Min Max
## -----------------------------------------------------------------------
## AWHMAN 388 41.204 0.608 38.500 42.300
## ICSA 388 370,557.200 289,442.900 197,750 4,663,250
## ACOGNO 388 163,147.000 45,534.820 90,386 264,445
## NEWORDER 388 59,213.110 9,194.857 35,258 73,992
## PERMIT 388 1,373.343 403.383 513 2,263
## M2REAL 388 4,132.597 1,579.515 2,305.100 7,656.700
## UMCSENT 388 85.889 13.398 50.000 112.000
## INDPRO 388 91.831 11.242 62.440 104.104
## symmetric_AWHMAN 388 0.001 0.630 -6.775 3.951
## symmetric_ICSA 388 -0.347 11.321 -72.838 167.161
## symmetric_ACOGNO 388 0.274 2.331 -19.073 17.352
## symmetric_NEWORDER 388 0.200 3.036 -11.390 9.090
## symmetric_PERMIT 388 0.062 5.014 -24.655 17.011
## symmetric_M2REAL 388 0.260 0.673 -1.429 6.991
## symmetric_UMCSENT 388 -0.002 5.186 -21.504 15.132
## symmetric_INDPRO 388 0.133 1.070 -14.181 6.375
## T10YFF 388 1.433 1.311 -1.470 3.800
## NOCDF 388 8.684 14.920 -69.300 46.700
## INDPRO_GR 388 0.133 1.070 -14.181 6.375
## adj_symmetric_AWHMAN 388 0.0002 0.195 -2.091 1.219
## adj_symmetric_ICSA 388 -0.006 0.195 -1.252 2.872
## adj_symmetric_ACOGNO 388 0.023 0.195 -1.592 1.448
## adj_symmetric_NEWORDER 388 0.013 0.195 -0.730 0.582
## adj_symmetric_PERMIT 388 0.002 0.195 -0.957 0.660
## adj_symmetric_M2REAL 388 0.075 0.195 -0.413 2.020
## adj_symmetric_UMCSENT 388 -0.0001 0.195 -0.807 0.568
## adj_T10YFF 388 0.213 0.195 -0.218 0.564
## adj_NOCDF 388 0.113 0.195 -0.903 0.609
## new_monthly_growth_rate 388 0.433 0.605 -3.702 2.647
## -----------------------------------------------------------------------
The sum of the adjusted contributions, i.e., growth rates, of the composite indexes are adjusted to equate their trends to that of a coincident index. This is accomplished by adding an adjustment factor, a, to the growth rates of the index each month (it`= it+ a). For example, the trend adjustment factor for the leading index is computed by subtracting its average monthly growth rate (sum over [t] it/ T where T is the number of observations in the sample) from the average monthly growth rate of the coincident index.
# Step 4: Adjust growth rates to match INDPRO
# Using INDPRO to proxy for coincident index
adjust_trend <- function(growth_rate, coincident_index_growth_rate) {
a <- mean(coincident_index_growth_rate) - mean(growth_rate) # Compute adjustment factor
return(growth_rate + a) # Adjusted growth rates
}
pchange_results$New_adj_GROWTHRATE <- adjust_trend(pchange_results$new_monthly_growth_rate,
pchange_results$INDPRO_GR)
The level of the index is computed using the symmetric percent change formula. The index is calculated recursively starting from an initial value of 100 for the first month of the sample period (i.e. January 1959)
# Initialize the leading index levels
leading_index <- numeric(nrow(pchange_results))
leading_index[1] <- 100
# Compute the leading index levels recursively
for (i in 2:nrow(pchange_results)) {
leading_index[i] <- leading_index[i - 1] * (200 + pchange_results$New_adj_GROWTHRATE[i]) / (200 - pchange_results$New_adj_GROWTHRATE[i])
}
# Add the leading index levels to the dataframe
pchange_results$leading_index <- leading_index
# Print the results
stargazer(pchange_results, type = "text")
##
## =======================================================================
## Statistic N Mean St. Dev. Min Max
## -----------------------------------------------------------------------
## AWHMAN 388 41.204 0.608 38.500 42.300
## ICSA 388 370,557.200 289,442.900 197,750 4,663,250
## ACOGNO 388 163,147.000 45,534.820 90,386 264,445
## NEWORDER 388 59,213.110 9,194.857 35,258 73,992
## PERMIT 388 1,373.343 403.383 513 2,263
## M2REAL 388 4,132.597 1,579.515 2,305.100 7,656.700
## UMCSENT 388 85.889 13.398 50.000 112.000
## INDPRO 388 91.831 11.242 62.440 104.104
## symmetric_AWHMAN 388 0.001 0.630 -6.775 3.951
## symmetric_ICSA 388 -0.347 11.321 -72.838 167.161
## symmetric_ACOGNO 388 0.274 2.331 -19.073 17.352
## symmetric_NEWORDER 388 0.200 3.036 -11.390 9.090
## symmetric_PERMIT 388 0.062 5.014 -24.655 17.011
## symmetric_M2REAL 388 0.260 0.673 -1.429 6.991
## symmetric_UMCSENT 388 -0.002 5.186 -21.504 15.132
## symmetric_INDPRO 388 0.133 1.070 -14.181 6.375
## T10YFF 388 1.433 1.311 -1.470 3.800
## NOCDF 388 8.684 14.920 -69.300 46.700
## INDPRO_GR 388 0.133 1.070 -14.181 6.375
## adj_symmetric_AWHMAN 388 0.0002 0.195 -2.091 1.219
## adj_symmetric_ICSA 388 -0.006 0.195 -1.252 2.872
## adj_symmetric_ACOGNO 388 0.023 0.195 -1.592 1.448
## adj_symmetric_NEWORDER 388 0.013 0.195 -0.730 0.582
## adj_symmetric_PERMIT 388 0.002 0.195 -0.957 0.660
## adj_symmetric_M2REAL 388 0.075 0.195 -0.413 2.020
## adj_symmetric_UMCSENT 388 -0.0001 0.195 -0.807 0.568
## adj_T10YFF 388 0.213 0.195 -0.218 0.564
## adj_NOCDF 388 0.113 0.195 -0.903 0.609
## new_monthly_growth_rate 388 0.433 0.605 -3.702 2.647
## New_adj_GROWTHRATE 388 0.133 0.605 -4.003 2.346
## leading_index 388 142.715 25.014 100.000 195.462
## -----------------------------------------------------------------------
The index is re-based to average 100 in the base year.
raw_index <- pchange_results |>
select(date,leading_index)
# create a new column for the leading index
df <- df |>
left_join(raw_index, by = "date")
# Convert the date strings to yearmon objects
base_year <- as.yearmon(c("2016-01-01", "2016-02-01", "2016-03-01",
"2016-04-01", "2016-05-01", "2016-06-01",
"2016-07-01","2016-08-01", "2016-09-01",
"2016-10-01","2016-11-01", "2016-12-01"))
# Filter the data for the base year (2016)
base_year_data <- df |>
filter(date %in% base_year)
# Calculate the average for the base year (January 2016)
base_year_average <- mean(base_year_data$leading_index, na.rm = TRUE)
# Rebase the leading index
df <- df |>
mutate(rebased_leading_index = leading_index * 100 / base_year_average)
# Rebase the leading index
df <- df |>
rename(LEI = rebased_leading_index)
head(df)
## # A tibble: 6 × 13
## date AWHMAN ICSA ACOGNO NOCDFSA066MSFRBPHI NEWORDER PERMIT M2REAL T10YFF
## <yearmo> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Feb 1992 40.7 442200 86445 13 33857 1146 2453. 3.28
## 2 Mar 1992 40.7 429500 90386 9.8 35258 1082 2447. 3.53
## 3 Apr 1992 40.8 418250 90535 20.7 35992 1054 2439. 3.64
## 4 May 1992 40.9 417400 92615 12 36811 1056 2433. 3.55
## 5 Jun 1992 40.8 418750 93632 15.2 36666 1057 2422. 3.46
## 6 Jul 1992 40.8 442750 93275 19.8 36629 1089 2416. 3.5
## # ℹ 4 more variables: UMCSENT <dbl>, INDPRO <dbl>, leading_index <dbl>,
## # LEI <dbl>
# drop Na from first row, this corrosponds to feb 1992
# this is done because of use of symmetric % change cant be computed on the first value
df <- stats::na.omit(as.data.frame(df))
stargazer(df, type = "text")
##
## ==================================================================
## Statistic N Mean St. Dev. Min Max
## ------------------------------------------------------------------
## AWHMAN 388 41.204 0.608 38.500 42.300
## ICSA 388 370,557.200 289,442.900 197,750 4,663,250
## ACOGNO 388 163,147.000 45,534.820 90,386 264,445
## NOCDFSA066MSFRBPHI 388 8.684 14.920 -69.300 46.700
## NEWORDER 388 59,213.110 9,194.857 35,258 73,992
## PERMIT 388 1,373.343 403.383 513 2,263
## M2REAL 388 4,132.597 1,579.515 2,305.100 7,656.700
## T10YFF 388 1.433 1.311 -1.470 3.800
## UMCSENT 388 85.889 13.398 50.000 112.000
## INDPRO 388 91.831 11.242 62.440 104.104
## leading_index 388 142.715 25.014 100.000 195.462
## LEI 388 87.253 15.293 61.138 119.501
## ------------------------------------------------------------------
cmpt_plot(df,"LEI", color = "blue") +
recession_bars(31,34) +
labs(title = "Leading Index")
Unfortunately we only have enough data to construct this index for 3 recessions but we will have to make due. Upon visual inspection the leading index dose the best around the 2008 recession, showing a decline well before the recession and its trough is slightly before the NBER trough date. As expected the leading index dose not do a great job of forecasting the pandemic recession. This is due to this recession being brought on by exogenous factors in the form of a virus. The time proceeding 2001 the recession is just okay. It seems like the index begins to take a dip in late 2000 but the lead time seems much shorter than in 2008. This could be due to some of the conditions surrounding the dot-com bubble that burst in late 2000.
# use kable to display recent LEI values
kable(tail(df)) |>
kableExtra::kable_classic(html_font = "Cambria", full_width = TRUE)
date | AWHMAN | ICSA | ACOGNO | NOCDFSA066MSFRBPHI | NEWORDER | PERMIT | M2REAL | T10YFF | UMCSENT | INDPRO | leading_index | LEI | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
384 | Jan 2024 | 40.2 | 209500 | 243853 | -17.9 | 73527 | 1508 | 6700.7 | -1.27 | 79.0 | 101.4830 | 168.8196 | 103.2126 |
385 | Feb 2024 | 40.5 | 209250 | 249125 | -5.2 | 73929 | 1563 | 6668.9 | -1.12 | 76.9 | 102.7267 | 168.4936 | 103.0132 |
386 | Mar 2024 | 40.7 | 213600 | 252395 | 5.4 | 73774 | 1485 | 6673.2 | -1.12 | 79.4 | 102.5001 | 168.2013 | 102.8346 |
387 | Apr 2024 | 40.6 | 210250 | 254629 | 12.2 | 73940 | 1440 | 6660.6 | -0.79 | 77.2 | 102.4955 | 167.2732 | 102.2672 |
388 | May 2024 | 40.8 | 223000 | 251570 | -7.9 | 73284 | 1399 | 6689.1 | -0.85 | 69.1 | 103.2734 | 165.8758 | 101.4128 |
389 | Jun 2024 | 40.8 | 236800 | 250118 | -2.2 | 73673 | 1454 | 6716.2 | -1.02 | 68.2 | 103.5494 | 165.5864 | 101.2359 |
Important note This data tends to lag behind by about 2 months because some of the serise take longer than others to be puplished by the fed.
The next step is to use the one sided hodrick-presscott filter to estimate the trend of our index. This will give us some important information that is left over in the cycle component AKA the deviations from trend. The HP filter is a robust estimation technique that has been shown to be the best estimation technique outside of the PAT or phase average trend. We don’t have a ton of observations so PAT is not an option in this scenario.
# create data frame for trend calculation
y <- as.data.frame(df$LEI)
# create trend
LEI_hptrend <- hp1(y, lambda = 144000)
# Calculate cycles/ devation from trend
LEI_hpcycle = y - LEI_hptrend
# Adding a new column using mutate
df <- df |>
mutate(LEI_hpcycle = LEI_hpcycle[,1],
LEI_hptrend = LEI_hptrend[,1])
# Define colors and labels for the legend
colors <- c( "deepskyblue")
labels <- c("hptrend")
# Build plot
LEI_hptrend_ggplot = cmpt_plot(df, "LEI" ) +
# Line for coincident_index with legend
geom_line(aes(y = LEI_hptrend , color = "hptrend"),
alpha = 0.7,
show.legend = TRUE) +
# Label graph
labs(x = "Date (monthly)",
y = "Index 2016 = 100",
title = " Leading index and HP trend") +
# Assign colors and labels
scale_color_manual(values = colors, labels = labels)
# recession bars
LEI_hptrend_ggplot = LEI_hptrend_ggplot + recession_bars(31,34)
print(LEI_hptrend_ggplot)
# Build plot
LEI_divhptrend_ggplot = cmpt_plot(df, "LEI_hpcycle", color = "darkturquoise" ) +
# Label graph
labs(x = "Date (monthly)",
y = "Index 2016 = 100",
title = " Leading index devations from HP trend")
# recession bars
LEI_divhptrend_ggplot = LEI_divhptrend_ggplot + recession_bars(31,34)
#display plot
print(LEI_divhptrend_ggplot)
# cacluate year over year growth rate
df <- df|>
mutate(YoY = (LEI - lag(LEI,12)) / lag(LEI,12))
# plot yoy
cmpt_plot(df,"YoY", color = "darkslategray4") +
recession_bars(31,34)+
labs(title= "Leading Index Year over Year Growth Rate")
To get an alternative look at the cyclical portion of the leading index we can use the YoY growth rate. This should basically de-trends the data in a way that is easier to interpret. This plot differs slightly from the one above because it has more sections in the postive values but tells pretty much the same story.
The reason we look at the de-trended values becomes clear when looking at the plots. For example in the YoY plot we can see the leading index growth rate goes negative a few months before each recession. For the more normal recessions we see the growth rate go negative several months beforehand, with (2010 previous 19 months negative) and (2001 previous 7 months negative) beforehand. The leading index would have definitely done a great job of forecasting the 2008 recession but could have missed the 2001 recession, 7 months negative is not a strong signal specifically because there are other periods with negative values that do not result in a recession. The recession triggered by the pandemic is an interesting case that is somewhat of an outlier. This is because the event that caused it exogenous to the economy and our index, we have not included a measure of global pandemic risk. None the less its an interesting event to study more in the future and maybe some information can be obtained.
In this section we will evaluate the Leading index and see how well it aligns with the established recession dates and then we will use the Granger Causality test to see if the leading index causes the INDPRO series that we are using as a proxy for the current state of the economy.
# make data quaterly
df_q <- df |>
select(date, LEI, INDPRO) |>
mutate(quarter = as.yearqtr(date)) |>
group_by(quarter) |>
summarise(LEI = mean(LEI),
INDPRO = mean(INDPRO))
# create tsibble and format data for eval
df_tsibble <- df_q |>
mutate(date = yearquarter(quarter)) |>
select(-quarter) |>
as_tsibble(index = date)
# join our LEI with real GDP on quartly level
ind_eva <- left_join(df_tsibble, rgdp_q, by = "date")
Use the BCDating
package and specifically the
Harding-Pagan method for detecting turning points. The first thing we
will do here is take a look at the LEI index with the official NBER
recession dates. This will allow us to compare with the turning points
detected in our index.
# plot LEI again for comparison
cmpt_plot(df,"LEI", color = "blue") +
recession_bars(31,34) +
labs(title = "Leading Index")
# turning points
turning_points[31:34,]
## # A tibble: 4 × 2
## peak trough
## <yearmon> <yearmon>
## 1 Aug 1990 Mar 1991
## 2 Apr 2001 Nov 2001
## 3 Jan 2008 Jun 2009
## 4 Mar 2020 Apr 2020
# Create time series object because this works with S4 not S3
newind <- ts(ind_eva$LEI,
start = 1993,
end = 2023,
frequency = 4)
# fit model
turns <- BBQ(newind, mincycle = 5, minphase = 2, name = "" )
# Plot turns
plot(turns, main = "Leading Index turing points")
# display turning point
print(turns)
## Peaks Troughs Duration
## 1 1996Q1 1997Q1 4
## 2 1999Q1 1999Q4 3
## 3 2001Q2 2002Q4 6
## 4 2007Q1 2010Q1 12
## 5 2019Q4 2021Q2 6
To preform the Granger causality test we need to make sure that our data is stationary. This will be done through examining the results from the Autocorrelation function and the KPSS unit root test. Then we will difference the data as needed.
We will inspect the data further so that we can gather more infomrmation needed to model. This is an exploratory step thats needed to used the granger test.
# ues autcorrlation function & plot
ind_eva |>
ACF(LEI) |>
autoplot()+
labs(title = "ACF of LEI")
# use partial auto-correlation function & plot
ind_eva |>
PACF(LEI) |>
autoplot()+
labs(title = "PACF of LEI ")
# preform kpss test
ind_eva |>
features(LEI, unitroot_kpss)
## # A tibble: 1 × 2
## kpss_stat kpss_pvalue
## <dbl> <dbl>
## 1 2.46 0.01
# use autcorrlation function on diffrenced data & plot
ind_eva |>
ACF(difference(LEI)) |>
autoplot()+
labs(title = "ACF of LEI")
# use partial autocorrelation function on diffrenced data & plot
ind_eva |>
PACF(difference(LEI)) |>
autoplot()+
labs(title = "PACF of LEI ")
ind_eva |>
features(difference(LEI), unitroot_kpss)
## # A tibble: 1 × 2
## kpss_stat kpss_pvalue
## <dbl> <dbl>
## 1 0.127 0.1
# preform kpss test
ind_eva <- ind_eva |>
mutate(LEI_diff = difference(LEI))
# plot first diffrenced LEI
cmpt_plot(ind_eva, "LEI_diff") +
recession_bars(31,34) +
labs(title = "First diffrenced LEI ")
# use autocorrelation function and plot
ind_eva |>
ACF(GDPC1) |>
autoplot()+
labs(title = "ACF of Real GDP")
# use partical autocorrelation function and plot
ind_eva |>
PACF(GDPC1) |>
autoplot()+
labs(title = "PACF of Real GDP")
# preform kpss test
ind_eva |>
features(GDPC1, unitroot_kpss)
## # A tibble: 1 × 2
## kpss_stat kpss_pvalue
## <dbl> <dbl>
## 1 2.64 0.01
# use autocorrelation function on diffenced data & plot
ind_eva |>
ACF(difference(GDPC1)) |>
autoplot()+
labs(title = "ACF of Real GDP")
# use partical autocorrelation function on diffrenced data & plot
ind_eva |>
PACF(difference(GDPC1)) |>
autoplot()+
labs(title = "PACF of Real GDP")
# kpss test on diffrenced data
ind_eva |>
features(difference(GDPC1), unitroot_kpss)
## # A tibble: 1 × 2
## kpss_stat kpss_pvalue
## <dbl> <dbl>
## 1 0.0784 0.1
ind_eva <- ind_eva |>
mutate(GDPC1_diff = difference(GDPC1))
# Plot diffrened GDP
cmpt_plot(ind_eva, "GDPC1_diff") +
recession_bars(31,34) +
labs(title = "First diffrenced Real GDP")
# create time series objects for model cause its incompatible with S3
LEI_ts <- ts(ind_eva $LEI_diff,
start = 1993,
end = 2023,
frequency = 12)
GDPC1_ts <- ts(ind_eva $GDPC1,
start = 1993,
end = 2023,
frequency = 12)
# fit test
cause_test <- lmtest::grangertest(LEI_ts, GDPC1_ts, order=3, test="F")
print(cause_test)
## Granger causality test
##
## Model 1: GDPC1_ts ~ Lags(GDPC1_ts, 1:3) + Lags(LEI_ts, 1:3)
## Model 2: GDPC1_ts ~ Lags(GDPC1_ts, 1:3)
## Res.Df Df F Pr(>F)
## 1 342
## 2 345 -3 4.1956 0.006192 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The Granger causality test shows that the LEI causes GDP. This finding indicates that we can use the lEI to understand where the economy is going to be in the business cycle. An important thing to remember is that we are using INDPRO to proxy for the state of the economy and this is just an approximation. With that being said this Leading Index of Economic Indicators dose a reasonably good job at forecasting movements in the economy
1.) https://www.conference-board.org/topics/consumer-confidence
3.) https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8814802/
4.) https://link.springer.com/referenceworkentry/10.1007/978-94-007-0753-5_542
5.) https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7882057/
6.) https://www.econstor.eu/bitstream/10419/217935/1/sajbm-v17i3-1048.pdf
9.) https://www.jstor.org/stable/1209163
10.) https://acta.uni-obuda.hu/Bilan_Gavurova_Stanislaw_Tkacova_78.pdf
11.) https://www.joim.com/wp-content/uploads/emember/downloads/p0671.pdf
13.) https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2730429
Data sources
14.) https://www.philadelphiafed.org/surveys-and-data/regional-economic-analysis/state-coincident - Not gonna use: State coincident indexes
15.) https://www.conference-board.org/data/bci/index.cfm?id=2160 - Description of components
16.) https://www.federalreserve.gov/data.htm - Usefull data hub
18.) https://fred.stlouisfed.org/categories - Also Usfull data hub
19.) Using the same file from class for the historic calculation of coincident index with new variable added.
20.) https://www.frbsf.org/research-and-insights/data-and-indicators/daily-news-sentiment-index/ - This is daily news sentiment!