ECON-6635-01 - Business Forecasting Final Fall 2024

Pompea College of Business - University of New Haven

Aoibhín Laverty | Student ID: 00900025 | MS Business Analytics

Deliverable:

Prepare a comprehensive written analysis that includes insights, tables, graphs (using ggplot), and supporting code.

Diffusion Confusion:

  1. Select three economic variables for the United States.
  2. Create a Diffusion Index and plot it using ggplot with a built-in smoother or one that you create yourself.
  3. Compare your Diffusion Index with the Chicago Fed National Activity Index (CNAIDIFF). This comparison should include:
    • Creating a ggplot to display the two series side by side.
    • Calculating the correlation coefficient between the two indices.
  4. To what extent does your Diffusion Index align with the Chicago Fed National Activity Index? What general insights or trends can you identify?

Introduction

Understanding the underlying dynamics of economic variables is crucial in explaining economic growth, inflation, and other economic factors/trends. This report looks at three economic variables for the United States through the development and analysis of a diffusion index and compares and contrasts the US index against a diffusion index for Chicago.

Report Objective

The primary objectives of this study are to:

  • Construct a diffusion index using selected economic variables.
  • Construct a diffusion index using the Chicago data.
  • Analyze the economic trends and movements from the diffusion indices.
  • Compare US diffusion index with the Chicago Fed National Activity Index.

This report analyses economic activity trends using diffusion indices for both the United States and Chicago. The diffusion indices will measure the proportion of variables showing positive and negative changes over a period of time (2010-01 to 2024-10) which will provide us with insights into the economic trends. The data used in this analysis is sourced from FRED (Federal Reserve Economic Data):

  • USA:
    • Consumer Spending (PCE)
    • Employment (PAYEMS)
    • Industrial Production (INDPRO)

These variables were chosen because they are among the top 10 U.S. Economic Indicators [1], providing valuable insights into the USA’s overall economic health and trends.

  • Chicago
    • Chicago Fed National Activity Index (CFNAIDIFF)

Analysis Conducted

Data Retrieval:

  • The monthly data for the US and Chicago diffusion Indices were downloaded from FRED using the tickers for the variables mentioned above.

Data Preparation:

  • Selecting a specific window/time frame for the analysis
  • Converting the data to a time series
  • Combining the data into a dataframe
  • Calculating the first differences to capture changes
  • Checking for N/A values
  • Converting the values to 1 or -1 based on if they are positive (> 0) or negative (<= 0)

Construction of the Diffusion Indices:

  • Calculating the Diffusion Indices
  • Combining the calculated data
  • Creating a data frame of the indexed data

Visualization of the Diffusion Indices:

  • Generating static & interactive plots
  • Creating a moving average smoother
    • Generating a date sequence for time series analysis
    • Combined the two indices into one visual. _________________________________________________________________________________________________________________________

Part 1: Set Up

Clear environment and set options

set.seed(26)
options(digits = 3, scipen = 99999)  # Remove scientific numbering
remove(list = ls())  # Clear current environment
graphics.off()  # Close all open graphics windows

Load required libraries

suppressWarnings({
  suppressPackageStartupMessages({
    library(markovchain)  # For Markov chain analysis
    library(tidyverse)    # For data manipulation and visualization
    library(quantmod)     # For financial data retrieval
    library(tsbox)        # For time series manipulation
    library(vars)         # For Vector Autoregression analysis
    library(TSstudio)     # For time series tools
    library(ggthemes)     # For additional ggplot themes
    library(zoo)          # For handling irregular time series data and advanced time series operations
    library(dygraphs)     # For interactive time series plotting and visualization
  })
})

Part 2: USA Diffusion Index

Retrieve Monthly Time Series Data From FRED (Federal Reserve Economic Data)

getSymbols(c("PCE", "PAYEMS", "INDPRO"), 
           freq = "monthly", 
           src = "FRED", return.class = 'xts',
           index.class  = 'Date',
           from = "2010-01-01",
           to = Sys.Date(),
           periodicity = "monthly")
## [1] "PCE"    "PAYEMS" "INDPRO"
Con_Spend = PCE 
Employ = PAYEMS
Indus_Prod = INDPRO

Summarize The Time Series Data

ts_info(Con_Spend)
##  The Con_Spend series is a xts object with 1 variable and 178 observations
##  Frequency: monthly 
##  Start time: 2010-01-01 
##  End time: 2024-10-01
ts_info(Employ)
##  The Employ series is a xts object with 1 variable and 179 observations
##  Frequency: monthly 
##  Start time: 2010-01-01 
##  End time: 2024-11-01
ts_info(Indus_Prod)
##  The Indus_Prod series is a xts object with 1 variable and 178 observations
##  Frequency: monthly 
##  Start time: 2010-01-01 
##  End time: 2024-10-01
  • All variables are in monthly frequency.
  • PCE(Con_Spend) and INDPRO (Indus_Prod) start and end within the same time frame (2010-01-01 to 2024-10-01).
  • PAYEMS (Employ) has a different end time to the other variables (2024-11-01)

Set All Variables To Be Within The Same Timeframe & Convert To Timeseries

Con_Spend_sw = Con_Spend["2010-01-01/2024-10-01"] |> ts_ts() 
Employ_sw = Employ["2010-01-01/2024-10-01"] |> ts_ts() 
Indus_Prod_sw = Indus_Prod["2010-01-01/2024-10-01"] |> ts_ts() 

Summarize The Time Series Data

ts_info(Con_Spend_sw)
##  The Con_Spend_sw series is a ts object with 1 variable and 178 observations
##  Frequency: 12 
##  Start time: 2010 1 
##  End time: 2024 10
ts_info(Employ_sw)
##  The Employ_sw series is a ts object with 1 variable and 178 observations
##  Frequency: 12 
##  Start time: 2010 1 
##  End time: 2024 10
ts_info(Indus_Prod_sw)
##  The Indus_Prod_sw series is a ts object with 1 variable and 178 observations
##  Frequency: 12 
##  Start time: 2010 1 
##  End time: 2024 10
  • All variables are in monthly (12) frequency.
  • All variables now have the same start and end times (2010 01 to 2024 10).

Combine The Data Into One Dataframe

mydata = cbind.data.frame(Con_Spend_sw, Employ_sw, Indus_Prod_sw)
head(mydata,3)
##   Con_Spend_sw Employ_sw Indus_Prod_sw
## 1        10056    129795          89.2
## 2        10093    129702          89.5
## 3        10156    129865          90.1

Calculate The First Differences

mydf = mydata %>% 
  mutate(
    Con_Spend_d1 = tsibble::difference(Con_Spend_sw, differences = 1),
    Employ_d1 = tsibble::difference(Employ_sw, differences = 1),
    Indus_Prod_d1 = tsibble::difference(Indus_Prod_sw, differences = 1)
  ) %>% 
  dplyr::select(c(Con_Spend_d1, Employ_d1, Indus_Prod_d1)) |> na.omit()

Check For Any N/A Values

colSums(is.na(mydf)) # Check for any remaining N/As
##  Con_Spend_d1     Employ_d1 Indus_Prod_d1 
##             0             0             0
head(mydf,3)
##   Con_Spend_d1 Employ_d1 Indus_Prod_d1
## 2         37.3       -93         0.315
## 3         62.6       163         0.631
## 4         26.3       250         0.325

Convert The Values To 1 Or -1 Based On If They Are Positive (> 0) Or Negative (<= 0)

mydf_df = ifelse(mydf > 0, 1, -1 )
head(mydf_df,10)
##    Con_Spend_d1 Employ_d1 Indus_Prod_d1
## 2             1        -1             1
## 3             1         1             1
## 4             1         1             1
## 5             1         1             1
## 6             1        -1             1
## 7             1        -1             1
## 8             1         1             1
## 9             1        -1             1
## 10            1         1            -1
## 11            1         1             1
table(mydf_df)
## mydf_df
##  -1   1 
##  92 439

Calculate The Diffusion Index

pos = apply(mydf_df, 1,  function(row) sum(row>0) ) # counts the positive
neg = apply(mydf_df, 1,  function(row) sum(row<0) ) # counts the negatives
tot = pos + neg
index_USA = (pos/tot - neg/tot)*100
table(index_USA)
## index_USA
##              -100 -33.3333333333333  33.3333333333333               100 
##                 2                 7                72                96

Combining The Data

# Signs, Positives, Negatives, Total(pos + neg) and index (pos/tot - neg/tot)*100
Combined_data = cbind(mydf_df, pos, neg, tot, index_USA)
head(Combined_data,10)
##    Con_Spend_d1 Employ_d1 Indus_Prod_d1 pos neg tot index_USA
## 2             1        -1             1   2   1   3      33.3
## 3             1         1             1   3   0   3     100.0
## 4             1         1             1   3   0   3     100.0
## 5             1         1             1   3   0   3     100.0
## 6             1        -1             1   2   1   3      33.3
## 7             1        -1             1   2   1   3      33.3
## 8             1         1             1   3   0   3     100.0
## 9             1        -1             1   2   1   3      33.3
## 10            1         1            -1   2   1   3      33.3
## 11            1         1             1   3   0   3     100.0

Create A Dataframe Of The Indexed Data

diff_index_USA = data.frame(x = 1:length(index_USA), y = index_USA)
head(diff_index_USA,10)
##     x     y
## 2   1  33.3
## 3   2 100.0
## 4   3 100.0
## 5   4 100.0
## 6   5  33.3
## 7   6  33.3
## 8   7 100.0
## 9   8  33.3
## 10  9  33.3
## 11 10 100.0

Visualize The US Economic Diffusion Index

ggplot(diff_index_USA, aes(x = x, y = y)) +
  geom_line() +  
  geom_hline(yintercept = 0, color = "darkblue") +  
  labs(
    title = "US Economic Diffusion Index", 
    x = "Time",         
    y = "Diffusion Index"          
  )

Create A 7 Month Moving Average Smoother

ma_index_USA = zoo::rollmean(index_USA, 7, align = "right")

The US Economic Diffusion Index With Built In Moving Average

length(index_USA)
## [1] 177
plot(index_USA[8:177], type = "l")
abline(a = 0, b = 0, col = "darkblue")
lines(ma_index_USA, col = "darkred", lwd = 2.5)

Visualize The US Economic Diffusion Index With Adjustable Moving Average

dygraphs::dygraph(diff_index_USA, main = "US Economic Diffusion Index", 
                  ylab = "Diffusion Index") %>%
  dyOptions(
    fillGraph=FALSE, 
    drawGrid = TRUE, 
    colors=c("darkred", "blue")) %>%
  dyRangeSelector() %>%
  dyCrosshair(direction = "vertical") %>%
  dyRoller(rollPeriod = 7, showRoller = TRUE) %>%
  dyHighlight(
    highlightCircleSize = 5, 
    highlightSeriesBackgroundAlpha = 0.5, 
    hideOnMouseOut = FALSE
    
  )

Create a Date Sequence for Time Series Analysis

Date = seq.Date(from = as.Date("2010-05-01"), length.out = 177, by = "month")

Combine Date and Index Data into a Data Frame

Diff_index_USA = cbind.data.frame(Date, index_USA)

Visualize The US Economic Time Series Diffusion Index With A Static Plot

ggplot(Diff_index_USA, aes(x = Date, y = index_USA)) +
  geom_line()+
  geom_smooth(colour = "darkblue") +
  labs( title = "USA Economy") +
  xlab("Years") +
  ylab("Change")+
  theme(axis.line.x = element_line(size= 0.75, colour = "black"), 
        axis.line.y = element_line(size= 0.75, colour = "black"), 
        legend.position = "bottom", 
        legend.direction = "horizontal", element_blank()) +
  theme_tufte()

Visualize The US Economic Time Series Diffusion Index With An Interactive Plot

dygraphs::dygraph(Diff_index_USA, main = "US Economic Diffusion Index", 
                  ylab = "Change") %>%
  dyOptions(
    fillGraph=FALSE, 
    drawGrid = TRUE, 
    colors=c("darkred", "darkblue")) %>%
  dyRangeSelector() %>%
  dyCrosshair(direction = "vertical") %>%
  dyRoller(rollPeriod = 7, showRoller = TRUE) %>%
  dyHighlight(
    highlightCircleSize = 5, 
    highlightSeriesBackgroundAlpha = 0.5, 
    hideOnMouseOut = FALSE
    
  )

Analysis Of The US Diffusion Index

  • There are more periods of positive economic movement / economic growth (1) with 439 observations than negative economic movement / economic slowdowns (-1) with 92 observations (table(mydf_df))
  • The US economy reached highs of around 90 during periods of strong economic growth, and dropped below 10 during periods of economic downturns.
  • The trends in the diffusion index show regular fluctuations, with a severe economic slowdown visible around early 2020, which could be due to COVID-19’s impact on the economy.

Part 3: Chicago Diffusion Index

Retrieve Monthly Time Series Data From FRED (Federal Reserve Economic Data)

getSymbols(c("CFNAIDIFF" ), 
           freq = "monthly", 
           src = "FRED", return.class = 'xts',
           index.class  = 'Date',
           from = "2010-01-01",
           to = Sys.Date(),
           periodicity = "monthly")
## [1] "CFNAIDIFF"
CDI = CFNAIDIFF

Summarize The Time Series Data

ts_info(CDI)
##  The CDI series is a xts object with 1 variable and 178 observations
##  Frequency: monthly 
##  Start time: 2010-01-01 
##  End time: 2024-10-01
  • CDI is in monthly frequency.
  • Start and end time is 2010-01-01 to 2024-01-01 (same as the window set for the US Variables Above.)

Convert To Timeseries

CDI_sw = ts_ts(CDI)
ts_info(CDI_sw)
##  The CDI_sw series is a ts object with 1 variable and 178 observations
##  Frequency: 12 
##  Start time: 2010 1 
##  End time: 2024 10

Convert To A Dataframe

CDI_data = as.data.frame(CDI_sw) |> na.omit()

head(CDI_data,3)
##       x
## 1  0.05
## 2 -0.09
## 3  0.09

Convert The Values To 1 Or -1 Based On If They Are Positive (> 0) Or Negative (<= 0)

CDI_df = ifelse(CDI_data > 0, 1, -1 )
table(CDI_df)
## CDI_df
## -1  1 
## 92 86
head(CDI_df,10)
##     x
## 1   1
## 2  -1
## 3   1
## 4   1
## 5   1
## 6   1
## 7   1
## 8   1
## 9  -1
## 10 -1

Calculate The Diffusion Matrix

pos_CDI = apply(CDI_df, 1,  function(row) sum(row>0) ) # counts the positive
neg_CDI = apply(CDI_df, 1,  function(row) sum(row<=0) ) # counts the negatives
tot_CDI = pos_CDI + neg_CDI
index_CDI = (pos_CDI/tot_CDI - neg_CDI/tot_CDI)*100 
table(index_CDI)
## index_CDI
## -100  100 
##   92   86

Combine The Data

Combined_data_CDI = cbind(CDI_df, pos_CDI, neg_CDI, tot_CDI, index_CDI)
head(Combined_data_CDI,10)
##     x pos_CDI neg_CDI tot_CDI index_CDI
## 1   1       1       0       1       100
## 2  -1       0       1       1      -100
## 3   1       1       0       1       100
## 4   1       1       0       1       100
## 5   1       1       0       1       100
## 6   1       1       0       1       100
## 7   1       1       0       1       100
## 8   1       1       0       1       100
## 9  -1       0       1       1      -100
## 10 -1       0       1       1      -100

Create A Dataframe Of The Indexed Data

diff_index_CDI = data.frame(x = 1:length(index_CDI), y = index_CDI) 
head(diff_index_CDI,10) 
##     x    y
## 1   1  100
## 2   2 -100
## 3   3  100
## 4   4  100
## 5   5  100
## 6   6  100
## 7   7  100
## 8   8  100
## 9   9 -100
## 10 10 -100

Visualize The Chicago Diffusion Index

ggplot(diff_index_CDI, aes(x = x, y = y)) +
  geom_line() +  
  geom_hline(yintercept = 0, color = "darkblue") +  
  labs(
    title = "Chicago Diffusion Index", 
    x = "Time",         
    y = "Diffusion Index"          
  )

Create A 7 Month Moving Average Smoother

ma_index_CDI = zoo::rollmean(index_CDI, 7, align = "right")

The Chicago Diffusion Index With Built In Moving Average

length(index_CDI)
## [1] 178
plot(index_CDI[8:178], type = "l")
abline(a = 0, b = 0, col = "darkblue")
lines(ma_index_CDI, col = "darkred", lwd = 2.5)

The Chicago Diffusion Index With Adjustable Moving Average

dygraphs::dygraph(diff_index_CDI, main = "Chicago Diffusion Index", 
                  ylab = "Diffusion Index") %>%
  dyOptions(
    fillGraph=FALSE, 
    drawGrid = TRUE, 
    colors=c("darkred", "blue")) %>%
  dyRangeSelector() %>%
  dyCrosshair(direction = "vertical") %>%
  dyRoller(rollPeriod = 7, showRoller = TRUE) %>%
  dyHighlight(
    highlightCircleSize = 5, 
    highlightSeriesBackgroundAlpha = 0.5, 
    hideOnMouseOut = FALSE
    
  )

Create a Date Sequence for Time Series Analysis

length(index_CDI)
## [1] 178
Date_CDI = seq.Date(from = as.Date("2010-05-1"), length.out = 178, by = "month")
length(Date_CDI)
## [1] 178

Combine Date and Index Data into a Data Frame

Diff_index_CDI = cbind.data.frame(Date_CDI, index_CDI)

Visualize The Chicago Time Series Diffusion Index With A Static Plot

ggplot(Diff_index_CDI,aes(x = Date_CDI, y = index_CDI)) +
  geom_line()+
  geom_smooth(colour = "darkblue") +
  labs( title = "CDI") +
  xlab("Years") +
  ylab("Change")+
  theme(axis.line.x = element_line(size= 0.75, colour = "black"), 
        axis.line.y = element_line(size= 0.75, colour = "black"), 
        legend.position = "bottom", 
        legend.direction = "horizontal", element_blank()) +
  theme_tufte()

Visualize The Chicago Time Series Diffusion Index With An Interactive Plot

dygraphs::dygraph(Diff_index_CDI, main = "Chicago Economic Diffusion Index", 
                  ylab = "Change") %>%
  dyOptions(
    fillGraph=FALSE, 
    drawGrid = TRUE, 
    colors=c("darkred", "darkblue")) %>%
  dyRangeSelector() %>%
  dyCrosshair(direction = "vertical") %>%
  dyRoller(rollPeriod = 7, showRoller = TRUE) %>%
  dyHighlight(
    highlightCircleSize = 5, 
    highlightSeriesBackgroundAlpha = 0.5, 
    hideOnMouseOut = FALSE
    
  )

Analysis Of The Chicago Diffusion Index

  • There are more period of negative economic movement / economic slowdowns (-1) with 92 observations than positive economic movement / economic growth (1) with 86 observations (table(CDI_df))
  • There are several peaks in Chicago economy at around around 100 in 2012,2018 and 2022, indicating very strong economic conditions during these times. However, after each of these periods the index sharply declines, dropping to around -100 during periods of economic downturns in Chicago.
  • These downturns could be due to many factors such as the increase in taxes in the state, the impact Covid19 had on tourism, and safety concerns [3].

Part 4: Comparision of Difission Indices

Combine The Two Indices (USA & Chicago) Into A Dataframe

Combined_USA_CDI = data.frame(
  Date = Diff_index_USA$Date,
  US_Index = Diff_index_USA$index,
  CDI_Index = Diff_index_CDI$index_CDI[1:length(Diff_index_USA$Date)]
)

head(Combined_USA_CDI,10)
##          Date US_Index CDI_Index
## 1  2010-05-01     33.3       100
## 2  2010-06-01    100.0      -100
## 3  2010-07-01    100.0       100
## 4  2010-08-01    100.0       100
## 5  2010-09-01     33.3       100
## 6  2010-10-01     33.3       100
## 7  2010-11-01    100.0       100
## 8  2010-12-01     33.3       100
## 9  2011-01-01     33.3      -100
## 10 2011-02-01    100.0      -100

Correlation Between The Indices

cor_value <- cor(Combined_USA_CDI$US_Index, Combined_USA_CDI$CDI_Index)
cor_value
## [1] 0.102

The correlation between the US Economic Index and the Chicago Diffusion Index is 0.01, which indicates a very weak positive relationship. This suggests that changes in one index are not significantly associated with changes in the other, implying that the indices do not move together in a meaningful way.

Visualize The USA Economic & Chicago Time Series Diffusion Index With A Static Plot

ggplot(Combined_USA_CDI, aes(x = Date)) +
  geom_line(aes(y = US_Index, color = "US Index")) +
  geom_line(aes(y = CDI_Index, color = "Chicago Index")) +
  geom_smooth(aes(y = US_Index, linetype = "US Index"), color = "darkblue", linetype = "dashed") +
  geom_smooth(aes(y = CDI_Index, linetype = "Chicago Index"), color = "darkred", linetype = "dashed") +
  labs(title = "USA Economic & Chicago Time Series Diffusion Index") +
  xlab("Years") +
  ylab("Change") +
  scale_color_manual(values = c("US Index" = "blue", "Chicago Index" = "red")) +
  scale_linetype_manual(values = c("US Index" = "dashed", "Chicago Index" = "dashed")) +
  theme(axis.line.x = element_line(size = 0.75, colour = "black"), 
        axis.line.y = element_line(size = 0.75, colour = "black"), 
        legend.position = "bottom", 
        legend.direction = "horizontal") +
  theme_tufte()

Visualize The USA Economic & Chicago Time Series Diffusion Index With An Interactive Plot

dygraphs::dygraph(Combined_USA_CDI, main = "USA & Chicago Economic Diffusion Index", 
                  ylab = "Change") %>%
  dyOptions(
    fillGraph=FALSE, 
    drawGrid = TRUE, 
    colors=c("darkred", "darkdarkblue")) %>%
  dyRangeSelector() %>%
  dyCrosshair(direction = "vertical") %>%
  dyRoller(rollPeriod = 7, showRoller = TRUE) %>%
  dyHighlight(
    highlightCircleSize = 5, 
    highlightSeriesBackgroundAlpha = 0.5, 
    hideOnMouseOut = FALSE
    
  )

Analysis Of The Combined US & Chicago Diffusion Indices

  • The correlation coefficient is positive but very weak at 0.102, suggesting almost no linear relationship between the two indices. This implies that the economic movements overall in the US and those in Chicago are largely independent.
  • Although US overall experience many of the same fluctuations as Chicago during the same time periods it did not experience it to the same level of severity that Chicago did. The Chicago index exhibits even more pronounced volatility compared to the US index.

Key Takeaways

  • The weak correlation highlights the importance of looking at both local and national economic indicators
  • The volatility and sensitivity of the Chicago index relative to the US index suggests the local economy may be more vulnerable to economic shifts and disruptions.
  • The severe drop in the combined index in 2020 depicts the dramatic impact Covid19 had on both national and regional economies.

Conclusion

This report constructed and compared the diffusion indices for 3 key economic variables in the United States and the Chicago Index. By examining these indices, we were able to observe periods of both positive and negative economic movements during the period of January 2010 to October 2024. Both diffusion indices exhibited significant fluctuations, including a marked downturn caused by the COVID-19 pandemic. However the Chicago Index exhibited more volatility than the U.S Index demonstrating that in order to fully understand the health of the economy you need to look at both regional and national trends as they can be impacted by other varying factors.


References

[1] Thangavelu, P. (2024, October). Top 10 U.S. economic indicators. Investopedia. https://www.investopedia.com/articles/personal-finance/020215/top-ten-us-economic-indicators.asp

[2] Federal Reserve Bank of St. Louis. (1991). Welcome to fred, Federal Reserve Economic Data. your trusted source for economic data since 1991. FRED. https://fred.stlouisfed.org/

[3] Fr, A. (2024, October 21). The economic impact: Chicago struggles with Tourism and Investment. www.championvet.ca. https://www.championvet.ca/aiseo/the-economic-impact-chicago-struggles-with-tourism-and