MEGHA PATIL
BUSINESS ANALYTICS
UNIVERSITY OF NEW HAVEN
POMPEA SCHOOL OF BUSINESS
For the Diffusion Index Analysis in this report I have picked up the
3 economic variables from FRED i.e Housing Starts , Industrial
Production & Unemployment Rate and then created a Diffusion Index
between these 3 with a built-in smoother.Later have compared the
Diffusion Index with the Chicago Fed National Activity Index.
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.3.3
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.3.3
library(lubridate)
## Warning: package 'lubridate' was built under R version 4.3.3
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library(readr)
## Warning: package 'readr' was built under R version 4.3.3
Loading Data Sets
houst <- read.csv("C:/Users/Lenovo/Downloads/HOUST.csv")
indpro <- read.csv("C:/Users/Lenovo/Downloads/INDPRO.csv")
unrate <- read.csv("C:/Users/Lenovo/Downloads/UNRATE.csv")
cfnaidiff <- read.csv("C:/Users/Lenovo/Downloads/CFNAIDIFF.csv")
Converting Data Columns to Data Type
houst$DATE <- as.Date(houst$DATE)
indpro$DATE <- as.Date(indpro$DATE)
unrate$DATE <- as.Date(unrate$DATE)
cfnaidiff$DATE <- as.Date(cfnaidiff$DATE)
Filtering of data from 2017-2024
houst <- houst %>% filter(DATE >= as.Date("2017-01-01") & DATE <= as.Date("2024-12-31"))
indpro <- indpro %>% filter(DATE >= as.Date("2017-01-01") & DATE <= as.Date("2024-12-31"))
unrate <- unrate %>% filter(DATE >= as.Date("2017-01-01") & DATE <= as.Date("2024-12-31"))
cfnaidiff <- cfnaidiff %>% filter(DATE >= as.Date("2017-01-01") & DATE <= as.Date("2024-12-31"))
Calculating montly changes for Diffusion Index
houst_diff <- houst %>% mutate(Change = HOUST - lag(HOUST))
indpro_diff <- indpro %>% mutate(Change = INDPRO - lag(INDPRO))
unrate_diff <- unrate %>% mutate(Change = UNRATE - lag(UNRATE))
Calculating Diffusion Index
diffusion_index <- data.frame(
DATE = houst$DATE,
Diffusion = rowMeans(cbind(
houst_diff$Change > 0,
indpro_diff$Change > 0,
unrate_diff$Change < 0
), na.rm = TRUE) - 0.5
)
Plotting of Diffusion Index
ggplot(diffusion_index, aes(x = DATE, y = Diffusion)) +
geom_line(color = "blue") +
geom_smooth(method = "loess", se = FALSE) +
geom_hline(yintercept = 0, linetype = "dashed") +
labs(title = "Diffusion Index (2017-2024)",
y = "Diffusion Index", x = "Year") +
theme_classic()
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 1 row containing non-finite outside the scale range
## (`stat_smooth()`).
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).

Here in the above graph we have used the data from the year
2017-2024 . The Diffusion Index reflets significant variability and
cyclical behaviour over the period from 2017-2024. The blue trendline
indicates an overall downward path from 2017-2019 , followed by a peak
around 2020 ,and then a steady decline from 2022 onward. These trends
imply that the index is sensitive to larger economic cycles which are
probably influenced by macroeconomic variables like expansion,
contraction, or changes in policy.
diffusion_index$DATE <- as.Date(diffusion_index$DATE)
cfnaidiff$DATE <- as.Date(cfnaidiff$DATE)
start_date <- max(min(diffusion_index$DATE), min(cfnaidiff$DATE))
end_date <- min(max(diffusion_index$DATE), max(cfnaidiff$DATE))
diffusion_index_filtered <- diffusion_index %>%
filter(DATE >= start_date & DATE <= end_date)
cfnaidiff_filtered <- cfnaidiff %>%
filter(DATE >= start_date & DATE <= end_date)
Calculating the correlation coefficient
correlation <- cor(diffusion_index_filtered$Diffusion, cfnaidiff_filtered$CFNAIDIFF, use = "complete.obs")
print(paste("Correlation Coefficient:", round(correlation, 3)))
## [1] "Correlation Coefficient: 0.373"
A Correlation coefficent of 0.37 indicates a weak to moderate
positive linear relationship between the variables.
As the Diffusion Index increases , the CFNAIDIFF also tends to
increase , but not very perfectly.
ggplot() +
geom_line(data = diffusion_index_filtered, aes(x = DATE, y = Diffusion, color = "Diffusion Index")) +
geom_line(data = cfnaidiff_filtered, aes(x = DATE, y = CFNAIDIFF, color = "CFNAIDIFF")) +
labs(title = "Comparison of Diffusion Index and CFNAIDIFF",
x = "Year", y = "Index Value", color = "Series") +
geom_hline(yintercept = 0, linetype = "dashed") +
theme_classic()
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).

From the above graph it indicates that we have got 2 lines in which
the blue line indicates Diffusion Index and red line indicates CFNAIDIFF
Index .The blue line measure how many parts of the economy are
imporving. it moves up & down a lot , meaning that at times , many
areas of the economy are doing well.The red line measures the overall
strength of the economy.
The Diffusion Index and CFNAIDIFF show similar cyclical trends
around the zero line (dotted line) representing periods of economic
growth and decline. While the more volatile Diffusion Index responds to
short-term fluctuations , the smoother looking CFNAIDIFF seems to
capturee wider trends . Despite having different sensitives , their
moderate correlation (0.37) suggests that they share influences.
The graph shows that both the Diffusion Index and CFNAIDIFF are
close to zero for the current year. This means the economy is neither
growing strongly nor shrinking a lot it’s kind of stable or neutral
right now. Things don’t look too bad, but there’s no strong improvement
either.