Economic Regime Classification

1. Perparing Data

Totally we have totally 13 indices reflecting US economic market performance in this study. Those indices are listed in the code chunk below, in which you will see the name of each index and what it represent for.

We should notice that the frequency of some series vary from each other. To solve the problem, the frequency of all indices used in this study was chosen as quarterly. Besides, to solve the inconsistency of time length, i.e. different starting time of all indices, I used the subset of series which starts in 1960 Q1 and ends in 2019 Q4, since this series has the largest common length.

All index series are standardized before we do any statistical analysis, that is, centering each series (makes its mean equal to 0) and multiplying a constant to each series to make its variances equal to 1. For example, the value scale of Private Housing Units Permits Total SAAR (thousands) is much greater than else. By standardized the data series, we may ignore how magnificent of its value but the relationship between it and other series. Taking advantages of those internal relationship, we can group economic time periods into different regimes and learn how to make strategies for different economic regimes.

##       indices         
##  [1,] "EHGDUS Index"  
##  [2,] "CPI YOY Index" 
##  [3,] "CPI CHNG Index"
##  [4,] "EHUPUS Index"  
##  [5,] "IP CHNG Index" 
##  [6,] "NHSPATOT Index"
##  [7,] "NFP TCH Index" 
##  [8,] "TMNOCHNG Index"
##  [9,] "LEI TOTL Index"
## [10,] "PITL YOY Index"
## [11,] "CICRTOT Index" 
## [12,] "USCABAL Index" 
## [13,] "M2% YOY Index" 
##       des                                                                          
##  [1,] "US Real GDP (QoQ, %, SAAR)"                                                 
##  [2,] "US CPI (inflation) Urban Consumer YoY NSA"                                  
##  [3,] "US CPI (inflation) Urban Consumer MoM SA"                                   
##  [4,] "US Unemployment Rate (%)"                                                   
##  [5,] "US Industrial Production MoM SA"                                            
##  [6,] "Private Housing Units Permits Total SAAR (thousands)"                       
##  [7,] "US Employment on Nonfarm Payrolls Total (SA, Net Monthly Change, thousands)"
##  [8,] "US Manufacturing New Orders Total MoM SA"                                   
##  [9,] "Conference Board US Leading Economic Indicator"                             
## [10,] "US Personal Income YoY SA"                                                  
## [11,] "Federal Resrve Consumer Credit Total Net Change SA"                         
## [12,] "US Nominal Account Balance (Billions USD)"                                  
## [13,] "Federal Reserve Money Supply M2 YoY % Change"

Data Correlation

Through looking into the correlation between each pairs of standardized indices, we can detect that some of indices are quiet closely related to each other. Since we already have 13 indices Therefore, we can use some dimension reduction techniques in statistics like PCA.

## The correlation between EHGDUS Index and IP CHNG Index is 0.44 .

## The correlation between EHGDUS Index and NFP TCH Index is 0.516 .

## The correlation between CPI YOY Index and CPI CHNG Index is 0.719 .

## The correlation between CPI YOY Index and PITL YOY Index is 0.709 .

## The correlation between LEI TOTL Index and CICRTOT Index is 0.662 .

## The correlation between CPI CHNG Index and PITL YOY Index is 0.516 .

## The correlation between PITL YOY Index and USCABAL Index is 0.495 .

## The correlation between IP CHNG Index and NFP TCH Index is 0.681 .

## The correlation between IP CHNG Index and TMNOCHNG Index is 0.501 .

## The correlation between NFP TCH Index and TMNOCHNG Index is 0.428 .

Labelling

df.label = as.data.frame(df)
df.label$label = NULL
recr = function(start, end){
  r = index(window(df, start = as.yearqtr(as.yearmon(start)), end = as.yearqtr(as.yearmon(end))))
  r = as.character(r)
  return(r)
}
#Recession of 1960–61: Another primarily monetary recession occurred after the Federal Reserve began raising interest rates in 1959. The government switched from deficit (or 2.6% in 1959) to surplus (of 0.1% in 1960). When the economy emerged from this short recession, it began the second-longest period of growth in NBER history. The Dow Jones Industrial Average (Dow) finally reached its lowest point on February 20, 1961, about 4 weeks after President Kennedy was inaugurated.
r = recr(start = "Apr 1960", end = "Feb 1961")
df.label[r, "label"] = "recession"
#The relatively mild 1969 recession followed a lengthy expansion. At the end of the expansion, inflation was rising, possibly a result of increased deficits. This relatively mild recession coincided with an attempt to start closing the budget deficits of the Vietnam War (fiscal tightening) and the Federal Reserve raising interest rates (monetary tightening).
r = recr(start = "Dec 1969", end = "Nov 1970")
df.label[r, "label"] = "recession"
#The 1973 oil crisis, a quadrupling of oil prices by OPEC, coupled with the 1973–1974 stock market crash led to a stagflation recession in the United States.
r = recr(start = "Nov 1973", end = "Mar 1975")
df.label[r, "label"] = "recession"
#The NBER considers a very short recession to have occurred in 1980, followed by a short period of growth and then a deep recession. Unemployment remained relatively elevated in between recessions. The recession began as the Federal Reserve, under Paul Volcker, raised interest rates dramatically to fight the inflation of the 1970s. The early 1980s are sometimes referred to as a "double-dip" or "W-shaped" recession.
r = recr(start = "Jan 1980", end = "Jul 1980")
df.label[r, "label"] = "recession"
#The Iranian Revolution sharply increased the price of oil around the world in 1979, causing the 1979 energy crisis. This was caused by the new regime in power in Iran, which exported oil at inconsistent intervals and at a lower volume, forcing prices up. Tight monetary policy in the United States to control inflation led to another recession. The changes were made largely because of inflation carried over from the previous decade because of the 1973 oil crisis and the 1979 energy crisis.
r = recr(start = "Jul 1981", end = "Nov 1982")
df.label[r, "label"] = "recession"
#   After the lengthy peacetime expansion of the 1980s, inflation began to increase and the Federal Reserve responded by raising interest rates from 1986 to 1989. This weakened but did not stop growth, but some combination of the subsequent 1990 oil price shock, the debt accumulation of the 1980s, and growing consumer pessimism combined with the weakened economy to produce a brief recession.
r = recr(start = "Jul 1990", end = "Mar 1991")
df.label[r, "label"] = "recession"
#The 1990s once were the longest period of growth in American history. The collapse of the speculative dot-com bubble, a fall in business outlays and investments, and the September 11th attacks, brought the decade of growth to an end. Despite these major shocks, the recession was brief and shallow.
r = recr(start = "Mar 2001", end = "Nov 2001")
df.label[r, "label"] = "recession"
#The subprime mortgage crisis led to the collapse of the United States housing bubble. Falling housing-related assets contributed to a global financial crisis, even as oil and food prices soared. The crisis led to the failure or collapse of many of the United States' largest financial institutions: Bear Stearns, Fannie Mae, Freddie Mac, Lehman Brothers, and AIG, as well as a crisis in the automobile industry. The government responded with an unprecedented $700 billion bank bailout and $787 billion fiscal stimulus package. The National Bureau of Economic Research declared the end of this recession over a year after the end date.[78] The Dow Jones Industrial Average (Dow) finally reached its lowest point on March 9, 2009.
r = recr(start = "Dec 2007", end = "Jun 2009")
df.label[r, "label"] = "recession"
#The first documented case of COVID-19 emerged in Wuhan, China in November 2019. The government in China first instituted travel restrictions, quarantines and stay-at-home orders. When efforts to contain the virus in China were unsuccessful, other countries instituted similar measures in an attempt to contain and slow the spread of the virus, which prompted many cities to close. The initial outbreak expanded into a global pandemic. The economic effects of the pandemic were severe. More than 24 million people lost jobs in the United States in just three weeks.[81] Official economic impact of the virus is still being determined but the stock market responded negatively to the shock to supply chains, primarily in technology industries.
r = index(window(df, start = as.yearqtr(as.yearmon("Feb 2020"))))
r = as.character(r)
df.label[r, "label"] = "recession"
df.label$label[is.na(df.label$label)]<- "other"
df.label$label = as.factor(df.label$label)

Classification

Principle Component Analaysis (PCA)

Monthly data is too repetitious for PCA, as we aim to looking into different historic economic regimes rather than a specific month.

Variable contributions to PC

The plot below shows the contributions of variables in accounting for the variability to the top 2 principal components, that is, the higher contribution (%) of one economic index in this graph, the greater necessity to including this index into our analysis.

Some highly correlated indices result trivial variables which do not contribute variance to first two principle components.

Graph of individuals

Individuals with a similar profile are grouped together.

Graph of variables

Positive correlated variables point to the same side of the plot. Negative correlated variables point to opposite sides of the graph.

Biplot of both individuals and variables

How about removing those redundant variables?

2020 is extremely unusual, far away from any previous annual index performance. How about removing the data of 2020?

Now, our data become more dispersive.

Alternative way: K-means

K-means clustering (MacQueen 1967) is one of the most commonly used unsupervised machine learning algorithm for partitioning a given data set into a set of k groups (i.e. k clusters), where k represents the number of groups pre-specified by the analyst. It classifies objects in multiple groups (i.e., clusters), such that objects within the same cluster are as similar as possible (i.e., high intra-class similarity), whereas objects from different clusters are as dissimilar as possible (i.e., low inter-class similarity).

###Computing k-means clustering
set.seed(3456)
df2020 = as.data.frame(df.yearly2.2020)
k2 <- kmeans(df2020, centers = 5, nstart = 20, iter.max = 100)
fviz_cluster(k2, data = df2020, palette = "jco", ggtheme = theme_classic())

df2019 = as.data.frame(df.yearly2)
k2 <- kmeans(df2019, centers = 4, nstart = 20, iter.max = 100)
fviz_cluster(k2, data = df2019, palette = "jco", ggtheme = theme_classic())

Now, we repeated the analysis using quarterly data