It is common to speak about bull and bear markets or periods of economic growth and recession. A mixture model will break a distribution into two or more component distributions to get deeper understanding of underlying regimes. If it is assumed that there are two distributions with particular parameters, it is possible to calculate the probability that each element (daily return or quarterly increase in GDP) is in either distribution. It is then possible to calculate how well the data fit the model. Then adjust the parameters of the model for the best fit.
The key things to determine are the number of regimes and the (maybe)
the starting point for parameters. A normal distribution is the most
usual distribution to use. In that case mean and standard deviation are
the parameters to be adjusted. The mixtools package can be
used to estimate a mixture model.
This package will use maximum likelihood to estimate the parameters of the mixture model as the most likely given the data. For some problems (such as ordinary least squares) there is an analytical solution that can be used to find the most likely coefficients; for many other problems the maximum likelihood can only be found by numerical methods. That means iterating through possibilities to find the most likely outcome.
There is more mathematical detail about maximum likelihood here:
This example uses a set of data that measures eruptions of the Old Faithful geyser in Yellowstone National Park.
library(mixtools)
data(faithful)
hist(faithful$waiting, main = "Time between Old Faithful erruptions", xlab = "minutes", col = 'lightblue')
Now use the normalmixEM function to estimate the
parameters of a distribution. The key arguments are lambda
which is the starting point for estimating share or proportion,
mu which would be the mean and sigma as the
estimated standard deviation.
wait <- normalmixEM(faithful$waiting, lambda = 0.5, mu = c(55, 80), sigma = 5)
## number of iterations= 9
wait[c('lambda', 'mu', 'sigma')]
## $lambda
## [1] 0.3608498 0.6391502
##
## $mu
## [1] 54.61364 80.09031
##
## $sigma
## [1] 5.869089 5.869089
We can look at the estimated parameters of lambda, mu and sigma and
plot the normal distribution relative to the actual data. To do this we
use the density function and input the estimated values of
the two normal distributions.
hist(faithful$waiting, probability = TRUE, main = "Time between Old Faithful erruptions", xlab = "Time")
lines(density(rnorm(100, wait$mu, wait$sigma)), col = 'blue',
lwd = 2)
legend('topleft', inset = 0.008, legend = 'Density estimate',
col = 'blue', lwd = 2, cex = 0.8)
Try to repeat this exercise for the returns on the S&P 500 or the annual or monthly increase in GDP. You can get the data from Yahoo finance or from the St. Louis Fed Database. You can use a normal distribution and try to extract mean and standard deviations for a two-regime model.