We frequently look at assets or economic series that have different regimes or members. If we model the asset or series as a single entity, we are missing some of the important underlying information. For example, we might want to model GDP growth or the performance of the stock market as having two regimes: boom and recession. To understand this we could use a mixture model.
This is a mixture of distributions for different regimes. In our case, there is one distribution for the boom and one distribution for the recession. If we assume that each of these can be approximated by a normal distribution we would describe the two regimes with two means and two standard deviations.
For the return on the S & P 500 and for the rate of GDP growth, decide whether mean or standard deviation are higher in boom or recession. Why?
There is a package called mixtools that will fit mixture models in R. You can find details of the package here:
This will use maximum likelihood to estimate the parameters of the mixture model as the most likely given the data. For some problems (such as ordinary least squares) there is an analytical solution that can be used to find the most likely coefficients; for many other problems the maximum likelihood can only be found by numerical methods.
There is more mathematical detail about maximum likelihood here:
library(mixtools)
data(faithful)
hist(faithful$waiting, main = "Time between Old Faithful erruptions", xlab = "minutes")
Now use the normalmixEM function to estimate the
parameters of a distribution. The key arguments are lambda
which is the starting point for estimating share or proportion,
mu which would be the mean and sigma as the
estimated standard deviation.
wait <- normalmixEM(faithful$waiting, lambda = 0.5, mu = c(55, 80), sigma = 5)
## number of iterations= 9
wait[c('lambda', 'mu', 'sigma')]
## $lambda
## [1] 0.3608498 0.6391502
##
## $mu
## [1] 54.61364 80.09031
##
## $sigma
## [1] 5.869089 5.869089
We can look at the estimated parameters of lambda, mu and sigma and
plot the normal distribution relative to the actual data. To do this we
use the density function and input the estimated values of
the two normal distributions.
hist(faithful$waiting, probability = TRUE, main = "Time between Old Faithful erruptions", xlab = "Time")
lines(density(rnorm(100, wait$mu, wait$sigma)), col = 'red')