Mixture models

Introduction
Mixture model
- Mixtools package
Mixture model example
- Exercise

Introduction

It is common to speak about bull and bear markets or periods of economic growth and recession. A mixture model will break a distribution into two or more component distributions to get deeper understanding of underlying regimes. If it is assumed that there are two distributions with particular parameters, it is possible to calculate the probability that each element (daily return or quarterly increase in GDP) is in either distribution. It is then possible to calculate how well the data fit the model. Then adjust the parameters of the model for the best fit.

Mixture model

The key things to determine are the number of regimes and the (maybe) the starting point for parameters. A normal distribution is the most usual distribution to use. In that case mean and standard deviation are the parameters to be adjusted. The mixtools package can be used to estimate a mixture model.

Mixtools package

Mixtools

Mixtools on Cran

This package will use maximum likelihood to estimate the parameters of the mixture model as the most likely given the data. For some problems (such as ordinary least squares) there is an analytical solution that can be used to find the most likely coefficients; for many other problems the maximum likelihood can only be found by numerical methods. That means iterating through possibilities to find the most likely outcome.

There is more mathematical detail about maximum likelihood here:

Stackquest Maximum Likelihood

Mixture model example

This example uses a set of data that measures eruptions of the Old Faithful geyser in Yellowstone National Park.

library(mixtools)
data(faithful)
hist(faithful$waiting, main = "Time between Old Faithful erruptions", xlab = "minutes", col = 'lightblue')

Old Faithful

Now use the normalmixEM function to estimate the parameters of a distribution. The key arguments are lambda which is the starting point for estimating share or proportion, mu which would be the mean and sigma as the estimated standard deviation.

wait <- normalmixEM(faithful$waiting, lambda = 0.5, mu = c(55, 80), sigma = 5)

## number of iterations= 9

wait[c('lambda', 'mu', 'sigma')]

## $lambda
## [1] 0.3608498 0.6391502
## 
## $mu
## [1] 54.61364 80.09031
## 
## $sigma
## [1] 5.869089 5.869089

We can look at the estimated parameters of lambda, mu and sigma and plot the normal distribution relative to the actual data. To do this we use the density function and input the estimated values of the two normal distributions.

hist(faithful$waiting, probability = TRUE, main = "Time between Old Faithful erruptions", xlab = "Time")
lines(density(rnorm(100, wait$mu, wait$sigma)), col = 'blue', 
      lwd = 2)
legend('topleft', inset = 0.008, legend = 'Density estimate', 
       col = 'blue', lwd = 2, cex = 0.8)

Exercise

Try to repeat this exercise for the returns on the S&P 500 or the annual or monthly increase in GDP. You can get the data from Yahoo finance or from the St. Louis Fed Database. You can use a normal distribution and try to extract mean and standard deviations for a two-regime model.
- What are the parameters of the bull and bear markets or what are the parameters of boom and recession?
- What proportion of time does the market spend in bull and bear? What proportion in boom and recession?
- When do the two periods take place?