The Petal.Lengths (the third variable) in the iris dataset exhibit a bimodal pattern. We are going to fit them using a mixture of two normal distributions. For simplicity, we only study the first 100 rows in the dataset, which correspond to setosa and versicolor (50 for each). After loading iris in R, you can select the data by executing S = iris$Petal.Length[1:100] or S = iris[1:100, 3]. Now S becomes the dataset under investigation. Suppose that the entries of S are random samples \(\{X_i\}^n_{i=1}\) from the mixture distribution in Problem 4 above. The goal is to estimate the parameters \(μ_1\), \(μ_2\), \(σ_1\) and \(σ_2\) by the Method of Moments (MoM).
# Cleaning data
data(iris)
n = 100
S = iris$Petal.Length[1:100]
S
## [1] 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 1.5 1.6 1.4 1.1 1.2 1.5 1.3
## [18] 1.4 1.7 1.5 1.7 1.5 1.0 1.7 1.9 1.6 1.6 1.5 1.4 1.6 1.6 1.5 1.5 1.4
## [35] 1.5 1.2 1.3 1.4 1.3 1.5 1.3 1.3 1.3 1.6 1.9 1.4 1.6 1.4 1.5 1.4 4.7
## [52] 4.5 4.9 4.0 4.6 4.5 4.7 3.3 4.6 3.9 3.5 4.2 4.0 4.7 3.6 4.4 4.5 4.1
## [69] 4.5 3.9 4.8 4.0 4.9 4.7 4.3 4.4 4.8 5.0 4.5 3.5 3.8 3.7 3.9 5.1 4.5
## [86] 4.5 4.7 4.4 4.1 4.0 4.4 4.6 4.0 3.3 4.2 4.2 4.2 4.3 3.0 4.1
M whose \(k\)-th entry is the \(k\)-th sample moment.find_k_moment <- function(k,x){
mean(x^k)
}
M = sapply(c(1,2,3,4), find_k_moment, S)
M
## [1] 2.86100 10.26550 41.63513 178.49022
momentsmixture that computes \(\mathbb{E} X^k (k = 1, 2, 3, 4)\) for a random variable \(X\) from the mixture distribution in Problem 4, with parameters mu1, mu2, sigma1 and sigma2. The output of momentsmixture(mu1, mu2, sigma1, sigma2) should be a vector of length 4. Then, define an R function differences that satisfies the followings:theta;momentsmixture(theta[1], theta[2], theta[3], theta[4]) - M,where M is defined in Part 1.momentmixture <- function(mu1, mu2, sigma1, sigma2){
c(1/2*(mu1+mu2),
1/2*(mu1^2 + mu2^2 + sigma1^2 + sigma2^2),
1/2*(3*sigma1^2*mu1+mu1^3+
3*sigma2^2*mu2+mu2^3),
1/2*(3*sigma1^4+6*sigma1^2*mu1^2+mu1^4+
3*sigma2^4+6*sigma2^2*mu2^2+mu2^4))
}
differences <- function(theta){
momentmixture(theta[1],theta[2],theta[3],theta[4]) - M
}
differences(theta) = 0. To that end, we will apply the function multiroot in the R package rootSolve to find an approximate solution. This function requires an initial guess for the solutions. Use the 1st and 3rd quartilesmu10 = quantile(S, 0.25, names = FALSE) mu20 = quantile(S, 0.75, names = FALSE)
for \(\mu_1\) and \(\mu_2\); use 1 as the initial guesses for \(\sigma_1\) and \(\sigma_2\). That is, the initial guess for theta is c(mu10, mu20, 1, 1). Report the estimated parameters.
mu10 = quantile(S, 0.25, names = FALSE)
mu20 = quantile(S, 0.75, names = FALSE)
init = c(mu10, mu20, 1, 1)
library(rootSolve)
res = multiroot(differences, init)
res
## $root
## [1] 1.4545495 4.2674505 0.1268971 0.4336460
##
## $f.root
## [1] 0.000000e+00 2.273737e-13 1.016076e-12 3.780087e-12
##
## $iter
## [1] 8
##
## $estim.precis
## [1] 1.255884e-12
mu1_h = res$root[1]
mu2_h = res$root[2]
sigma1_h = res$root[3]
sigma2_h = res$root[4]
mu1_h
## [1] 1.454549
mu2_h
## [1] 4.267451
sigma1_h
## [1] 0.1268971
sigma2_h
## [1] 0.433646
hist(S, freq = FALSE, breaks = 20) to get a histogram of S showing relative fre- quencies. On the same figure, plot the PDF of the mixture distribution with estimated parameters. How is the fit?hist(S, freq = FALSE, breaks = 20,
xlim = c(0, 6), ylim = c(0, 2))
x = seq(0, 6, length = 100)
y1 = sapply(x, dnorm, mu1_h, sigma1_h)
y2 = sapply(x, dnorm, mu2_h, sigma2_h)
lines(x,(y1+y2)/2,col = 'blue')
# The fit is fairly good.