S520 HW4

Part 1: Different variance in additive error model

Original data: \(Y_{ij}\) is a random variable associated with the PCB concentrations of fish \(j\) with age \(i\) for \(i = 1,\ldots, 12, \quad j = 1,2,3\). Assume that \(Y_{ij}\) follows the true model \[\begin{equation} \label{eq:true_model} Y_{ij} =g(i,\beta) + \sigma_i \epsilon_{ij},\quad g(x,\beta) = \beta_0x^{\beta_1},\quad \sigma _i = 0.2g(i,\beta) ^{1.25} \end{equation}\]

1. Simulate one data set from true model

g <- function (i, beta){
  beta[1] * i ^ beta[2]
}

sig <- function(i, beta, theta ){
  .2 * g(i, beta)^ theta
}

simdf <- function(beta, theta, ngroup , nobs){
  simdf <- rep(sig(c(1:ngroup),beta,theta), each = nobs) * rnorm(ngroup * nobs) + rep(g(c(1:ngroup), beta), each = nobs)
  simdf <- matrix(simdf, nrow = ngroup, ncol = nobs, byrow =TRUE)
  rownames(simdf) <- sapply (c(1:ngroup), function (x) {paste ("Age", x)})
  colnames(simdf) <- sapply (c(1:nobs), function (x) {paste ("Sim", x)})
  simdf
}

beta  <- c(.9, 1.3)
theta <- 1.25
ngroup <- 12
nobs <- 3
df <-simdf(beta, theta, ngroup, nobs)
df
#>             Sim 1      Sim 2      Sim 3
#> Age 1   0.9033467  0.8507356  0.9984951
#> Age 2   1.2923074  1.6851009  2.2794258
#> Age 3   2.7040723  1.4707111  5.3118565
#> Age 4   6.5374094  5.7034016  3.6468092
#> Age 5   9.3370048  6.5639641 10.0828591
#> Age 6  11.6529399 10.7226128 11.3606863
#> Age 7   3.0011910 12.3686324  9.4635935
#> Age 8   9.6740168 22.4326982 11.1432323
#> Age 9  19.4977778 25.4934411 11.9498960
#> Age 10 16.5652002 20.1134084  7.3869584
#> Age 11 30.1968810 20.2685602 22.2894937
#> Age 12 19.0482249  4.4001714 38.7836018
# Construct a scatter plot of concentration against age
matplot(df,type = 'p',pch = 16, col = 1, xlab = "Age i", ylab = "Simulated Yij")

2. We can express the bias of \(\hat\beta_U\) as an expected value as \(bias(\hat \beta_U) = E(\hat \beta_U -\beta)\) . Express the mean squared error of each component of \(\hat\beta_U\) as an expected value.

\[\begin{align} MSE (\hat\beta_{0,U} )& = E(\hat\beta_{0, U} - \beta_0)^2 \\ MSE (\hat\beta_{1,U} )& = E(\hat\beta_{1, U} - \beta_1)^2 \end{align}\]

3. A formula for the Monte Carlo approximation for the bias of \(\hat\beta_{0,U}\) , the estimator of \(\beta_0\) is given by

\[ b_M(\beta_{0,U}) = M^{-1} \sum_{m =1}^M (\hat \beta_{0,U}^{(m)} - \beta_0)\] where m denotes the Monte Carlo sample, and \(\hat \beta_{0,U}^{(m)}\) is the estimator of \(\beta_0\) based on Monte Carlo sample \(m\). Give a formula for the Monte Carlo approximation for the mean squared errors that you defined in 2 above.

Consider there are in toal \(M\) Monte Carlo samples with \(m =1,\ldots, M\) representing each of them. For each of the sample we are able to get an estimate \(\hat\beta_{0,U}^{(m)}\), the MSE can be estimated by \[\begin{align} \widehat {MSE} (\hat\beta_{0,U} )& = \frac{1}{M} \sum_{m=1}^M (\hat\beta_{0, U}^{(m)} - \bar\beta_0)^2 \\ \widehat {MSE} (\hat\beta_{1,U} )& = \frac{1}{M} \sum_{m=1}^M (\hat\beta_{1, U}^{(m)} - \bar\beta_1)^2 \\ \end{align}\] where \(\bar \beta _ i = \frac{1}{M} \sum_{m=1}^M \hat\beta^{(m)}_{i, U}, i = 0,1.\) In our simulation study, the true parameters are known, therefore we plug in the true value \(\bar\beta_i = \beta_i, i = 0,1\).

4. For MC samples of size 50 and 500, report the Monte Carlo approximation for the bias, variance, and mean squared error for each component of \(\hat\beta_U , \hat\beta_G, \hat\beta_T\).

Model 1: Unstructured \(\hat\beta_U\). No structuring for \(\sigma_i^2\), estimate using \[S_i^2 = \frac{1}{2}\sum_{j = 1}^3 (y_{ij} - \bar y_i)^2\] Esimate \(\beta\) using GLS with known weights \(\omega_i = S_i^{-2}, i = 1,\ldots, 12\) by minimizing \[Q(\beta)= \sum_{i = 1}^{12} \sum _{j = 1}^3 \omega_i (y_{ij} - g(i,\beta))^2, \quad \hat \beta_U = \arg\min_\beta Q(\beta)\]

# weighted square error
Qb <- function(df, beta, weight){
  if(length(weight)!= nrow(df)){warning("length of weight does not match the number of groups")}
  else
    Qb <- sum ( t (weight) %*% (df - rep(g(1:nrow(df) , beta) ,ncol(df))) ^2 )
  Qb
}
# In each simulation, estimate beta and store them
M <- 50
beta0.sim <- c();beta1.sim <- c()
for (i in 1:M){
  df <- simdf(beta, theta, ngroup, nobs)
  s <- apply(df, 1, sd)
  weight <- 1/s^2
  beta.m <- optim(c(0,0),function(x) {Qb(df, x,weight)})
  beta0.sim <- c(beta0.sim, beta.m$par[1])
  beta1.sim <- c(beta1.sim, beta.m$par[2])
}

# calculte bias, variance, mse
bias <- c(mean (beta0.sim - beta[1]), mean (beta1.sim - beta[2]))
var <- c(var(beta0.sim),var(beta1.sim))
mse <- c(mean((beta0.sim - beta[1])^2) ,mean((beta1.sim - beta[1])^2) )

summary.U.50 <- round(data.frame("Bias" = bias, "Variance" = var, "MSE" = mse),4)
rownames(summary.U.50) <- c("beta(0,U)","beta(1,U)")
summary.U.50
#>              Bias Variance    MSE
#> beta(0,U) -0.0284   0.0146 0.0151
#> beta(1,U)  0.0290   0.0088 0.1927

#>              Bias Variance    MSE
#> beta(0,U)  0.0126   0.0185 0.0187
#> beta(1,U) -0.0053   0.0107 0.0107

Model 2: The true model \(\hat\beta_T\). Assume the structure of \(\sigma_i\) to be consistent with that of the true model. \[\sigma_ i = cg (i,\beta) ^{1.25}\] Estimate of \(\beta\) for this model is denoted as \(\hat\beta _T\).

Note the true model setting is equivalent to \[Y_{ij} = g_1(x_{ij},\beta) +\sigma g_2(x_{ij}, \beta, \theta)\epsilon_{ij}\] where \[g_1(x_{ij}, \beta) = \beta_0 i^{\beta_1}, \quad g_2(x_{ij}, \beta, \theta) = g_1(x_{ij}, \beta)^\theta\] \[\quad \theta = 1.25, \quad \sigma = 0.2 \] Objective is to estimate \(\beta_0, \beta_1\) in the model.

To estimate an initial pair of \(\beta\), note that take logarithm on the both sides of \(g_1\) gives us \(\log(g_1(x,\beta)) = \log(\beta_0) + \beta_1 \log(x)\). Also note \(E(Y_{ij}) = g_1(i, \beta)\), we can perform a OLS of \(\log(E(Y_{ij}))\) against \(\log(i)\)


# specify g1, g2 and v
g1fun <- function(par,  X){
    par[1] *X^ par[2]
}

g2fun <- function(par,  X, theta){
    g1fun(par,  X )^theta
}

derfun <- function(par,   X){
    cbind( X^ par[2], par[1] * X^par[2]*log(X) )
}

vhatmufun <- function(par, derfun, X, vcov){
    diag(derfun(par, X)%*%vcov%*%t(derfun(par, X)))
}

# monte carlo and parameter estimation
beta  <- c(.9, 1.3)
theta <- 1.25
ngroup <- 12
nobs <- 3
M <- 50
beta0.sim <- c();beta1.sim <- c()
for (i in 1:M){
  df <- simdf(beta, theta, ngroup, nobs)
  # transform the original matrix form of simulation in to two vectors
  Y <- as.vector(df)
  X <- rep(1:ngroup, 3)  
  ## parameter initialization
  ## NOTE: this ols can give rise to NaN due to some negative Y generated, therefore there is a truncation applied
  res.ols <- lm(log(sapply(Y, max, .001))~log(X)); res.ols$coefficients
  alphainit <-exp(summary(res.ols)$coeff[1])
  betainit <- summary(res.ols)$coeff[2]
  # parameter estimation
  glsoutput <- glsiter(c(alphainit, betainit), g1fun, g2fun, derfun, theta = 1.25, Y, X, 3)
  beta0.sim <- c(beta0.sim, glsoutput$par[1])
  beta1.sim <- c(beta1.sim, glsoutput$par[2])
}

bias <- c(mean (beta0.sim - beta[1]), mean (beta1.sim - beta[2]))
var <- c(var(beta0.sim),var(beta1.sim))
mse <- c(mean((beta0.sim - beta[1])^2) ,mean((beta1.sim - beta[2])^2) )

summary.T.50 <- round(data.frame("Bias" = bias, "Variance" = var, "MSE" = mse),4)
rownames(summary.T.50) <- c("beta(0,T)", "beta(1,T)")
summary.T.50
#>              Bias Variance    MSE
#> beta(0,T)  0.0109   0.0053 0.0053
#> beta(1,T) -0.0046   0.0032 0.0031

Model 3: Gamma model \(\hat\beta_G\). Assume \(\sigma_ i = cg(i,\beta) \iff \theta = 1\). Estimate of \(\beta\) for this model is denoted as \(\hat\beta _G\).


# specify g1, g2 and v
g1fun <- function(par,  X){
    par[1] *X^ par[2]
}

g2fun <- function(par,  X, theta){
    g1fun(par,  X )^theta
}

derfun <- function(par,   X){
    cbind( X^ par[2], par[1] * X^par[2]*log(X) )
}

vhatmufun <- function(par, derfun, X, vcov){
    diag(derfun(par, X)%*%vcov%*%t(derfun(par, X)))
}

# monte carlo and parameter estimation
beta  <- c(.9, 1.3)
theta <- 1.25
ngroup <- 12
nobs <- 3
M <- 50
beta0.sim <- c();beta1.sim <- c()
for (i in 1:M){
  df <- simdf(beta, theta, ngroup, nobs)
  # transform the original matrix form of simulation in to two vectors
  Y <- as.vector(df)
  X <- rep(1:ngroup, 3)  
  # parameter initialization
  res.ols <- lm(log(sapply(Y, max, .001))~log(X)); res.ols$coefficients
  alphainit <-exp(summary(res.ols)$coeff[1])
  betainit <- summary(res.ols)$coeff[2]
  # parameter estimation
  glsoutput <- glsiter(c(alphainit, betainit), g1fun, g2fun, derfun, theta = 1, Y, X, 3)
  beta0.sim <- c(beta0.sim, glsoutput$par[1])
  beta1.sim <- c(beta1.sim, glsoutput$par[2])
}

bias <- c(mean (beta0.sim - beta[1]), mean (beta1.sim - beta[2]))
var <- c(var(beta0.sim),var(beta1.sim))
mse <- c(mean((beta0.sim - beta[1])^2) ,mean((beta1.sim - beta[2])^2) )

summary.G.50 <- round(data.frame("Bias" = bias, "Variance" = var, "MSE" = mse),4)
rownames(summary.G.50) <- c("beta(0,G)","beta(1,G)")
summary.G.50
#>              Bias Variance    MSE
#> beta(0,G)  0.0065   0.0071 0.0070
#> beta(1,G) -0.0005   0.0038 0.0038

#>              Bias Variance    MSE
#> beta(0,G)  0.0023   0.0080 0.0079
#> beta(1,G) -0.0002   0.0048 0.0048

In summary, when \(M = 50\),

#>              Bias Variance    MSE
#> beta(0,U) -0.0284   0.0146 0.0151
#> beta(1,U)  0.0290   0.0088 0.1927
#> beta(0,G)  0.0065   0.0071 0.0070
#> beta(1,G) -0.0005   0.0038 0.0038
#> beta(0,T)  0.0109   0.0053 0.0053
#> beta(1,T) -0.0046   0.0032 0.0031

Similarly, we can also perform monte carlo when \(M = 500\).

#>              Bias Variance    MSE
#> beta(0,U)  0.0126   0.0185 0.0187
#> beta(1,U) -0.0053   0.0107 0.0107
#> beta(0,G)  0.0023   0.0080 0.0079
#> beta(1,G) -0.0002   0.0048 0.0048
#> beta(0,T)  0.0003   0.0074 0.0074
#> beta(1,T)  0.0027   0.0041 0.0041

5. Explain how your Monte Carlo output relates to the statement in italics at the beginning of this assignment.

For the ease of computation, Monte Carlo processes of size 50 and 500 are applied here. Note the MSE of \(\hat\beta_G\) and \(\hat\beta_T\) are consistenly smaller than that of \(\hat\beta_U\). Even \(\hat\beta_G\) does not perfectly reflect the true model, the modelling of \(g_2\) as a function of \(g_1\) improves the efficiency in estimation significantly.

Part 2: Bootstrap

Let \(Y_i\) denote the \(i\)-th observation of OXY, and let \(X_i\) denote the \(i\)-th observation of body mass. Since the interest of us is to relate OXY concentration to body mass. We propose the conditional distribution of \(Y_i\) given \(\theta_i\) follows a exponential dispersion family as follwing

\[ f(y_i | \theta_i,\phi) = \exp(\phi_i(y_i \theta_i - b(\theta_i)) + c(y_i, \phi)), \quad y_u \in \Omega_y \]

Recall the moments of \(Y_i\) as following \[\begin{align} E(Y_i|\theta_i) & = b'(\theta_i) = \mu_i\\ V(Y_i|\theta_i) & =\frac{1}{\phi} b''(\theta_i) = \frac{1}{\phi}V(\mu_i) \end{align}\]

Since we apply log link, we have \(\mu_i \equiv \exp(x_i \beta) = \exp(\beta_0+ \beta_1 x_i)\). Also we assume a gamma random component, \[ \begin{aligned} f\left(y_{i} | \theta_{i}, \phi\right) &=\exp \left[\phi\left\{y \theta_{i}-b\left(\theta_{i}\right)\right\}+c\left(y_{i}, \phi\right)\right] \\ \phi &=\alpha, \quad \theta_{i}=-\frac{\beta_{i}}{\alpha} \\ b\left(\theta_{i}\right) &=\log \left(-\frac{1}{\theta_{i}}\right) \\ c(y, \phi) &=-\log (\Gamma(\phi))+(\phi-1) \log \left(y_{i}\right)+\phi \log (\phi)\\ b^{\prime}\left(\theta_{i}\right) &=-\frac{1}{\theta_{i}} \\ \implies \theta_i& = -\frac{1}{b'(\theta)} = -\frac{1}{\exp(\beta_0+ \beta_1 x_i)} \end{aligned} \]

Note that the objective is to estimate \(P(Y> 6| X = 1700)\), which can be written into the following \[\begin{align} q : = P(Y> 6| x = 1700) & = \int_6^\infty f(y|x = 1700, \theta, \phi) dy\\ & = 1 - \int_{-\infty}^6 f(y|x = 1700, \theta, \phi) dy \end{align} \]

Algorithm:

1. Fit GLM using the original data

2. Simulate from the fitted GLM

3. Estimate shape and scale parameters \(\hat\alpha^{(\beta)}\) and \(\hat\phi^{(\beta)}\), using which to calculate the objective quantile \(\hat q ^{(m)}\)

4. Repeat 2-3 for M times

# Fit GLM use function `gammafit'
library(MASS) # in order to call boxcox
df <- read.csv("gullsdata2019.csv")
X <- df$bm
Y <- df$oxy
gammafit <- glm(Y~X, family = Gamma(link = "log"))
summary(gammafit)
#> 
#> Call:
#> glm(formula = Y ~ X, family = Gamma(link = "log"))
#> 
#> Deviance Residuals: 
#>     Min       1Q   Median       3Q      Max  
#> -1.7413  -0.7697  -0.2940   0.2609   2.3873  
#> 
#> Coefficients:
#>               Estimate Std. Error t value Pr(>|t|)    
#> (Intercept) -0.9483048  0.7724320  -1.228    0.222    
#> X            0.0024149  0.0004656   5.187 1.02e-06 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for Gamma family taken to be 0.8545051)
#> 
#>     Null deviance: 96.639  on 108  degrees of freedom
#> Residual deviance: 76.478  on 107  degrees of freedom
#> AIC: 873.71
#> 
#> Number of Fisher Scoring iterations: 6


B <- 500
b <- 0
qhatstarBs <- c()


repeat{

b <- b + 1

Ystar <-  simulate(gammafit)$sim_1
gammafitstar <- glm(Ystar~X, family = Gamma(link = "log"))
shapestar <- 1/summary(gammafitstar)$dispersion
mustar <- exp(gammafitstar$coef[1] + gammafitstar$coef[2]*1700)
scalestar <- mustar/shapestar
qhatstarBs <- c(qhatstarBs,  1 - pgamma(6, shape = shapestar, scale = scalestar))

if(b == B){break}

}

# Percentile Bootstrap confidence interval:
c(quantile(qhatstarBs, prob = 0.025), quantile(qhatstarBs, prob = 0.975))
#>      2.5%     97.5% 
#> 0.7956354 0.9229331

Part 3: Additive Error Model

1.

First, consider an additive error model of the form \[ Y_{i}=\mu\left(x_{i}, \boldsymbol{\beta}\right)+\sigma \mu\left(x_{i}, \boldsymbol{\beta}\right)^{\theta} \epsilon_{i} \] where \(\epsilon_{i} \stackrel{\text { iid }}{\sim}(0,1)\) for \(i=1, \ldots, n, \boldsymbol{\beta}=\left(\beta_{0}, \beta_{1}\right)\) \[\mu\left(x_{i}, \boldsymbol{\beta}\right)^{\theta}=\frac{1}{\beta_{0}+\beta_{1} x_{i}}\]

In this exercise, \(\theta =1\).

Report an initial estimate of \(\beta\) obtained from the OLS regression of \(1/Yi\) on \(x_i\). You do not need to report a standard error.

# read data
df <- read.csv("salmon.csv")
s <- df$spawners; X<- 1/s
R <- df$recruits; Y<- R
res.ols <- lm((1/Y)~X); res.ols$coefficients
#>  (Intercept)            X 
#> 0.0005278397 0.2080778421
alphainit <-summary(res.ols)$coeff[1]
betainit <- summary(res.ols)$coeff[2]

The estimate of intercept is 510^{-4}, the estimate of slope is 0.2081.

Report a GLS estimate of β along with corresponding standard errors. We can rewrite the additive error model into the following form \[Y_i = g_1(x_i,\beta) +\sigma g_2(x_i, \beta, \theta)\epsilon_{i}\] where \[g_1(x_i,\beta) = \mu(x_i, \beta) = \frac{1}{\beta_0+\beta_1 x_i},\] \[g_2(x_i, \beta,\theta ) = \mu(x_i, \beta) ^\theta = \frac{1}{\beta_0+\beta_1 x_i}.\] To minimize the Squared error \(Q(\beta) = (Y - g(X, \beta))' A(Y- g(X, \beta))\), calculate the first derivatives of \(g_1\) with respect to \(\beta\): \[v = [v_1, \ldots, v_n]' = [\partial g_1(x_i,\beta) /\partial \beta, \ldots, \partial g_1(x_n,\beta) /\partial \beta]' \] And we let \(v_i^{(j)}\) denote \(v\) valued at the j-th estimate of \(\beta\), i.e. \(\beta^{(j)}\). In our case, \[\frac{\partial g_1(x_i,\beta) }{\partial \beta_0} = -\frac{1}{(\beta_0+ \beta_1 x_i)^2} \] \[\frac{\partial g_1(x_i,\beta) }{\partial \beta_1} = -\frac{x_i}{(\beta_0+ \beta_1 x_i)^2} \]

g1fun <- function(par,  X){
    1/(par[1] + par[2]* X)
}

g2fun <- function(par,  X, theta){
    g1fun(par,  X )^theta
}

derfun <- function(par,   X){
    cbind( -1/((par[1] + par[2]* X)^2), -X/((par[1] + par[2]* X)^2) )
}

vhatmufun <- function(par, derfun, X, vcov){
    diag(derfun(par, X)%*%vcov%*%t(derfun(par, X)))
}

glsoutput <- glsiter(c(alphainit, betainit), g1fun, g2fun, derfun, theta = 1, Y, X, 3)
sehatmu <- sqrt(vhatmufun(glsoutput$par, derfun, X, glsoutput$Vhat))

plot(X, Y)
lines(X[order(X)], g1fun(glsoutput$par, X)[order(X)])
lines(X[order(X)], g1fun(glsoutput$par, X)[order(X)] + 1.96*sehatmu[order(X)], lty = 2)
lines(X[order(X)], g1fun(glsoutput$par, X)[order(X)] - 1.96*sehatmu[order(X)], lty = 2)


glsoutput
#> $par
#>              [,1]
#> [1,] 0.0003174447
#> [2,] 0.2422520821
#> 
#> $sig2hat
#> [1] 0.1548461
#> 
#> $Vhat
#>               [,1]          [,2]
#> [1,]  1.563763e-08 -7.428111e-06
#> [2,] -7.428111e-06  4.448430e-03

Construct a 95% confidence interval for the parameter representing the maximum carrying capacity of the fishery.

The parameter \(h(\beta_0) = 1/\beta _0\) has an interpretation as the maximum carrying capacity of the fishery. Apply delta method, we have 95% confidence interval for \(h(\beta_0)\) is \[\begin{align} &h(\hat\beta_0) \pm 1.96 \sqrt{Var(\hat\beta_0) \left(\left.\frac{\partial h(\beta_0)}{\partial \beta_0} \right|_{\beta_0 =\hat\beta_0}\right)^2}\\ & =h(\hat\beta_0) \pm 1.96 \sqrt{Var(\hat\beta_0) \left(\frac{-1}{\hat\beta_0^2}\right)^2}\\ & = \frac{1}{ 0.0003174447} \pm 1.96 \times \sqrt{1.563763*10^{-08} * \frac{1}{0.0003174447^4}}\\ & =( 717.9209,5582.389) \end{align}\]

2.

Fit a generalized linear model to these data using a gamma random component and an inverse link function. Compare the estimate of β obtained from the gamma GLM to the GLS estimate of β obtained in (b) above.

# read data
df <- read.csv("salmon.csv")
s <- df$spawners; X<- 1/s
R <- df$recruits; Y<- R
gammafit <- glm(Y~X, family = Gamma(link = "inverse"))
summary(gammafit)$coef
#>                 Estimate   Std. Error  t value    Pr(>|t|)
#> (Intercept) 0.0003174622 0.0001250536 2.538610 0.017732154
#> X           0.2422474710 0.0666976881 3.632022 0.001266152

3.

State one reason that GLM might be preferred for this study. State one reason that an additive error model might be preferred for this study. Which of the two models would you recommend for this study? (This question has multiple correct answers.)

Compare the Variances of estimators of these two methods

# Additive Error Model
sd.1 <- sqrt(diag(glsoutput$Vhat))
mu.1 <- as.vector(glsoutput$par)
#GLM
sd.2 <- summary(gammafit)$coeff[,2]
mu.2 <- as.vector(summary(gammafit)$coeff[,1])

rbind(mu.1, mu.2)
#>              [,1]      [,2]
#> mu.1 0.0003174447 0.2422521
#> mu.2 0.0003174622 0.2422475
rbind(sd.1, sd.2)
#>       (Intercept)          X
#> sd.1 0.0001250505 0.06669655
#> sd.2 0.0001250536 0.06669769

Note the mean and variances of the estimators are very close. Trace back to the estimation process, additive error model requires the calculation of the derivative functions of \(g_1\) wrt \(\beta\), which can go complex when \(g_1\) is not in simple form. In constrast, GLM restricted the distribution in exponential family with much easier application due to nice properties of the family. However when it comes to estimate the parameters with verified theoretical preparation in \(g_1\) and \(g_2\), additive error model may be preferred.