CDA Lab 2

Question 3.13 a: Using x=weight and Y=number of satellites, fit the Poisson loglinear model. Report the prediction equation.

crab_data <- readRDS("crab.rds")
poisson_model <- glm(satell ~ weight, 
                     data = crab_data, family = poisson(link = "log"))
summary(poisson_model)

## 
## Call:
## glm(formula = satell ~ weight, family = poisson(link = "log"), 
##     data = crab_data)
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -0.42841    0.17893  -2.394   0.0167 *  
## weight       0.58930    0.06502   9.064   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 632.79  on 172  degrees of freedom
## Residual deviance: 560.87  on 171  degrees of freedom
## AIC: 920.16
## 
## Number of Fisher Scoring iterations: 5

The prediction equation is log(mu)=-0.43+0.59x

Question 3.13 b: Estimate the mean of Y for female crabs of average weight 2.44 kg.

beta0=-0.43
beta1=0.59
x=2.44
mu=exp(beta0+beta1*x)
mu

## [1] 2.744503

The mean of Y for female crabs of average weight 2.44kg is 2.74 satellites.

Question 3.13 c: Use β hat to describe the weight effect. Construct a 95% confidence interval for β and for the multiplicative effect of 1 kg increase.

multiplicative_effect=exp(beta1)
multiplicative_effect

## [1] 1.803988

WaldCI=confint.default(poisson_model, level = .95)
WaldCI

##                  2.5 %    97.5 %
## (Intercept) -0.7791047 -0.077706
## weight       0.4618742  0.716734

LRCI=confint(poisson_model, level = .95)

## Waiting for profiling to be done...

LRCI

##                  2.5 %      97.5 %
## (Intercept) -0.7771762 -0.07591025
## weight       0.4597002  0.71449835

WaldCIMultEff=exp(WaldCI)
WaldCIMultEff

##                 2.5 %    97.5 %
## (Intercept) 0.4588166 0.9252364
## weight      1.5870457 2.0477345

LRCIMultEff=exp(LRCI)
LRCIMultEff

##                 2.5 %    97.5 %
## (Intercept) 0.4597023 0.9268994
## weight      1.5835992 2.0431615

The estimate for weight is on a log scale. For every 1kg increase in weight, Y increases by 1.80.The 95% CI for beta is [0.46, 0.72] using the Wald test and [0.46, 0.71] using the LR test. The 95% CI for the multiplicative effect of beta is[1.59, 2.05] for the Wald test and [1.58,2.04] for the LR test.

Question 3.13 d: Conduct a Wald test of the hypothesis that the mean of Y is independent of weight. Interpret.

coefs <- summary(poisson_model)$coefficients
beta_hat <- coefs["weight", "Estimate"]
beta_hat

## [1] 0.5893041

beta_se  <- coefs["weight", "Std. Error"]
beta_se

## [1] 0.06501645

z <- beta_hat / beta_se
z

## [1] 9.063923

p_val <- 2 * (1 - pnorm(abs(z)))
p_val

## [1] 0

The Wald test provides evidence to reject the null hypothesis that the mean of Y is independent of weight Z=9.06, p<.001. This indicates that the number of satellites increases significantly with crab weight.

Question 3.13 e: Conduct a likelihood-ratio test about the weight effect. Interpret.

model_M0 <- glm(satell ~ 1, 
                     data = crab_data, family = poisson(link = "log"))
summary(model_M0)

## 
## Call:
## glm(formula = satell ~ 1, family = poisson(link = "log"), data = crab_data)
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   1.0713     0.0445   24.07   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 632.79  on 172  degrees of freedom
## Residual deviance: 632.79  on 172  degrees of freedom
## AIC: 990.09
## 
## Number of Fisher Scoring iterations: 5

# Deviance comparison
deviance_M1 <- deviance(poisson_model)
deviance_M1

## [1] 560.8664

deviance_M0 <- deviance(model_M0)
deviance_M0

## [1] 632.7917

lrt_stat <- deviance_M0 - deviance_M1
lrt_stat

## [1] 71.92524

df_diff <- df.residual(model_M0) - df.residual(poisson_model)
df_diff

## [1] 1

lrt_p_value <- 1 - pchisq(lrt_stat, df_diff)
lrt_p_value

## [1] 0

#ANOVA
car::Anova(poisson_model)

## Analysis of Deviance Table (Type II tests)
## 
## Response: satell
##        LR Chisq Df Pr(>Chisq)    
## weight   71.925  1  < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

There is evidence that the more complex model including weight fits the data better than the intercept alone (ΔD=71.93, df=1, p<.001)

Question 3.14: Refer to the previous exercise. Allow overdispersion by fitting the negative binomial loglinear model. Question 3.14 a: Report the prediction equation and the estimate of the dispersion parameter and its SE. Is there evidence that this model gives a better fit than the Poisson model?

neg_binom_model <- MASS::glm.nb(satell ~ weight, data = crab_data)
summary(neg_binom_model)

## 
## Call:
## MASS::glm.nb(formula = satell ~ weight, data = crab_data, init.theta = 0.9310592338, 
##     link = log)
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -0.8647     0.4048  -2.136   0.0327 *  
## weight        0.7603     0.1578   4.817 1.45e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(0.9311) family taken to be 1)
## 
##     Null deviance: 216.43  on 172  degrees of freedom
## Residual deviance: 196.16  on 171  degrees of freedom
## AIC: 754.64
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  0.931 
##           Std. Err.:  0.168 
## 
##  2 x log-likelihood:  -748.644

The prediction equation is log(mu)=-0.86+0.76x. The dispersion parameter is 0.931 with a standard error or 0.168. While the AIC has been reduced in the negative binomial model,there is not strong evidence that this model gives a better fit than the Poisson model since theta is close to 1.

Question 3.14 b: Construct a 95% confidence interval for β. Compare it with the on in (c) in the previous exercise. Interpret, and explain why the interval is wider with the negative binomial model.

nbWaldCI=confint.default(neg_binom_model, level = .95)
nbWaldCI

##                  2.5 %      97.5 %
## (Intercept) -1.6580014 -0.07131248
## weight       0.4509605  1.06959694

nbLRCI=confint(neg_binom_model, level = .95)

## Waiting for profiling to be done...

nbLRCI

##                  2.5 %      97.5 %
## (Intercept) -1.7542558 0.007184686
## weight       0.4207032 1.113666186

The 95% CI for weight in the negative binomial model is [.45, 1.07] for the Wald test and [.42, 1.11] for the LR test. The intervals are wider than those for the Poisson model because the negative binomial model factors in greater variability in data and the Poisson model does not.

CDA Lab 2

Yufan Ouyang, Mariana Sosa, Maria Velasco

2025-10-07