Week 8 Lab

1. Poisson Distribution

The probability mass function of the Poisson distribution follows

\[ p(y; \mu) = \frac{\mu^{y} \exp (-\mu)}{y !}, \quad \mu >0, \quad y = 0, 1, \ldots \]

with parameter \(\mu\).

Questions:

  • Show that this density can be written in the form of an exponential family model with parameter:
    • \(\theta = \log(\mu)\)
    • \(\phi = 1\)
    • \(b(\theta) = \exp(\theta)\)
  • Give canonical link function.
  • Compute mean and variance using \(b(\theta)\).

2. Binomial Distribution

The probability mass function of the Binomial distribution follows

\[ p(y; \mu, n) = \begin{pmatrix} n\\y \end{pmatrix} \left( \frac{\mu}{n} \right)^y \left(1- \frac{\mu}{n} \right)^{n-y}. \]

Questions:

  • Show that this density can be written in the form of an exponential family model and find \(\theta\), \(\phi\), \(b(\theta)\).
  • Compute the expectation of \(Y\) and briefly describe your interpretation of the parameters.
  • Give the canonical link function.

3. Show that the following distributions belong to the exponential family:

  • Exponential: \(f(y, \theta) = \theta \exp(-y\theta)\) for \(y > 0\)
  • Gamma: \(f(y, \zeta, \phi) = y^{\phi-1} \zeta^\phi e^{-y \zeta}/\Gamma(\phi)\)

4. Logistic Regression Analysis

To study the effect of volume (x1) and rate (x2) of air inspired by human subjects on the occurrence or not (Y = 1 or 0) of transient vasoconstriction response in the skin of the fingers, 39 observations on these variables were obtained in a laboratory. The data is given below.

x1 <- c(3.7,3.5,1.25,0.75,.8,.7,.6,1.1,.9,.9,.8,.55,.6,1.4,0.75,2.3, 3.2,0.85,1.7,1.8,0.4,0.95,1.35,1.5,1.6,.6,1.8,0.95,1.9,1.6, 2.7,2.35,1.1,1.1,1.2,.8,.95,.75,1.3)
x2 <- c(.83,1.09,2.5,1.5,3.2,3.5,.75,1.7,.75,.45,.57,2.75,3,2.33,3.75, 1.64,1.6,1.42,1.06,1.8,2,1.36,1.35,1.36,1.78,1.5,1.5, 1.9,.95,.4,.75,.03,1.83,2.2,2,3.33,1.9,1.9,1.63)
y <- c(1,1,1,1,1,1,0,0,0,0,0,0,0,1,1,1,1,1,0,1,0,0,0,0,1,0,1,0,1,0,1, 0,0,1,1,1,0,0,1)
  • Fit a logistic regression model
  • Perform diagnostics
  • Interpret results
  • Fit the models with other possible link functions and compare the fitted models

Week 9 & 10 Lab

1. Leukaemia Data Analysis

The data below are the times to death, y, and the log of the white blood cell count, x, for 17 patients suffering from leukaemia.

y <- c(3.36,2.88,3.63,3.41,3.78,4.02,4.00,4.23,3.73,3.85,3.97,4.51,4.54, 5.00,5.00,4.72,5.00)
x <- c(65,156,100,134,16,108,121,4,39,143,56,26,22,1,1,5,6)
  • Plot y against x
  • Find the link function
  • Fit a suitable GLM

2. Glass Dataset Analysis

  • Load the dataset, briefly explore it, and provide a description.

  • Preprocess the response variable Type and suggest a suitable generalized linear model to classify if a glass type is 1 or not.

# Glass$Type <- ifelse(Glass$Type == 1, 1, 0)
  • Give an expression of the linear predictor and discuss how it associates with the conditional mean \(E(Y \mid X)\).

  • Perform model diagnostics and identify problems with the above model.

  • Fit a quasi-likelihood model with the same mean structure and briefly discuss the difference.

  • Use the predict function for prediction and explain the output.

# index <- sample(dim(Glass)[1], 10, replace = TRUE)
# predict(fit, newdata = Glass[index, ], type = "response")  
## fit refers to your fitted model

Week 11 & 12 Lab

1. The data below are the times to death (\(y\)) and the log of the white blood cell count (\(x\)) for 17 patients suffering from leukaemia.

y <- c(3.36,2.88,3.63,3.41,3.78,4.02,4.00,4.23,3.73,3.85,3.97,4.51,4.54,
    5.00,5.00,4.72,5.00)
x <- c(65,156,100,134,16,108,121,4,39,143,56,26,22,1,1,5,6)
  • Plot y against x. Is there a trend?
  • A possible model is \(E\left[Y_{j}\right]=1 /\left(\beta_{1}+\beta_{2} x_{j}\right)\). What is the link function in this case?
  • Assume an exponential error fit a model with your suggested link. (Hint: the exponential is a Gamma with scale parameter 1 so you will need to use a Gamma error, your chosen link and to set the dispersion to 1 . To do this compute the glm object, say G1, then use summary(G1,dispersion=1).)
  • Between exponential and Gamma, which distribution do you prefer? Why?

2. Analysis of the data on the number of plant species in Galapagos islands resulted in the following output.

library(GLMsData)
data("galapagos")
fit1 <- glm(Plants ~ Area + Elevation + Nearest + StCruz + Adjacent,family = poisson, data = galapagos)
summary(fit1)
## 
## Call:
## glm(formula = Plants ~ Area + Elevation + Nearest + StCruz + 
##     Adjacent, family = poisson, data = galapagos)
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  3.328e+00  5.071e-02  65.623  < 2e-16 ***
## Area        -5.511e-04  2.579e-05 -21.369  < 2e-16 ***
## Elevation    3.357e-03  8.425e-05  39.846  < 2e-16 ***
## Nearest      8.743e-03  1.806e-03   4.841 1.29e-06 ***
## StCruz      -6.500e-03  6.359e-04 -10.221  < 2e-16 ***
## Adjacent    -6.304e-04  2.914e-05 -21.637  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 3447.59  on 28  degrees of freedom
## Residual deviance:  700.48  on 23  degrees of freedom
## AIC: 868.29
## 
## Number of Fisher Scoring iterations: 5
anova(fit1,test="Chisq")
## Analysis of Deviance Table
## 
## Model: poisson, link: log
## 
## Response: Plants
## 
## Terms added sequentially (first to last)
## 
## 
##           Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
## NULL                         28     3447.6              
## Area       1   871.96        27     2575.6 < 2.2e-16 ***
## Elevation  1   789.47        26     1786.2 < 2.2e-16 ***
## Nearest    1    10.00        25     1776.2  0.001569 ** 
## StCruz     1   500.41        24     1275.8 < 2.2e-16 ***
## Adjacent   1   575.28        23      700.5 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  • Is there evidence of overdispersion? Explain your answer.
  • Suppose the variance is proportional to the mean rather than equal to the mean. Estimate the proportionality parameter using residual deviance.
  • Use this estimate to decide whether there is a significant effect of the distance from the nearest island.
  • What would be a better estimate of the proportionality parameter?