| Variable | Type | Description |
|---|---|---|
| \(y_i\) | Numeric. | number of cases (count data) |
| \(t_i\) | Numeric | time (years. ) |
y <- c(12,14,33,50,67,74,123,141,165,204,253,246,240)
t <- 1:13
plot(t+1980,y,xlab="Year",ylab="New AIDS cases",ylim=c(0,280))
Since the response is count data, we use a Poisson model: \[ Y_i \sim \text{Poisson}(\mu_i), \quad \mu_i = \mathbb{E}[Y_i], \] where \(Y_i\) is the number of new cases in year \(i\).
We begin with the basic epidemic assumption that the rate of increase is proportional to the current level: \[ \frac{d\mu(t)}{dt} = \delta \mu(t). \]
Solving this differential equation gives \[ \mu(t) = \gamma e^{\delta t}. \]
At observation times \(t_i\), we have \[ \mu_i = \gamma e^{\delta t_i}. \]
Taking logarithms, \[ \log(\mu_i) = \log(\gamma) + \delta t_i. \]
Let \[ \beta_0 = \log(\gamma), \quad \beta_1 = \delta, \] then \[ \log(\mu_i) = \beta_0 + \beta_1 t_i. \]
Thus, the Poisson regression model is \[ Y_i \sim \text{Poisson}(\mu_i), \quad \log(\mu_i) = \beta_0 + \beta_1 t_i. \]
This model describes how the incidence evolves over time.
Now we fit this Poisson regression model using time as a predictor and get:
m0 <- glm(y ~ t, poisson)
m0
##
## Call: glm(formula = y ~ t, family = poisson)
##
## Coefficients:
## (Intercept) t
## 3.1406 0.2021
##
## Degrees of Freedom: 12 Total (i.e. Null); 11 Residual
## Null Deviance: 872.2
## Residual Deviance: 80.69 AIC: 166.4
\[ \quad \log(\mu_i) = 3.1406+ 0.2021 t_i. \]
\[ \quad \mu_i = exp(3.1406+ 0.2021 t_i) \]
coef(m0)
## (Intercept) t
## 3.1405895 0.2021212
This means the number of cases increases by about 22% per year (\(e^{0.2021}=1.224\)). The intercept correspond to the expected 23 cases at time zero (\(e^{3.1406}=23\)) This model assumes that the epidemic follows exponential growth, with a constant growth rate over time.
par(mfrow=c(2,2))
plot(m0)
Residuals vs Fitted: clear non-random pattern, indicating model
misspecification.
The curvature suggests that the relationship between time and the log
mean is not linear, and a quadratic term in time may be needed.
Residuals vs Leverage: presence of a potentially influential observation.
Overall, these diagnostics suggest that the current model is too simple and fails to capture the underlying trend in the data.
m1 <- glm(y ~ t + I(t^2), poisson)
plot(m1)
summary(m1)
##
## Call:
## glm(formula = y ~ t + I(t^2), family = poisson)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.901459 0.186877 10.175 < 2e-16 ***
## t 0.556003 0.045780 12.145 < 2e-16 ***
## I(t^2) -0.021346 0.002659 -8.029 9.82e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 872.2058 on 12 degrees of freedom
## Residual deviance: 9.2402 on 10 degrees of freedom
## AIC: 96.924
##
## Number of Fisher Scoring iterations: 4
anova(m0,m1,test="Chisq")
beta.1 <- summary(m1)$coefficients[2,]
ci <- c(beta.1[1]-1.96*beta.1[2],beta.1[1]+1.96*beta.1[2])
ci # print 95% CI for beta_1
## Estimate Estimate
## 0.4662750 0.6457316
new.t <- seq(1,13,length=100)
fv <- predict(m1,data.frame(t=new.t),se=TRUE)
plot(t+1980,y,xlab="Year",ylab="New AIDS cases",ylim=c(0,280))
lines(new.t+1980,exp(fv$fit))
lines(new.t+1980,exp(fv$fit+2*fv$se.fit),lty=2)
lines(new.t+1980,exp(fv$fit-2*fv$se.fit),lty=2)