| Variable | Type | Description |
|---|---|---|
| \(y_i\) | Numeric. | number of cases (count data) |
| \(t_i\) | Numeric | time (years. ) |
y <- c(12,14,33,50,67,74,123,141,165,204,253,246,240)
t <- 1:13
plot(t+1980,y,xlab="Year",ylab="New AIDS cases",ylim=c(0,280))
Since the response is count data, we use a Poisson model: \[ Y_i \sim \text{Poisson}(\mu_i), \quad \mu_i = \mathbb{E}[Y_i]. \] where \(Y_i\) is the number of new cases in year \(i\), and \(\mu_i\) is the expected (mean) number of new cases in that year.
To describe how the expected number of cases evolves over time, we introduce a continuous function \(\mu(t)\), which represents the expected number of new cases at time \(t\).
At the observed time points \(t_i\), we have \[ \mu_i = \mu(t_i) \] i.e., \(\mu_i\) is the value of \(\mu(t)\) at time \(t_i\).
We assume a simple epidemic mechanism: the rate of change of \(\mu(t)\) is proportional to the current level, leading to the differential equation: \[ \frac{d\mu(t)}{dt} = \delta \mu(t). \] where \(\delta\) represents the growth rate.
Solving this differential equation gives \[ \mu(t) = \gamma e^{\delta t}. \] where \(\gamma\) is a constant representing the initial level.
At observation times \(t_i\), we have \[ \mu_i = \gamma e^{\delta t_i}. \]
Taking logarithms, \[ \log(\mu_i) = \log(\gamma) + \delta t_i. \]
Let \[ \beta_0 = \log(\gamma), \quad \beta_1 = \delta, \] then we obtain the linear form \[ \log(\mu_i) = \beta_0 + \beta_1 t_i. \]
Thus, the Poisson regression model is \[ Y_i \sim \text{Poisson}(\mu_i), \quad \log(\mu_i) = \beta_0 + \beta_1 t_i. \]
This model describes how the incidence evolves over time.
Now we fit this Poisson regression model using time as a predictor and get:
m0 <- glm(y ~ t, poisson)
m0
##
## Call: glm(formula = y ~ t, family = poisson)
##
## Coefficients:
## (Intercept) t
## 3.1406 0.2021
##
## Degrees of Freedom: 12 Total (i.e. Null); 11 Residual
## Null Deviance: 872.2
## Residual Deviance: 80.69 AIC: 166.4
\[ \quad \log(\mu_i) = 3.1406+ 0.2021 t_i. \]
\[ \quad \mu_i = exp(3.1406+ 0.2021 t_i) \]
par(mfrow=c(2,2))
plot(m0)
Residuals vs Fitted: clear non-random pattern, indicating model
misspecification.
The curvature suggests that the relationship between time and the log
mean is not linear, and a quadratic term in time may be needed.
Residuals vs Leverage: presence of a potentially influential observation.
Overall, these diagnostics suggest that the current model is too simple and fails to capture the underlying trend in the data. In fact, the residual deviance is much larger than the degrees of freedom (80.686 vs 11), yielding a ratio of about 7, which suggests substantial overdispersion.
The previous model implies a constant proportional growth rate:
\[ \frac{d}{dt}\log \mu(t)=\frac{\mu'(t)}{\mu(t)}=\delta \]
For an epidemic, the growth rate can change over time. To allow this, we add a quadratic term \(t^2\), to make the growth rate time-dependent. \[ \log \mu(t)=\beta_0+\beta_1 t+\beta_2 t^2. \] Under this model, \[ \mu(t)=\exp(\beta_0+\beta_1 t+\beta_2 t^2), \]
the derivative becomes \[ \frac{d}{dt}\log \mu(t)=\beta_1+2\beta_2 t. \] The growth rate in \(m1\) is no longer constant; instead, it changes with time.
m1 <- glm(y ~ t + I(t^2), poisson)
summary(m1)
##
## Call:
## glm(formula = y ~ t + I(t^2), family = poisson)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.901459 0.186877 10.175 < 2e-16 ***
## t 0.556003 0.045780 12.145 < 2e-16 ***
## I(t^2) -0.021346 0.002659 -8.029 9.82e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 872.2058 on 12 degrees of freedom
## Residual deviance: 9.2402 on 10 degrees of freedom
## AIC: 96.924
##
## Number of Fisher Scoring iterations: 4
In m1, the expected number of new cases is modeled as
\[log(\mu_i) = 1.9015 + 0.5560 t_i - 0.02135
t_i^2\]
The intercept (1.9015) corresponds to an expected ~6–7 (\(exp(1.9015)=6.6959\)) cases at time zero.
the derivative \[ \beta_1+2\beta_2 t = 0.556 + 2(-0.0213)t\] decreases as \(t\) increases.
The coefficient of \(t\)
(0.5560) indicates a strong positive initial growth
effect.
At the start of the epidemic, the increase rate for the new cases is
approximately 74% (\(exp(0.556)\) =
1.74)
The coefficient of \(t^2\)
(-0.02135) shows that this growth rate decreases over time.
Specifically, the growth rate declines by about 0.0427 each year. So the
epidemic is still growing but at a slowing rate.
par(mfrow=c(2,2))
plot(m1)
Overall, these results suggest that m1 provides a much better fit than m0 and gives a more reasonable description of the data. Now, when checking for overdispersion, the residual deviance of 9.240 with 10 degrees of freedom gives a ratio of less than 1, indicating no evidence of overdispersion.
We want to test whether the quadratic term is needed. Because in a Poisson GLM the scale parameter is known and fixed at 1, the deviance difference between two models is approximately chi-square distributed under the null hypothesis.
anova(m0,m1,test="Chisq")
Conclusions: Therefore, we reject m0 in favor of m1. This shows that adding the quadratic term significantly improves the fit and helps capture the slowing growth pattern.
beta.1 <- summary(m1)$coefficients[2,]
ci <- c(beta.1[1]-1.96*beta.1[2],beta.1[1]+1.96*beta.1[2])
ci # print 95% CI for beta_1
## Estimate Estimate
## 0.4662750 0.6457316
To help us understand the trend and quantify the uncertainty in the estimates, we use the fitted model m1 to estimate the expected number of new cases over time (solid line), along with confidence intervals (dashed lines).
Since the model uses a log link, so we apply the inverse link function (\(exp()\)) to obtain values on the original scale.
new.t <- seq(1,13,length=100)
fv <- predict(m1,data.frame(t=new.t),se=TRUE)
plot(t+1980,y,xlab="Year",ylab="New AIDS cases",ylim=c(0,280))
lines(new.t+1980,exp(fv$fit))
lines(new.t+1980,exp(fv$fit+2*fv$se.fit),lty=2)
lines(new.t+1980,exp(fv$fit-2*fv$se.fit),lty=2)