Introduction

We are interested in modeling how the number of AIDS cases evolves over time。

The key question is: Is the growth slowing down?

Data Description

Variable Type Description
\(y_i\) Numeric. number of cases (count data)
\(t_i\) Numeric time (years. )
y <- c(12,14,33,50,67,74,123,141,165,204,253,246,240)
t <- 1:13
plot(t+1980,y,xlab="Year",ylab="New AIDS cases",ylim=c(0,280))

Since the response is count data, we use a Poisson model: \[ Y_i \sim \text{Poisson}(\mu_i), \quad \mu_i = \mathbb{E}[Y_i], \] where \(Y_i\) is the number of new cases in year \(i\).

We begin with the basic epidemic assumption that the rate of increase is proportional to the current level: \[ \frac{d\mu(t)}{dt} = \delta \mu(t). \]

Solving this differential equation gives \[ \mu(t) = \gamma e^{\delta t}. \]

At observation times \(t_i\), we have \[ \mu_i = \gamma e^{\delta t_i}. \]

Taking logarithms, \[ \log(\mu_i) = \log(\gamma) + \delta t_i. \]

Let \[ \beta_0 = \log(\gamma), \quad \beta_1 = \delta, \] then \[ \log(\mu_i) = \beta_0 + \beta_1 t_i. \]

Thus, the Poisson regression model is \[ Y_i \sim \text{Poisson}(\mu_i), \quad \log(\mu_i) = \beta_0 + \beta_1 t_i. \]

This model describes how the incidence evolves over time.

Now we fit this Poisson regression model using time as a predictor and get:

m0 <- glm(y ~ t, poisson)
m0
## 
## Call:  glm(formula = y ~ t, family = poisson)
## 
## Coefficients:
## (Intercept)            t  
##      3.1406       0.2021  
## 
## Degrees of Freedom: 12 Total (i.e. Null);  11 Residual
## Null Deviance:       872.2 
## Residual Deviance: 80.69     AIC: 166.4
So the model can be written as

\[ \quad \log(\mu_i) = 3.1406+ 0.2021 t_i. \]

And it becomes

\[ \quad \mu_i = exp(3.1406+ 0.2021 t_i) \]

Summary: This implies that the mean number of cases grows exponentially over time.
coef(m0)
## (Intercept)           t 
##   3.1405895   0.2021212

This means the number of cases increases by about 22% per year (\(e^{0.2021}=1.224\)). The intercept correspond to the expected 23 cases at time zero (\(e^{3.1406}=23\)) This model assumes that the epidemic follows exponential growth, with a constant growth rate over time.

par(mfrow=c(2,2))
plot(m0)

Model diagnostics
  • Residuals vs Fitted: clear non-random pattern, indicating model misspecification.
    The curvature suggests that the relationship between time and the log mean is not linear, and a quadratic term in time may be needed.

  • Residuals vs Leverage: presence of a potentially influential observation.

Overall, these diagnostics suggest that the current model is too simple and fails to capture the underlying trend in the data.

Therefore, let’s consider the following model:
m1 <- glm(y ~ t + I(t^2), poisson)
plot(m1)

summary(m1)
## 
## Call:
## glm(formula = y ~ t + I(t^2), family = poisson)
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  1.901459   0.186877  10.175  < 2e-16 ***
## t            0.556003   0.045780  12.145  < 2e-16 ***
## I(t^2)      -0.021346   0.002659  -8.029 9.82e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 872.2058  on 12  degrees of freedom
## Residual deviance:   9.2402  on 10  degrees of freedom
## AIC: 96.924
## 
## Number of Fisher Scoring iterations: 4
anova(m0,m1,test="Chisq")
beta.1 <- summary(m1)$coefficients[2,]
ci <- c(beta.1[1]-1.96*beta.1[2],beta.1[1]+1.96*beta.1[2])
ci # print 95% CI for beta_1
##  Estimate  Estimate 
## 0.4662750 0.6457316
new.t <- seq(1,13,length=100)
fv <- predict(m1,data.frame(t=new.t),se=TRUE)
plot(t+1980,y,xlab="Year",ylab="New AIDS cases",ylim=c(0,280))
lines(new.t+1980,exp(fv$fit))
lines(new.t+1980,exp(fv$fit+2*fv$se.fit),lty=2)
lines(new.t+1980,exp(fv$fit-2*fv$se.fit),lty=2)