Introduction

Background: we are interested in modeling how the number of AIDS cases, in Belgium, from 1981 onwards, evolves over time.

Data Description

Variable Type Description
\(y_i\) Numeric. number of cases (count data)
\(t_i\) Numeric time (years. )
y <- c(12,14,33,50,67,74,123,141,165,204,253,246,240)
t <- 1:13
plot(t+1980,y,xlab="Year",ylab="New AIDS cases",ylim=c(0,280))

Summary: From the resulting plot, the key scientific question is whether the data provide evidence that the rate of increase in new cases is slowing down.

Model Specification

Since the response is count data, we use a Poisson model: \[ Y_i \sim \text{Poisson}(\mu_i), \quad \mu_i = \mathbb{E}[Y_i]. \] where \(Y_i\) is the number of new cases in year \(i\), and \(\mu_i\) is the expected (mean) number of new cases in that year.

To describe how the expected number of cases evolves over time, we introduce a continuous function \(\mu(t)\), which represents the expected number of new cases at time \(t\).

At the observed time points \(t_i\), we have \[ \mu_i = \mu(t_i) \] i.e., \(\mu_i\) is the value of \(\mu(t)\) at time \(t_i\).

We assume a simple epidemic mechanism: the rate of change of \(\mu(t)\) is proportional to the current level, leading to the differential equation: \[ \frac{d\mu(t)}{dt} = \delta \mu(t). \] where \(\delta\) represents the growth rate.

Solving this differential equation gives \[ \mu(t) = \gamma e^{\delta t}. \] where \(\gamma\) is a constant representing the initial level.

At observation times \(t_i\), we have \[ \mu_i = \gamma e^{\delta t_i}. \]

Taking logarithms, \[ \log(\mu_i) = \log(\gamma) + \delta t_i. \]

Let \[ \beta_0 = \log(\gamma), \quad \beta_1 = \delta, \] then we obtain the linear form \[ \log(\mu_i) = \beta_0 + \beta_1 t_i. \]

Thus, the Poisson regression model is \[ Y_i \sim \text{Poisson}(\mu_i), \quad \log(\mu_i) = \beta_0 + \beta_1 t_i. \]

This model describes how the incidence evolves over time.

Fit the Model (m0)

Now we fit this Poisson regression model using time as a predictor and get:

m0 <- glm(y ~ t, poisson)
m0
## 
## Call:  glm(formula = y ~ t, family = poisson)
## 
## Coefficients:
## (Intercept)            t  
##      3.1406       0.2021  
## 
## Degrees of Freedom: 12 Total (i.e. Null);  11 Residual
## Null Deviance:       872.2 
## Residual Deviance: 80.69     AIC: 166.4
So the model can be written as

\[ \quad \log(\mu_i) = 3.1406+ 0.2021 t_i. \]

Which is equivalent to

\[ \quad \mu_i = exp(3.1406+ 0.2021 t_i) \]

Summary: This implies that the mean number of cases grows exponentially over time.
Interpretation:
  • This means the number of cases increases by about 22% per year (\(e^{0.2021}=1.224\)).
  • The intercept correspond to the expected 23 cases at time zero (\(e^{3.1406}=23\)). This model assumes that the epidemic follows exponential growth, with a constant growth rate over time.

Model diagnostics (m0)

par(mfrow=c(2,2))
plot(m0)

  • Residuals vs Fitted: clear non-random pattern, indicating model misspecification.
    The curvature suggests that the relationship between time and the log mean is not linear, and a quadratic term in time may be needed.

  • Residuals vs Leverage: presence of a potentially influential observation.

Overall, these diagnostics suggest that the current model is too simple and fails to capture the underlying trend in the data.

Why adding a quadratic term?

The previous model implies a constant proportional growth rate:

\[ \frac{d}{dt}\log \mu(t)=\frac{\mu'(t)}{\mu(t)}=\delta \]

For an epidemic, the growth rate can change over time. To allow this, we add a quadratic term \(t^2\), to make the growth rate time-dependent. \[ \log \mu(t)=\beta_0+\beta_1 t+\beta_2 t^2. \] Under this model, \[ \mu(t)=\exp(\beta_0+\beta_1 t+\beta_2 t^2), \]

the derivative becomes \[ \frac{d}{dt}\log \mu(t)=\beta_1+2\beta_2 t. \] The growth rate in \(m1\) is no longer constant; instead, it changes with time.

m1 <- glm(y ~ t + I(t^2), poisson)
summary(m1)
## 
## Call:
## glm(formula = y ~ t + I(t^2), family = poisson)
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  1.901459   0.186877  10.175  < 2e-16 ***
## t            0.556003   0.045780  12.145  < 2e-16 ***
## I(t^2)      -0.021346   0.002659  -8.029 9.82e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 872.2058  on 12  degrees of freedom
## Residual deviance:   9.2402  on 10  degrees of freedom
## AIC: 96.924
## 
## Number of Fisher Scoring iterations: 4
Interpretation:
  • In m1, the expected number of new cases is modeled as
    \[log(\mu_i) = 1.9015 + 0.5560 t_i - 0.02135 t_i^2\]

  • The intercept (1.9015) corresponds to an expected ~6–7 (\(exp(1.9015)=6.6959\)) cases at time zero.

  • the derivative \[ \beta_1+2\beta_2 t = 0.556 + 2(-0.0213)t\] decreases as \(t\) increases.

  • The coefficient of \(t\) (0.5560) indicates a strong positive initial growth effect.
    At the start of the epidemic, the increase rate for the new cases is approximately 74% (\(exp(0.556)\) = 1.74)

  • The coefficient of \(t^2\) (-0.02135) shows that this growth rate decreases over time.
    Specifically, the growth rate declines by about 0.0427 each year. So the epidemic is still growing but at a slowing rate.

Residual plots for m1 fitted to the AIDS data

par(mfrow=c(2,2))
plot(m1)

Overall, these results suggest that m1 provides a much better fit than m0 and gives a more reasonable description of the data.

Model Comparison

We want to test whether the quadratic term is needed. Because in a Poisson GLM the scale parameter is known and fixed at 1, the deviance difference between two models is approximately chi-square distributed under the null hypothesis.

anova(m0,m1,test="Chisq")

Conclusions: Therefore, we reject m0 in favor of m1. This shows that adding the quadratic term significantly improves the fit and helps capture the slowing growth pattern.

Interpretation of \(\beta_1\)

beta.1 <- summary(m1)$coefficients[2,]
ci <- c(beta.1[1]-1.96*beta.1[2],beta.1[1]+1.96*beta.1[2])
ci # print 95% CI for beta_1
##  Estimate  Estimate 
## 0.4662750 0.6457316

After model comparison

To help us understand the trend and quantify the uncertainty in the estimates, we use the fitted model m1 to estimate the expected number of new cases over time (solid line), along with confidence intervals (dashed lines).

Since the model uses a log link, so we apply the inverse link function (\(exp()\)) to obtain values on the original scale.

new.t <- seq(1,13,length=100)
fv <- predict(m1,data.frame(t=new.t),se=TRUE)
plot(t+1980,y,xlab="Year",ylab="New AIDS cases",ylim=c(0,280))
lines(new.t+1980,exp(fv$fit))
lines(new.t+1980,exp(fv$fit+2*fv$se.fit),lty=2)
lines(new.t+1980,exp(fv$fit-2*fv$se.fit),lty=2)

Conclusion:

  • The fitted curve follows the data reasonably well.
  • The rate of increase appears to be slowing down over time.
  • Good fit, but not a causal model.
  • Future direction: SIR/SEIR model to explain epidemic mechanism.