Summary

As we accumulate more data, the model has been moving steadily toward an earlier peak, at or near the present date. The model has always drifted in that direction, and in the past we have been skeptical, so we added priors to favor models with a later peak. However, as data has accumulated it has begun to overwhelm those priors, suggesting that it is time we begin to take these early peak scenarios more seriously.

Model description

This version of the model has the counties in four different categories, each with its own growth rate. Counties are assigned to categories based on their growth rates in the month of April. Although the categories are named based on their average growth rates over the month (“ultra-high”, “high”, “low”, and “ultra-low”), the actual growth rates are fit to the data, and as we shall see, the best fit rate for the “ultra-high” category is lower than the best-fit rate for the “high” category.

This version of the model also has dropped the starting date as a fittable parameter. Instead, the start of the infection in Fairfax County is fixed at \(t_0=30\) (i.e., 31 January, 2020). The reason for this change is that because of the exponential growth in the early stage of the model, the effect of changing the starting time is redundant with the effect of changing \(I_0\), the number of starting cases. Lowering \(I_0\) is equivalent to shifting \(t_0\) later, while raising \(I_0\) is equivalent to shifting \(t_0\) earlier. Eliminating the redundant parameter produces better performance in the optimization and MCMC calculations.

The model parameters and comparison observation comparison plots are:

##   T0_uhi T0_hi T0_lo T0_ulo  D0  A0   I0  Ts    b
## 1    8.9   6.7    14   20.8 1.2 0.9 17.5 1.8 33.9

The T0 parameters are the initial doubling times for the four county categories (thus, they are inversely proportional to the growth rates). The D0 and A0 parameters are the average lengths of the infectious and incubation periods; note, however, that these parameters are very weakly identified. I0 is the initial number of infections, Ts is the average time to symptom onset, and b is the enrichment factor.

Of the counties shown, Fairfax and Henrico are both in the “high” growth category. The rest are counties where the UVAHS has substantial market share. The model appears to do a good job of fitting the basline growth growth, but it isn’t able to capture the 1-2 day spikes in reported cases in Buckingham and Fluvanna counties (note the different scales in each pannel – the apparent spike in Madison County is only 2-3 extra cases.)

Model projections

## Warning: Removed 224 row(s) containing missing values (geom_path).

This projection has the maximum infection prevalence at 0.0072681 on 2020-03-20. That is substantially earlier than we have been assuming, but it’s something we have seen before. Early on the data seemed to be favoring these kinds of scenarios, and because we were skeptical we introduced some additional priors to push the peak out where we thought it should be. As new data has accumulated, it has started to overwhelm those priors, pushing the peak earlier. Because the new data coming in seem to continue to favor the earlier peak, it’s worth considering whether we are prepared to believe what the model is telling us.

Analysis of the modeling

Positive test fraction

After correcting for enrichment, he positive test fraction should track the infection prevalence. We can compare these two ratios in the statewide data.

## Warning: Removed 8 row(s) containing missing values (geom_path).

## Warning: Removed 8 row(s) containing missing values (geom_path).
## Warning: Removed 8 rows containing missing values (geom_point).

The first thing to notice is that the observations are significantly overdispersed. The error bars represent 95% CI ranges based on a beta distribution appropriate for the nominal positive and negative test counts. The day-to-day variation in the measured fraction is far outside the expected statistical variability. The explanation for this is likely that many of the tests are not performed independently, but rather in batches. Thus, the effective number of tests for statistical counting purposes is much smaller than the nominal count.

There is a clear increasing trend through March and the first week of April, and the model predictions seem to track this well, vindicating the assumption of a constant enrichment factor, at least through those early stages. The data seem to get noisier starting in the second week of April, making it harder to evaluate the model’s claim that the positive test fraction has started to decline. The claim seems plausible though. Certainly, it’s hard to see support for the prevalence continuing to increase.

Hospitalization fraction

How does the implied hospitalization fraction compare to our expectations? To properly evaluate this we really need to use the census model so as to account for the time delay between developing symptoms and being hospitalized. However, we can make a crude approximation by just assuming that time is zero.

## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

So, the hospitalization fraction implied by this model seems to be closer about half a percent than to the 3% we had previously assumed.

Infectious duration

Coming soon…