Updated on: 2020-08-10
Good background on Gompertz curves can be found at https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0178691&type=printable. A quick Youtube introduction is at https://youtu.be/0ifT-7K68sk. R code used for an example application of tumor growth modelling is at https://rpubs.com/mathetal/gompertz.
A “Gompertz” equation can be written in two forms that are equivalent from a data fitting perspective. I look at two forms. Form 1: \[f(t) = a * exp(-b * exp(-c*t))\], the three parameters to be estimated are a, b and c. Term t represents time. Parameter a is the upper asymptote. In the context of Covid cumulative death data, one way parameter a can be conceptualized is as an estimate of the population with inherent characteristics that will result in their death from Covid.
Form 2: \[f(t) = a*exp(-exp(-b*(d-Ti)))\] In this form parameter a is the same (as we will see below), , d is the current day number, and parameter b is…
“…a growth-rate coefficient (which affects the slope), and Ti represents time at inflection. The Ti parameter shifts the growth curve horizontally without changing its shape and is therefore what is often termed a location parameter (whereas A and [b] are shape parameters)….” - Tjørve KMC, Tjørve E (2017) The use of Gompertz models in growth analyses, and new Gompertz-model approach: An addition to the Unified-Richards family. PLoS ONE 12(6): e0178691.
Form 2 is easier to interpret, because Ti is interpreted in a straightforward way as the date at which the growth rate begins to slow.
My R code (included some additional analyses not shown here) is freely avaialble at Github
Note: Total deaths from covidtracking.com tends to lag those reported by J. Hopkins website.
u1<-stLevel[,lapply(.SD, sum,na.rm=TRUE),by=Date,.SDcols=c("death","positive","total","positiveIncrease","totalTestResultsIncrease","deathIncrease","hospitalized","hospitalizedCurrently")]
u1<-u1[order(Date),]
# Omit fragmentary early data when cum deaths < 50
u11<-u1[death>50,]
data1<- data.frame(seq(1:nrow(u11)),u11$death)
colnames(data1)<-c("d","cd")
library(easynls)
#fit data to the Gompertz model form 1
mod10<-nlsfit(data1, model=10, start = c(a=150000,b=11,c=0.1))
mod10
## $Model
## [1] "y~a*exp(-b*exp(-c*x)"
##
## $Parameters
## cd
## coefficient a 1.456911e+05
## coefficient b 4.810877e+00
## coefficient c 3.184994e-02
## p-value t.test for a 0.000000e+00
## p-value t.test for b 0.000000e+00
## p-value t.test for c 0.000000e+00
## r-squared 9.893000e-01
## adjusted r-squared 9.891000e-01
## AIC 3.036147e+03
## BIC 3.048242e+03
#fit data to Gompertz model form 2
library(minpack.lm)
modA<-nlsLM(cd~a*exp(-exp(-b*(d-Ti))),data=data1,start=list(a=150000,b=4,Ti=20))
summary(modA)
##
## Formula: cd ~ a * exp(-exp(-b * (d - Ti)))
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## a 1.457e+05 1.497e+03 97.29 <2e-16 ***
## b 3.185e-02 9.294e-04 34.27 <2e-16 ***
## Ti 4.932e+01 5.815e-01 84.82 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5177 on 149 degrees of freedom
##
## Number of iterations to convergence: 19
## Achieved convergence tolerance: 1.49e-08
data2<- data.frame(cdc[,wkNum],cdc[,totalCovid])
colnames(data2)<-c("wkNum","totalCovid")
#fit CDC date of death weekly data
#omit last 2 weeks of data because of significant lags in reporting to the CDC
modCDC<-nlsLM(totalCovid~a*exp(-exp(-b*(wkNum-Ti))),data=data2[1:(nrow(data2)-2),],start=list(a=150000,b=0.1,Ti=8))
## Lowdon section
# # modCDC<-nlsLM(totalCovid~a*exp(-exp(-b*(wkNum-Ti))),data=data2[1:(nrow(data2)),],start=list(a=100000,b=0.1,Ti=8))
# summary(modCDC)
cdcMod<-unlist(summary(modCDC))
gomp<-nlsplot(data=data1, model = 10, start = c(a = 150000, b = 11, c = .1),
xlab = "Days since Deaths>50, Gompertz Eq. " , ylab = "Cum Covid Deaths", position = 1)
CDC date of death data Gompertz parameters indicates a similar peak between 4/11/2020 and 4/18/2020, and an estimated total of 138,007 Covid deaths. The issue with CDC date of death data is the significant lag of reporting which gets revised upward over the next 5 - 7 weeks.
Another parameterized equation form that some have proposed to model infections is a “Graph Equation” or “Network Equation” (see https://twitter.com/JasonC88766481/status/1269991092507652098). These equations are used to model something propagated through a network of cluster nodes. One proposed Graph Equation fit to daily deaths is \[K*(t)^q * exp(-(t)/z)\] where t is time, and K, q and z are estimated parameters. This equation is fit to daily deaths.
This can be compared to the first derivative of a Gompertz curve fit to cumulative deaths vs. time. The first derivative of Form 2 of the Gompertz equation is \[a*b*exp(-exp(-b*(t-To))-b*(t-To))\], where again t is time and parameters a, b, and To are estimated.
While these equations are different, they are both 3 parameter equations with declining growth over time. As shown on the graph above, they produce very similar fits to daily deaths. The total forecasted cumulative number of Covid deaths from the Gompertz equation is 145,691, while the forecasted cumulative from the Graph equation is 159,975. Since I am using covidtracking.com for fitting, these can be compared to their currently reported total deaths of 154,947 (this is lower than numbers reported by J. Hopkins).
#Look at estimates of parameter a over the last 30 days at 5 day intervals
priorDates<-data.frame(seq((nrow(data1)-40),nrow(data1),5))
priorDates$CumEstimate<-0
priorDates$AdjRSquared<-0
priorDates$ParamB<-0
priorDates$ParamC<-0
for (i in 1:nrow(priorDates)) {
mod10<-nlsfit(data1[1:priorDates[i,1],], model=10, start = c(a=90000,b=8,c=0.05))
priorDates$CumEstimate[i]<-unlist(mod10)[2]
priorDates$AdjRSquared[i]<-unlist(mod10)[8]
priorDates$ParamB[i]<-unlist(mod10)[3]
priorDates$ParamC[i]<-unlist(mod10)[4]
}
priorDates[,1]<-priorDates[,1]-max(priorDates[,1])
priorDates[,2]<-comma(round(as.numeric(priorDates[,2]),0),digits = 0)
priorDates[,3]<-percent(priorDates[,3])
priorDates[,4]<-comma(priorDates[,4],2)
priorDates[,5]<-comma(priorDates[,5],4)
colnames(priorDates) <- c("Days Ago","Cum Covid Death Est.","Adj. R-Squared","Parameter b", "Parameter c")
priorDates$pctChangeCumCovid<-percent(max(priorDates$`Cum Covid Death Est.`)/priorDates$`Cum Covid Death Est.`-1,digits = 1)
priorDates
## Days Ago Cum Covid Death Est. Adj. R-Squared Parameter b Parameter c
## 1 -40 123,519 99.88% 6.90 0.0444
## 2 -35 124,965 99.86% 6.72 0.0434
## 3 -30 126,652 99.83% 6.52 0.0423
## 4 -25 128,580 99.78% 6.29 0.0411
## 5 -20 130,880 99.71% 6.03 0.0397
## 6 -15 133,693 99.58% 5.74 0.0380
## 7 -10 137,136 99.40% 5.43 0.0360
## 8 -5 141,125 99.18% 5.12 0.0340
## 9 0 145,691 98.93% 4.81 0.0319
## pctChangeCumCovid
## 1 18.0%
## 2 16.6%
## 3 15.0%
## 4 13.3%
## 5 11.3%
## 6 9.0%
## 7 6.2%
## 8 3.2%
## 9 0.0%
We can see that cumulative Covid death estimate using this technique has changed by 15.0% over the last 30 days (since 2020-07-11) despite all fits having high adjusted R-squared values. The estimate does appear to be converging however with smaller changes as time passes. This seems consistent with the virus spreading more widely throughout the US, ultimately providing a more stable national estimate.
For comparison the widely cited IHME Univ. of Washington model (https://ihmecovid19storage.blob.core.windows.net/archive/2020-04-29/ihme-covid19.zip) forecasted 72,000 Covid deaths on April 29, and then revised their forecast to 134,000 on May 4th.
mda<-data.frame(seq((nrow(data1)-40),nrow(data1),5))
mda$CumEstimate<-0
mda$ParamB<-0
mda$ParamTo<-0
for (i in 1:nrow(mda)) {
moda<-summary(nlsLM(cd~a*exp(-exp(-b*(d-To))),data=data1[1:mda[i,1],],start=list(a=90000,b=0.05,To=30)))
mda$CumEstimate[i]<-moda$parameters[1]
mda$ParamB[i]<-moda$parameters[2]
mda$ParamTo[i]<-moda$parameters[3]
}
mda[,1]<-mda[,1]-max(mda[,1])
mda[,2]<-comma(round(as.numeric(mda[,2]),0),digits = 0)
mda[,3]<-percent(mda[,3])
mda[,4]<-comma(mda[,4],1)
colnames(mda) <- c("Days Ago","Cum Covid Death Est.","Parameter b", "Parameter Ti")
mda$pctChangeCumCovid<-percent(max(mda$`Cum Covid Death Est.`)/mda$`Cum Covid Death Est.`-1,digits = 1)
mda
## Days Ago Cum Covid Death Est. Parameter b Parameter Ti pctChangeCumCovid
## 1 -40 123,519 4.44% 43.5 18.0%
## 2 -35 124,965 4.34% 43.9 16.6%
## 3 -30 126,652 4.23% 44.3 15.0%
## 4 -25 128,580 4.11% 44.7 13.3%
## 5 -20 130,880 3.97% 45.3 11.3%
## 6 -15 133,693 3.80% 46.0 9.0%
## 7 -10 137,136 3.60% 46.9 6.2%
## 8 -5 141,125 3.40% 48.0 3.2%
## 9 0 145,691 3.18% 49.3 0.0%
plot(mda$`Days Ago`,mda$`Cum Covid Death Est.`,type='b',ylab="Final Total Covid Death Est.",xlab="Est. Made X Days Ago",col='red',ylim = c(50000,150000))
a1<-nlsLM(cd~a*exp(-exp(-b*(d-To))),data=data1[1:(nrow(data1)-40),],start=list(a=90000,b=0.05,To=30))
a2<-nlsLM(cd~a*exp(-exp(-b*(d-To))),data=data1,start=list(a=90000,b=0.05,To=30))
plot(predict(a2,newdata = data1),type='l',col='blue',xlab = "Days Since Deaths>50",ylab="Cum. Covid Deaths", main="Blue line = Curr. Gompertz Est., Red Line = Est. Made 40 days ago")
lines(predict(a1,newdata=data1),col='red')
Form 2 model parameters have changed in the following ways:
It is difficult to relate these changes to increasing (and now decreasing) degrees of lockdown/social distancing. While the slowing growth rate estimate and the extension of the inflection date could be seen as compatible with increasing lockdown implementations, the growing estimate of cumulative deaths doesn’t fit a lockdown hypothesis.
I believe a better hypothesis is that the initial death estimates were dominated by the large number of deaths coming from dense, highly populated cities (e.g. in NY and NJ). The initial Gompertz estimates would have reflected the growth and “carrying capacity” in that sub-system. Since then Covid has spread to interior states/more rural & suburban areas. For example, on April 8th 49% of the US population aged 55 and older lived in counties with 5 or fewer reported Covid deaths. On June 2nd, that percentage had dropped to approximately 15%.
Thus from Covid’s perspective, it found a “larger container” as it expanded beyond larger cities. That larger container was inherently harder to spread in. Finding the larger container meant cumulative death estimates would rise, growth rates would decline, and the date of growth rate inflection was extended.
More generally, each state’s experience (as shown below) has been unique in terms of:
## Warning: Removed 607 row(s) containing missing values (geom_path).
We can fit Gompertz curves to each state individually. The chart below shows the resulting dispersion of coefficients for Form 2 of the Gompertz equation. Term b, the growth coefficient estimates and Ti (days to growth inflection point) are the x-y coordinates. Term a, the asymptote, is used to calculate the “projDeathRate” which is term a divided by that state’s population aged 55 and older.
## `geom_smooth()` using formula 'y ~ x'
As suspected given what we know about the timing and distribution of Covid deaths in the US, curves for New York and New Jersey have a higher growth parameter than one estimated on all other states combined. Similarly the growth inflection date for New York and New Jersey is sooner than ones estimated for states with lower population densities
Finally, the total Covid death estimate is higher by combining all individual state Gompertz curves vs. the estimate from the single All-US model estimate (104,002,029 vs. 145,691).
I suspect the total forecast using 50 individual state models is less stable, but more accurate (as it captures slower growth curves in some states) than a single US model. Ten days ago the total Covid death estimate using individual state models was 17,213,572 (-83% different).
aa<-head(sumTab[Date>=Sys.Date(),.(Date,round(dailyDeath,0))],30)
colnames(aa)<-c("Date","Daily Covid Deaths")
kable(aa, table.attr = "style='width:50%;'")
Date | Daily Covid Deaths |
---|---|
2020-08-10 | 686 |
2020-08-11 | 691 |
2020-08-12 | 697 |
2020-08-13 | 702 |
2020-08-14 | 708 |
2020-08-15 | 715 |
2020-08-16 | 721 |
2020-08-17 | 727 |
2020-08-18 | 734 |
2020-08-19 | 741 |
2020-08-20 | 748 |
2020-08-21 | 755 |
2020-08-22 | 763 |
2020-08-23 | 771 |
2020-08-24 | 778 |
2020-08-25 | 786 |
2020-08-26 | 795 |
2020-08-27 | 803 |
2020-08-28 | 812 |
2020-08-29 | 820 |
2020-08-30 | 829 |
2020-08-31 | 838 |
2020-09-01 | 848 |
2020-09-02 | 857 |
2020-09-03 | 867 |
2020-09-04 | 877 |
2020-09-05 | 887 |
2020-09-06 | 897 |
2020-09-07 | 908 |
2020-09-08 | 918 |