Delta method for predictive models

The Delta method is very useful in predictive modeling, as it enables the calculation of 95% confidence intervals for predicted values in regression models. Interestingly, it can also be applied to determine a 95% confidence interval for probabilities. For instance, in a logistic regression model, the probability is a function of coefficients, allowing us to calculate the variance of this function and subsequently derive a 95% confidence interval for the probability (as discussed in reference [1,2]).

In this note, I will demonstrate how to employ the Delta method to compute a 95% confidence interval for predicted values in a simple linear regression.

To begin, let’s explore the univariate Delta method, assuming there is a sequence of random variables, \(X_n\), satisfying:

\[\sqrt{n}(X_n-\theta) \overset{D}{\rightarrow}\mathcal{N}(0,\sigma^2) \tag{1}\] Then it can be shown that:

\[\sqrt{n}(g(X_n)-g(\theta)) \overset{D}{\rightarrow}\mathcal{N}(0,[g'(\theta)]^2.\sigma^2)\tag{2}\] The proof is to use the first order of Taylor series, and we expand \(g(X_n)\) about the \(\theta\):

\[g(X_n)=g(\theta)+g'(\theta)(X_n-\theta) \tag{3}\] Rearrange \((3)\) we have

\[g(X_n)-g(\theta)=g'(\theta)(X_n-\theta)\tag{4}\] Times \(\sqrt{n}\) from both sides of \((4)\) we get:

\[\sqrt{n}(g(X_n)-g(\theta))=g'(\theta)\sqrt{n}(X_n-\theta)\tag{5}\] From \((1)\) we see that \(\sqrt{n}(X_n-\theta)\) has a \(\mathcal{N}(0,\sigma^2)\) distribution, and remember \(Var(aX)=a^2Var(X)\), where \(a\) is an constant. We treat \(g'(\theta)\) as a constant,therefore,

\[\sqrt{n}(g(X_n)-g(\theta)) \overset{D}{\rightarrow}\mathcal{N}(0,[g'(\theta)]^2.\sigma^2)\] which is equation \((2)\)

We can do some further rearrangements of equation \((2)\), and we get:

\[g(X_n)\overset{D}{\rightarrow}\mathcal{N}(g(\theta),\frac{[g'(\theta)]^2.\sigma^2}{n})\tag{6}\] For multivariate Delta method we have:

\[G(X_n) \approx G(\theta) + \nabla G(\theta)^T (X-\theta)\] This is similar as \((3)\)

Then the variance of \(G(X_n)\) is:

\[Var(G(X_n))\approx \nabla G(\theta)^T Cov(X_n) \nabla G(\theta) \tag{7}\]

This corresponds to the univariate variance as shown in Equation \((6)\).

To calculate the 95% confidence intervals for predicted values from a regression model, we will apply Equation \((7)\).

In the following example, I will use a straightforward linear regression model to illustrate how the Delta method can be used to compute 95% confidence intervals for predicted values from the regression model.

For our demonstration, we will utilize the “elemapi2v2.csv” dataset. This dataset was created by randomly sampling 400 elementary schools from the California Department of Education’s API (Academic Performance Index) 2000 dataset. It includes a measure of school academic performance, along with various other attributes of the elementary schools, such as class size, enrollment, socioeconomic status, and more.

d <- read.csv("https://stats.idre.ucla.edu/wp-content/uploads/2019/02/elemapi2v2.csv")# read in data file
m1 <- lm(api00 ~ enroll, data = d) # run regression api00(academic performance) on number of students (enroll)
summary(m1)

## 
## Call:
## lm(formula = api00 ~ enroll, data = d)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -285.50 -112.55   -6.70   95.06  389.15 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 744.25141   15.93308  46.711  < 2e-16 ***
## enroll       -0.19987    0.02985  -6.695 7.34e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 135 on 398 degrees of freedom
## Multiple R-squared:  0.1012, Adjusted R-squared:  0.09898 
## F-statistic: 44.83 on 1 and 398 DF,  p-value: 7.339e-11

We can write regression model as:

\[y=b_0+b_1*enroll\] where \(b_0=744.254\) and \(b_1=-0.19987\), “enroll” represents the number of students enrolled in a school.

Suppose we want to predict API of a school with an enrollment size of 500, we can use the following model to predict:

\[\hat{y}=\hat{b}_0+\hat{b}_1*500=744.254-0.19987*500=644.32\] Next we need to calculate the 95% CI of \(\hat{y}\) using Delta method. Here, \(\hat{y}\) is a function of both \(\hat{b}_0\) and \(\hat{b}_1\)

Therefore, \[\Delta=(\frac{\partial \hat{y}}{\partial \hat{b}_0},\frac{\partial \hat{y}}{\partial \hat{b}_1})=(1,500)\]

The variance-covariance matrix of \(\hat{b}_0\) and \(\hat{b}_1\) can be directly obtained from the regression model

vb<-vcov(m1)
vb

##             (Intercept)       enroll
## (Intercept) 253.8629636 -0.430812784
## enroll       -0.4308128  0.000891094

Finally, according to equation \((7)\), the variance of the predicted value \(\hat{y}\) is calculated as:

var_y_hat<-t(c(1,500))%*%vb%*%c(1,500) #variance of y hat
var_y_hat

##          [,1]
## [1,] 45.82369

sqrt(var_y_hat) # standard error

##         [,1]
## [1,] 6.76932

Next, we use the built-in function \(prdict\) to check if our manual calculations are correct:

predict(m1, newdata=data.frame(enroll=500), se.fit=T)

## $fit
##        1 
## 644.3177 
## 
## $se.fit
## [1] 6.76932
## 
## $df
## [1] 398
## 
## $residual.scale
## [1] 135.026

We observed that our manual calculation of the standard error for the predicted value using the Delta method yielded the exact same result as the built-in function.

References

1.https://bookdown.org/ts_robinson1994/10EconometricTheorems/dm.html

2.https://stats.oarc.ucla.edu/r/faq/how-can-i-estimate-the-standard-error-of-transformed-regression-parameters-in-r-using-the-delta-method/

Delta method for predictive models

Enwu Liu

2023-10-23

References