Logistic regression in R - Part 2 (Goodness of fit)

Introduction

In a previous article, we discuss how to construct a logistic regression in R (see previous article). However, we did not discuss how to assess whether the logistic regression fit our data well. In this article, we’ll go over some goodness of fit tests to help determine whether the logistic regression model does a good job of fitting the data.

Motivating example

We will use the example diabetes.csv from the previous article.

You can download the data here.

Descriptives

There are 768 observations in the diabetes.data data frame.

library("psych")
library("gmodels")

## Import file
diabetes.data <- read.csv("diabetes.csv", header = TRUE)
head(diabetes.data)

##   Pregnancies Glucose BloodPressure SkinThickness Insulin  BMI
## 1           6     148            72            35       0 33.6
## 2           1      85            66            29       0 26.6
## 3           8     183            64             0       0 23.3
## 4           1      89            66            23      94 28.1
## 5           0     137            40            35     168 43.1
## 6           5     116            74             0       0 25.6
##   DiabetesPedigreeFunction Age Outcome
## 1                    0.627  50       1
## 2                    0.351  31       0
## 3                    0.672  32       1
## 4                    0.167  21       0
## 5                    2.288  33       1
## 6                    0.201  30       0

### Descriptive of diabetes.data
describeBy(diabetes.data)

##                          vars   n   mean     sd median trimmed   mad   min
## Pregnancies                 1 768   3.85   3.37   3.00    3.46  2.97  0.00
## Glucose                     2 768 120.89  31.97 117.00  119.38 29.65  0.00
## BloodPressure               3 768  69.11  19.36  72.00   71.36 11.86  0.00
## SkinThickness               4 768  20.54  15.95  23.00   19.94 17.79  0.00
## Insulin                     5 768  79.80 115.24  30.50   56.75 45.22  0.00
## BMI                         6 768  31.99   7.88  32.00   31.96  6.82  0.00
## DiabetesPedigreeFunction    7 768   0.47   0.33   0.37    0.42  0.25  0.08
## Age                         8 768  33.24  11.76  29.00   31.54 10.38 21.00
## Outcome                     9 768   0.35   0.48   0.00    0.31  0.00  0.00
##                             max  range  skew kurtosis   se
## Pregnancies               17.00  17.00  0.90     0.14 0.12
## Glucose                  199.00 199.00  0.17     0.62 1.15
## BloodPressure            122.00 122.00 -1.84     5.12 0.70
## SkinThickness             99.00  99.00  0.11    -0.53 0.58
## Insulin                  846.00 846.00  2.26     7.13 4.16
## BMI                       67.10  67.10 -0.43     3.24 0.28
## DiabetesPedigreeFunction   2.42   2.34  1.91     5.53 0.01
## Age                       81.00  60.00  1.13     0.62 0.42
## Outcome                    1.00   1.00  0.63    -1.60 0.02

Create categorical variables

Currently, the data frame has a variable called Pregnancies that contains the number of pregnancies each subject has. We are only interested in whether or not they have had a pregnancy, so we’ll create a binary variable called group. We added labels to the 0s an 1s and call this new data frame data1. There was a total of 657 (85.5%) subjects with a history of pregnancies and 111 (14.5%) subjects with no history of pregnancy.

#### Generate groups based on pregnancies (Group 1 = 0, Group 2 = 1-5, Group 3 = >5 pregnancies)
diabetes.data$group[diabetes.data$Pregnancies == 0] = 0 
diabetes.data$group[diabetes.data$Pregnancies > 0] = 1 
CrossTable(diabetes.data$group)

## 
##  
##    Cell Contents
## |-------------------------|
## |                       N |
## |         N / Table Total |
## |-------------------------|
## 
##  
## Total Observations in Table:  768 
## 
##  
##           |         0 |         1 | 
##           |-----------|-----------|
##           |       111 |       657 | 
##           |     0.145 |     0.855 | 
##           |-----------|-----------|
## 
## 
## 
##

### Create factors
data1 <- within(diabetes.data, {
           group <- factor(group, labels = c("No history of pregnancies", "History of pregnancies"))
})

CrossTable(data1$group)

## 
##  
##    Cell Contents
## |-------------------------|
## |                       N |
## |         N / Table Total |
## |-------------------------|
## 
##  
## Total Observations in Table:  768 
## 
##  
##                           | No history of pregnancies |    History of pregnancies | 
##                           |---------------------------|---------------------------|
##                           |                       111 |                       657 | 
##                           |                     0.145 |                     0.855 | 
##                           |---------------------------|---------------------------|
## 
## 
## 
##

### Compare datasets
head(diabetes.data)

##   Pregnancies Glucose BloodPressure SkinThickness Insulin  BMI
## 1           6     148            72            35       0 33.6
## 2           1      85            66            29       0 26.6
## 3           8     183            64             0       0 23.3
## 4           1      89            66            23      94 28.1
## 5           0     137            40            35     168 43.1
## 6           5     116            74             0       0 25.6
##   DiabetesPedigreeFunction Age Outcome group
## 1                    0.627  50       1     1
## 2                    0.351  31       0     1
## 3                    0.672  32       1     1
## 4                    0.167  21       0     1
## 5                    2.288  33       1     0
## 6                    0.201  30       0     1

head(data1)

##   Pregnancies Glucose BloodPressure SkinThickness Insulin  BMI
## 1           6     148            72            35       0 33.6
## 2           1      85            66            29       0 26.6
## 3           8     183            64             0       0 23.3
## 4           1      89            66            23      94 28.1
## 5           0     137            40            35     168 43.1
## 6           5     116            74             0       0 25.6
##   DiabetesPedigreeFunction Age Outcome                     group
## 1                    0.627  50       1    History of pregnancies
## 2                    0.351  31       0    History of pregnancies
## 3                    0.672  32       1    History of pregnancies
## 4                    0.167  21       0    History of pregnancies
## 5                    2.288  33       1 No history of pregnancies
## 6                    0.201  30       0    History of pregnancies

Crude Logistic Regression Model

We create a crude logistic regression model to evaluate the association of the subject’s age Age on history of pregnancy group.

Crude model:

\(logit( E[PregancyHistory_i | X_i]) = logit(p_i) = ln(\frac{p_i}{1 - p_i}) = \beta_0 + \beta_1 (Age)_{i} + \epsilon\)

Based on the crude logistic regression model, a 1-unit increase in age was associated with a 7% increase in the odds of having a history of pregnancy (95% CI: 1.05, 1.10), which is statistically significant.

logit1 <- glm(group ~ Age, data = data1, family = "binomial"(link = "logit"))
summary(logit1)

## 
## Call:
## glm(formula = group ~ Age, family = binomial(link = "logit"), 
##     data = data1)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.9478   0.2902   0.4966   0.6630   0.7500  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -0.33970    0.39173  -0.867    0.386    
## Age          0.06972    0.01343   5.193 2.06e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 634.53  on 767  degrees of freedom
## Residual deviance: 597.14  on 766  degrees of freedom
## AIC: 601.14
## 
## Number of Fisher Scoring iterations: 5

confint(logit1)

##                   2.5 %     97.5 %
## (Intercept) -1.13328100 0.40600173
## Age          0.04484305 0.09761406

### Exponentiate the coefficients
exp(coef(logit1))

## (Intercept)         Age 
##    0.711984    1.072211

exp(confint(logit1))

##                 2.5 %   97.5 %
## (Intercept) 0.3219751 1.500805
## Age         1.0458637 1.102537

Multivariable logistic regression model

We can add potential confounders such as BMI, glucose, and skin thickness to the logistic regression model. This will generate model estimates that will be adjusting for these other predictors.

Adjusted model:

\(logit( E[PregancyHistory_i | X_i]) = logit(p_i) = ln(\frac{p_i}{1 - p_i}) = \beta_0 + \beta_1 (Age)_{i} + \beta_2 (BMI)_{i} + \beta_2 (Glucose)_{i} + \beta_2 (SkinThickness)_{i} + \epsilon\)

logit2 <- glm(group ~ Age + BMI + Glucose + SkinThickness, data = data1, family = "binomial"(link = "logit"))
summary(logit2)

## 
## Call:
## glm(formula = group ~ Age + BMI + Glucose + SkinThickness, family = binomial(link = "logit"), 
##     data = data1)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -3.3549   0.2632   0.4737   0.6239   1.2282  
## 
## Coefficients:
##                Estimate Std. Error z value Pr(>|z|)    
## (Intercept)    1.516255   0.626257   2.421 0.015472 *  
## Age            0.083301   0.014808   5.625 1.85e-08 ***
## BMI           -0.053471   0.015487  -3.453 0.000555 ***
## Glucose       -0.005492   0.003615  -1.519 0.128700    
## SkinThickness  0.007768   0.007413   1.048 0.294694    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 634.53  on 767  degrees of freedom
## Residual deviance: 578.30  on 763  degrees of freedom
## AIC: 588.3
## 
## Number of Fisher Scoring iterations: 6

confint(logit2)

##                      2.5 %       97.5 %
## (Intercept)    0.296789712  2.755963522
## Age            0.055821246  0.114006339
## BMI           -0.084232809 -0.023425961
## Glucose       -0.012607513  0.001577382
## SkinThickness -0.006861665  0.022262463

### Exponentiate the coefficients
exp(coef(logit2))

##   (Intercept)           Age           BMI       Glucose SkinThickness 
##     4.5551360     1.0868687     0.9479330     0.9945226     1.0077981

exp(confint(logit2))

##                   2.5 %     97.5 %
## (Intercept)   1.3455323 15.7361958
## Age           1.0574087  1.1207592
## BMI           0.9192172  0.9768463
## Glucose       0.9874716  1.0015786
## SkinThickness 0.9931618  1.0225121

Assessing model fit for a logistic regression

In our motivating example, we have two models: (1) Crude model and (2) Adjusted model. The crude model only has the outcome (Pregnancy = Yes/No) and the predictor of interest (Age). The adjusted model has the outcome and predictor of interest along with other covariates (BMI, Glucose, and SkinThickness). We can assess which of these two models fit the data better using different types of tests. One of these tests is the Hosmer-Lemeshow (HL) goodness of fit (GOF) test.

There are several R packages that you can use to perform the HL GOF test:
* hoslem.test function from the ResourceSelection package
* performance_hosmer function from the performance package
* hltest function from the largesamplehl package

Note: I noticed that when I used data1 in the hoslem.test function, I get p-values that are very small (e.g., \(10^{-14}\)). This does not occur when I use diabetetes.data in the function. I believe this has something to do when I factor the outcome variable pregnancy. Therefore, when using the hoslem.test function, make sure to use the unfactorized outcome variable.

Comparing both models using the Likelihood ratio test

You can compare models if one is nested within the other. For our example, the crude mode is nested within the adjusted model.

Crude model:

\(logit( E[PregancyHistory_i | X_i]) = logit(p_i) = ln(\frac{p_i}{1 - p_i}) = \beta_0 + \beta_1 (Age)_{i} + \epsilon\)

Adjusted model:

\(logit( E[PregancyHistory_i | X_i]) = logit(p_i) = ln(\frac{p_i}{1 - p_i}) = \beta_0 + \beta_1 (Age)_{i} + \beta_2 (BMI)_{i} + \beta_3 (Glucose)_{i} + \beta_4 (SkinThickness)_{i} + \epsilon\)

where \(p_{i}\) is the probability of having a history of pregnancy.

We can compare both models using the likelihood ratio test to see which one fits the data better. We will use the lrtest function from the lmtest package. The null hypothesis states that the crude model fits the data better. The alternative hypothesis states that the alternative (adjusted model) fits the data better.

library(lmtest)

### Comparing the crude model (logit1) to the adjusted model (logit2)
lrtest(logit1, logit2)

## Likelihood ratio test
## 
## Model 1: group ~ Age
## Model 2: group ~ Age + BMI + Glucose + SkinThickness
##   #Df  LogLik Df  Chisq Pr(>Chisq)    
## 1   2 -298.57                         
## 2   5 -289.15  3 18.846  0.0002942 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The p<0.00001 suggesting that the adjusted model with four covariates (Age, BMI, Glucose, and SkinThickness) fits the data significantly better than the crude model.

HL GOF Test – Crude model

The Hosmer-Lemeshow GOF test evaluates whether the logistic regression model is a good fit for the data. The null hypothesis for the Hosmer-Lemeshow GOF test is that the model is a good fit of the data; the alternative hypothesis is that the model is not a good fit.

library("ResourceSelection")
library("largesamplehl")
library("PredictABEL")
library("performance")

### Method 1:
hl_gof <- hoslem.test(diabetes.data$group, fitted(logit1), g = 10) ### Use the unfactored data for the outcomes
hl_gof

## 
##  Hosmer and Lemeshow goodness of fit (GOF) test
## 
## data:  diabetes.data$group, fitted(logit1)
## X-squared = 22.446, df = 8, p-value = 0.004154

### Method 2:
performance_hosmer(logit1, n_bins = 10)

## # Hosmer-Lemeshow Goodness-of-Fit Test
## 
##   Chi-squared: 22.446
##            df:  8    
##       p-value:  0.004

### Method 3: 
hltest(logit1, G = 10)

## 
##  Modified Hosmer-Lemeshow test for large samples
##  H0: epsilon<=epsilon0 vs. Ha: epsilon>epsilon0
##  In-sample goodness of fit.
## 
## data:  Model evaluated on 768 observations
## HL statistic = 22.446, dof = 8.0000000, lambda = 0.0057656, p-value =
## 0.004179
## alternative hypothesis: true epsilon is greater than 0.002739948
## 95 percent confidence interval:
##  0.07177937        Inf
## sample estimates:
## epsilonHat 
##  0.1371472

### Plot the predicted against the observed:
plotCalibration(data = data1, cOutcome = 9, predRisk = fitted(logit1), groups= 10)

## $Table_HLtest
##               total meanpred meanobs predicted observed
## [0.755,0.780)   135    0.762   0.119    102.81       16
## 0.780            38    0.780   0.184     29.63        7
## [0.791,0.814)    94    0.797   0.234     74.94       22
## [0.814,0.834)    65    0.819   0.246     53.21       16
## [0.834,0.852)    64    0.838   0.359     53.63       23
## [0.852,0.884)    78    0.864   0.487     67.36       38
## [0.884,0.915)    75    0.898   0.467     67.37       35
## [0.915,0.939)    78    0.926   0.513     72.20       40
## [0.939,0.964)    68    0.949   0.559     64.55       38
## [0.964,0.995]    73    0.977   0.452     71.29       33
## 
## $Chi_square
## [1] 2370.819
## 
## $df
## [1] 8
## 
## $p_value
## [1] 0

HL GOF Test – Adjusted model

# Method 1:
hl_gof <- hoslem.test(diabetes.data$group, fitted(logit2), g = 10) ### Use the unfactored data for the outcomes
hl_gof

## 
##  Hosmer and Lemeshow goodness of fit (GOF) test
## 
## data:  diabetes.data$group, fitted(logit2)
## X-squared = 20.848, df = 8, p-value = 0.007561

# Method 2:
performance_hosmer(logit2, n_bins = 10)

## # Hosmer-Lemeshow Goodness-of-Fit Test
## 
##   Chi-squared: 20.848
##            df:  8    
##       p-value:  0.008

### Method 3: 
hltest(logit2, G = 10)

## 
##  Modified Hosmer-Lemeshow test for large samples
##  H0: epsilon<=epsilon0 vs. Ha: epsilon>epsilon0
##  In-sample goodness of fit.
## 
## data:  Model evaluated on 768 observations
## HL statistic = 20.848, dof = 8.0000000, lambda = 0.0057656, p-value =
## 0.007603
## alternative hypothesis: true epsilon is greater than 0.002739948
## 95 percent confidence interval:
##  0.06237691        Inf
## sample estimates:
## epsilonHat 
##  0.1293432

### Plot the predicted against the observed:
plotCalibration(data = data1, cOutcome = 9, predRisk = fitted(logit2), groups= 10)

## $Table_HLtest
##               total meanpred meanobs predicted observed
## [0.436,0.736)    77    0.668   0.545     51.42       42
## [0.736,0.787)    77    0.766   0.247     58.95       19
## [0.787,0.813)    77    0.800   0.195     61.58       15
## [0.813,0.836)    77    0.824   0.182     63.44       14
## [0.836,0.860)    76    0.848   0.276     64.44       21
## [0.860,0.889)    77    0.876   0.403     67.48       31
## [0.889,0.917)    77    0.903   0.390     69.55       30
## [0.917,0.945)    77    0.932   0.377     71.74       29
## [0.945,0.968)    77    0.956   0.455     73.65       35
## [0.968,0.999]    76    0.984   0.421     74.77       32
## 
## $Chi_square
## [1] 3463.833
## 
## $df
## [1] 8
## 
## $p_value
## [1] 0

Modified HL GOF test for large samples

For large samples, there is a modified version of the Hosmer-Lemeshow test. To perform this test, we will need to install the largesamplehl package, which can be found here. (Note: I had trouble installing this using the install.packages() command. Instead, I used the devtools command: install_github("gnattino/largesamplehl") to install the largesamplehl package.)

### Crude model
hltest(logit1)

## 
##  Modified Hosmer-Lemeshow test for large samples
##  H0: epsilon<=epsilon0 vs. Ha: epsilon>epsilon0
##  In-sample goodness of fit.
## 
## data:  Model evaluated on 768 observations
## HL statistic = 22.446, dof = 8.0000000, lambda = 0.0057656, p-value =
## 0.004179
## alternative hypothesis: true epsilon is greater than 0.002739948
## 95 percent confidence interval:
##  0.07177937        Inf
## sample estimates:
## epsilonHat 
##  0.1371472

### Adjusted model
hltest(logit2)

## 
##  Modified Hosmer-Lemeshow test for large samples
##  H0: epsilon<=epsilon0 vs. Ha: epsilon>epsilon0
##  In-sample goodness of fit.
## 
## data:  Model evaluated on 768 observations
## HL statistic = 20.848, dof = 8.0000000, lambda = 0.0057656, p-value =
## 0.007603
## alternative hypothesis: true epsilon is greater than 0.002739948
## 95 percent confidence interval:
##  0.06237691        Inf
## sample estimates:
## epsilonHat 
##  0.1293432

Omnibus Goodness of Fit test

Currently, many users consider the Hosmer-Lemeshow GOF test to be obsolete. As a replacement, Hosmer and Lemeshow have developed the omnibus goodness of fit test. To perform this test, we need the rms package and use the residual_lrm function.

The results from the crude model:

library("rms")

logit1.res <- lrm(group ~ Age, data = data1, y = TRUE, x = TRUE)
residuals(logit1.res, type = "gof")

## Sum of squared errors     Expected value|H0                    SD 
##          8.965968e+01          9.089012e+01          2.857065e-01 
##                     Z                     P 
##         -4.306652e+00          1.657438e-05

Based on the omnibus GOF test for the crude model (p<0.00001), the specific model does not fit the data well.

The results from the adjusted model:

logit2.res <- lrm(group ~ Age + BMI + Glucose + SkinThickness, data = data1, y = TRUE, x = TRUE)
residuals(logit2.res, type = "gof")

## Sum of squared errors     Expected value|H0                    SD 
##          8.537840e+01          8.821907e+01          5.065439e-01 
##                     Z                     P 
##         -5.607947e+00          2.047404e-08

Based on the omnibus GOF test for the adjusted model (p<0.00001), the specific model does not fit the data well.

Motivating example #2

Let’s look at a logistic regression model where the model specification fits the data well. I’m using the cancer.csv data from Kleinbam and Klein’s Logistic Regression text. This is a study on male patients at Evans County who were followed for 7 years to see if they develop coronary heard disease. You can download the data here. The variables include:

patientid: unique identifier
chd: 0 = no chd, 1 = chd (outcome)
cat: 0 = normal catecholamine levels, 1 = high catecholamine levels
age: continuous variable in years
chl: continuous variable for cholesterol
smk: 0 = never smoked, 1 = smoked
ecg: 0 = no ecg abnormality, 1 = ecg abnormality
dbp: diastolic blood pressure
sbp: systolic blood pressure
hpt: 0 = no high blood pressure, 1 = high blood pressure
CATxHPT: interaction term between cat x hpt
CATxCHL: interaction term between cat x chl

data2 <- read.csv("evans.csv", header = TRUE)
head(data2)

##   X patientid chd cat age chl smk ecg dbp sbp hpt CATxHPT CATxCHL
## 1 1        31   0   0  43 159   1   0  74 128   0       0       0
## 2 2        51   1   1  56 201   1   1 112 164   1       1     201
## 3 3        71   0   1  64 179   1   0 100 200   1       1     179
## 4 4        74   0   0  49 243   1   0  82 145   0       0       0
## 5 5        91   0   0  46 252   1   0  88 142   0       0       0
## 6 6       111   1   0  52 179   1   1  80 128   0       0       0

The outcome is coronary heard disease chd and it is a dichotomous variable. We can construct a logistic regression model to see what factors (age, cholesterol, smoking status, catecholamin levels, and ECG abnormality) were significantly associated with coronary heart disease:

\(logit( E[CHD_i | X_i]) = logit(p_i) = ln(\frac{p_i}{1 - p_i}) = \beta_0 + \beta_1 (age)_{i} + \beta_2 (chl)_{i} + \beta_3 (smk)_{i} + \beta_4 (cat)_{i} + \beta_5 (ecg)_{i} + \epsilon\)

logit3 <- glm(chd ~ age + chl + smk + cat + ecg, data = data2, family = "binomial"(link = logit))
summary(logit3)

## 
## Call:
## glm(formula = chd ~ age + chl + smk + cat + ecg, family = binomial(link = logit), 
##     data = data2)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.1270  -0.5275  -0.4028  -0.2995   2.5166  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -6.765311   1.131393  -5.980 2.24e-09 ***
## age          0.032534   0.015143   2.148  0.03168 *  
## chl          0.009401   0.003233   2.908  0.00364 ** 
## smk          0.822366   0.304303   2.702  0.00688 ** 
## cat          0.775393   0.332986   2.329  0.01988 *  
## ecg          0.414322   0.292467   1.417  0.15659    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 438.31  on 607  degrees of freedom
## Residual deviance: 402.50  on 602  degrees of freedom
## AIC: 414.5
## 
## Number of Fisher Scoring iterations: 5

confint(logit3)

##                    2.5 %      97.5 %
## (Intercept) -9.030820529 -4.58458025
## age          0.002628888  0.06215569
## chl          0.003042452  0.01575518
## smk          0.246243987  1.44532854
## cat          0.118407908  1.42810972
## ecg         -0.168689141  0.98130992

### Exponentiate the coefficients
exp(coef(logit3))

## (Intercept)         age         chl         smk         cat         ecg 
## 0.001153089 1.033068529 1.009445464 2.275878222 2.171444262 1.513344220

exp(confint(logit3))

##                    2.5 %     97.5 %
## (Intercept) 0.0001196643 0.01020803
## age         1.0026323461 1.06412801
## chl         1.0030470852 1.01587995
## smk         1.2792116470 4.24324599
## cat         1.1257032016 4.17080775
## ecg         0.8447714673 2.66794875

Let’s check the goodness of fit of this model

HL GOF tests

The results of the HF GOF test indicate that the logistic regression is a good fit to the observed data (p=0.062). Additionally, the plot of the predicted and observed probabilities are not over or underestimating the probabilities.

### Method 1: 
hl_gof <- hoslem.test(data2$chd, fitted(logit3), g = 10)
hl_gof

## 
##  Hosmer and Lemeshow goodness of fit (GOF) test
## 
## data:  data2$chd, fitted(logit3)
## X-squared = 14.874, df = 8, p-value = 0.06165

### Method 2:
performance_hosmer(logit3, n_bins = 10)

## # Hosmer-Lemeshow Goodness-of-Fit Test
## 
##   Chi-squared: 14.874
##            df:  8    
##       p-value:  0.062

## Summary: model seems to fit well.

### Method 3:
hltest(logit3, G = 10)

## 
##  Modified Hosmer-Lemeshow test for large samples
##  H0: epsilon<=epsilon0 vs. Ha: epsilon>epsilon0
##  In-sample goodness of fit.
## 
## data:  Model evaluated on 608 observations
## HL statistic = 14.874, dof = 8.0000000, lambda = 0.0045644, p-value =
## 0.06182
## alternative hypothesis: true epsilon is greater than 0.002739948
## 95 percent confidence interval:
##    0 Inf
## sample estimates:
## epsilonHat 
##  0.1063262

### Plot the predicted against the observed:
plotCalibration(data = data2, cOutcome = 3, predRisk = fitted(logit3), groups= 10)

## $Table_HLtest
##                 total meanpred meanobs predicted observed
## [0.0153,0.0384)    61    0.030   0.000      1.84        0
## [0.0384,0.0551)    61    0.047   0.033      2.85        2
## [0.0551,0.0672)    61    0.062   0.082      3.77        5
## [0.0672,0.0764)    61    0.072   0.148      4.38        9
## [0.0764,0.0972)    60    0.086   0.117      5.15        7
## [0.0972,0.1105)    61    0.103   0.115      6.28        7
## [0.1105,0.1308)    61    0.120   0.131      7.33        8
## [0.1308,0.1608)    61    0.145   0.066      8.86        4
## [0.1608,0.2275)    61    0.190   0.115     11.60        7
## [0.2275,0.4820]    60    0.316   0.367     18.95       22
## 
## $Chi_square
## [1] 14.817
## 
## $df
## [1] 8
## 
## $p_value
## [1] 0.0628

Omnibus GOF test

The results of the Omnibus GOF test also indicate that the logistic regression model is a good fit to the observed data (p=0.369).

logit2.res <- lrm(chd ~ age + chl + smk + cat + ecg, data = data2, y = TRUE, x = TRUE)
residuals(logit2.res, type = "gof")

## Sum of squared errors     Expected value|H0                    SD 
##            58.0457420            58.4394289             0.4378980 
##                     Z                     P 
##            -0.8990381             0.3686324

Conclusions

Logistic regression models are useful when you have an outcome variable that is dichotomous. However, the model may not fit the data very well. Using goodness of fit tests will help determine whether the logistic regression model is the correct specification for your data. We demonstrated using a logistic regression model is poorly specified in the diabetes.csv data. We also demonstrated who the logistic regression is correctly specified in the evans.csv data. You should not always assume that the logistic regression model is the best specification for your data. However, when the GOF results indicate that the model is a poor fit, you may want to consider the covariates in the model or a different specification (e.g., probit model).

Acknowledgements

The text that I used to understand the different types of tests used to assess the quality of the logistic regression model is Kleinbaum DG and Klein M. Logistic Regression: A Self-Learning Text; the book’s website contains useful information that compliments the text.

Equally helpful is a website by Jonathan Bartlett called The Stats Geek where he succinctly describes and provides codes for performing the Hosmer-Lemeshow GOF test in R.

I learned about the modified Hosmer-Lemeshow GOF test when I reviewed the largesamplehl package, which can be found here.

Several StackExchange threads were helpful in writing this tutorial: link1, link2, and link3.

Contact

These tutorials are a work in progress, so they may contain errors. I highly recommend you seek out statistical consult if you plan on running any analysis meant for publication. Any comments and suggestions can be sent to: internal.validity.blog@gmail.com