Using Multiple Linear Regression on US Judges’ Ratings

Multiple linear regression is used to show the relationship between a dependent variable and multiple (two or more) independent variables. We will be using this method of analysis on a dataset from R called USJudgeRatings, which contains lawyers’ ratings of state judges in the US Superior Court in 1977.

Formula

A multiple linear regression model follows the formula:

y = b0 + b1x1 + b2x2… + b11*x11

where b0 is the intercept, and b1 to b11 are the coefficients of the variables x to x11.

Exploring the dataset

Firstly, load the dplyr and moderndive packages

library(dplyr)
library(moderndive)

Let’s use glimpse() to look inside the dataset:

glimpse(USJudgeRatings)

## Observations: 43
## Variables: 12
## $ CONT <dbl> 5.7, 6.8, 7.2, 6.8, 7.3, 6.2, 10.6, 7.0, 7.3, 8.2, 7.0, 6...
## $ INTG <dbl> 7.9, 8.9, 8.1, 8.8, 6.4, 8.8, 9.0, 5.9, 8.9, 7.9, 8.0, 8....
## $ DMNR <dbl> 7.7, 8.8, 7.8, 8.5, 4.3, 8.7, 8.9, 4.9, 8.9, 6.7, 7.6, 7....
## $ DILG <dbl> 7.3, 8.5, 7.8, 8.8, 6.5, 8.5, 8.7, 5.1, 8.7, 8.1, 7.4, 7....
## $ CFMG <dbl> 7.1, 7.8, 7.5, 8.3, 6.0, 7.9, 8.5, 5.4, 8.6, 7.9, 7.3, 7....
## $ DECI <dbl> 7.4, 8.1, 7.6, 8.5, 6.2, 8.0, 8.5, 5.9, 8.5, 8.0, 7.5, 7....
## $ PREP <dbl> 7.1, 8.0, 7.5, 8.7, 5.7, 8.1, 8.5, 4.8, 8.4, 7.9, 7.1, 6....
## $ FAMI <dbl> 7.1, 8.0, 7.5, 8.7, 5.7, 8.0, 8.5, 5.1, 8.4, 8.1, 7.2, 7....
## $ ORAL <dbl> 7.1, 7.8, 7.3, 8.4, 5.1, 8.0, 8.6, 4.7, 8.4, 7.7, 7.1, 7....
## $ WRIT <dbl> 7.0, 7.9, 7.4, 8.5, 5.3, 8.0, 8.4, 4.9, 8.5, 7.8, 7.2, 7....
## $ PHYS <dbl> 8.3, 8.5, 7.9, 8.8, 5.5, 8.6, 9.1, 6.8, 8.8, 8.5, 8.4, 6....
## $ RTEN <dbl> 7.8, 8.7, 7.8, 8.7, 4.8, 8.6, 9.0, 5.0, 8.8, 7.9, 7.7, 7....

We can see that the dataset contains 43 observations on the 12 numeric variables, which are:

CONT Number of contacts of lawyer with judge.
INTG Judicial integrity.
DMNR Demeanor.
DILG Diligence.
CFMG Case flow managing.
DECI Prompt decisions.
PREP Preparation for trial.
FAMI Familiarity with law.
ORAL Sound oral rulings.
WRIT Sound written rulings.
PHYS Physical ability.
RTEN Worthy of retention.

Analysis

We want to see if there is any correlation between any of the first 11 variables and how much lawyers believe a particular judge is worthy of staying on in the US Superior Court. To do this, we will need to fit a multiple regression model with the lm() function, using RTEN as the dependent variable.

characteristics <- lm(RTEN ~ CONT + INTG + DMNR + DILG + CFMG + DECI + PREP + FAMI + ORAL + WRIT + PHYS, data = USJudgeRatings)

Next, we can get a regression table using the get_regression_table() function.

get_regression_table(characteristics)

## # A tibble: 12 x 7
##    term      estimate std_error statistic p_value lower_ci upper_ci
##    <chr>        <dbl>     <dbl>     <dbl>   <dbl>    <dbl>    <dbl>
##  1 intercept   -2.12      0.519    -4.08    0       -3.18    -1.06 
##  2 CONT         0.013     0.026     0.495   0.624   -0.04     0.066
##  3 INTG         0.365     0.129     2.82    0.008    0.101    0.629
##  4 DMNR         0.125     0.09      1.40    0.172   -0.058    0.308
##  5 DILG         0.067     0.143     0.466   0.644   -0.225    0.358
##  6 CFMG        -0.195     0.148    -1.32    0.198   -0.496    0.107
##  7 DECI         0.278     0.138     2.01    0.053   -0.004    0.56 
##  8 PREP        -0.002     0.24     -0.008   0.994   -0.491    0.488
##  9 FAMI        -0.136     0.267    -0.508   0.615   -0.681    0.409
## 10 ORAL         0.548     0.277     1.98    0.057   -0.018    1.11 
## 11 WRIT        -0.068     0.315    -0.216   0.83    -0.71     0.574
## 12 PHYS         0.269     0.062     4.33    0        0.142    0.396

The table tells us the intercept estimate which has no practical meaning as a negative number. However, it tells us where the RTEN score crosses the x axis - that is, when lawyers’ ratings for a judge’s retention is 0.

We can also see the coefficients for each of the variables, and their p-values. We will be using the conventional p < 0.05 as the threshold test for the statistical significance of variables. The smaller the p-value, the more statistically significant the variable will be.

Refining the regression model

To refine our model, let’s remove the biggest p-value which is PREP (preparation for trial) at 0.994, and run our model again to see how the remaining variables relate to RTEN:

characteristics_1 <- lm(RTEN ~ CONT + INTG + DMNR + DILG + CFMG + DECI + FAMI + ORAL + WRIT + PHYS, data = USJudgeRatings) 

get_regression_table(characteristics_1)

## # A tibble: 11 x 7
##    term      estimate std_error statistic p_value lower_ci upper_ci
##    <chr>        <dbl>     <dbl>     <dbl>   <dbl>    <dbl>    <dbl>
##  1 intercept   -2.12      0.51     -4.15    0       -3.16    -1.08 
##  2 CONT         0.013     0.025     0.509   0.615   -0.038    0.064
##  3 INTG         0.365     0.124     2.95    0.006    0.113    0.617
##  4 DMNR         0.125     0.085     1.47    0.152   -0.049    0.299
##  5 DILG         0.066     0.114     0.581   0.566   -0.166    0.298
##  6 CFMG        -0.195     0.144    -1.36    0.185   -0.487    0.098
##  7 DECI         0.278     0.135     2.07    0.047    0.004    0.553
##  8 FAMI        -0.137     0.213    -0.643   0.525   -0.571    0.297
##  9 ORAL         0.547     0.268     2.04    0.049    0.002    1.09 
## 10 WRIT        -0.068     0.306    -0.221   0.827   -0.691    0.556
## 11 PHYS         0.269     0.06      4.45    0        0.146    0.392

```

Note the p-values change when we remove a variable due to changes in relationship between the remaining variables. Unfortunately, WRIT at 0.827 is the next largest p-value. You would think sound written rulings would be critical for a judge’s retention rating! Let’s remove it from our model, and run it again.

characteristics_2 <- lm(RTEN ~ CONT + INTG + DMNR + DILG + CFMG + DECI + FAMI + ORAL + PHYS, data = USJudgeRatings) 

get_regression_table(characteristics_2)

## # A tibble: 10 x 7
##    term      estimate std_error statistic p_value lower_ci upper_ci
##    <chr>        <dbl>     <dbl>     <dbl>   <dbl>    <dbl>    <dbl>
##  1 intercept   -2.13      0.502    -4.24    0       -3.15    -1.10 
##  2 CONT         0.014     0.024     0.594   0.556   -0.034    0.063
##  3 INTG         0.361     0.121     3.00    0.005    0.116    0.606
##  4 DMNR         0.123     0.083     1.47    0.15    -0.047    0.292
##  5 DILG         0.071     0.11      0.641   0.526   -0.153    0.295
##  6 CFMG        -0.198     0.141    -1.40    0.17    -0.485    0.089
##  7 DECI         0.272     0.13      2.10    0.044    0.008    0.536
##  8 FAMI        -0.168     0.158    -1.06    0.295   -0.489    0.153
##  9 ORAL         0.516     0.225     2.30    0.028    0.059    0.974
## 10 PHYS         0.274     0.055     5.01    0        0.163    0.386

It looks like CONT or the number of contacts a lawyer has with the judge seems the next largest p-value at 0.556. Let’s see what happens when it’s gone.

characteristics_3 <- lm(RTEN ~ INTG + DMNR + DILG + CFMG + DECI + FAMI + ORAL + PHYS, data = USJudgeRatings) 

get_regression_table(characteristics_3)

## # A tibble: 9 x 7
##   term      estimate std_error statistic p_value lower_ci upper_ci
##   <chr>        <dbl>     <dbl>     <dbl>   <dbl>    <dbl>    <dbl>
## 1 intercept   -2.04      0.478    -4.28    0       -3.02    -1.07 
## 2 INTG         0.372     0.118     3.16    0.003    0.132    0.612
## 3 DMNR         0.106     0.078     1.36    0.182   -0.052    0.263
## 4 DILG         0.056     0.106     0.524   0.604   -0.16     0.271
## 5 CFMG        -0.165     0.129    -1.28    0.208   -0.426    0.096
## 6 DECI         0.267     0.128     2.08    0.045    0.006    0.527
## 7 FAMI        -0.191     0.152    -1.25    0.218   -0.499    0.118
## 8 ORAL         0.541     0.219     2.47    0.019    0.096    0.986
## 9 PHYS         0.27      0.054     5.02    0        0.161    0.379

Then let’s sadly remove diligence DILG which has a p-value of 0.604.

characteristics_4 <- lm(RTEN ~ INTG + DMNR + CFMG + DECI + FAMI + ORAL + PHYS, data = USJudgeRatings) 

get_regression_table(characteristics_4)

## # A tibble: 8 x 7
##   term      estimate std_error statistic p_value lower_ci upper_ci
##   <chr>        <dbl>     <dbl>     <dbl>   <dbl>    <dbl>    <dbl>
## 1 intercept   -2.12      0.451     -4.71   0       -3.04    -1.21 
## 2 INTG         0.398     0.106      3.74   0.001    0.182    0.613
## 3 DMNR         0.099     0.076      1.31   0.199   -0.055    0.253
## 4 CFMG        -0.137     0.116     -1.18   0.245   -0.372    0.098
## 5 DECI         0.283     0.123      2.30   0.028    0.033    0.532
## 6 FAMI        -0.168     0.144     -1.17   0.252   -0.461    0.125
## 7 ORAL         0.526     0.215      2.45   0.02     0.09     0.963
## 8 PHYS         0.264     0.052      5.08   0        0.158    0.369

And then remove familiarity with the law FAMI at 0.252.

characteristics_5 <- lm(RTEN ~ INTG + DMNR + CFMG + DECI + ORAL + PHYS, data = USJudgeRatings) 

get_regression_table(characteristics_5)

## # A tibble: 7 x 7
##   term      estimate std_error statistic p_value lower_ci upper_ci
##   <chr>        <dbl>     <dbl>     <dbl>   <dbl>    <dbl>    <dbl>
## 1 intercept   -2.25      0.44     -5.11    0       -3.14    -1.35 
## 2 INTG         0.384     0.106     3.62    0.001    0.168    0.599
## 3 DMNR         0.147     0.064     2.3     0.027    0.017    0.277
## 4 CFMG        -0.1       0.112    -0.893   0.378   -0.326    0.127
## 5 DECI         0.249     0.12      2.07    0.045    0.005    0.492
## 6 ORAL         0.306     0.103     2.96    0.005    0.096    0.516
## 7 PHYS         0.289     0.047     6.10    0        0.193    0.385

Next to go is case file management CFMG at 0.378.

characteristics_6 <- lm(RTEN ~ INTG + DMNR + DECI + ORAL + PHYS, data = USJudgeRatings) 

get_regression_table(characteristics_6)

## # A tibble: 6 x 7
##   term      estimate std_error statistic p_value lower_ci upper_ci
##   <chr>        <dbl>     <dbl>     <dbl>   <dbl>    <dbl>    <dbl>
## 1 intercept   -2.20      0.436     -5.06   0       -3.09    -1.32 
## 2 INTG         0.378     0.106      3.58   0.001    0.164    0.592
## 3 DMNR         0.152     0.064      2.39   0.022    0.023    0.281
## 4 DECI         0.167     0.077      2.16   0.037    0.011    0.323
## 5 ORAL         0.292     0.102      2.86   0.007    0.085    0.498
## 6 PHYS         0.283     0.047      6.05   0        0.188    0.378

Finally, we are left with variables where p < 0.05. And the order of the variables seems pretty odd: the strongest positive correlation between lawyers’ scores on whether judges should stay is related to 1) a judge’s physical strength (which seems perfectly correlated), then 2) their integrity, 3) sound oral rulings, 4) demeanour, and finally 5) prompt decisions. But how do we know this model is actually any good? How well does it predict lawyers’ opinions on whether US Judges should stay?

Ensuring our model is robust

The sum of square residuals measures the lack of fit of a model to a set of points. However, the most appropriate test for a regression with multiple variables is the adjusted R-squared test, using summary.lm().

summary.lm(characteristics_6)

## 
## Call:
## lm(formula = RTEN ~ INTG + DMNR + DECI + ORAL + PHYS, data = USJudgeRatings)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.240656 -0.069026 -0.009474  0.068961  0.246402 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -2.20433    0.43611  -5.055 1.19e-05 ***
## INTG         0.37785    0.10559   3.579 0.000986 ***
## DMNR         0.15199    0.06354   2.392 0.021957 *  
## DECI         0.16672    0.07702   2.165 0.036928 *  
## ORAL         0.29169    0.10191   2.862 0.006887 ** 
## PHYS         0.28292    0.04678   6.048 5.40e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1119 on 37 degrees of freedom
## Multiple R-squared:  0.9909, Adjusted R-squared:  0.9897 
## F-statistic: 806.1 on 5 and 37 DF,  p-value: < 2.2e-16

As you can see, the adjusted R-squared of 0.9897 is very close to one which is the ideal fit. Let’s compare this to if we hadn’t removed the last variable.

summary.lm(characteristics_5)

## 
## Call:
## lm(formula = RTEN ~ INTG + DMNR + CFMG + DECI + ORAL + PHYS, 
##     data = USJudgeRatings)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.24640 -0.06378 -0.00994  0.06123  0.25233 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -2.24575    0.43976  -5.107 1.08e-05 ***
## INTG         0.38363    0.10607   3.617 0.000907 ***
## DMNR         0.14710    0.06395   2.300 0.027345 *  
## CFMG        -0.09977    0.11173  -0.893 0.377826    
## DECI         0.24884    0.12009   2.072 0.045475 *  
## ORAL         0.30632    0.10350   2.960 0.005419 ** 
## PHYS         0.28924    0.04744   6.097 5.15e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1122 on 36 degrees of freedom
## Multiple R-squared:  0.9911, Adjusted R-squared:  0.9896 
## F-statistic: 668.2 on 6 and 36 DF,  p-value: < 2.2e-16

The adjusted R-squared is slightly smaller at 0.9896, meaning we were on the right track… Let’s check what would have happened to the adjusted R-squared if we continued to remove the next largest p-value variable DECI or prompt decisions (despite being under the p< 0.05 threshold).

characteristics_7 <- lm(RTEN ~ INTG + DMNR + ORAL + PHYS, data = USJudgeRatings) 

summary.lm(characteristics_7)

## 
## Call:
## lm(formula = RTEN ~ INTG + DMNR + ORAL + PHYS, data = USJudgeRatings)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.24455 -0.06517 -0.01191  0.05901  0.24095 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.73086    0.39518  -4.380 9.01e-05 ***
## INTG         0.33169    0.10831   3.062  0.00402 ** 
## DMNR         0.13983    0.06629   2.109  0.04157 *  
## ORAL         0.45847    0.06987   6.562 9.70e-08 ***
## PHYS         0.28710    0.04895   5.865 8.73e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1172 on 38 degrees of freedom
## Multiple R-squared:  0.9898, Adjusted R-squared:  0.9887 
## F-statistic: 917.4 on 4 and 38 DF,  p-value: < 2.2e-16

Conclusion

We can see the p-value drops back down to 0.9887, suggesting our 5 variable model is the best predictor of a retention score for US judges.

Using Multiple Linear Regression on US Judges’ Ratings

Maria Chan

17/08/2018