Problem Set 2 Answers

Exploring the behavior of immigration judges

Seth J. Chandler

August 28, 2014

Introduction

This problem introduces students to some basic R functionality and, as a bonus, explores the interesting differences in behavior among various immigration judges in the United States. As preparation for the assignment, students were asked to suck in a web-based csv file on the behavior of immigration judges in the United States, make some minor modifications in Google Spreadsheets, and save the document as a csv file in a place they could access it. For those who want to explore the materials discussed in this report without going through the Google Spreadsheets process, you can get the material here.1

Importing the Data

Our first task is to load in data you previously created in Google Spreadsheets and downloaded to your hard drive as a CSV file. To bring this data into R, the appropriate command is read.csv. I print out the “head” of the resulting data.frame.2

csvfile<-"~/Dropbox/Courses/Analytic Methods/Problem Sets/Immigration Judge Data Augmented R.csv"
Immigration.Judge.Data<-read.csv(csvfile)
head(Immigration.Judge.Data)
##   X     Court              Judge Decisions Percent.Grants Percent.Denials
## 1 1  Adelanto        Lee, Amy T.       120            7.5            92.5
## 2 2  Adelanto    Burke, David H.       197           24.4            75.6
## 3 3  Adelanto  Laurent, Scott D.       147           27.2            72.8
## 4 4 Arlington  Harris, Rodger C.       131           14.5            85.5
## 5 5 Arlington Crosland, David W.       212           73.1            26.9
## 6 6 Arlington    Iskra, Wayne R.       729           73.5            26.5
##   Grants Denials
## 1   9.00  111.00
## 2  48.07  148.93
## 3  39.98  107.02
## 4  19.00  112.00
## 5 154.97   57.03
## 6 535.82  193.19

The Houston Judges

Creating a Houston data.frame

Using as much elegance as you can, have R compute the aggregate grant and denial rates for Immigration Judges in Houston. (Assume Houston-detained is a different location than Houston).

I am asking you here to use the filtering capabilities of R. The first thing I want to do is create a mini-spreadsheet that only contains the Houston rows and only contains the Decisions, Grants and Denials columns.

print(houston<-
        Immigration.Judge.Data[Immigration.Judge.Data$Court=="Houston",
                               c("Decisions","Grants","Denials")]
      )
##    Decisions Grants Denials
## 82       111  13.99   97.01
## 83       123  29.03   93.97
## 84       204  57.94  146.06
## 85       188  53.96  134.04
## 86       275  86.08  188.93
## 87       217  92.01  124.99

Using lapply to round columns

Notice that because I was (deliberately) sloppy when I created my Google Spreadsheet, the grants and denials are not integers. Obviously, however, there are no fractions of judicial grants and denials of asylum. The fractions shown here are an artifact of multiplying previously rounded numbers together in Google Spreadsheets.3 So, I think it would be a good idea to use R to round the “Grants” and “Denials.” The command that would be perhaps most helpful is lapply. The idea of lapply is to apply a function to each element of a list. Remember, that a data.frame in R is basically a list of columns. To apply the rounding just to the columns that need it, I use c(“Grants”,“Denials”) on the left hand side of the assignment and the same filtering construct on the right hand side to apply the round function to just the same two columns.

houston[,c("Grants","Denials")]<-lapply(houston[,c("Grants","Denials")],round)
houston
##    Decisions Grants Denials
## 82       111     14      97
## 83       123     29      94
## 84       204     58     146
## 85       188     54     134
## 86       275     86     189
## 87       217     92     125

Summing the columns

If this were a spreadsheet, you would probably take the sum of each column and then do some division. We can do the same thing in R. The command that would be perhaps most helpful is lapply. The idea of lapply is to apply a function to each element of a list. Remember, that a data.frame in R is basically a list of columns.

print(houston.rates<-lapply(houston,sum))
## $Decisions
## [1] 1118
## 
## $Grants
## [1] 333
## 
## $Denials
## [1] 785

To get the aggregate grant and denial rates, we have a simple division problem. I use c to combine the numerators into a little vector (c(houston.rates\(Grants,houston.rates\)Denials)) and then divide that vector by houston.rates$Decisions. I round the resulting quotient to three decimal places.

round(c(houston.rates$Grants,houston.rates$Denials)/houston.rates$Decisions,3)
## [1] 0.298 0.702

The Judges with highest grant and denial rates

Using as much elegance as you can, have R find the Immigration Judge with at least 200 decisions who had the highest grant rate. (Notice anything about these judges?) Have R find the Immigration Judge with the highest denial rate. (Notice anything about the top three judges?)

The 200 club

The first thing I want to do is find the judges with at least 200 decisions. There are a lot of them so I will just print out the first couple using head.

head(judges.200plus<-Immigration.Judge.Data[Immigration.Judge.Data$Decisions>=200,])
##     X     Court               Judge Decisions Percent.Grants
## 5   5 Arlington  Crosland, David W.       212           73.1
## 6   6 Arlington     Iskra, Wayne R.       729           73.5
## 7   7 Arlington Burman, Lawrence O.       654           74.0
## 8   8 Arlington     Snow, Thomas G.       825           75.6
## 10 10 Arlington   Bryant, John Milo       538           81.4
## 11 11 Arlington    Schmidt, Paul W.       446           83.2
##    Percent.Denials Grants Denials
## 5             26.9  155.0   57.03
## 6             26.5  535.8  193.19
## 7             26.0  484.0  170.04
## 8             24.4  623.7  201.30
## 10            18.6  437.9  100.07
## 11            16.8  371.1   74.93

Sorting

I now want to sort the judges high to low by grant rates. To do this, I use the R order command. The idea of order is that it takes some data, here the grant rate, and returns a list of integers. The integers represent positions in the original data. If one picked those positions in order, one would obtain a sorted list.

  1. Note that I stick a minus sign in front of the data. This tells order that I want the list sorted from highest to lowest and not lowest to higher.

  2. Note also that I use a second argument to head so that I can see the positions of the top 20 judges. There seems to be something “funny” going on in that there appears to be a cluster of judges numbered 140 to about 123 who have very high grant rates. This is an artifact of the fact that the immigration judge data was previously sorted by geographic region and that geography apparently makes a big difference in grant rates.

head(judges.200plus.orderingByGrants<-order(-judges.200plus$Percent.Grants),n=20)
##  [1] 140 139 138 137 136 135 133 134 132 131 129 130 128   6 127   5 126
## [18] 124 125 123

Naming names

Who are these judges? Let’s look at the top 20.

judges.200plus[judges.200plus.orderingByGrants[1:20],
               c("Court","Judge","Decisions","Percent.Grants")]
##         Court                       Judge Decisions Percent.Grants
## 196  New York          Lamb, Elizabeth A.      1346           96.1
## 195  New York              Bain, Terry A.      1775           93.9
## 194  New York            Brennan, Noel A.      2029           91.7
## 193  New York         Bukszpan, Joanna M.      1039           91.0
## 192  New York           McManus, Margaret      1831           89.1
## 191  New York      Loprest, F. James, Jr.       807           88.4
## 189  New York          Laforest, Brigitte      1807           86.5
## 190  New York         Mulligan, Thomas J.      1426           86.5
## 188  New York Gordon-Uruakpa, Vivienne E.      1546           86.2
## 187  New York           Morace, Philip L.      2322           85.9
## 185  New York             Chew, George T.      1860           84.9
## 186  New York            Sichel, Helen J.      1633           84.9
## 184  New York       Schoppert, Douglas B.      1490           84.6
## 11  Arlington            Schmidt, Paul W.       446           83.2
## 183  New York       Van Wyke, William Van      1309           82.9
## 10  Arlington           Bryant, John Milo       538           81.4
## 182  New York                Segal, Alice       886           81.0
## 180  New York           Weisel, Robert D.       989           80.8
## 181  New York              Zagzoug, Randa       750           80.8
## 179  New York          Rohan, Patricia A.      1558           79.2

Hmm. It appears that the top granters are almost all in New York, with a few in Arlington, Virginia. This is an interesting finding in and of itself. Now, it could be that the facts of the cases in New York are materially different from those elsewhere, but the data is certainly worthy of further investigation.

And what about the judges with the highest denial rates? Is there any pattern. Let’s whip up some R code to find out.

judges.200plus[order(-judges.200plus$Percent.Denials)[1:20],
               c("Court","Judge","Decisions","Percent.Grants")]
##             Court                      Judge Decisions Percent.Grants
## 160 Miami - Krome             Opaciuch, John       200            0.5
## 161 Miami - Krome             Opaciuch, Adam       329            1.5
## 162 Miami - Krome       Hurewitz, Kenneth S.       305            2.3
## 241 San Francisco          Murry, Anthony S.       410            2.7
## 98    Los Angeles         Munoz, Lorraine J.       643            6.2
## 12        Atlanta           Wilson, Earle B.       335            6.3
## 242 San Francisco      Yamaguchi, Michael J.       244            6.6
## 73       Florence           Taylor, Bruce A.       251            7.6
## 66        El Paso         Abbott, William L.       272            7.7
## 163 Miami - Krome          Slavin, Denise N.       219            7.8
## 164 Miami - Krome               Ford, Rex J.       291            7.9
## 226   San Antonio        Burkholder, Gary D.       287            8.4
## 23    Bloomington Nickerson, William J., Jr.       279           10.0
## 200        Newark   Reichenberg, Margaret R.       359           10.6
## 207         Omaha          Morris, Daniel A.       442           10.6
## 48      Cleveland     Evans, D. William, Jr.       594           12.6
## 99    Los Angeles            Riley, Kevin W.       444           12.6
## 37      Charlotte        Pettinato, Barry J.       315           12.7
## 91       Imperial            Staton, Jack W.       305           14.1
## 94      Las Vegas          Romig, Jeffrey L.       308           15.6

Clearly, if you’re seeking asylum in the United States, Miami should not be your first destination. Again, conceivably this is because the facts of cases in Miami differ systematially from those brought elsewhere in the United States, but the high rate of denials seems rather curious.

Testing a theory

I have a theory that the rate of denials is related to the number of cases the judge has decided. The more cases, the higher the rate of denials. See if you can use R to figure out if the data supports this theory.

There are many ways we can explore this hypothesis. And, indeed, a good chunk of the course in Analytic Methods is devoted to this process. Let’s take a look here, however, at some simple methods.

Correlation

One simple approach would be to see if there is any correlation between the decisions and denials. The key function to accomplish this in R is cor. This function cor computes something known as the Pearson correlation coefficient. It’s 1 if the data is perfectly correlated, zero if the data is completely uncorrelated and -1 if the data is perfectly inversely correlated. You can explore Pearson (and other) measures of correlation by going to this website.

cor(Immigration.Judge.Data[c("Decisions","Percent.Denials")])
##                 Decisions Percent.Denials
## Decisions          1.0000         -0.5296
## Percent.Denials   -0.5296          1.0000

What we see appears to be kind of the opposite of what I predicted. There appears to be a negative correlation between numbers of decisions and denial rates. Maybe judges, instead of getting harsher as they gain more experience, get more soft hearted. Or maybe there are other factors at work.

Visualization using pairs

We can visualize the correlation using the pairs function in R. Hopefully, the output is clear enough to give you a feel for what is going on.

pairs(Immigration.Judge.Data[c("Decisions","Percent.Denials")])

plot of chunk unnamed-chunk-11 ##Linear regression We can do a linear regression of the rate of denials on the number of decisions.

summary(lm(data=Immigration.Judge.Data,Percent.Denials~Decisions))
## 
## Call:
## lm(formula = Percent.Denials ~ Decisions, data = Immigration.Judge.Data)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -51.36 -14.20  -0.52  17.08  43.12 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 71.93358    1.81362    39.7   <2e-16 ***
## Decisions   -0.03305    0.00323   -10.2   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 19.9 on 268 degrees of freedom
## Multiple R-squared:  0.28,   Adjusted R-squared:  0.278 
## F-statistic:  104 on 1 and 268 DF,  p-value: <2e-16

I know we haven’t studied linear regression yet, but if you look at the coefficients section of the output and look under the “Estimate” for “Decisions,” you will see that it is negative. This means that Decisions appears to have a negative effect on the rate of denials, the opposite of what I thought would be the case. We can also see that R has placed three asterisks in the rightmost column of the coefficients output. Without getting into details here, this means that the result is statistically significant with a great deal of confidence. We can also look, by the way, at the “Adjusted R-squared.” It shows a value of 0.278. This means, very roughly, that one can account for 27.8% of the variation in the denial rate just by looking at the number of decisions. ##Logit and Probit regressions But a linear regression isn’t really proper where the value we are trying to predict is a percentage. So we can try logit and probit forms of regression, which are more appropriate. To do this, however, I first have to make sure that the percent denials are decimal fractions rather than percentage points. I can then use R’s glm function to run these more advanced forms of regression.

Immigration.Judge.Data$Percent.Denials<-Immigration.Judge.Data$Percent.Denials/100.
print(summary(glm(data=Immigration.Judge.Data,Percent.Denials~Decisions,
                  family=binomial(link="logit"))))
## Warning: non-integer #successes in a binomial glm!
## 
## Call:
## glm(formula = Percent.Denials ~ Decisions, family = binomial(link = "logit"), 
##     data = Immigration.Judge.Data)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.0713  -0.2855  -0.0262   0.3479   0.9681  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  0.952721   0.203947    4.67    3e-06 ***
## Decisions   -0.001510   0.000403   -3.75  0.00018 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 67.790  on 269  degrees of freedom
## Residual deviance: 50.206  on 268  degrees of freedom
## AIC: 324.5
## 
## Number of Fisher Scoring iterations: 4
print(summary(glm(data=Immigration.Judge.Data,Percent.Denials~Decisions,
                  family=binomial(link="probit"))))
## Warning: non-integer #successes in a binomial glm!
## 
## Call:
## glm(formula = Percent.Denials ~ Decisions, family = binomial(link = "probit"), 
##     data = Immigration.Judge.Data)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.0693  -0.2847  -0.0209   0.3463   0.9633  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  0.586240   0.122410    4.79  1.7e-06 ***
## Decisions   -0.000919   0.000234   -3.92  8.7e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 67.79  on 269  degrees of freedom
## Residual deviance: 50.24  on 268  degrees of freedom
## AIC: 324.6
## 
## Number of Fisher Scoring iterations: 4

If you look at the output from these these regressions, you will see roughly the same story as we saw with linear regression: the number of decisions is in fact negatively correlated with denial rate.

Conclusion

It certainly appears that geograpy plays a significant role in the probability that one will receive political asylum from an immigration judge. But in order to assert this proposition with greater confidence, we’d need to look at many more variables, such as the distribution of the nation from which the asylum seekers are coming.


  1. The underlying data comes from Syracuse University and their trac project. You can find the material from which I originally extracted the data here.

  2. In Mathematica, if you just want to see a short form of the expression, you use Short.

  3. Obviously, the fix should have been done in Google Spreadsheets itself using its Round command. My failure to do so gives me the opportunity of showing how it can be done using lapply in R.