Processing math: 100%

We all contributed equally to produce this assignment.

In this assignment we have investigated the data from New York City’s Stop, Question and Frisk program from 2003 until 2013. Firstly, we have made an exploratory analysis of the data set by finding correlations and distribution of the variables, whereafter we have considered missing data. Our aim has been to determine how the appearance and setting influence the risk of force used against young civilians under the age of 21. In order to do so, we compare different models by various methods including residual plots, training errors and cross-validations. Lastly we interpret our model’s predictions on biased behaviour from the police, but ultimately conclude in a general discussion of the model diagnostics and data collection that the model does not necessarily reflect the reality.

Load and mutate data

To begin our exploratory analysis, we start by loading our data set and mutating it to make sure we can investigate all variables collectively and compare them. The following shows a summary of the data.

factor_columns <- c("force", "race2","gender", "daytime", "inout2", "ac_incid", "ac_time", "offunif2", "typeofid2", "othpers2", "cs_objcs", "cs_descr", "cs_casng", "cs_lkout", "cs_cloth", "cs_drgtr", "cs_furtv", "cs_vcrim", "cs_bulge", "cs_other")

yes_no_columns <- c("ac_incid", "ac_time", "cs_objcs", "cs_descr", "cs_casng", "cs_lkout", "cs_cloth", "cs_drgtr", "cs_furtv", "cs_vcrim", "cs_bulge", "cs_other")


data <- d2 %>%
  select(-c("force2", "pct")) %>%
  mutate(across(all_of(yes_no_columns), ~ifelse(. == "N", 0, 1)))%>%
  mutate(race2 = case_when( 
    race2 == "white"    ~ 0,
    race2 == "black"    ~ 1,
    race2 == "hisp"     ~ 2,
    race2 == "asian"    ~ 3,
    race2 == "other"    ~ 4
  )) %>%
  mutate(typeofid2  = case_when(
    typeofid2 == "O"    ~ 0,
    typeofid2 == "P"    ~ 1,
    typeofid2 == "R"    ~ 2,
    typeofid2 == "V"    ~ 3,
  )) %>%
  mutate(across(all_of(factor_columns), as.factor)) %>%
  mutate(year = as.numeric(year))

summary(data)
##      force          race2          gender             age2       daytime       
##  0      :3910239   0   : 492688   0   :4495436   Min.   :10      0   :2472074  
##  1      : 736051   1   :2886843   1   : 348159   1st Qu.:19      1   :1871287  
##  3      : 156732   2   :1215401   NA's: 140796   Median :24      NA's: 641030  
##  2      : 122898   3   : 152053                  Mean   :28                    
##  5      :  31590   4   : 235105                  3rd Qu.:34                    
##  6      :  17769   NA's:   2301                  Max.   :90                    
##  (Other):   9112                                 NA's   :41534                 
##   inout2        ac_incid    ac_time     offunif2       typeofid2     
##  0   :3809789   0:2204888   0:3158281   0   :1396968   0   :  84217  
##  1   :1147150   1:2779503   1:1826110   1   :3586909   1   :2640092  
##  NA's:  27452                           NA's:    514   2   : 107497  
##                                                        3   :2133368  
##                                                        NA's:  19217  
##                                                                      
##                                                                      
##  othpers2       cs_objcs    cs_descr    cs_casng    cs_lkout    cs_cloth   
##  0   :3806808   0:4850690   0:4116587   0:3562617   0:4151151   0:4773826  
##  1   :1163298   1: 133701   1: 867804   1:1421774   1: 833240   1: 210565  
##  NA's:  14285                                                              
##                                                                            
##                                                                            
##                                                                            
##                                                                            
##  cs_drgtr    cs_furtv    cs_vcrim    cs_bulge    cs_other         year     
##  0:4522192   0:2801105   0:4588325   0:4543022   0:3944522   Min.   :2003  
##  1: 462199   1:2183286   1: 396066   1: 441369   1:1039869   1st Qu.:2006  
##                                                              Median :2009  
##                                                              Mean   :2008  
##                                                              3rd Qu.:2011  
##                                                              Max.   :2013  
## 

Full data set exploratory analysis

In order to meet the purpose of developing a model that describes the probability of police force being used on civilians we choose the response variable to be force and the exploratory analysis will thus focus on how the predictors influence and interact with each other to impact the likelihood of a police force interaction. To continue the analysis we consider the marginal distribution by generating bar plots for the variables characterized as factors and histograms for the continuous predictors to check for the number of missing values, skewness and extreme values.

First of all, we note that force is exceptionally skewed, which we will elaborate further on in the following. Secondly we note, that cs_objcs and cs_cloth are very asymmetrical. This may not directly cause any issues, nonetheless it is worth noticing. Lastly, the number of missing values varies from none to a significant amount as seen in the variable daytime. We’ll account for these later on.

As age2 is a continuous variable, it is plotted as a histogram. There is nothing eye catching about the distribution relating to skewness or extreme values, though we do note that the police tend to have stopped more younger than older people. This becomes very clear when looking at the violin plot in which the subjects age is stratified on the categorical types of force. There is nothing concerning about this relation either.

The last variable year is for now also a continuous variable and thus it is plotted as a histogram. Again, there is nothing eye catching about the distribution relating to skewness or extreme values, only an indication of the half year period in year 2003 and 2013 due to change in government. However, when looking at the corresponding violin plot, we observe an extreme amount of type 5 force being used in year 2006. This could be a mistake but we cannot make any conclusions, only note that it could be possible.

Prior to initiating our data analysis, it is convenient to search for variables that would be strong predictors. By that we would be looking for a numerically large correlation coefficient with the response variable, force. Furthermore, for the predictor variables, we would like to avoid using predictors, which have a high correlation to other predictors, as these most likely will have a similar effect on the response variable. To get an idea of how the variables correlate we make a Spearman-plot.

cpmix <- cor(data.matrix(na.omit(data)), method = "spearman")
colPal1 <- colorRampPalette(c("white", "#4f844f"), space = "rgb")(100)
pmix <- corrplot.mixed(cpmix,lower="color", upper="number",order="hclust", 
               tl.col="black", tl.pos="lt", tl.srt=38,tl.cex=0.6, number.cex=0.4,
               lower.col=colPal1, upper.col=colPal1)

Unfortunately, the Spearman-plot at first glance does not show any strong correlations at all. However, knowing that the Spearman-plot by default cluster the variables with numerically higher correlations. Therefore, despite the small correlation, we anticipate that cs_bulge and othpers2 must be the most correlated predictors to force. It also seems noteworthy that the correlation between cs_lkout and cs_casng and between ac_time and ac_incid seems quite strong, compared to the correlation between the other variables.

##   race2          gender             age2       daytime         inout2       
##  0   : 492688   0   :4495436   Min.   :10      0   :2472074   0   :3809789  
##  1   :2886843   1   : 348159   1st Qu.:19      1   :1871287   1   :1147150  
##  2   :1215401   NA's: 140796   Median :24      NA's: 641030   NA's:  27452  
##  3   : 152053                  Mean   :28                                   
##  4   : 235105                  3rd Qu.:34                                   
##  NA's:   2301                  Max.   :90                                   
##                                NA's   :41534                                
##  offunif2       typeofid2      othpers2      
##  0   :1396968   0   :  84217   0   :3806808  
##  1   :3586909   1   :2640092   1   :1163298  
##  NA's:    514   2   : 107497   NA's:  14285  
##                 3   :2133368                 
##                 NA's:  19217                 
##                                              
## 

To further research our chosen response variable, force, we make a crosstabulation between the 7 types of force and year.

CT <- table(data$force, data$year)

kable(CT, 
      col.names = colnames(CT), 
      caption="Crosstabulation of variables force and year") %>% 
  kable_styling()
Crosstabulation of variables force and year
2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013
0 119564 246398 312773 406541 365744 414700 439271 463337 540799 443857 157255
1 21225 41392 56400 69177 75406 91045 103625 97624 102471 57803 19883
2 8134 10412 12542 653 13097 14096 16467 16444 16978 10099 3976
3 7938 10637 11812 11366 14079 16265 17690 19265 21076 17697 8907
4 912 1010 937 631 612 496 481 498 641 712 435
5 1166 1579 1684 16069 1369 1792 1831 2128 1845 1418 709
6 1787 1954 1808 1868 1592 1747 1609 1807 1730 1215 652
7 125 141 235 184 197 161 194 182 184 110 34

Common intuition would suggest that as the level of force used upon civilians increases, such incidents would become less frequent. However we note from the cross tabulation that our initial presumption is not the case. Specifically, it becomes evident that there are relatively few instances where force level 4 is reported. This pattern indicates that the order of force types may not necessarily correspond to an escalation in brutality, or it could suggest that it is hard for the police to distinct between “at least drawing a weapon on a civilian” and “at least pushing a civilian to the ground”. This difficulty in distinction could also be some of the explanation to why, there is a extreme number of observations of force 5 in the year 2006.

The observed order of force levels and the potential overlap between specific force types motivates for a particular data modification decision. To address this, a new dataset is defined, wherein a force is assigned the value of 0 if no force is reported and 1 if any form of force is indicated. This grouping simplifies the less distinguishable order of force types into a more concise binary representation, making it suitable for modeling outcomes that involve the presence or absence of force. Moreover this transformation is chosen with the intention of subsequently fitting a binomial model.

Our scientific question

We now move on to formulating our scientific question:

How does the combination between the setting and the subjects appearance affect the risk of the police using force when stopping the subject for young civilians under the age of 21?

We choose to define the setting as the surroundings and situation of the stop, and appearance as the police officer’s view of the subject and its supposed actions during the stop.

Selection of predictors

As accounted for earlier, we want force to be our response variable, where we combine all use of force and model the risk of the police using any kind of force on the subject. In order to model this we fit a binomial generalized linear model. Moreover, to answer our scientific question we construct the following subset:

data_children <- data %>%
  filter(age2 <= 20) %>%
  mutate(force = as.numeric(as.character(force)), 
         bin_force = ifelse(force > 0, 1, 0)) %>%
        select(-force) %>%
  mutate(bin_force = as.factor(bin_force))

Another exploratory analysis of this subset was carried out. Here we saw no drastic changes in the marginal distributions or correlations though the strongest correlations are even stronger in the subset than in the full dataset.

As the exploratory analysis doesn’t disclose much about the variables our choice of predictors for the model will need to be somewhat based on the specific aspects we intend to bring forward.

To get alternative perspective on discrimination, we intentionally exclude both race2 and gender.

Immediately we choose to include daytime, inout2 as predictors as these tells us a lot the about setting of the stop. Furthermore we choose to discard typeofid2 and othpers2 as these neither tells us much about the officer’s view of the civilian nor the setting. However we are interested in how wearing an uniform could impact the officer’s use of force.

From the Spearman plot we noted the most correlated pairs of variables to be cs_lkout, cs_casng and ac_time, ac_incid, why we choose to only look at one variable of each pair. From the description of the variables our belief is that cs_lkout tells us more about the police officer’s interpretation than cs_casng. When we already have included daytime we choose ac_incid over ac_time as another predictor.

As we want to include predictors that tell us something about the officer’s own perception of the civilian rather than an evaluation of their actual behavior or circumstances we choose to exclude cs_descr, cs_drgtr, cs_objcs and cs_vcrimas predictors.

When we made the exploratory analysis on the subset with children only, we saw that children are slightly more likely to do furtive movements, why we choose to include cs_furtvin the model too.

With the asymmetric bar plots in mind, we also discard cs_cloth as a predictor.

Finally we choose to keep out cs_bulgeand cs_other from the model as these variables are quite imprecise.

In conclusion we choose the following predictors:

The setting predictors: daytime, inout2, ac_incid and offunif2

The appearance predictors: age2, cs_furtv. and cs_lkout

To quickly assess the correlation among predictors, as well as between predictors and the response variable, we generate a Spearman-plot.

Again, the Spearman-plot doesn’t show any signs of strong correlation between the predictors which is a good sign. Other than that, the Spearman-plot is as bland as the one including all variables.

Note that the Spearman plot omits all rows with missing values. Thus, before we begin the model fitting with our chosen predictors, we’ll need to investigate and evaluate if and how to impute the missing values in our data set.

Analysis of missing values

pMiss <- function(x){sum(is.na(x))/length(x)*100}
apply(NAdata,2,pMiss)
##       race2      gender        age2     daytime      inout2    offunif2 
##  0.04616412  2.82473827  0.83328134 12.86074869  0.55075936  0.01031219 
##   typeofid2    othpers2 
##  0.38554359  0.28659469
#apply(modeldata,1,pMiss)

aggr_plot <- aggr(NAdata, col=c('cadetblue','darkseagreen2'), 
                  numbers=TRUE, 
                  sortVars=TRUE, 
                  labels=names(NAdata), 
                  cex.axis=.7, gap=3, 
                  ylab=c("Histogram of missing data","Pattern"))

## 
##  Variables sorted by number of missings: 
##   Variable        Count
##    daytime 0.1286074869
##     gender 0.0282473827
##       age2 0.0083328134
##     inout2 0.0055075936
##  typeofid2 0.0038554359
##   othpers2 0.0028659469
##      race2 0.0004616412
##   offunif2 0.0001031219

From the histogram of the missing values we notice that one predictor is especially missing a lot of values: daytime is missing 12.86%. The large amount of missing values imply that we cannot directly assume they are missing completely at random. The other predictors are missing up to 3% of their data values, which means that we in general can assume them to be at least missing at random. Thus we focus on analyzing how the values of daytime are missing, as this big amount could be problematic, if there is a correlation to the response variable. From the Spearman plot above, we noted no correlation between force and daytime and we investigate further. Given the nature of the dataset we suspect that daytime should be correlated to ac_time when stratifying on the binary force:

correlations <- numeric()

non_na_data <- binmodel_data[complete.cases(binmodel_data$daytime, binmodel_data$ac_time), ]
force_levels <- unique(non_na_data$bin_force)


non_na_data$daytime <- as.numeric(as.character(non_na_data$daytime))
non_na_data$ac_time <- as.numeric(as.character(non_na_data$ac_time))

for (level in force_levels) {
   subset_data <- non_na_data[non_na_data$bin_force == level, ]
   correlation <- cor(subset_data$ac_time, subset_data$daytime, method = "pearson")
   correlations <- c(correlations, correlation)
 }

correlations
## [1] -0.06942767 -0.06494352

Firstly we note a weak correlation between ac_time and daytime. Secondly we don’t observe a systematic correlation to bin_force. This observation together with the previous adds to our initial suspicion that the values are not missing completely at random.

Lastly we consider the cross tabulation between year and daytime:

cross_tab <- table(data$year, data$daytime, useNA = "ifany")
row_sums <- rowSums(cross_tab, na.rm = TRUE)
cross_tab_with_sums <- cbind(cross_tab, row_sums)
 
kable(cross_tab_with_sums, 
      col.names = colnames(cross_tab_with_sums), 
      caption="Crosstabulation daytime and year") %>% 
  kable_styling()
Crosstabulation daytime and year
0 1 NA row_sums
2003 100148 60686 17 160851
2004 201078 112376 69 313523
2005 250292 147348 551 398191
2006 243565 185416 77508 506489
2007 232487 162968 76641 472096
2008 261735 191513 87054 540302
2009 266595 228727 85846 581168
2010 278109 225644 97532 601285
2011 307707 275828 102189 685724
2012 242830 208138 81943 532911
2013 87528 72643 31680 191851

From this we note that in 2003, 2004 and 2005 not many daytime-values were missing. Thus we look at the ratio of the predictor’s values without considering the missing values:

non_na_data <- data[complete.cases(data$daytime, data$year), ]
non_na_data$daytime <- as.numeric(as.character(non_na_data$daytime))
non_na_data$year <- as.numeric(as.character(non_na_data$year))
cross_tab_year <- table(non_na_data$year, non_na_data$daytime)
 
normalized_cross_tab_year <- cbind(cross_tab_year[,1]/rowSums(cross_tab_year), cross_tab_year[,2]/rowSums(cross_tab_year))
 
kable(normalized_cross_tab_year, 
      col.names = colnames(normalized_cross_tab_year), 
      caption="Daytime-ratios depending on year") %>% 
  kable_styling()
Daytime-ratios depending on year
2003 0.6226793 0.3773207
2004 0.6414913 0.3585087
2005 0.6294437 0.3705563
2006 0.5677757 0.4322243
2007 0.5878975 0.4121025
2008 0.5774653 0.4225347
2009 0.5382256 0.4617744
2010 0.5520741 0.4479259
2011 0.5273154 0.4726846
2012 0.5384639 0.4615361
2013 0.5464660 0.4535340

We note that the ratios for the years with few missing value are quite different from the years with many missing values. In general we note that from the year 2006 and forward we have fewer observations of daytime being 0. This could indicate that the values are not missing completely at random. Considering the data collection the missing values could be due to:

  1. The police could be understaffed at night and thus might not prioritize certain variables when patrolling as much as officers during the day. This could lead to the data not being consistently collected regarding the time of the day for each stop. Furthermore when the police need to use more force to deescalate a stop of civilians, they may be more focused on immediate safety and security concerns rather than detailed data recording such as daytime. On the other hand certain stops might happen quickly or in situations where the time of day is less relevant, making officers less likely to record this information.

  2. The police could have a hard time evaluating if the stop happened at night or day. Also there could be inconsistency in how different officers or departments report the time of day, leading to mismatch data and missing values.

  3. In some cases, the police might avoid recording the exact time of day to make it easier to defend their own actions or even in some cases the civilian’s privacy.

  4. The types and rates of criminal activity can vary by time of day. Some crimes and thus forces may be more common at certain times. As a result, officers may conduct more stop and frisk procedures during these times. This is investigated in the following.

cross_tab_daytime <- table(data$force, data$daytime, useNA = "ifany")

normalized_cross_tab_daytime <- cbind(cross_tab_daytime[,1]/colSums(cross_tab_daytime)[1], cross_tab_daytime[,2]/colSums(cross_tab_daytime)[2], cross_tab_daytime[,3]/colSums(cross_tab_daytime)[3])

colnames(normalized_cross_tab_daytime) <- c("Night", "Day", "NA")

kable(normalized_cross_tab_daytime, 
      col.names = colnames(normalized_cross_tab_daytime), 
      caption="Daytime-ratios depending on force") %>% 
  kable_styling()
Daytime-ratios depending on force
Night Day NA
0 0.7726990 0.8020277 0.7788185
1 0.1602480 0.1316410 0.1459651
2 0.0268940 0.0215199 0.0251845
3 0.0280906 0.0348423 0.0344602
4 0.0015109 0.0013579 0.0016988
5 0.0064763 0.0053450 0.0087016
6 0.0037268 0.0029429 0.0047564
7 0.0003544 0.0003233 0.0004150

From the cross tabulation we see, that the force being used is slightly stronger during the night than at day. Even though from the considerations mentioned before we suspect police might be more likely to record “daytime” values accurately while underreporting or omitting “nighttime”, we need a more thoroughly insight into the data collection and why certain values are missing. Thus we are unable to conclude if the values are missing at random or not at random. Moveover because of the big amount of missing values, an imputation of these missing values might lead to biased results, why we choose discard daytime as a predictor.

For the remainder of the variables, we know, that when variables are missing less than 5% of data values we can assume them to be missing at random. Thus we are safe to impute the missing values by imputing the most probable value for discrete variables and the conditional mean E(Xzc|Xz) for continuous variables (p. 99).

Imputing data and analysis on imputed datasets

We impute our missing data using the Mice package, by bootstrapping our model data 50 times. We note that all the predictors have less than 5% of missing values, whence as a rule of thumb we generate five new data sets using the PMM method. This method uses predictive mean matching - hence PMM - to impute the missing values in the dataset as we concluded above that the data is missing at random.

modeldata_imp <- mice(modeldata,m=5,maxit=50,meth='pmm',seed=500)
summary(modeldata_imp)

#watch imputed data
modeldata_imp$imp$daytime

#puts imputed data from 1st bootstrap back into dataset
modeldata_mice1 <- complete(modeldata_imp, 1)
modeldata_mice2 <- complete(modeldata_imp, 2)
modeldata_mice3 <- complete(modeldata_imp, 3)
modeldata_mice4 <- complete(modeldata_imp, 4)
modeldata_mice5 <- complete(modeldata_imp, 5)

After reading the new data sets, given the names mice1, mice2, mice3, mice4 and mice5, we need to check if the imputed data sets have similar imputed values. To do this, we’ll start by looking at the imputed data for age2, as this is the variable with the highest number of missing values we’ll use as a predictor.

The following density plot depict the distribution of age for the original data set without the missing values and all of the imputed data sets.

combined_data <- bind_rows(
  mice1 %>% mutate(dataset = "mice1"),
  mice2 %>% mutate(dataset = "mice2"),
  mice3 %>% mutate(dataset = "mice3"),
  mice4 %>% mutate(dataset = "mice4"),
  mice5 %>% mutate(dataset = "mice5")
)


ggplot(combined_data, aes(x = age2, color = dataset)) +
  geom_density(linewidth = 0.5) +
  geom_density(data = data, aes(x = age2), color = "darkseagreen3", linewidth = 0.5) +
  labs(x = "Age", y = "Density") +
  scale_color_manual(values = c("mice1" = "darkseagreen1", "mice2" = "darkseagreen2", "mice3" = "darkseagreen4", "mice4" = "aquamarine3", "mice5" = "darkcyan"))

The imputed data sets are not only seemingly identical to the original, they are also seemingly identical to each other. This could indicate that the data is imputed similarly in the five different data sets and because all of the other predictors have a smaller number of missing values, we assume that these are imputed similarly for the five data sets as well.

Now we are ready to begin the model selection and model fitting.
We again add a filter on age2 to only include children and young people under the age of 21 in order to investigate our group of interest.

To further research the differences of the imputed data sets, we use lapply to fit five additive GLM’s, one for each imputed data set, and use qqplot to plot the estimates for all five models.

mice1$Data <- "Mice 1"
mice2$Data <- "Mice 2"
mice3$Data <- "Mice 3"
mice4$Data <- "Mice 4"
mice5$Data <- "Mice 5"

combined_data <- rbind(mice1, mice2, mice3, mice4, mice5)

models <- split(combined_data, combined_data$Data) %>% 
  map(~ glm(bin_force ~ age2 + inout2 + ac_incid + offunif2 + cs_furtv + cs_lkout, data = ., family = binomial(link=logit)))

plot_data <- map_dfr(models, broom::tidy, .id = "Data")

ggplot(plot_data, aes(x = term, y = estimate, color = Data)) +
  geom_point() +
  geom_errorbar(aes(ymin = estimate - std.error, ymax = estimate + std.error), width = 0.2) +
  theme_minimal() + theme(axis.text.x = element_text(angle = 60, vjust = 1, hjust=1)) +
  labs(x = "", y = "Estimates")

Overall the estimates generated from each imputed data set look quite similar implying that the data sets are similar.

Choosing imputed dataset

The pool function averages the estimates of the complete data model, computes the total variance over the repeated analyses by Rubin’s rules, thus generating a collective model based on the five imputed data sets.

fits <- lapply(list(mice1, mice2, mice3, mice4, mice5), function(data) {
  glm(bin_force ~ age2 + inout2 + ac_incid + offunif2 + cs_furtv + cs_lkout, 
      family = binomial(link = logit), data = data) })

pooled_results <- pool(fits)

est1 <- summary(fits[[1]])$coefficients[,1]
est2 <- summary(fits[[2]])$coefficients[,1]
est3 <- summary(fits[[3]])$coefficients[,1]
est4 <- summary(fits[[4]])$coefficients[,1]
est5 <- summary(fits[[5]])$coefficients[,1]
pool <- summary(pooled_results)$estimate

pooledsum <- cbind(est1,est2,est3,est4,est5,pool)
kable(pooledsum, 
      col.names = colnames(pooledsum), 
      caption="Daytime-ratios depending on year") %>% 
  kable_styling()
Daytime-ratios depending on year
est1 est2 est3 est4 est5 pool
(Intercept) -0.6201873 -0.9449116 -0.9341736 -0.9289438 -0.9360392 -0.8728511
age2 -0.0111155 0.0014609 0.0008699 0.0007042 0.0010346 -0.0014092
inout21 -0.3387667 -0.3016910 -0.3015755 -0.3042366 -0.3022765 -0.3097092
ac_incid1 -0.2144852 -0.1835546 -0.1838671 -0.1837429 -0.1839454 -0.1899190
offunif21 -0.3772505 -0.3365833 -0.3357833 -0.3371664 -0.3368837 -0.3447334
cs_furtv1 0.4544399 0.3925586 0.3919383 0.3906458 0.3914825 0.4042130
cs_lkout1 -0.2905987 -0.2580985 -0.2579648 -0.2584940 -0.2586147 -0.2647541

When looking at the estimates for each imputed data sets next to the pooled estimates, we again see that they are quite similar. Because the imputed data sets are quite similar, we could in theory just choose one. Consequently, we compute the imputed data set with the median of the estimates.

coefficients_matrix <- cbind(est1, est2, est3, est4, est5)

medians <- function(matrix) {
  for (i in 1:7) {
    med_value <- median(matrix[i, ])
    median_column_index <- which(matrix[i, ] == med_value)[1]
      print(colnames(matrix)[median_column_index])}}

medians(coefficients_matrix)
## [1] "est3"
## [1] "est3"
## [1] "est5"
## [1] "est3"
## [1] "est5"
## [1] "est3"
## [1] "est4"

The output gives which of imputed data sets that contains the median etimate for each of the etimates. It turns out mice3 contain the median estimate for 4 estimates, therefore we move forward with this data set.

Model selection

With our chosen imputed dataset, we start the actual model selection. As we aim to predict the risk of use of force, we utilize the binomial logistic regression model. The response variable will be bin_force as explain earlier. We begin the model selection by constructing a baseline model, which we will use in the following for comparison. We choose the most simple model as our baseline: the intercept only model.

baseline_model <- glm(bin_force ~ 1, 
      family = binomial(link = logit), data = mice3)

We fit an additive generalized linear model and measures the significance of each predictor by comparing the full model with a nested model in which one predictor is dropped at a time. This is done with the drop1 function measuring the significance using ANOVA with a χ2-test.

add_model <- glm(bin_force ~ age2 + inout2 + ac_incid + offunif2 + cs_furtv + cs_lkout, 
      family = binomial(link = logit), data = mice3)

drop1(add_model, test = "Chisq")
## Single term deletions
## 
## Model:
## bin_force ~ age2 + inout2 + ac_incid + offunif2 + cs_furtv + 
##     cs_lkout
##          Df Deviance     AIC     LRT Pr(>Chi)    
## <none>       1764068 1764082                     
## age2      1  1764069 1764081     0.7   0.3874    
## inout2    1  1768172 1768184  4103.5   <2e-16 ***
## ac_incid  1  1766450 1766462  2381.3   <2e-16 ***
## offunif2  1  1771039 1771051  6970.8   <2e-16 ***
## cs_furtv  1  1775061 1775073 10992.6   <2e-16 ***
## cs_lkout  1  1766679 1766691  2610.5   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Due to age’s insignificance we note that it is a weak predictor but in order to avoid data driven variable selection we keep this predictor in our model. This choice is also based on the main purpose of the model is to investigate how the setting and appearance predictors interact with each other across the subjects age, increasing or decreasing the risk of the police using force.

We then use anova to test the model against the baseline model as these models indeed are nested.

anova(baseline_model, add_model, test = "Chisq")
## Analysis of Deviance Table
## 
## Model 1: bin_force ~ 1
## Model 2: bin_force ~ age2 + inout2 + ac_incid + offunif2 + cs_furtv + 
##     cs_lkout
##   Resid. Df Resid. Dev Df Deviance  Pr(>Chi)    
## 1   1635954    1793741                          
## 2   1635948    1764068  6    29672 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We note that the additive model against the intercept-only model is significant and thus continue our model fitting by calculating the training error based on the squared deviance residuals, to obtain comparable values, that express the models goodness-of-fit.

trainErr <- function(mod) {
  mean(residuals(mod, type = "deviance") ^ 2)
}

form0 <- bin_force ~ 1

form1 <- bin_force ~ age2 + inout2 + ac_incid + offunif2 + cs_lkout + cs_furtv


sapply(list(form0, form1),
       function(form){trainErr(glm(form, data = mice3,
                                   family = binomial(link = "logit")))})
## [1] 1.096449 1.078311

Moreover we perform cross-validation on the models and evaluate their expected generalization error. We calculate the expected generalization error based on the squared deviance residuals in the model which is given by:

d(Y,ˆμ)=2(Ylog(ˆμ)+(1Y)log(1ˆμ)) which is derived from example 5.18.

set.seed(200601)
cv_data <- mice3 %>%
  mutate(bin_force = as.numeric(as.character(bin_force)))

testCV <- function(form, data, B = 1, k = 5) {
  n <- nrow(data)
  y <- data$bin_force
  PEcv <- vector("list", B)
  tmp <- numeric(n)
  for(b in 1:B) {
    
    group <- sample(rep(1:k, length.out = n))
    for(i in 1:k) {
      modelcv <- glm(form, family=binomial("logit"), data = data[group != i, ])
      X_cv <- model.matrix(modelcv, data[group == i, ])
      muhat <- predict(modelcv, newdata = data[group == i, ], type = "response") 
       y_cv <- y[group == i]
       
       tmp[group == i] <- -(2*(y_cv*log(muhat)+(1-y_cv)*log(1-muhat)))
  
    }
    PEcv[[b]] <- tmp
  }
  mean(unlist(PEcv))
}

sapply(list(form0,form1), function(form){testCV(form, cv_data)})
## [1] 1.096452 1.078322

Both the training error and generalization error are smaller for the additive model, and thus we move forward with this model.

We suspect there to be interactions between our “appearance” and “setting” predictors - which also are our stronger predictors - and thus continue our model selection with the following interaction plot. We include age in the interactions to view if age could become a stronger predictor when having interactions:

fit <- glm(bin_force ~ age2*(cs_furtv + cs_lkout)*(inout2 + ac_incid + offunif2), data = mice3, family = binomial(link = "logit"))

ip12 <- plot_model(fit, type = "int", terms = c("age2", "cs_furtv + cs_lkout", "inout2 + ac_incid + offunif2"), title = "cs_furtv and inout2", colors = c("darkseagreen3", "cadetblue"))[[12]]
ip13 <- plot_model(fit, type = "int", terms = c("age2", "cs_furtv + cs_lkout", "inout2 + ac_incid + offunif2"), title = "cs_furtv and ac_incid", colors = c("darkseagreen3", "cadetblue"))[[13]]
ip14 <- plot_model(fit, type = "int", terms = c("age2", "cs_furtv + cs_lkout", "inout2 + ac_incid + offunif2"), title = "cs_furtv and offunif2", colors = c("darkseagreen3", "cadetblue"))[[14]]
ip15 <- plot_model(fit, type = "int", terms = c("age2", "cs_furtv + cs_lkout", "inout2 + ac_incid + offunif2"), title = "cs_lkout and inout2", colors = c("darkseagreen3", "cadetblue"))[[15]]
ip16 <- plot_model(fit, type = "int", terms = c("age2", "cs_furtv + cs_lkout", "inout2 + ac_incid + offunif2"), title = "cs_lkout and ac_incid", colors = c("darkseagreen3", "cadetblue"))[[16]]
ip17 <- plot_model(fit, type = "int", terms = c("age", "cs_furtv + cs_lkout", "inout2 + ac_incid + offunif2"), title = "cs_lkout and offunif2", colors = c("darkseagreen3", "cadetblue"))[[17]]

ip12 <- ip12 + scale_x_continuous(breaks = c(10, 15, 20))
ip13 <- ip13 + scale_x_continuous(breaks = c(10, 15, 20))
ip14 <- ip14 + scale_x_continuous(breaks = c(10, 15, 20))
ip15 <- ip15 + scale_x_continuous(breaks = c(10, 15, 20))
ip16 <- ip16 + scale_x_continuous(breaks = c(10, 15, 20))
ip17 <- ip17 + scale_x_continuous(breaks = c(10, 15, 20))

grid.arrange(ip12, ip15, ip13, ip16, ip14, ip17, nrow = 3)

From this plot we don’t see any reason for using interactions in our model as we don’t see any significant variations between the regression lines across the different plots. We thus continue our model selection with the additive model. We plot the residuals.

add_modeldiag <- fortify(add_model)

p1 <- ggplot(add_modeldiag, aes(.fitted, .stdresid)) +
  geom_point(alpha = 0.1) +
  geom_smooth(size = 1, color = "cadetblue") +
  xlab("Fitted values") + ylab("Standardized residuals")

p2 <- ggplot(add_modeldiag, aes(age2, .stdresid)) +
  geom_smooth(size = 1, color = "cadetblue") +
  xlab("Age") + ylab("Standardized residuals")

grid.arrange(p1, p2, nrow = 1)

These residual plots might imply that the relationship between the force and age is not appropriately captured by a linear model. This suggests the requirement for considering a non-linear effect concerning age. We test this in the following:

form2 <- bin_force ~ ns(age2, df = 2) + inout2 + ac_incid + offunif2 + cs_lkout + cs_furtv

form3 <- bin_force ~ ns(age2, df = 4) + inout2 + ac_incid + offunif2 + cs_lkout + cs_furtv

form4 <- bin_force ~ ns(age2, df = 8) + inout2 + ac_incid + offunif2 + cs_lkout + cs_furtv

anova(glm(form2, data = mice3, family = binomial(link = "logit")), glm(form3, data = mice3, family = binomial(link = "logit")), test = "Chisq")
## Analysis of Deviance Table
## 
## Model 1: bin_force ~ ns(age2, df = 2) + inout2 + ac_incid + offunif2 + 
##     cs_lkout + cs_furtv
## Model 2: bin_force ~ ns(age2, df = 4) + inout2 + ac_incid + offunif2 + 
##     cs_lkout + cs_furtv
##   Resid. Df Resid. Dev Df Deviance  Pr(>Chi)    
## 1   1635947    1764033                          
## 2   1635945    1764003  2   30.093 2.919e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(glm(form3, data = mice3, family = binomial(link = "logit")), glm(form4, data = mice3, family = binomial(link = "logit")), test = "Chisq")
## Analysis of Deviance Table
## 
## Model 1: bin_force ~ ns(age2, df = 4) + inout2 + ac_incid + offunif2 + 
##     cs_lkout + cs_furtv
## Model 2: bin_force ~ ns(age2, df = 8) + inout2 + ac_incid + offunif2 + 
##     cs_lkout + cs_furtv
##   Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1   1635945    1764003                     
## 2   1635942    1763998  3   4.8119   0.1861

From the tests above we note that the form3-model is significant against the form2-model however the form4-model against the form3-model with is not. we then compute the training errors and to capture the possible risk of overfitting, we compute the expected generalization errors.

sapply(list(form1, form2, form3, form4),
       function(form){trainErr(glm(form, data = mice3,
                                   family = binomial(link = "logit")))})
## [1] 1.078311 1.078290 1.078271 1.078268
sapply(list(form1, form2, form3, form4), function(form){testCV(form, cv_data)})
## [1] 1.078320 1.078303 1.078290 1.078284

The training error and expected generalization error are only marginally smaller for the form4-model with differences less than 0.0001. It’s important to note that such small differences may be due to variations introduced by computing the generalization error just once, as we will discuss in more detail later. Consequently we do not find reason to choose form4 over form3 and proceed with the form3-model.

We again look at the residuals.

nonlin_model <- glm(form3, data = mice3, family = binomial(link = "logit"))


nonlin_modeldiag <- fortify(nonlin_model)


ggplot(nonlin_modeldiag, aes(mice3$age2, .stdresid)) +
  geom_smooth(size = 1, color = "cadetblue") +
  xlab("Age") + ylab("Standardized residuals")

We note that the non-linear effects are better captured by the model using splines.

The final model

Given the model selection above, we find the additive model with the natural cubic splines of age, the appearance predictors and setting predictors to be the best fit. We thus choose the model below to predict the risk of force being used against adolescent under the age of 21. The following is a summary of the model.

final_model <- nonlin_model

sum <- (summary(final_model))$coefficients
kable(sum, 
      col.names = colnames(sum), 
      caption="Summary of the final model") %>% 
  kable_styling()
Summary of the final model
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.9863821 0.0332482 -29.6672037 0.0000000
ns(age2, df = 4)1 0.0724693 0.0307836 2.3541547 0.0185649
ns(age2, df = 4)2 0.0794074 0.0228100 3.4812609 0.0004991
ns(age2, df = 4)3 0.1186796 0.0706913 1.6788433 0.0931826
ns(age2, df = 4)4 0.0078149 0.0106562 0.7333675 0.4633343
inout21 -0.3013936 0.0047902 -62.9187397 0.0000000
ac_incid1 -0.1841073 0.0037655 -48.8930435 0.0000000
offunif21 -0.3355266 0.0039883 -84.1278049 0.0000000
cs_lkout1 -0.2582312 0.0051305 -50.3326554 0.0000000
cs_furtv1 0.3916577 0.0037454 104.5715528 0.0000000

Residualplot for the final model

To investigate our final model, we plot the residuals:

resid_final_model <- glm(bin_force ~ ns(age2, df = 4) +cs_furtv + cs_lkout + inout2 + ac_incid + offunif2, data = cv_data, 
                         family = binomial(link = "logit"))

simDiag <- fortify(resid_final_model)

ggplot(simDiag, aes(x = .fitted, y = .resid)) +
  geom_point() +
  geom_smooth(method = "glm", color = "darkseagreen", fill = "darkseagreen2") +
  xlab("Fitted values") + ylab("Standardized residuals")

We can’t tell if the residual plot seem reasonable thus we plot the simulated residuals under the null hypothesis that our model is correct and compare with the residual plot above.

set.seed(2023)

yNew1 <- simulate(resid_final_model)[,1]
yNew2 <- simulate(resid_final_model)[,1]
yNew3 <- simulate(resid_final_model)[,1]
yNew4 <- simulate(resid_final_model)[,1]


simGlmNew1 <- glm(yNew1 ~ ns(age2, df = 4) +cs_furtv + cs_lkout + inout2 + ac_incid + offunif2, data = cv_data, family = binomial(link = "logit"), control = glm.control(maxit = 1000))
simGlmNew2 <- glm(yNew2 ~ ns(age2, df = 4) +cs_furtv + cs_lkout + inout2 + ac_incid + offunif2, data = cv_data, family = binomial(link = "logit"), control = glm.control(maxit = 1000))
simGlmNew3 <- glm(yNew3 ~ ns(age2, df = 4) +cs_furtv + cs_lkout + inout2 + ac_incid + offunif2, data = cv_data, family = binomial(link = "logit"), control = glm.control(maxit = 1000))
simGlmNew4 <- glm(yNew4 ~ ns(age2, df = 4) +cs_furtv + cs_lkout + inout2 + ac_incid + offunif2, data = cv_data, family = binomial(link = "logit"), control = glm.control(maxit = 1000))

simDiagNew1 <- fortify(simGlmNew1)
simDiagNew2 <- fortify(simGlmNew2)
simDiagNew3 <- fortify(simGlmNew3)
simDiagNew4 <- fortify(simGlmNew4)


p1 <- qplot(.fitted, .resid, data = simDiagNew1) +
  geom_smooth(method = "glm", color = "darkseagreen", fill = "darkseagreen2") +
  xlab("Fitted values") + ylab("Standardized residuals")
p2 <- qplot(.fitted, .resid, data = simDiagNew2) +
  geom_smooth(method = "glm", color = "darkseagreen", fill = "darkseagreen2") +
  xlab("Fitted values") + ylab("Standardized residuals")
p3 <- qplot(.fitted, .resid, data = simDiagNew3) +
  geom_smooth(method = "glm", color = "darkseagreen", fill = "darkseagreen2") +
  xlab("Fitted values") + ylab("Standardized residuals")
p4 <- qplot(.fitted, .resid, data = simDiagNew4) +
  geom_smooth(method = "glm", color = "darkseagreen", fill = "darkseagreen2") +
  xlab("Fitted values") + ylab("Standardized residuals")


grid.arrange(p1,p2,p3,p4, ncol=2)

Since the simulated residual plots are similar to the one of our model, we don’t see any major reason for concern: Our model’s fit seems consistent across different resampled data sets, which indicates that it’s less likely to be a subject to overfitting.

Confidence intervals for odds-ratio

Before looking at the overall prediction of the model, we’ll illustrate particular predictors effect on the risk of the police using force by examining specific examples. We’ll do so by calculating the odds-ratio of this predictor, meaning the odds-ratio of this specific setting or appearance situation.

First, we’ll look at how the setting variable inout2 affects the risk of the police using force, as one could imagine that a police officer is less likely to use force when the subject is stopped inside e.g due to lack of space and a generally calmer environment.

B = 99
n = nrow(mice3)
ORsampset <- c()
for(b in 1:B) {
  i <- sample(n, n, replace = TRUE)
  bootGlm <- glm(bin_force ~ ns(age2, df = 4) +cs_furtv + cs_lkout + inout2 + ac_incid + offunif2, 
                 family = binomial(link = "logit"), data = mice3[i, ])
  ORsampset[b] <- exp(coef(bootGlm)["inout21"])
}
qset <- quantile(ORsampset, c(0.025,0.975), type = 1)
2.5% 97.5%
0.732931 0.7470805
##      [,1]    [,2]       
## [1,] "2.5%"  "0.732931" 
## [2,] "97.5%" "0.7470805"

The 95% confidence interval does not include 1, indicating that if the subject is stopped inside, the risk of the police using force is lower compared to the subject being stopped outside. The confidence interval is quite narrow, implying a high level of precision in the estimate which could be due to the large sample size.

Next, we’ll investigate the odds-ratio for the appearance predictor cs_furtv, as one could imagine that a police officer is more likely is more to use force on the subject if he or she is moving suspiciously.

B = 99
n = nrow(mice3)
ORsampapp <- c()
for(b in 1:B) {
  i <- sample(n, n, replace = TRUE)
  bootGlm <- glm(bin_force ~ ns(age2, df = 4) +cs_furtv + cs_lkout + inout2 + ac_incid + offunif2, 
                 family = binomial(link = "logit"), data = mice3[i, ])
  ORsampapp[b] <- exp(coef(bootGlm)["cs_furtv1"])
}
qapp <- quantile(ORsampapp, c(0.025,0.975), type = 1)
2.5% 97.5%
1.468486 1.493202

This 95% confidence interval does not include 1 either and the confidence interval is much higher than 1, indicating an increasing risk of the police using force if the subject is moving suspiciously during the stop.

Prediction and interpretation

For our model, it is now possible to plot the predicted possibility of force used against adolescents.

predFrame0 <- expand.grid(cs_furtv = factor(c(0)),
                         ac_incid = factor(1),
                         inout2 = factor(c(0)),
                         offunif2 = factor(c(0)),
                         cs_lkout = factor(c(1)),
                         age2 = seq(10, 20, by = 0.1))
predFrame0$predicted_prob <- predict(final_model, type = "response",
                                     newdata = predFrame0, interval = "confidence")
p_pred0 <- ggplot(predFrame0, aes(x = age2, y = predicted_prob)) +
  facet_grid(cs_lkout + cs_furtv ~ ac_incid + inout2 + offunif2, label = label_both) +
  geom_line(color = "cadetblue") + coord_cartesian(ylim = c(0.1,0.4)) +
  geom_hline(yintercept = 0.25, linetype = "dashed", color = "firebrick3") +
  scale_y_continuous(breaks = c(0.15, 0.25, 0.35)) +
  labs(x = "Age", y = "Risk of force")

predFramecom <- expand.grid(cs_furtv = factor(c(1)),
                         ac_incid = factor(1),
                         inout2 = factor(c(0)),
                         offunif2 = factor(c(0)),
                         cs_lkout = factor(c(0)),
                         age2 = seq(10, 20, by = 0.1))

predFramecom$predicted_prob <- predict(final_model, type = "response",
                                     newdata = predFramecom, interval = "confidence")

p_predcom <- ggplot(predFramecom, aes(x = age2, y = predicted_prob)) +
  facet_grid(cs_lkout + cs_furtv ~ ac_incid+ inout2 + offunif2, label = label_both) +
  geom_line(color = "cadetblue") + coord_cartesian(ylim = c(0.1,0.4)) +
  geom_hline(yintercept = 0.25, linetype = "dashed", color = "firebrick3") +
  scale_y_continuous(breaks = c(0.15, 0.25, 0.35)) +
  labs(x = "Age", y = "Risk of force")

grid.arrange(p_pred0, p_predcom, ncol = 2)

The plot above presents a portion of the predictions from our model. The graph on the left, describes the risk of the police using force on the subject when stopped, with a red line marking 25%. It represents the following situation: it is nighttime, the subject is outside in a high crime area meeting an officer not in uniform. The subject appears as being on a lookout without exhibiting any suspicious movements. The prediction plot suggests that there’s approximately 20% risk of force being used against the subject, and this risk remains consistent for individuals aged between 10 and 21.

The right prediction plot indicates that, under the same setting, when the subject doesn’t appear to be on the lookout but exhibits suspicious movements, the risk of force being used against them surges significantly, exceeding 30%. Another interesting change that our model imply based on the change of appearance is, that the risk of force used against the civilian depends on the age. The risk increases slightly with the age until around the age of 18, where it decreases a bit.

For better comparison we construct a confidence interval.

conf_plots <- function(data, model) {
  link_preds <- predict(model, newdata = data, type = "link", se.fit = TRUE)
  
  fit_link <- link_preds$fit
  se_link <- link_preds$se.fit
  
  z <- qnorm(0.975) 
  lwr_link <- fit_link - z * se_link
  upr_link <- fit_link + z * se_link
  
  data$predicted_prob <- plogis(fit_link)
  data$lwr <- plogis(lwr_link)
  data$upr <- plogis(upr_link)
  
  return(data)
}

First, note that the function plogis takes the inverse of the logit function to transform values from the log-odds scale to the probability scale. Also, when making the confidence intervals, we’ll use the standard normal distribution to generate our z value. This is due to asymptotic theory which states that the maximum likelihood estimator is asymptotically normally distributed when dealing with a large set of observations. As the data set we use to generate the estimates used in our prediction plots contains over 1.5 million observations, this seems rather safe to assume. However, for added assurance, we examine the QQ-plot comparing quantiles from a standard normal distribution with those from the estimator.

estimate <- predict(final_model, newdata = mice3, type = "link", se.fit = TRUE)$fit


qqnorm(estimate)
qqline(estimate)

From this QQ-plot our construction of the confidence intervals seems reasonable, and we show the same two prediction plots with confidence intervals.

predFrame <- expand.grid(cs_furtv = factor(c(0)),
                         ac_incid = factor(1),
                         inout2 = factor(c(0)),
                         offunif2 = factor(c(0)),
                         cs_lkout = factor(c(1)),
                         age2 = seq(10, 20, by = 0.1))

p_pred <- ggplot(conf_plots(predFrame, final_model), aes(x = age2, y = predicted_prob)) +
  facet_grid(cs_lkout + cs_furtv ~ ac_incid  + inout2 + offunif2, label = label_both) +
  geom_ribbon(aes(ymin = lwr, ymax = upr), fill = gray(0.85)) +
  geom_line(color = "cadetblue") + coord_cartesian(ylim = c(0.1,0.4)) +
  geom_hline(yintercept = 0.25, linetype = "dashed", color = "firebrick3") +
  scale_y_continuous(breaks = c(0.15, 0.25, 0.35)) +
  labs(x = "Age", y = "Risk of force")

predFrame1 <- expand.grid(cs_furtv = factor(c(1)),
                         ac_incid = factor(1),
                         inout2 = factor(c(0)),
                         offunif2 = factor(c(0)),
                         cs_lkout = factor(c(0)),
                         age2 = seq(10, 20, by = 0.1))

p_pred1 <- ggplot(conf_plots(predFrame1, final_model), aes(x = age2, y = predicted_prob)) +
  facet_grid(cs_lkout + cs_furtv ~ ac_incid  + inout2 + offunif2, label = label_both) +
  geom_ribbon(aes(ymin = lwr, ymax = upr), fill = gray(0.85)) +
  geom_line(color = "cadetblue") + coord_cartesian(ylim = c(0.1,0.4)) +
  geom_hline(yintercept = 0.25, linetype = "dashed", color = "firebrick3") +
  scale_y_continuous(breaks = c(0.15, 0.25, 0.35)) +
  labs(x = "Age", y = "Risk of force")

grid.arrange(p_pred, p_pred1, ncol = 2)

We note that these confidence intervals are quite narrow in general and wider towards the age of 10, which might be due to fewer observations around this age. As a consequence of the narrow confidence intervals our interpretations of the plot are the same as above. We now plot the prediction plots for all combinations of the predictors.

predFrame1full <- expand.grid(cs_furtv = factor(c(0,1)),
                         ac_incid = factor(0),
                         inout2 = factor(c(0,1)),
                         offunif2 = factor(c(0,1)),
                         cs_lkout = factor(c(0,1)),
                         age2 = seq(10, 20, by = 0.1))

p_pred1full <- ggplot(conf_plots(predFrame1full, final_model), aes(x = age2, y = predicted_prob)) +
  facet_grid(cs_lkout + cs_furtv ~ ac_incid + inout2 + offunif2, label = label_both) +
  geom_hline(yintercept = 0.25, linetype = "dashed", color = "firebrick3") +
  scale_y_continuous(breaks = c(0.15, 0.25, 0.35)) +
  geom_ribbon(aes(ymin = lwr, ymax = upr), fill = gray(0.85)) +
  theme(axis.text.y = element_text(size = 5)) + 
  geom_line(color = "cadetblue") + coord_cartesian(ylim = c(0.1,0.4)) +
  theme( strip.text.y = element_text(angle = 0)) +
  labs(x = "Age", y = "Risk of force")
  


predFrame2full <- expand.grid(cs_furtv = factor(c(0,1)),
                         ac_incid = factor(1),
                         inout2 = factor(c(0,1)),
                         offunif2 = factor(c(0,1)),
                         cs_lkout = factor(c(0,1)),
                         age2 = seq(10, 20, by = 0.1))

p_pred2full <- ggplot(conf_plots(predFrame2full, final_model), aes(x = age2, y = predicted_prob)) +
  facet_grid(cs_lkout + cs_furtv ~ ac_incid + inout2 + offunif2, label = label_both) +
  geom_ribbon(aes(ymin = lwr, ymax = upr), fill = gray(0.85)) +
  geom_line(color = "cadetblue") + coord_cartesian(ylim = c(0.1,0.4)) +
  geom_hline(yintercept = 0.25, linetype = "dashed", color = "firebrick3") +
  scale_y_continuous(breaks = c(0.15, 0.25, 0.35)) +
  theme(axis.text.y = element_text(size = 5)) +
  theme( strip.text.y = element_text(angle = 0)) +
  labs(x = "Age", y = "Risk of force")

p_pred1full

p_pred2full

Above is shown our full model prediction plot, still with a dotted line at 25% as a comparable benchmark. By looking at the confidence intervals for each combination of setting and appearance predictors, one can tell that the real risk of the police using force is less certain in the lower tail of the distribution. This is probably because we have fewer observations for children under the age of 15 in our data set. When generating 100 confidence intervals 95 of them would contain the true risk of the police using force, and as these confidence intervals are quite narrow, especially for civilians over the age of 15, it might indicate that the predictions are creditable.

We notice a few general tendencies from the overview: Firstly, the appearance seems to have an effect on the risk of the police using force. The strongest difference of risk is seen between the scenario where the subject is appearing to be on a lookout and not moving suspiciously, and the scenario where the subject is definitely not on a lookout but appear to be moving suspiciously (the two middle-rows). The model implies that in the latter scenario, the risk of police force is over 25% in many of the combinations of setting variables, while the former is under 25% in all of the cases indicating that the police is less likely to use force when stopping the subject in this specific situation.

Another tendency in our prediction plot is a higher risk of use of force when the police isn’t wearing a uniform. Even though this effect does not seem statistically strong, it is noteworthy, especially in the context of social theories. One theory suggests that police officers take on a more authoritative role when wearing a uniform, as it symbolizes the expectations they have to meet to fulfill their duties. This does not only affect the police officer’s view upon him or herself, but also the subject’s view of the police officer, their authorities and rights during the stop. Especially regarding children and adolescents as they might have a stronger belief in authority.

From the prediction plots, we also observe a rather counter intuitive tendency showing that the risk of use of force is not higher if the stop occurs in a high crime area. The risk in the prediction plots actually appears to be higher when the stop occurs in a low crime area. This might suggest that police officers don’t hold biases towards the suggested criminal behavior of children and young adults in high crime areas, indicating no discriminatory behavior of the police regarding the area.

However, some of the data might be biased, as already mentioned in the analysis of the variable daytime. As we don’t know who exactly provides the information of the stop and frisk, we must assume that it is the same police officers that conduct the stop and frisk and provide the information of the incident. If the stop and frisk portrays the police officer negatively, he or she might provide false information to make his or her actions appear more appropriate, such as lowering the use of force, changing the subject’s appearance or location of the stop. Which in turn tells us something about the discriminatory mindset of the police. So, the non-existing bias, the prediction plots suggests, might just be due to fabricated, left out or biased data.

From above discussion, we have concluded the tendencies our model imply. However, we have encountered diagnostics in our analysis that contradicts the reliability of the model. Mentionable is our Spearman plot, showing little to no correlations within our datasets, which as just disclosed might even be biased. Furthermore in our model selection, the drop1-analysis suggested that all of the predictors but one are significant and strong in explaining the variation in the response variable. We have in our analysis interpreted it as all the variables are relevant to include in the model, however this could just as well be due to the data in generally correlating badly, and therefore the level of significance is due to the predictors affecting the response differently and not their goodness-of-fit.

We could also recall in the model selection, we chose our final model based on the training errors and expected generalization errors. Both of these were determined based on a comparison of values, that differed less than 0.0001. To investigate this difference a little further we plot 100 simulations of the expected generalization errors for our chosen form3 and not chosen form4, to gain insight into how often, the predictive strength of our chosen model is better than the alternatives.

cv_sample3 <- sapply(numeric(100), function(x) testCV(form3, cv_data, k = 10))

cv_sample4 <- sapply(numeric(100), function(x) testCV(form4, cv_data, k = 10))

The plot above is illustrating the distribution of EGE’s for the two forms. It turns out that form3, has the highest counts for the smaller EGE-values, while form4 has the highest count for the larger EGE-values. However, it is still clearly visible, that form4 with some separation of the dataset, has the same EGE-value, as form3 and therefore in another test could turn out to have better predictive strength. This observation becomes particularly visible when calculating the mean and median for the EGE samples, which are extremely close.

mean form3 mean form4 median form3 median form4
1.07828381772766 1.07828536875339 1.07828361604188 1.0782856143929

In the end, the 8 degrees of freedom of our splines in form4 are an concerningly high amount given our age-variable only span 10 years. Ultimately it is therefore unlikely that the other model would have been chosen. Given this discussion, we do have to consider that our chosen model, might not be much better than our next best alternative, and could relatively easily have been chosen differently. This speaks against the general predictive strength and reliability of our chosen model.

Nevertheless, our model diagnostics show that the residuals are in fact distributed as could be expected from our chosen distribution. Even though this implies that we have chosen the correct underlying distribution, it is not strong enough to outweigh the arguments of why the model has poor reliability.

To answer our scientific question, we can say that the model implies that the appearance and setting have an interactive effect on the risk of use of force against civilians under the age of 21. This effect is visible but not necessarily significant. In conclusion, it is very important to take into consideration, that the model is diagnostically not performing very well, and should therefore be interpreted and utilized with careful considerations.