We all contributed equally to produce this assignment.

In this assignment we have investigated the data from New York City’s Stop, Question and Frisk program from 2003 until 2013. Firstly, we have made an exploratory analysis of the data set by finding correlations and distribution of the variables, whereafter we have considered missing data. Our aim has been to determine how the appearance and setting influence the risk of force used against young civilians under the age of 21. In order to do so, we compare different models by various methods including residual plots, training errors and cross-validations. Lastly we interpret our model’s predictions on biased behaviour from the police, but ultimately conclude in a general discussion of the model diagnostics and data collection that the model does not necessarily reflect the reality.

Load and mutate data

To begin our exploratory analysis, we start by loading our our data set and mutating it to make sure we can investigate all variables collectively and compare them.

data <- d2 %>%
  select(-c("force2", "pct")) %>%
  mutate_at(vars(all_of(yes_no_columns)), ~ifelse(. == "N", 0, 1))%>%
  mutate(race2 = case_when(
    race2 == "white"    ~ 0,
    race2 == "black"    ~ 1,
    race2 == "hisp"     ~ 2,
    race2 == "asian"    ~ 3,
    race2 == "other"    ~ 4
  )) %>%
  mutate(typeofid2  = case_when(
    typeofid2 == "O"    ~ 0,
    typeofid2 == "P"    ~ 1,
    typeofid2 == "R"    ~ 2,
    typeofid2 == "V"    ~ 3,
  )) %>%
  mutate(across(all_of(factor_columns), as.factor)) %>%
  mutate(year = as.numeric(year))

summary(data)
##      force          race2          gender             age2       daytime       
##  0      :3910239   0   : 492688   0   :4495436   Min.   :10      0   :2472074  
##  1      : 736051   1   :2886843   1   : 348159   1st Qu.:19      1   :1871287  
##  3      : 156732   2   :1215401   NA's: 140796   Median :24      NA's: 641030  
##  2      : 122898   3   : 152053                  Mean   :28                    
##  5      :  31590   4   : 235105                  3rd Qu.:34                    
##  6      :  17769   NA's:   2301                  Max.   :90                    
##  (Other):   9112                                 NA's   :41534                 
##   inout2        ac_incid    ac_time     offunif2       typeofid2     
##  0   :3809789   0:2204888   0:3158281   0   :1396968   0   :  84217  
##  1   :1147150   1:2779503   1:1826110   1   :3586909   1   :2640092  
##  NA's:  27452                           NA's:    514   2   : 107497  
##                                                        3   :2133368  
##                                                        NA's:  19217  
##                                                                      
##                                                                      
##  othpers2       cs_objcs    cs_descr    cs_casng    cs_lkout    cs_cloth   
##  0   :3806808   0:4850690   0:4116587   0:3562617   0:4151151   0:4773826  
##  1   :1163298   1: 133701   1: 867804   1:1421774   1: 833240   1: 210565  
##  NA's:  14285                                                              
##                                                                            
##                                                                            
##                                                                            
##                                                                            
##  cs_drgtr    cs_furtv    cs_vcrim    cs_bulge    cs_other         year     
##  0:4522192   0:2801105   0:4588325   0:4543022   0:3944522   Min.   :2003  
##  1: 462199   1:2183286   1: 396066   1: 441369   1:1039869   1st Qu.:2006  
##                                                              Median :2009  
##                                                              Mean   :2008  
##                                                              3rd Qu.:2011  
##                                                              Max.   :2013  
## 

Full data set exploratory analysis

In order to meet the purpose of developing a model that describes the probability of police force being used on civilians we choose the response variable to be force and the exploratory analysis will thus focus on how the predictors influence and interact with each other to impact the likelihood of a police force interaction. To continue the analysis, we generate bar plots for the variables characterized as factors and histograms for the continuous predictors to check for the number of missing values, skewness and extreme values.

First of all, we note that force is exceptionally skewed, which we will elaborate further on in the following.

Most of the other variables are binary, due to the mutation of our data set. For these we cannot discuss any skewness, however, some of the variables have significantly more values of 0 than 1, like cs_objcs and cs_cloth. This may not directly cause any issues, nonetheless it is worth noticing.

The number of missing values varies from none to a significant amount as seen in the variable daytime. We’ll account for these later on.

As age2 is a continuous variable, it is plotted as a histogram. There is nothing eye catching about the distribution relating to skewness or extreme values, though we do note that the police tend to have stopped more younger than older people. This becomes very clear when looking at the violin plot in which the subjects age is stratified on the categorical types of force. There is nothing concerning about this relation either.

The last variable year is for now also a continuous variable and thus it is plotted as a histogram. Again, there is nothing eye catching about the distribution relating to skewness or extreme values, only an indication of the half year period in year 2003 and 2013 due to change in government. However, when looking at the corresponding violin plot, we observe an extreme amount of type 5 force being used in year 2006. This could be a mistake but we cannot make any conclusions, only note that it could be possible.

Prior to initiating our data analysis, it is convenient to search for variables that would be strong predictors. By that we would be looking for a numerically large correlation coefficient with the response variable, force. Furthermore, for the predictor variables, we would like to avoid using predictors, which have a high correlation to other predictors, as these most likely will have a similar effect on the response variable. To get an idea of how the variables correlate we make a Spearman-plot.

cp <- cor(data.matrix(na.omit(data)), method = "spearman")

ord <- rev(hclust(as.dist(1-abs(cp)))$order)

colPal <- colorRampPalette(c("white", "darkseagreen4"), space = "rgb")(100)

spearman <- levelplot(cp[ord, ord], xlab = "", ylab = "",
col.regions = colPal, at = seq(-1, 1, length.out = 100),
colorkey = list(space = "top", labels = list(cex = 0.5)),
scales = list(x = list(rot = 45),
y = list(rot = 45),
cex = 0.5))

spearman

Unfortunately, the Spearman-plot at first glance does not show any strong correlations at all. However, knowing that the Spearman-plot by default cluster the variables with numerically higher correlations. Therefore, despite the small correlation, we anticipate that cs_bulge and othpers2 must be the most correlated predictors to force. It also seems noteworthy that the correlation between cs_lkout and cs_casng and between ac_time and ac_incid seems quite strong, compared to the correlation between the other variables.

##   race2          gender             age2       daytime         inout2       
##  0   : 492688   0   :4495436   Min.   :10      0   :2472074   0   :3809789  
##  1   :2886843   1   : 348159   1st Qu.:19      1   :1871287   1   :1147150  
##  2   :1215401   NA's: 140796   Median :24      NA's: 641030   NA's:  27452  
##  3   : 152053                  Mean   :28                                   
##  4   : 235105                  3rd Qu.:34                                   
##  NA's:   2301                  Max.   :90                                   
##                                NA's   :41534                                
##  offunif2       typeofid2      othpers2      
##  0   :1396968   0   :  84217   0   :3806808  
##  1   :3586909   1   :2640092   1   :1163298  
##  NA's:    514   2   : 107497   NA's:  14285  
##                 3   :2133368                 
##                 NA's:  19217                 
##                                              
## 

To further research our chosen response variable, force, we make a crosstabulation between the 7 types of force and year.

CT <- table(data$force, data$year)

kable(CT, 
      col.names = colnames(CT), 
      caption="Crosstabulation of variables force and year") %>% 
  kable_styling()
Crosstabulation of variables force and year
2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013
0 119564 246398 312773 406541 365744 414700 439271 463337 540799 443857 157255
1 21225 41392 56400 69177 75406 91045 103625 97624 102471 57803 19883
2 8134 10412 12542 653 13097 14096 16467 16444 16978 10099 3976
3 7938 10637 11812 11366 14079 16265 17690 19265 21076 17697 8907
4 912 1010 937 631 612 496 481 498 641 712 435
5 1166 1579 1684 16069 1369 1792 1831 2128 1845 1418 709
6 1787 1954 1808 1868 1592 1747 1609 1807 1730 1215 652
7 125 141 235 184 197 161 194 182 184 110 34

Common intuition would suggest that as the level of force used upon civilians increases, such incidents would become less frequent. However we note from the cross tabulation that our initial presumption is not the case. Specifically, it becomes evident that there are relatively few instances where force level 4 is reported. This pattern indicates that the order of force types may not necessarily correspond to an escalation in brutality, or it could suggest that it is hard for the police to distinct between “at least drawing a weapon on a civilian” and “at least pushing a civilian to the ground”. This difficulty in distinction could also explain the extreme number of observatios of force 5 in the year 2006.

The observed order of force levels and the potential overlap between specific force types motivates for a particular data modification decision. To address this, a new dataset is defined, wherein a force is assigned the value of 0 if no force is reported and 1 if any form of force is indicated. This grouping simplifies the less distinguishable order of force types into a more concise binary representation, making it suitable for modeling outcomes that involve the presence or absence of force. Moreover this transformation is chosen with the intention of subsequently fitting a binomial model.

Our scientific question

How does the interaction between the setting and the subjects appearance affect the risk of the police using force when stopping the subject for young civilians under the age of 21?

We choose to define the setting as the surroundings and situation of the stop, and appearance as the police officer’s view of the subject and its supposed actions during the stop.

Model data

As accounted for earlier, we want force to be our response variable, where we combine all use of force and model the risk of the police using any kind of force on the subject. In order to model this we fit a binomial generalized linear model.

In order to answer our scientific question, we’ll need to choose which variables we want to include as predictors in our model:

Immediately we choose to include daytime, inout2 as a predictors as these tells us a lot the about setting of the stop. Furthermore we choose to discard typeofid2 and othpers2 as these neither tells us much about the officer’s view of the civilian nor the setting. However we are interested in how wearing an uniform could impact the officer’s use of force. To get alternative perspective on discrimination, we intentionally exclude both race2 and gender, however we choose to include the age of the civilian.

In order to project the force being used and thus predict future usage of force on civilians under the age of 21 we choose to include year as a predictor.

As we want to include predictors that tell us something about the officer’s own perception of the civilian rather than an evaluation of their actual behavior or circumstances we choose to exclude cs_descr, cs_drgtr, cs_objcs and cs_vcrimas predictors.

We believe that children are more likely to do furtive movements, why we choose to include cs_furtvin the model too.

From the Spearman plot we noted the most correlated pairs of variables to be cs_lkout, cs_casng and ac_time, ac_incid, why we choose to only look at one of them. From the description of the variables our belief is that cs_lkout tells us more about the police officer’s interpretation than cs_casng. When we already have included daytimewe choose ac_incid over ac_time as another predictor.

With the bar plots from the exploratory analysis in mind, we also discard cs_cloth as a predictor as it doesn’t provide significant insights into the officer’s personal potential discriminatory perception of civilians.

Finally we choose to keep out cs_bulgeand cs_other from the model as these variables are quite imprecise.

In conclusion we choose the following predictors:

The setting predictors: daytime, inout2, ac_incid and offunif2

The appearance predictors: age2, cs_furtv. and cs_lkout

To quickly assess the correlation among predictors, as well as between predictors and the response variable, we generate a Spearman-plot.

Again, the Spearman-plot doesn’t show any signs of strong correlation between the predictors which is a good sign. Other than that, the Spearman-plot is as bland as the one including all variables.

Note that the Spearman plot omits all rows with missing values. Thus, before we begin the model fitting with our chosen predictors, we’ll need to investigate and evaluate if and how to impute the missing values in our data set.

Analysis of missing values

pMiss <- function(x){sum(is.na(x))/length(x)*100}
apply(NAdata,2,pMiss)
##       race2      gender        age2     daytime      inout2    offunif2 
##  0.04616412  2.82473827  0.83328134 12.86074869  0.55075936  0.01031219 
##   typeofid2    othpers2 
##  0.38554359  0.28659469
#apply(modeldata,1,pMiss)

aggr_plot <- aggr(NAdata, col=c('cadetblue','darkseagreen2'), 
                  numbers=TRUE, 
                  sortVars=TRUE, 
                  labels=names(NAdata), 
                  cex.axis=.7, gap=3, 
                  ylab=c("Histogram of missing data","Pattern"))

## 
##  Variables sorted by number of missings: 
##   Variable        Count
##    daytime 0.1286074869
##     gender 0.0282473827
##       age2 0.0083328134
##     inout2 0.0055075936
##  typeofid2 0.0038554359
##   othpers2 0.0028659469
##      race2 0.0004616412
##   offunif2 0.0001031219

From the histogram of the missing values we notice that one predictor is especially missing a lot of values: daytime is missing \(12.86\%\). The large amount of missing values imply that we cannot directly assume they are missing completely at random. The other predictors are missing up to \(3\%\) of their data values, which means that we in general can assume them to be at least missing at random. Thus we focus on analyzing how the values of daytime are missing, as this big amount could be problematic, if there is a correlation to the response variable. By not taking the missing values into account we calculate the correlation between daytime and force to be:

non_na_data <- data[complete.cases(data$daytime), ]
non_na_data$force <- as.numeric(as.character(non_na_data$force))
non_na_data$daytime <- as.numeric(as.character(non_na_data$daytime))
cor(non_na_data$force, non_na_data$daytime)
## [1] -0.01773452

We note no correlation between force and daytime and we investigate further: Given the nature of the dataset we suspect that daytime should be correlated to ac_time when stratifying on the binary force:

bin_data<- data %>%
  mutate(force = as.numeric(as.character(force)), 
         bin_force = ifelse(force > 0, 1, 0)) %>%
        select(-force) %>%
  mutate(bin_force = as.factor(bin_force))
     
        
correlations <- numeric()

non_na_data <- bin_data[complete.cases(bin_data$daytime, bin_data$ac_time), ]
force_levels <- unique(non_na_data$bin_force)


non_na_data$daytime <- as.numeric(as.character(non_na_data$daytime))
non_na_data$ac_time <- as.numeric(as.character(non_na_data$ac_time))

for (level in force_levels) {
   subset_data <- non_na_data[non_na_data$bin_force == level, ]
   correlation <- cor(subset_data$ac_time, subset_data$daytime, method = "pearson")
   correlations <- c(correlations, correlation)
 }

correlations
## [1] -0.08107143 -0.08215116

Firstly we note a weak correlation between ac_time and daytime. Secondly we don’t observe a systematic correlation to bin_force. This observation together with the previous calculations adds to our initial suspicion that the values are not missing completely at random.

Lastly we consider the cross tabulation between year and daytime:

cross_tab <- table(data$year, data$daytime, useNA = "ifany")
row_sums <- rowSums(cross_tab, na.rm = TRUE)
cross_tab_with_sums <- cbind(cross_tab, row_sums)
 
kable(cross_tab_with_sums, 
      col.names = colnames(cross_tab_with_sums), 
      caption="Crosstabulation daytime and year") %>% 
  kable_styling()
Crosstabulation daytime and year
0 1 NA row_sums
2003 100148 60686 17 160851
2004 201078 112376 69 313523
2005 250292 147348 551 398191
2006 243565 185416 77508 506489
2007 232487 162968 76641 472096
2008 261735 191513 87054 540302
2009 266595 228727 85846 581168
2010 278109 225644 97532 601285
2011 307707 275828 102189 685724
2012 242830 208138 81943 532911
2013 87528 72643 31680 191851

From this we note that in 2003, 2004 and 2005 not many daytime-values were missing. Thus we look at the ratio of the predictor’s values without considering the missing values:

non_na_data <- data[complete.cases(data$daytime, data$year), ]
non_na_data$daytime <- as.numeric(as.character(non_na_data$daytime))
non_na_data$year <- as.numeric(as.character(non_na_data$year))
cross_tab_year <- table(non_na_data$year, non_na_data$daytime)
 
normalized_cross_tab_year <- cbind(cross_tab_year[,1]/rowSums(cross_tab_year), cross_tab_year[,2]/rowSums(cross_tab_year))
 
kable(normalized_cross_tab_year, 
      col.names = colnames(normalized_cross_tab_year), 
      caption="Daytime-ratios depending on year") %>% 
  kable_styling()
Daytime-ratios depending on year
2003 0.6226793 0.3773207
2004 0.6414913 0.3585087
2005 0.6294437 0.3705563
2006 0.5677757 0.4322243
2007 0.5878975 0.4121025
2008 0.5774653 0.4225347
2009 0.5382256 0.4617744
2010 0.5520741 0.4479259
2011 0.5273154 0.4726846
2012 0.5384639 0.4615361
2013 0.5464660 0.4535340

We note that the ratios for the years with few missing value are quite different from the years with many missing values. In general we note that from the year 2006 and forward we have fewer observations of daytime being 0. This could indicate that the values are missing not at random. Considering the data collection the missing values could be due to:

  1. The police could be understaffed at night and thus might not prioritize certain variables when patrolling as much as officers during the day. This could lead to the data not being consistently collected regarding the time of the day for each stop. Furthermore when the police need to use more force to deescalate a stop of civilians, they may be more focused on immediate safety and security concerns rather than detailed data recording such as daytime. On the other hand certain stops might happen quickly or in situations where the time of day is less relevant, making officers less likely to record this information.

  2. The police could have a hard time evaluating if the stop happened at night or day. Also there could be inconsistency in how different officers or departments report the time of day, leading to mismatch data and missing values.

  3. In some cases, the police might avoid recording the exact time of day to make it easier to defend their own actions or even in some cases the civilian’s privacy.

  4. The types and rates of criminal activity can vary by time of day. Some crimes and thus forces may be more common at certain times. As a result, officers may conduct more stop and frisk procedures during these times. From the cross tabulation we see, that the force being used is stronger at night than at day.

table(data$force, data$daytime, useNA = "ifany")
##    
##           0       1    <NA>
##   0 1910169 1500824  499246
##   1  396145  246338   93568
##   2   66484   40270   16144
##   3   69442   65200   22090
##   4    3735    2541    1089
##   5   16010   10002    5578
##   6    9213    5507    3049
##   7     876     605     266

Because of the big amount and pattern of missing values of daytime and the considerations above, we conclude that the values must be missing not at random. Even though we suspect police might be more likely to record “daytime” values accurately while underreporting or omitting “nighttime”, we need a more thoroughly insight into the data collection and why certain values are missing. An imputation of these missing values might lead to bias values, why we choose discard daytime as a predictor.

For the remainder of the variables, we know, that when variables are missing less than \(5\%\) of data values we can assume them to be missing at random. Thus we are safe to impute the missing values by imputing the most probable value for discrete variables and the conditional mean \(\mathbb{E}(X_{z^c}|X_z)\) for continuous variables (p. 99). For this reason we impute the missing values by utilizing MICE-imputation and the pmm method.

Imputed data model fitting

We impute our missing data using the Mice package, by bootstrapping our model data 50 times. We note that all the predictors have less than 5% of missing values, whence as a rule of thumb we generate five new data sets using the PMM method. This method uses predictive mean matching - hence PMM - to impute the missing values in the data set as the data is missing at random.

modeldata_imp <- mice(modeldata,m=5,maxit=50,meth='pmm',seed=500)
summary(modeldata_imp)

#watch imputed data
modeldata_imp$imp$daytime

#puts imputed data from 1st bootstrap back into dataset
modeldata_mice1 <- complete(modeldata_imp, 1)
modeldata_mice2 <- complete(modeldata_imp, 2)
modeldata_mice3 <- complete(modeldata_imp, 3)
modeldata_mice4 <- complete(modeldata_imp, 4)
modeldata_mice5 <- complete(modeldata_imp, 5)

After reading the new data sets, given the names mice1, mice2, mice3, mice4 and mice5, we need to check if the imputed data sets have similar imputed values. To do this, we’ll start by looking at the imputed data for age2, as this is the variable with the highest number of missing values we’ll use as a predictor.

The following histograms depict the distribution of age for the original data set without the missing values and all of the imputed data sets.

tmp <- lapply(names(data), function(x) {
  ggplot(data = data, aes(x = data[[x]]), ) +
    geom_bar(fill = 'darkseagreen') +
    labs(x = x, y = "") 
})
grid.arrange(
  tmp[[4]] + labs(title = "Original w/o NA's") + coord_cartesian(xlim = c(10,75)) + xlab("Age"),
  tmp_mice1[[3]] + labs(title = "Mice 1") + coord_cartesian(xlim = c(10,75)) + xlab("Age"),
  tmp_mice2[[3]] + labs(title = "Mice 2") + coord_cartesian(xlim = c(10,75)) + xlab("Age"),
  tmp_mice3[[3]] + labs(title = "Mice 3") + coord_cartesian(xlim = c(10,75)) + xlab("Age"),
  tmp_mice4[[3]]+ labs(title = "Mice 4") + coord_cartesian(xlim = c(10,75)) + xlab("Age"),
  tmp_mice5[[3]] + labs(title = "Mice 5") + coord_cartesian(xlim = c(10,75)) + xlab("Age"),
  ncol = 3
)

The imputed data sets are not only seemingly identical to the original, they are also seemingly identical to each other. This could indicate that the data is imputed similarly in the five different data sets and because all of the other predictors have a smaller number of missing values, we assume that these are imputed similarly for the five data sets as well.

Now we are ready to begin the model selection and model fitting.
We add a filter on age2 to only include children and young people under the age of 21 in order to investigate our group of interest.

To further research the differences of the imputed data sets, we use lapply to fit five additive GLM’s, one for each imputed data set, and use qqplot to plot the estimates for all five models.

mice1$Data <- "Mice 1"
mice2$Data <- "Mice 2"
mice3$Data <- "Mice 3"
mice4$Data <- "Mice 4"
mice5$Data <- "Mice 5"

combined_data <- rbind(mice1, mice2, mice3, mice4, mice5)

models <- split(combined_data, combined_data$Data) %>% 
  map(~ glm(bin_force ~ age2 + inout2 + ac_incid + offunif2 + cs_furtv + cs_lkout, data = ., family = binomial(link=logit)))

plot_data <- map_dfr(models, broom::tidy, .id = "Data")

ggplot(plot_data, aes(x = term, y = estimate, color = Data)) +
  geom_point() +
  geom_errorbar(aes(ymin = estimate - std.error, ymax = estimate + std.error), width = 0.2) +
  theme_minimal() + theme(axis.text.x = element_text(angle = 60, vjust = 1, hjust=1)) +
  labs(x = "", y = "Estimates")

Overall the estimates generated from each imputed data set look quite similar implying that the data sets are similar.

Analysis on imputed datasets

The pool function averages the estimates of the complete data model, computes the total variance over the repeated analyses by Rubin’s rules, thus generating a collective model based on the five imputed data sets.

fits <- lapply(list(mice1, mice2, mice3, mice4, mice5), function(data) {
  glm(bin_force ~ age2 + inout2 + ac_incid + offunif2 + cs_furtv + cs_lkout, 
      family = binomial(link = logit), data = data) })

pooled_results <- pool(fits)

est1 <- summary(fits[[1]])$coefficients[,1]
est2 <- summary(fits[[2]])$coefficients[,1]
est3 <- summary(fits[[3]])$coefficients[,1]
est4 <- summary(fits[[4]])$coefficients[,1]
est5 <- summary(fits[[5]])$coefficients[,1]
pool <- summary(pooled_results)$estimate

pooledsum <- cbind(est1,est2,est3,est4,est5,pool)
kable(pooledsum, 
      col.names = colnames(pooledsum), 
      caption="Daytime-ratios depending on year") %>% 
  kable_styling()
Daytime-ratios depending on year
est1 est2 est3 est4 est5 pool
(Intercept) -0.9299106 -0.9449116 -0.9341736 -0.9289438 -0.9360392 -0.9347957
age2 0.0006069 0.0014609 0.0008699 0.0007042 0.0010346 0.0009353
inout21 -0.3021537 -0.3016910 -0.3015755 -0.3042366 -0.3022765 -0.3023866
ac_incid1 -0.1835088 -0.1835546 -0.1838671 -0.1837429 -0.1839454 -0.1837238
offunif21 -0.3373804 -0.3365833 -0.3357833 -0.3371664 -0.3368837 -0.3367594
cs_furtv1 0.3928818 0.3925586 0.3919383 0.3906458 0.3914825 0.3919014
cs_lkout1 -0.2582018 -0.2580985 -0.2579648 -0.2584940 -0.2586147 -0.2582748

When looking at the estimates for each imputed data sets next to the pooled estimates, we again see that they are quite similar. Because the imputed data sets are quite similar, we could in theory just choose one. Consequently, we compute the imputed data set with the median of the estimates.

coefficients_matrix <- matrix(c(est1, est2, est3, est4, est5), 8, 5)

median_coefficients <- apply(coefficients_matrix, 1, median)

median_index <- which.min(median_coefficients)

median_data_set <- paste0("est", median_index)

cat("The data set with median coefficients is:", median_data_set, "\n")
## The data set with median coefficients is: est8

The imputed data set which contain the median of the estimates is mice1, therefor we move forward with this data set.

Drop1 for additive model

To investigate if some of our chosen predictors are redundant we peform a drop1 on the additive model we fitted in an earlier section. The drop1 function measures the significance of each predictor by comparing the full model with a nested model in which one predictor is dropped at a time, measuring the significance using ANOVA with a \(\chi^2\)-test.

add_model <- glm(bin_force ~ age2 + inout2 + ac_incid + offunif2 + cs_furtv + cs_lkout, 
      family = binomial(link = logit), data = mice1)

drop1(add_model, test = "Chisq")
## Single term deletions
## 
## Model:
## bin_force ~ age2 + inout2 + ac_incid + offunif2 + cs_furtv + 
##     cs_lkout
##          Df Deviance     AIC     LRT Pr(>Chi)    
## <none>       1765323 1765337                     
## age2      1  1765324 1765336     0.4   0.5463    
## inout2    1  1769444 1769456  4120.7   <2e-16 ***
## ac_incid  1  1767696 1767708  2373.1   <2e-16 ***
## offunif2  1  1772366 1772378  7042.8   <2e-16 ***
## cs_furtv  1  1776374 1776386 11050.4   <2e-16 ***
## cs_lkout  1  1767939 1767951  2615.8   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We can conclude that all the predictors are significant though with age2 being slightly less significant than the others. This is not alerting as the main purpose of the model is to investigate how the setting and appearance predictors interact with each other across the subjects age, increasing or decreasing the risk of the police using force.

Training error and cross validation on different forms

With our chosen imputed dataset, we start the actual modelfitting. As we aim to predict the risk of use of force, we utilize the logistic regression model. The response variable will be the possibility of any use of force against the civilian, only based on our parameters of interest. We initially want to fit the logistic regression as an additive model, mentioned above. In order to answer our question however, we find the interaction of combinations of a setting and appearance predictor relevant for interpretation. In addition we test the nonlinear behaviour of age as a predictor.

To evaluate both these interaction models and the non-linear modelling, an ANOVA would be useful, as performed in the drop1 for the additive models. However, as the interaction models with different combination are not nested, the assumptions for using ANOVA are not fullfilled. In extension, we also test almost every combination, which would yield a new prediction for every specific combination and saturates the model, which is at risk of overfitting. To prevent that, we calculate the training error and perform cross-validation and evaluate the models on a combination of those.

We calculate the training error based on the squared deviance residuals, to obtain comparable values, that express the models goodness-of-fit.

trainErr <- function(mod) {
  mean(residuals(mod, type = "deviance") ^ 2)
}

form0 <- bin_force ~ age2 + inout2 + ac_incid + offunif2 + cs_lkout + cs_furtv

form1 <- bin_force ~ age2*inout2 + ac_incid + offunif2 + cs_lkout + cs_furtv

form2 <- bin_force ~ age2 + inout2*cs_lkout + ac_incid + offunif2 + cs_furtv

form3 <- bin_force ~ age2 + inout2*cs_furtv + ac_incid + offunif2 + cs_lkout


form4 <- bin_force ~ age2*ac_incid + inout2 + offunif2 + cs_lkout + cs_furtv

form5 <- bin_force ~ age2 + inout2 + offunif2 + cs_lkout*ac_incid + cs_furtv

form6 <- bin_force ~ age2 + inout2 + offunif2 + cs_lkout + cs_furtv*ac_incid


form7 <- bin_force ~ age2*offunif2 + inout2 + ac_incid + cs_lkout + cs_furtv

form8 <- bin_force ~ age2  + inout2 + ac_incid + cs_lkout*offunif2 + cs_furtv

form9 <- bin_force ~ age2 + inout2 + ac_incid + cs_lkout + cs_furtv*offunif2 


form10 <- bin_force ~ age2 + inout2 + ac_incid + cs_lkout*cs_furtv*offunif2 

form11 <- bin_force ~ age2  + inout2 + cs_furtv*offunif2*ac_incid + cs_lkout

form12 <- bin_force ~ age2 + cs_furtv*offunif2*inout2 + ac_incid + cs_lkout

form12_1 <- bin_force ~ ns(age2, df=2) + cs_furtv*offunif2*inout2 + ac_incid + cs_lkout 

form13 <- bin_force ~ cs_furtv*offunif2*age2  + inout2 + ac_incid + cs_lkout 

form13_1 <- bin_force ~ cs_furtv*offunif2*ns(age2, df=2)  + inout2 + ac_incid + cs_lkout 

form14 <- bin_force ~ age2*(cs_furtv + cs_lkout)*( inout2 + ac_incid + offunif2)

form15 <- bin_force ~ ns(age2, df = 2)*(cs_furtv + cs_lkout)*(inout2 + ac_incid + offunif2)

form16 <- bin_force ~ ns(age2, df = 4)*(cs_furtv + cs_lkout)*(inout2 + ac_incid + offunif2)

form17 <- bin_force ~ ns(age2, df = 2)*(cs_furtv + cs_lkout)*(ac_incid + offunif2) + inout2

form18 <- bin_force ~ ns(age2, df = 2)*cs_furtv*(ac_incid + offunif2 + inout2) + cs_lkout

form19 <- bin_force ~ ns(age2, df = 2)*cs_furtv*(ac_incid + inout2) + offunif2*cs_lkout

form20 <- bin_force ~ ns(age2, df = 2)*(cs_furtv + cs_lkout)*(ac_incid + offunif2) + ns(age2, df = 2)*inout2

results1 <- sapply(list(form0, form1, form2, form3, form4, form5, form6, form7, form8, form9, form17, form18),
       function(form){trainErr(glm(form, data = mice1,
                                   family = binomial(link = "logit")))})
result1_df <- as.data.frame(results1)
result1_df$form_name <- c("form0", "form1", "form2", "form3", "form4", "form5", "form6", "form7", "form8", "form9", "form17", "form18")
colnames(result1_df) <- c("Train_Error","Model")

results2 <- sapply(list(form10, form11, form12, form12_1, form13, form13_1, form14, form15, form16, form19, form20),
       function(form){trainErr(glm(form, data = mice1,
                                   family = binomial(link = "logit")))})
result2_df <- as.data.frame(results2)
result2_df$form_name <- c("form10", "form11", "form12", "form12_1", "form13", "form13_1", "form14", "form15", "form16", "form19", "form20")
colnames(result2_df) <- c("Train_Error","Model")


kable(t(result1_df), 
      col.names = colnames(t(result1_df)),
      caption = "Training error for models")%>% 
  kable_styling()
Training error for models
Train_Error 1.077868 1.077856 1.077668 1.077763 1.077857 1.077542 1.076967 1.077868 1.077795 1.077832 1.076503 1.076728
Model form0 form1 form2 form3 form4 form5 form6 form7 form8 form9 form17 form18
kable(t(result2_df), 
      col.names = colnames(t(result2_df)) )%>% 
  kable_styling()
Train_Error 1.076860 1.076753 1.076927 1.076906 1.077793 1.077768 1.076188 1.076142 1.076106 1.076692 1.076487
Model form10 form11 form12 form12_1 form13 form13_1 form14 form15 form16 form19 form20

Based on the output of training errors, we notice the additive model and interaction models with 2nd order interactions have higher training errors than the higher order interactions. Also the interaction models including the splines on age yields lower training error. However, we notice a pattern that the lower training error, the more saturated our modelforms are in general, therefore we perform 5-fold cross-validation, to test the predictive strength. Specifically how \(80\%\) of the set to predict the remaining responses. This step is especially important for our interaction models, in order to prevent overfitting.

We perform the cross-validation, where we assume that the squared deviance residuals in the model is given by

\[ d(Y, \hat{\mu}) = -2 (Y \cdot \log(\hat{\mu}) + (1+Y)\cdot \log{(1-\hat{\mu})})\]

set.seed(200601)
cv_data <- mice1 %>%
  mutate(bin_force = as.numeric(as.character(bin_force)))

testCV <- function(form, data, B = 1, k = 5) {
  n <- nrow(data)
  y <- data$bin_force
  PEcv <- vector("list", B)
  tmp <- numeric(n)
  for(b in 1:B) {
    
    group <- sample(rep(1:k, length.out = n))
    for(i in 1:k) {
      modelcv <- glm(form, family=binomial("logit"), data = data[group != i, ])
      X_cv <- model.matrix(modelcv, data[group == i, ])
      muhat <- predict(modelcv, newdata = data[group == i, ], type = "response") 
       y_cv <- y[group == i]
       
       tmp[group == i] <- -(2*(y_cv*log(muhat)+(1-y_cv)*log(1-muhat)))
  
    }
    PEcv[[b]] <- tmp
  }
  mean(unlist(PEcv))
}

CV1s <- sapply(list(form0,form1, form2, form3, form4, form5, form6, form7, form8, form9, form17, form18), function(form){testCV(form, cv_data)})
CV1s_df <- as.data.frame(CV1s)
CV1s_df$form_name <- c("form0", "form1", "form2", "form3", "form4", "form5", "form6", "form7", "form8", "form9", "form17", "form18")
colnames(CV1s_df) <- c("EGE","Model")

CV2s <- sapply(list(form10, form11, form12, form12_1, form13, form13_1, form14, form15, form16, form19, form20), function(form){testCV(form, cv_data)})
CV2s_df <- as.data.frame(CV2s)
CV2s_df$form_name <- c("form10", "form11", "form12", "form12_1", "form13", "form13_1", "form14", "form15", "form16", "form19", "form20")
colnames(CV2s_df) <- c("EGE","Model")

kable(t(CV1s_df), 
      col.names = colnames(t(CV1s_df)), 
      caption="Expected Generalization Error for models") %>% 
  kable_styling()
Expected Generalization Error for models
EGE 1.077876 1.077865 1.077674 1.077774 1.077864 1.077551 1.076982 1.077875 1.077805 1.077840 1.076547 1.076764
Model form0 form1 form2 form3 form4 form5 form6 form7 form8 form9 form17 form18
kable(t(CV2s_df), 
      col.names = colnames(t(CV2s_df)) ) %>% 
  kable_styling()
EGE 1.076878 1.076772 1.076941 1.076916 1.077807 1.077790 1.076215 1.076199 1.076205 1.076712 1.076527
Model form10 form11 form12 form12_1 form13 form13_1 form14 form15 form16 form19 form20

The expected generalization errors does not systematically get higher or lower values, when the order of interaction changes, and the EGE’s are generally close to each other. However, the model where each setting predictor interact with each appearance predictor, which are form14, form15 and form16, does have smaller EGE. As these also had the lowest training erros, it is sensible to choose this interaction as our best model to fit and predict the data.

Furthermore the models with splines are also at risk of overfitting. These are form12_1, form13_1 and form15 with 2 degrees of freedom and form16 with 4 degrees of freedom. In general the splines with 2 degress of freedom also have lower EGE’s, therefore the most sensible selection of a model seems to be form15.

Secondly, to test the use of splines on age, we perform the ANOVA-testing on the chosen interaction model without splines, with 2 degrees of freedom and with 4 degrees of freedom.

int_model <- glm(form15, 
      family = binomial(link = logit), data = mice1)

int_model_no_spl <- glm(form14, 
      family = binomial(link = logit), data = mice1)

anovatable_df2 <- anova(int_model_no_spl, int_model, test = "Chisq")

kable(anovatable_df2, 
      col.names = colnames(anovatable_df2), 
      caption="Expected Generalization Error for models") %>% 
  kable_styling()
Expected Generalization Error for models
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1637767 1762570 NA NA NA
1637755 1762495 12 74.74708 0

Residualplot of the final model

To investigate our final model, we plot the residuals:

cv_data <- mice1 %>%
  mutate(bin_force = as.numeric(as.character(bin_force)))

final_model <- glm(bin_force ~ ns(age2, df = 2)*(cs_furtv + cs_lkout)*(inout2 + ac_incid + offunif2), data = cv_data, family = binomial)

simDiag <- fortify(final_model)

ggplot(simDiag, aes(x = .fitted, y = .resid)) +
  geom_point() +
  geom_smooth(method = "glm", color = "darkseagreen", fill = "darkseagreen2")

We can’t tell if the residual plot seem reasonable thus we plot the simulated residuals under the null hypothesis that our model is correct and compare with the residual plot above.

set.seed(2023)

yNew1 <- simulate(final_model)[,1]
yNew2 <- simulate(final_model)[,1]
yNew3 <- simulate(final_model)[,1]
yNew4 <- simulate(final_model)[,1]


simGlmNew1 <- glm(yNew1 ~ ns(age2, df = 2)*(cs_furtv + cs_lkout)*(inout2 + ac_incid + offunif2), data = cv_data, family = binomial(link = logit), control = glm.control(maxit = 1000))
simGlmNew2 <- glm(yNew2 ~ ns(age2, df = 2)*(cs_furtv + cs_lkout)*(inout2 + ac_incid + offunif2), data = cv_data, family = binomial(link = logit), control = glm.control(maxit = 1000))
simGlmNew3 <- glm(yNew3 ~ ns(age2, df = 2)*(cs_furtv + cs_lkout)*(inout2 + ac_incid + offunif2), data = cv_data, family = binomial(link = logit), control = glm.control(maxit = 1000))
simGlmNew4 <- glm(yNew4 ~ ns(age2, df = 2)*(cs_furtv + cs_lkout)*(inout2 + ac_incid + offunif2), data = cv_data, family = binomial(link = logit), control = glm.control(maxit = 1000))

simDiagNew1 <- fortify(simGlmNew1)
simDiagNew2 <- fortify(simGlmNew2)
simDiagNew3 <- fortify(simGlmNew3)
simDiagNew4 <- fortify(simGlmNew4)


p1 <- qplot(.fitted, .resid, data = simDiagNew1) +
  geom_smooth(method = "glm", color = "darkseagreen", fill = "darkseagreen2")
p2 <- qplot(.fitted, .resid, data = simDiagNew2) +
  geom_smooth(method = "glm", color = "darkseagreen", fill = "darkseagreen2")
p3 <- qplot(.fitted, .resid, data = simDiagNew3) +
  geom_smooth(method = "glm", color = "darkseagreen", fill = "darkseagreen2")
p4 <- qplot(.fitted, .resid, data = simDiagNew4) +
  geom_smooth(method = "glm", color = "darkseagreen", fill = "darkseagreen2")


grid.arrange(p1,p2,p3,p4, ncol=2)

Since the simulated residual plots are similar to the one of our model, we don’t see any major reason for concern: Our model’s fit seems consistent across different resampled data sets, which indicates that it’s less likely to be a subject to overfitting. Also the consistency in the residual plots suggest that the model is capturing underlying patterns in the data without being affected by random outliers which in turn imply that the model’s predictive power is reliable. This again means that the model is likely to generalize well to unseen data.

Interaction model

Given the above model selection, we find the interaction model with the natural cubic splines of age, the appearance predictors and setting predictors, we choose below model to predict the risk of force being used against adolescent under the age of 21.

int_model <- glm(bin_force ~ ns(age2, df = 2)*(cs_furtv + cs_lkout)*(inout2 + ac_incid + offunif2), 
      family = binomial(link = logit), data = mice1)

sum <- (summary(int_model))$coefficients
kable(sum, 
      col.names = colnames(sum), 
      caption="Summary of the interaction model") %>% 
  kable_styling()
Summary of the interaction model
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.8693727 0.0611126 -14.2257470 0.0000000
ns(age2, df = 2)1 0.0452462 0.1197751 0.3777599 0.7056090
ns(age2, df = 2)2 -0.0225723 0.0199947 -1.1289156 0.2589334
cs_furtv1 0.1709293 0.0871817 1.9606092 0.0499246
cs_lkout1 -0.2940791 0.1310688 -2.2437005 0.0248517
inout21 -0.3649725 0.0661200 -5.5198497 0.0000000
ac_incid1 -0.5269654 0.0571303 -9.2239195 0.0000000
offunif21 -0.2881609 0.0634883 -4.5388055 0.0000057
ns(age2, df = 2)1:cs_furtv1 0.1582603 0.1701951 0.9298754 0.3524356
ns(age2, df = 2)2:cs_furtv1 0.0525042 0.0276052 1.9019661 0.0571756
ns(age2, df = 2)1:cs_lkout1 -0.2079858 0.2556665 -0.8135042 0.4159291
ns(age2, df = 2)2:cs_lkout1 0.0065120 0.0408639 0.1593592 0.8733859
ns(age2, df = 2)1:inout21 -0.0395810 0.1301147 -0.3042012 0.7609746
ns(age2, df = 2)2:inout21 -0.0761867 0.0221588 -3.4382159 0.0005856
ns(age2, df = 2)1:ac_incid1 0.3259691 0.1120433 2.9093135 0.0036222
ns(age2, df = 2)2:ac_incid1 0.0448788 0.0187604 2.3922146 0.0167470
ns(age2, df = 2)1:offunif21 0.0054804 0.1244037 0.0440537 0.9648616
ns(age2, df = 2)2:offunif21 -0.0248245 0.0206661 -1.2012167 0.2296671
cs_furtv1:inout21 -0.0092462 0.0993968 -0.0930228 0.9258854
cs_furtv1:ac_incid1 0.2666623 0.0800691 3.3304019 0.0008672
cs_furtv1:offunif21 0.0148749 0.0879367 0.1691546 0.8656751
cs_lkout1:inout21 0.1856105 0.1403612 1.3223773 0.1860425
cs_lkout1:ac_incid1 0.1984317 0.1213646 1.6350044 0.1020481
cs_lkout1:offunif21 -0.2179842 0.1196576 -1.8217332 0.0684955
ns(age2, df = 2)1:cs_furtv1:inout21 0.2637838 0.1952581 1.3509494 0.1767116
ns(age2, df = 2)2:cs_furtv1:inout21 0.0346328 0.0324584 1.0669878 0.2859773
ns(age2, df = 2)1:cs_furtv1:ac_incid1 0.0333905 0.1567124 0.2130687 0.8312734
ns(age2, df = 2)2:cs_furtv1:ac_incid1 -0.0165104 0.0256670 -0.6432524 0.5200604
ns(age2, df = 2)1:cs_furtv1:offunif21 -0.1490782 0.1717349 -0.8680714 0.3853553
ns(age2, df = 2)2:cs_furtv1:offunif21 0.0495722 0.0277823 1.7843088 0.0743735
ns(age2, df = 2)1:cs_lkout1:inout21 0.1481126 0.2756907 0.5372420 0.5911005
ns(age2, df = 2)2:cs_lkout1:inout21 -0.0119466 0.0454174 -0.2630413 0.7925187
ns(age2, df = 2)1:cs_lkout1:ac_incid1 0.0996025 0.2372625 0.4197989 0.6746324
ns(age2, df = 2)2:cs_lkout1:ac_incid1 0.0207898 0.0382253 0.5438749 0.5865275
ns(age2, df = 2)1:cs_lkout1:offunif21 0.2043009 0.2334488 0.8751421 0.3814966
ns(age2, df = 2)2:cs_lkout1:offunif21 -0.0003374 0.0371558 -0.0090813 0.9927542

From our summary, we notice our p-values describing each of the interactions differ quite significantly. Therefore we plot them, to get an overview of the interaction terms.

p_values <- summary(int_model)$coefficients[,4]


p_values <- data.frame(p_values)


ggplot(p_values, aes(x = reorder(rownames(summary(int_model)$coefficients), p_values), y = p_values)) +
  geom_bar(stat = "identity", fill = "darkseagreen3") +
  labs(title = "P-Values of Coefficients", x = "", y = "P-Value") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  geom_hline(yintercept = 0.05, color = "red", linetype = "dashed")

On our plot we have added a dotted line at 5% only as a benchmark. It is clear to the eye, that a lot of the interaction terms have a high p-value, which indicates that the terms are more likely to have no effect on the response variable. Especially we notice the p-values of the interaction terms containing inout2and cs_lkout are large, which suggests that these interactions have a less significant effect on the response. Even though these terms have less effect, an analysis of that effect is already included in the calculation of the expected generalization error. Therefore we are not concerned with the big amount of interaction terms with higher p-values. Also we note that the interaction ac_incid:cs_furtv has a relatively small p-value, hence this estimate is different from 0 with higher probability. The take away from this is that even though the probability of most terms’ estimates being equal to 0 is large, they still contribute to the model’s predictive strengths.

For our model, it is now possible to plot the predicted possibility of force used against adolescents.

predFrame0 <- expand.grid(cs_furtv = factor(c(0)),
                         ac_incid = factor(1),
                         inout2 = factor(c(0)),
                         offunif2 = factor(c(0)),
                         cs_lkout = factor(c(1)),
                         age2 = 10:21)
predFrame0$predicted_prob <- predict(int_model, type = "response",
                                     newdata = predFrame0, interval = "confidence")
p_pred0 <- ggplot(predFrame0, aes(x = age2, y = predicted_prob)) +
  facet_grid(cs_lkout + cs_furtv ~ ac_incid + inout2 + offunif2, label = label_both) +
  geom_line(color = "cadetblue") + coord_cartesian(ylim = c(0.1,0.4)) +
  geom_hline(yintercept = 0.25, linetype = "dashed", color = "firebrick3") +
  scale_y_continuous(breaks = c(0.15, 0.25, 0.35)) +
  labs(x = "Age", y = "Risk of force")

predFramecom <- expand.grid(cs_furtv = factor(c(1)),
                         ac_incid = factor(1),
                         inout2 = factor(c(0)),
                         offunif2 = factor(c(0)),
                         cs_lkout = factor(c(0)),
                         age2 = 10:21)

predFramecom$predicted_prob <- predict(int_model, type = "response",
                                     newdata = predFramecom, interval = "confidence")

p_predcom <- ggplot(predFramecom, aes(x = age2, y = predicted_prob)) +
  facet_grid(cs_lkout + cs_furtv ~ ac_incid+ inout2 + offunif2, label = label_both) +
  geom_line(color = "cadetblue") + coord_cartesian(ylim = c(0.1,0.4)) +
  geom_hline(yintercept = 0.25, linetype = "dashed", color = "firebrick3") +
  scale_y_continuous(breaks = c(0.15, 0.25, 0.35)) +
  labs(x = "Age", y = "Risk of force")

grid.arrange(p_pred0, p_predcom, ncol = 2)

The plot above presents a portion of the predictions from our model. The graph on the left, describes the risk of the police using force on the subject when stopped, with a red line marking 25%. It represents the following situation: it is nighttime, the subject is outside in a high crime area meeting an officer not in uniform. The subject appears as being on a lookout without exhibiting any suspicious movements. The prediction plot suggests that there’s approximately 21% risk of force being used against the subject, and this risk remains consistent for individuals aged between 10 and 21.

The right prediction plot indicates that, under the same setting, when the subject doesn’t appear to be on the lookout but exhibits suspicious movements, the risk of force being used against them surges significantly, exceeding 30%. Another interesting change that our model imply based on the change of appearance is, that the risk of force used against the civilian depends on the age. The risk increases with the age until around the age of 18, where it decreases a bit. This second order polynomial behaviour is due to our utilization of splines, which exactly is justified by the predictive strength.

predFrame <- expand.grid(cs_furtv = factor(c(0)),
                         ac_incid = factor(1),
                         inout2 = factor(c(0)),
                         offunif2 = factor(c(0)),
                         cs_lkout = factor(c(1)),
                         age2 = 10:21)

predFrame$predicted_prob <- predict(int_model, type = "response",
                                     newdata = predFrame, interval = "confidence")

p_pred <- ggplot(predFrame0, aes(x = age2, y = predicted_prob)) +
  facet_grid(cs_lkout + cs_furtv ~ ac_incid  + inout2 + offunif2, label = label_both) +
  geom_line(color = "cadetblue") + coord_cartesian(ylim = c(0.1,0.9)) +
  geom_hline(yintercept = 0.6, linetype = "dashed", color = "firebrick3") +
  geom_smooth(method = "glm", method.args = list(family = binomial(link = "logit")), se = TRUE, linetype = "blank") +
  scale_y_continuous(breaks = c(0.2, 0.4, 0.6, 0.8)) +
  labs(x = "Age", y = "Risk of force")

predFrame1 <- expand.grid(cs_furtv = factor(c(1)),
                         ac_incid = factor(1),
                         inout2 = factor(c(0)),
                         offunif2 = factor(c(0)),
                         cs_lkout = factor(c(0)),
                         age2 = 10:21)

predFrame1$predicted_prob <- predict(int_model, type = "response",
                                     newdata = predFrame1, interval = "confidence")

p_pred1 <- ggplot(predFrame1, aes(x = age2, y = predicted_prob)) +
  facet_grid(cs_lkout + cs_furtv ~ ac_incid  + inout2 + offunif2, label = label_both) +
  geom_line(color = "cadetblue") + coord_cartesian(ylim = c(0,0.9)) +
  geom_hline(yintercept = 0.6, linetype = "dashed", color = "firebrick3") +
  geom_smooth(method = "glm", method.args = list(family = binomial(link = "logit")), se = TRUE, linetype = "blank") +
  scale_y_continuous(breaks = c(0.2, 0.4, 0.6, 0.8)) +
  labs(x = "Age", y = "Risk of force")

grid.arrange(p_pred, p_pred1, ncol = 2)
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'

Even though we just presented the prediction plots which describes the risk of the police using force as if they were a reality, we believe that this isn’t the full truth due to the models reliability.

predFrame1full <- expand.grid(cs_furtv = factor(c(0,1)),
                         ac_incid = factor(0),
                         inout2 = factor(c(0,1)),
                         offunif2 = factor(c(0,1)),
                         cs_lkout = factor(c(0,1)),
                         age2 = 10:21)

predFrame1full$predicted_prob <- predict(int_model, type = "response",
                                     newdata = predFrame1full, interval = "confidence")

p_pred1full <- ggplot(predFrame1full, aes(x = age2, y = predicted_prob)) +
  facet_grid(cs_lkout + cs_furtv ~ ac_incid + inout2 + offunif2, label = label_both) +
  geom_line(color = "cadetblue") + coord_cartesian(ylim = c(0,0.9)) +
  geom_hline(yintercept = 0.6, linetype = "dashed", color = "firebrick3") +
  geom_smooth(method = "glm", method.args = list(family = binomial(link = "logit")), se = TRUE, linetype = "blank") +
  theme(axis.text.y = element_text(size = 5)) + 
  scale_y_continuous(breaks = c(0.2, 0.4, 0.6, 0.8)) +
  theme( strip.text.y = element_text(angle = 0)) +
  labs(x = "Age", y = "Risk of force")
  


predFrame2full <- expand.grid(cs_furtv = factor(c(0,1)),
                         ac_incid = factor(1),
                         inout2 = factor(c(0,1)),
                         offunif2 = factor(c(0,1)),
                         cs_lkout = factor(c(0,1)),
                         age2 = 10:21)

predFrame2full$predicted_prob <- predict(int_model, type = "response",
                                     newdata = predFrame2full, interval = "confidence")

p_pred2full <- ggplot(predFrame2full, aes(x = age2, y = predicted_prob)) +
  facet_grid(cs_lkout + cs_furtv ~ ac_incid + inout2 + offunif2, label = label_both) +
  geom_line(color = "cadetblue") + coord_cartesian(ylim = c(0,0.9)) +
  geom_hline(yintercept = 0.6, linetype = "dashed", color = "firebrick3") +
  geom_smooth(method = "glm", method.args = list(family = binomial(link = "logit")), se = TRUE, linetype = "blank") +
  theme(axis.text.y = element_text(size = 5)) +
  scale_y_continuous(breaks = c(0.2, 0.4, 0.6, 0.8)) +
  theme( strip.text.y = element_text(angle = 0)) +
  labs(x = "Age", y = "Risk of force")

p_pred1full

p_pred2full

Above is shown our full model prediction plot, with a dotted line at 60% as a comparable benchmark. By looking at the confidence intervals for each combination of setting and appearance predictors, one can tell that the real risk of the police using force when stopping a subject might as well be a great deal larger, or even smaller. When generating 100 confidence intervals 95 of them would contain the true risk of the police using force, and due to the width of the confidence intervals these will vary quite a lot.

We notice a few general tendencies from the overview: Firstly, the appearance seems to have an effect on the risk of the police using force. The strongest difference of risk is seen between the scenario where the subject is appearing to be on a lookout and not moving suspiciously, and the scenario where the subject is definitely not on a lookout but appear to be moving suspiciously. The model implies that the latter scenario has a 95% confidence interval, where the upper band mostly is above our benchmark. Opposedly, the 95%-confidence interval of the risk force usage is consistently lower in the first scenario, indicating that the police is less likely to use force when stopping the subject.

Another tendency in our prediction plot is a higher risk of use of force when the police isn’t wearing a uniform, which we observe on the confidence interval-bands. Even though this effect does not seem statistically strong, it is noteworthy, especially in the context of social theories. One theory suggests that police officers take on a more authoritative role when wearing a uniform, as it symbolizes the expectations they have to meet to fulfill their duties. This does not only affect the police officer’s view upon him or herself, but also the subject’s view of the police officer, their authorities and rights during the stop. Especially regarding children and adolescents as they might have a stronger belief in authority.

From the prediction plots, we also observe a rather counter intuitive tendency showing that the risk of use of force is not higher if the stop occurs in a high crime area. The risk in the prediction plots appears to be higher when the stop occured in a low crime area. This might suggest that police officers don’t hold biases towards the suggested criminal behavior of children and young adults in high crime areas, indicating no discriminatory behavior of the police regarding the area.

However, some of the data might be biased, as already mentioned for instance in the analysis of the variable daytime. As we don’t know who exactly provides the information of the stop and frisk, we must assume that it is the same police officers that conduct the stop and frisk and provide the information of the incident. If the stop and frisk portrays the police officer negatively, he or she might provide false information to make his or her actions appear more appropriate, such as lowering the use of force, changing the subject’s appearance or location of the stop. So, the non-existing bias, the prediction plots suggests, might just be due to fabricated, left out or biased data.

From above discussion, we have concluded the tendencies our model imply. However, we have encountered diagnostics in our analysis that contradicts the reliability of the model. Mentionable is our Spearman plot, showing little to no correlations within our datasets, which as just disclosed might even be biased. Furthermore in our model selection, the drop1-analysis suggested that all of the predictors are significant and strong in explaining the variation in the response variable. We have in our analysis interpreted it as all the variables are relevant to include in the model, however this could just as well be due to the data in generally correlating badly, and therefore the level of significance is due to the predictors affecting the response differently and not their goodness-of-fit.

We could also recall in the model selection, we chose our final model based on the training errors and expected generalization errors. Both of these were determined based on a comparison of values, that differed less than 0.001. To investigate this difference a little further we plot 100 simulations of the expected generalization errors, to compare how often, the predictive strength of our chosen model is better than the alternatives.

cv_sample <- sapply(numeric(100), function(x) testCV(form15, cv_data, k = 10))

Based on the plot, the EGE is turning out to be less than the lowest value from other models, whoch indicates, that this model would be chosen most of the times. However, above plot does not take into consideration, that the EGE of the other models also could vary.

Finally, it is also clear in our interpretation of the model that our 95%-confidence intervals are concerningly wide, and therefore our predictions from the model are quite vague.

Nevertheless, our model diagnostics show that the residuals are in fact distributed as could be expected from our chosen distribution. Even though this implies that we have chosen the correct underlying distribution, it is not strong enough to outweigh the arguments of why the model has bad reliability.

To answer our scientific question, we can say that the model implies that the appearance has an interactive effect on the risk of use of force against civilians under the age of 21, however, the setting does not necessarily. In conclusion, it is very important to take into consideration, that the model is diagnostically not performing very well, and should therefore be interpreted and utilized with careful considerations.