We all contributed equally to produce this assignment.
In this assignment we have investigated the data from New York City’s Stop, Question and Frisk program from 2003 until 2013. Firstly, we have made an exploratory analysis of the data set by finding correlations and distribution of the variables, whereafter we have considered missing data. Our aim has been to determine how the appearance and setting influence the risk of force used against young civilians under the age of 21. In order to do so, we compare different models by various methods including residual plots, training errors and cross-validations. Lastly we interpret our model’s predictions on biased behaviour from the police, but ultimately conclude in a general discussion of the model diagnostics and data collection that the model does not necessarily reflect the reality.
To begin our exploratory analysis, we start by loading our our data set and mutating it to make sure we can investigate all variables collectively and compare them.
data <- d2 %>%
select(-c("force2", "pct")) %>%
mutate_at(vars(all_of(yes_no_columns)), ~ifelse(. == "N", 0, 1))%>%
mutate(race2 = case_when(
race2 == "white" ~ 0,
race2 == "black" ~ 1,
race2 == "hisp" ~ 2,
race2 == "asian" ~ 3,
race2 == "other" ~ 4
)) %>%
mutate(typeofid2 = case_when(
typeofid2 == "O" ~ 0,
typeofid2 == "P" ~ 1,
typeofid2 == "R" ~ 2,
typeofid2 == "V" ~ 3,
)) %>%
mutate(across(all_of(factor_columns), as.factor)) %>%
mutate(year = as.numeric(year))
summary(data)
## force race2 gender age2 daytime
## 0 :3910239 0 : 492688 0 :4495436 Min. :10 0 :2472074
## 1 : 736051 1 :2886843 1 : 348159 1st Qu.:19 1 :1871287
## 3 : 156732 2 :1215401 NA's: 140796 Median :24 NA's: 641030
## 2 : 122898 3 : 152053 Mean :28
## 5 : 31590 4 : 235105 3rd Qu.:34
## 6 : 17769 NA's: 2301 Max. :90
## (Other): 9112 NA's :41534
## inout2 ac_incid ac_time offunif2 typeofid2
## 0 :3809789 0:2204888 0:3158281 0 :1396968 0 : 84217
## 1 :1147150 1:2779503 1:1826110 1 :3586909 1 :2640092
## NA's: 27452 NA's: 514 2 : 107497
## 3 :2133368
## NA's: 19217
##
##
## othpers2 cs_objcs cs_descr cs_casng cs_lkout cs_cloth
## 0 :3806808 0:4850690 0:4116587 0:3562617 0:4151151 0:4773826
## 1 :1163298 1: 133701 1: 867804 1:1421774 1: 833240 1: 210565
## NA's: 14285
##
##
##
##
## cs_drgtr cs_furtv cs_vcrim cs_bulge cs_other year
## 0:4522192 0:2801105 0:4588325 0:4543022 0:3944522 Min. :2003
## 1: 462199 1:2183286 1: 396066 1: 441369 1:1039869 1st Qu.:2006
## Median :2009
## Mean :2008
## 3rd Qu.:2011
## Max. :2013
##
In order to meet the purpose of developing a model that describes the
probability of police force being used on civilians we choose the
response variable to be force and the exploratory analysis
will thus focus on how the predictors influence and interact with each
other to impact the likelihood of a police force interaction. To
continue the analysis, we generate bar plots for the variables
characterized as factors and histograms for the continuous predictors to
check for the number of missing values, skewness and extreme values.
First of all, we note that force is exceptionally
skewed, which we will elaborate further on in the following.
Most of the other variables are binary, due to the mutation of our
data set. For these we cannot discuss any skewness, however, some of the
variables have significantly more values of 0 than 1, like
cs_objcs and cs_cloth. This may not directly
cause any issues, nonetheless it is worth noticing.
The number of missing values varies from none to a significant amount
as seen in the variable daytime. We’ll account for these
later on.
As age2 is a continuous variable, it is plotted as a
histogram. There is nothing eye catching about the distribution relating
to skewness or extreme values, though we do note that the police tend to
have stopped more younger than older people. This becomes very clear
when looking at the violin plot in which the subjects age is stratified
on the categorical types of force. There is nothing concerning about
this relation either.
The last variable year is for now also a continuous
variable and thus it is plotted as a histogram. Again, there is nothing
eye catching about the distribution relating to skewness or extreme
values, only an indication of the half year period in year 2003 and 2013
due to change in government. However, when looking at the corresponding
violin plot, we observe an extreme amount of type 5 force being used in
year 2006. This could be a mistake but we cannot make any conclusions,
only note that it could be possible.
Prior to initiating our data analysis, it is convenient to search for
variables that would be strong predictors. By that we would be looking
for a numerically large correlation coefficient with the response
variable, force. Furthermore, for the predictor variables,
we would like to avoid using predictors, which have a high correlation
to other predictors, as these most likely will have a similar effect on
the response variable. To get an idea of how the variables correlate we
make a Spearman-plot.
cp <- cor(data.matrix(na.omit(data)), method = "spearman")
ord <- rev(hclust(as.dist(1-abs(cp)))$order)
colPal <- colorRampPalette(c("white", "darkseagreen4"), space = "rgb")(100)
spearman <- levelplot(cp[ord, ord], xlab = "", ylab = "",
col.regions = colPal, at = seq(-1, 1, length.out = 100),
colorkey = list(space = "top", labels = list(cex = 0.5)),
scales = list(x = list(rot = 45),
y = list(rot = 45),
cex = 0.5))
spearman
Unfortunately, the Spearman-plot at first glance does not show any
strong correlations at all. However, knowing that the Spearman-plot by
default cluster the variables with numerically higher correlations.
Therefore, despite the small correlation, we anticipate that
cs_bulge and othpers2 must be the most
correlated predictors to force. It also seems noteworthy
that the correlation between cs_lkout and
cs_casng and between ac_time and
ac_incid seems quite strong, compared to the correlation
between the other variables.
## race2 gender age2 daytime inout2
## 0 : 492688 0 :4495436 Min. :10 0 :2472074 0 :3809789
## 1 :2886843 1 : 348159 1st Qu.:19 1 :1871287 1 :1147150
## 2 :1215401 NA's: 140796 Median :24 NA's: 641030 NA's: 27452
## 3 : 152053 Mean :28
## 4 : 235105 3rd Qu.:34
## NA's: 2301 Max. :90
## NA's :41534
## offunif2 typeofid2 othpers2
## 0 :1396968 0 : 84217 0 :3806808
## 1 :3586909 1 :2640092 1 :1163298
## NA's: 514 2 : 107497 NA's: 14285
## 3 :2133368
## NA's: 19217
##
##
To further research our chosen response variable, force,
we make a crosstabulation between the 7 types of force and year.
CT <- table(data$force, data$year)
kable(CT,
col.names = colnames(CT),
caption="Crosstabulation of variables force and year") %>%
kable_styling()
| 2003 | 2004 | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 | 2011 | 2012 | 2013 | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 119564 | 246398 | 312773 | 406541 | 365744 | 414700 | 439271 | 463337 | 540799 | 443857 | 157255 |
| 1 | 21225 | 41392 | 56400 | 69177 | 75406 | 91045 | 103625 | 97624 | 102471 | 57803 | 19883 |
| 2 | 8134 | 10412 | 12542 | 653 | 13097 | 14096 | 16467 | 16444 | 16978 | 10099 | 3976 |
| 3 | 7938 | 10637 | 11812 | 11366 | 14079 | 16265 | 17690 | 19265 | 21076 | 17697 | 8907 |
| 4 | 912 | 1010 | 937 | 631 | 612 | 496 | 481 | 498 | 641 | 712 | 435 |
| 5 | 1166 | 1579 | 1684 | 16069 | 1369 | 1792 | 1831 | 2128 | 1845 | 1418 | 709 |
| 6 | 1787 | 1954 | 1808 | 1868 | 1592 | 1747 | 1609 | 1807 | 1730 | 1215 | 652 |
| 7 | 125 | 141 | 235 | 184 | 197 | 161 | 194 | 182 | 184 | 110 | 34 |
Common intuition would suggest that as the level of force used upon civilians increases, such incidents would become less frequent. However we note from the cross tabulation that our initial presumption is not the case. Specifically, it becomes evident that there are relatively few instances where force level 4 is reported. This pattern indicates that the order of force types may not necessarily correspond to an escalation in brutality, or it could suggest that it is hard for the police to distinct between “at least drawing a weapon on a civilian” and “at least pushing a civilian to the ground”. This difficulty in distinction could also explain the extreme number of observatios of force 5 in the year 2006.
The observed order of force levels and the potential overlap between specific force types motivates for a particular data modification decision. To address this, a new dataset is defined, wherein a force is assigned the value of 0 if no force is reported and 1 if any form of force is indicated. This grouping simplifies the less distinguishable order of force types into a more concise binary representation, making it suitable for modeling outcomes that involve the presence or absence of force. Moreover this transformation is chosen with the intention of subsequently fitting a binomial model.
How does the interaction between the setting and the subjects appearance affect the risk of the police using force when stopping the subject for young civilians under the age of 21?
We choose to define the setting as the surroundings and situation of the stop, and appearance as the police officer’s view of the subject and its supposed actions during the stop.
As accounted for earlier, we want force to be our
response variable, where we combine all use of force and model the risk
of the police using any kind of force on the subject. In order to model
this we fit a binomial generalized linear model.
In order to answer our scientific question, we’ll need to choose which variables we want to include as predictors in our model:
Immediately we choose to include daytime,
inout2 as a predictors as these tells us a lot the about
setting of the stop. Furthermore we choose to discard
typeofid2 and othpers2 as these neither tells
us much about the officer’s view of the civilian nor the setting.
However we are interested in how wearing an uniform could impact the
officer’s use of force. To get alternative perspective on
discrimination, we intentionally exclude both race2 and
gender, however we choose to include the age of the
civilian.
In order to project the force being used and thus predict future
usage of force on civilians under the age of 21 we choose to include
year as a predictor.
As we want to include predictors that tell us something about the
officer’s own perception of the civilian rather than an evaluation of
their actual behavior or circumstances we choose to exclude
cs_descr, cs_drgtr, cs_objcs and
cs_vcrimas predictors.
We believe that children are more likely to do furtive movements, why
we choose to include cs_furtvin the model too.
From the Spearman plot we noted the most correlated pairs of
variables to be cs_lkout, cs_casng and
ac_time, ac_incid, why we choose to only look
at one of them. From the description of the variables our belief is that
cs_lkout tells us more about the police officer’s
interpretation than cs_casng. When we already have included
daytimewe choose ac_incid over
ac_time as another predictor.
With the bar plots from the exploratory analysis in mind, we also
discard cs_cloth as a predictor as it doesn’t provide
significant insights into the officer’s personal potential
discriminatory perception of civilians.
Finally we choose to keep out cs_bulgeand
cs_other from the model as these variables are quite
imprecise.
In conclusion we choose the following predictors:
The setting predictors: daytime, inout2,
ac_incid and offunif2
The appearance predictors: age2, cs_furtv.
and cs_lkout
To quickly assess the correlation among predictors, as well as between predictors and the response variable, we generate a Spearman-plot.
Again, the Spearman-plot doesn’t show any signs of strong correlation between the predictors which is a good sign. Other than that, the Spearman-plot is as bland as the one including all variables.
Note that the Spearman plot omits all rows with missing values. Thus, before we begin the model fitting with our chosen predictors, we’ll need to investigate and evaluate if and how to impute the missing values in our data set.
pMiss <- function(x){sum(is.na(x))/length(x)*100}
apply(NAdata,2,pMiss)
## race2 gender age2 daytime inout2 offunif2
## 0.04616412 2.82473827 0.83328134 12.86074869 0.55075936 0.01031219
## typeofid2 othpers2
## 0.38554359 0.28659469
#apply(modeldata,1,pMiss)
aggr_plot <- aggr(NAdata, col=c('cadetblue','darkseagreen2'),
numbers=TRUE,
sortVars=TRUE,
labels=names(NAdata),
cex.axis=.7, gap=3,
ylab=c("Histogram of missing data","Pattern"))
##
## Variables sorted by number of missings:
## Variable Count
## daytime 0.1286074869
## gender 0.0282473827
## age2 0.0083328134
## inout2 0.0055075936
## typeofid2 0.0038554359
## othpers2 0.0028659469
## race2 0.0004616412
## offunif2 0.0001031219
From the histogram of the missing values we notice that one predictor
is especially missing a lot of values: daytime is missing
\(12.86\%\). The large amount of
missing values imply that we cannot directly assume they are missing
completely at random. The other predictors are missing up to \(3\%\) of their data values, which means
that we in general can assume them to be at least missing at random.
Thus we focus on analyzing how the values of daytime are
missing, as this big amount could be problematic, if there is a
correlation to the response variable. By not taking the missing values
into account we calculate the correlation between daytime
and force to be:
non_na_data <- data[complete.cases(data$daytime), ]
non_na_data$force <- as.numeric(as.character(non_na_data$force))
non_na_data$daytime <- as.numeric(as.character(non_na_data$daytime))
cor(non_na_data$force, non_na_data$daytime)
## [1] -0.01773452
We note no correlation between force and daytime and we investigate
further: Given the nature of the dataset we suspect that
daytime should be correlated to ac_time when
stratifying on the binary force:
bin_data<- data %>%
mutate(force = as.numeric(as.character(force)),
bin_force = ifelse(force > 0, 1, 0)) %>%
select(-force) %>%
mutate(bin_force = as.factor(bin_force))
correlations <- numeric()
non_na_data <- bin_data[complete.cases(bin_data$daytime, bin_data$ac_time), ]
force_levels <- unique(non_na_data$bin_force)
non_na_data$daytime <- as.numeric(as.character(non_na_data$daytime))
non_na_data$ac_time <- as.numeric(as.character(non_na_data$ac_time))
for (level in force_levels) {
subset_data <- non_na_data[non_na_data$bin_force == level, ]
correlation <- cor(subset_data$ac_time, subset_data$daytime, method = "pearson")
correlations <- c(correlations, correlation)
}
correlations
## [1] -0.08107143 -0.08215116
Firstly we note a weak correlation between ac_time and
daytime. Secondly we don’t observe a systematic correlation
to bin_force. This observation together with the previous
calculations adds to our initial suspicion that the values are not
missing completely at random.
Lastly we consider the cross tabulation between year and
daytime:
cross_tab <- table(data$year, data$daytime, useNA = "ifany")
row_sums <- rowSums(cross_tab, na.rm = TRUE)
cross_tab_with_sums <- cbind(cross_tab, row_sums)
kable(cross_tab_with_sums,
col.names = colnames(cross_tab_with_sums),
caption="Crosstabulation daytime and year") %>%
kable_styling()
| 0 | 1 | NA | row_sums | |
|---|---|---|---|---|
| 2003 | 100148 | 60686 | 17 | 160851 |
| 2004 | 201078 | 112376 | 69 | 313523 |
| 2005 | 250292 | 147348 | 551 | 398191 |
| 2006 | 243565 | 185416 | 77508 | 506489 |
| 2007 | 232487 | 162968 | 76641 | 472096 |
| 2008 | 261735 | 191513 | 87054 | 540302 |
| 2009 | 266595 | 228727 | 85846 | 581168 |
| 2010 | 278109 | 225644 | 97532 | 601285 |
| 2011 | 307707 | 275828 | 102189 | 685724 |
| 2012 | 242830 | 208138 | 81943 | 532911 |
| 2013 | 87528 | 72643 | 31680 | 191851 |
From this we note that in 2003, 2004 and 2005 not many
daytime-values were missing. Thus we look at the ratio of
the predictor’s values without considering the missing values:
non_na_data <- data[complete.cases(data$daytime, data$year), ]
non_na_data$daytime <- as.numeric(as.character(non_na_data$daytime))
non_na_data$year <- as.numeric(as.character(non_na_data$year))
cross_tab_year <- table(non_na_data$year, non_na_data$daytime)
normalized_cross_tab_year <- cbind(cross_tab_year[,1]/rowSums(cross_tab_year), cross_tab_year[,2]/rowSums(cross_tab_year))
kable(normalized_cross_tab_year,
col.names = colnames(normalized_cross_tab_year),
caption="Daytime-ratios depending on year") %>%
kable_styling()
| 2003 | 0.6226793 | 0.3773207 |
| 2004 | 0.6414913 | 0.3585087 |
| 2005 | 0.6294437 | 0.3705563 |
| 2006 | 0.5677757 | 0.4322243 |
| 2007 | 0.5878975 | 0.4121025 |
| 2008 | 0.5774653 | 0.4225347 |
| 2009 | 0.5382256 | 0.4617744 |
| 2010 | 0.5520741 | 0.4479259 |
| 2011 | 0.5273154 | 0.4726846 |
| 2012 | 0.5384639 | 0.4615361 |
| 2013 | 0.5464660 | 0.4535340 |
We note that the ratios for the years with few missing value are
quite different from the years with many missing values. In general we
note that from the year 2006 and forward we have fewer observations of
daytime being 0. This could indicate that the values are
missing not at random. Considering the data collection the missing
values could be due to:
The police could be understaffed at night and thus might not prioritize certain variables when patrolling as much as officers during the day. This could lead to the data not being consistently collected regarding the time of the day for each stop. Furthermore when the police need to use more force to deescalate a stop of civilians, they may be more focused on immediate safety and security concerns rather than detailed data recording such as daytime. On the other hand certain stops might happen quickly or in situations where the time of day is less relevant, making officers less likely to record this information.
The police could have a hard time evaluating if the stop happened at night or day. Also there could be inconsistency in how different officers or departments report the time of day, leading to mismatch data and missing values.
In some cases, the police might avoid recording the exact time of day to make it easier to defend their own actions or even in some cases the civilian’s privacy.
The types and rates of criminal activity can vary by time of day. Some crimes and thus forces may be more common at certain times. As a result, officers may conduct more stop and frisk procedures during these times. From the cross tabulation we see, that the force being used is stronger at night than at day.
table(data$force, data$daytime, useNA = "ifany")
##
## 0 1 <NA>
## 0 1910169 1500824 499246
## 1 396145 246338 93568
## 2 66484 40270 16144
## 3 69442 65200 22090
## 4 3735 2541 1089
## 5 16010 10002 5578
## 6 9213 5507 3049
## 7 876 605 266
Because of the big amount and pattern of missing values of daytime
and the considerations above, we conclude that the values must be
missing not at random. Even though we suspect police might be more
likely to record “daytime” values accurately while underreporting or
omitting “nighttime”, we need a more thoroughly insight into the data
collection and why certain values are missing. An imputation of these
missing values might lead to bias values, why we choose discard
daytime as a predictor.
For the remainder of the variables, we know, that when variables are missing less than \(5\%\) of data values we can assume them to be missing at random. Thus we are safe to impute the missing values by imputing the most probable value for discrete variables and the conditional mean \(\mathbb{E}(X_{z^c}|X_z)\) for continuous variables (p. 99). For this reason we impute the missing values by utilizing MICE-imputation and the pmm method.
We impute our missing data using the Mice package, by bootstrapping our model data 50 times. We note that all the predictors have less than 5% of missing values, whence as a rule of thumb we generate five new data sets using the PMM method. This method uses predictive mean matching - hence PMM - to impute the missing values in the data set as the data is missing at random.
modeldata_imp <- mice(modeldata,m=5,maxit=50,meth='pmm',seed=500)
summary(modeldata_imp)
#watch imputed data
modeldata_imp$imp$daytime
#puts imputed data from 1st bootstrap back into dataset
modeldata_mice1 <- complete(modeldata_imp, 1)
modeldata_mice2 <- complete(modeldata_imp, 2)
modeldata_mice3 <- complete(modeldata_imp, 3)
modeldata_mice4 <- complete(modeldata_imp, 4)
modeldata_mice5 <- complete(modeldata_imp, 5)
After reading the new data sets, given the names mice1, mice2, mice3,
mice4 and mice5, we need to check if the imputed data sets have similar
imputed values. To do this, we’ll start by looking at the imputed data
for age2, as this is the variable with the highest number
of missing values we’ll use as a predictor.
The following histograms depict the distribution of age for the original data set without the missing values and all of the imputed data sets.
tmp <- lapply(names(data), function(x) {
ggplot(data = data, aes(x = data[[x]]), ) +
geom_bar(fill = 'darkseagreen') +
labs(x = x, y = "")
})
grid.arrange(
tmp[[4]] + labs(title = "Original w/o NA's") + coord_cartesian(xlim = c(10,75)) + xlab("Age"),
tmp_mice1[[3]] + labs(title = "Mice 1") + coord_cartesian(xlim = c(10,75)) + xlab("Age"),
tmp_mice2[[3]] + labs(title = "Mice 2") + coord_cartesian(xlim = c(10,75)) + xlab("Age"),
tmp_mice3[[3]] + labs(title = "Mice 3") + coord_cartesian(xlim = c(10,75)) + xlab("Age"),
tmp_mice4[[3]]+ labs(title = "Mice 4") + coord_cartesian(xlim = c(10,75)) + xlab("Age"),
tmp_mice5[[3]] + labs(title = "Mice 5") + coord_cartesian(xlim = c(10,75)) + xlab("Age"),
ncol = 3
)
The imputed data sets are not only seemingly identical to the original, they are also seemingly identical to each other. This could indicate that the data is imputed similarly in the five different data sets and because all of the other predictors have a smaller number of missing values, we assume that these are imputed similarly for the five data sets as well.
Now we are ready to begin the model selection and model
fitting.
We add a filter on age2 to only include children and young
people under the age of 21 in order to investigate our group of
interest.
To further research the differences of the imputed data sets, we use
lapply to fit five additive GLM’s, one for each imputed
data set, and use qqplot to plot the estimates for all five
models.
mice1$Data <- "Mice 1"
mice2$Data <- "Mice 2"
mice3$Data <- "Mice 3"
mice4$Data <- "Mice 4"
mice5$Data <- "Mice 5"
combined_data <- rbind(mice1, mice2, mice3, mice4, mice5)
models <- split(combined_data, combined_data$Data) %>%
map(~ glm(bin_force ~ age2 + inout2 + ac_incid + offunif2 + cs_furtv + cs_lkout, data = ., family = binomial(link=logit)))
plot_data <- map_dfr(models, broom::tidy, .id = "Data")
ggplot(plot_data, aes(x = term, y = estimate, color = Data)) +
geom_point() +
geom_errorbar(aes(ymin = estimate - std.error, ymax = estimate + std.error), width = 0.2) +
theme_minimal() + theme(axis.text.x = element_text(angle = 60, vjust = 1, hjust=1)) +
labs(x = "", y = "Estimates")
Overall the estimates generated from each imputed data set look quite similar implying that the data sets are similar.
The pool function averages the estimates of the complete
data model, computes the total variance over the repeated analyses by
Rubin’s rules, thus generating a collective model based on the five
imputed data sets.
fits <- lapply(list(mice1, mice2, mice3, mice4, mice5), function(data) {
glm(bin_force ~ age2 + inout2 + ac_incid + offunif2 + cs_furtv + cs_lkout,
family = binomial(link = logit), data = data) })
pooled_results <- pool(fits)
est1 <- summary(fits[[1]])$coefficients[,1]
est2 <- summary(fits[[2]])$coefficients[,1]
est3 <- summary(fits[[3]])$coefficients[,1]
est4 <- summary(fits[[4]])$coefficients[,1]
est5 <- summary(fits[[5]])$coefficients[,1]
pool <- summary(pooled_results)$estimate
pooledsum <- cbind(est1,est2,est3,est4,est5,pool)
kable(pooledsum,
col.names = colnames(pooledsum),
caption="Daytime-ratios depending on year") %>%
kable_styling()
| est1 | est2 | est3 | est4 | est5 | pool | |
|---|---|---|---|---|---|---|
| (Intercept) | -0.9299106 | -0.9449116 | -0.9341736 | -0.9289438 | -0.9360392 | -0.9347957 |
| age2 | 0.0006069 | 0.0014609 | 0.0008699 | 0.0007042 | 0.0010346 | 0.0009353 |
| inout21 | -0.3021537 | -0.3016910 | -0.3015755 | -0.3042366 | -0.3022765 | -0.3023866 |
| ac_incid1 | -0.1835088 | -0.1835546 | -0.1838671 | -0.1837429 | -0.1839454 | -0.1837238 |
| offunif21 | -0.3373804 | -0.3365833 | -0.3357833 | -0.3371664 | -0.3368837 | -0.3367594 |
| cs_furtv1 | 0.3928818 | 0.3925586 | 0.3919383 | 0.3906458 | 0.3914825 | 0.3919014 |
| cs_lkout1 | -0.2582018 | -0.2580985 | -0.2579648 | -0.2584940 | -0.2586147 | -0.2582748 |
When looking at the estimates for each imputed data sets next to the pooled estimates, we again see that they are quite similar. Because the imputed data sets are quite similar, we could in theory just choose one. Consequently, we compute the imputed data set with the median of the estimates.
coefficients_matrix <- matrix(c(est1, est2, est3, est4, est5), 8, 5)
median_coefficients <- apply(coefficients_matrix, 1, median)
median_index <- which.min(median_coefficients)
median_data_set <- paste0("est", median_index)
cat("The data set with median coefficients is:", median_data_set, "\n")
## The data set with median coefficients is: est8
The imputed data set which contain the median of the estimates is
mice1, therefor we move forward with this data set.
To investigate if some of our chosen predictors are redundant we
peform a drop1 on the additive model we fitted in an
earlier section. The drop1 function measures the
significance of each predictor by comparing the full model with a nested
model in which one predictor is dropped at a time, measuring the
significance using ANOVA with a \(\chi^2\)-test.
add_model <- glm(bin_force ~ age2 + inout2 + ac_incid + offunif2 + cs_furtv + cs_lkout,
family = binomial(link = logit), data = mice1)
drop1(add_model, test = "Chisq")
## Single term deletions
##
## Model:
## bin_force ~ age2 + inout2 + ac_incid + offunif2 + cs_furtv +
## cs_lkout
## Df Deviance AIC LRT Pr(>Chi)
## <none> 1765323 1765337
## age2 1 1765324 1765336 0.4 0.5463
## inout2 1 1769444 1769456 4120.7 <2e-16 ***
## ac_incid 1 1767696 1767708 2373.1 <2e-16 ***
## offunif2 1 1772366 1772378 7042.8 <2e-16 ***
## cs_furtv 1 1776374 1776386 11050.4 <2e-16 ***
## cs_lkout 1 1767939 1767951 2615.8 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We can conclude that all the predictors are significant though with
age2 being slightly less significant than the others. This
is not alerting as the main purpose of the model is to investigate how
the setting and appearance predictors interact with each other across
the subjects age, increasing or decreasing the risk of the police using
force.
With our chosen imputed dataset, we start the actual modelfitting. As
we aim to predict the risk of use of force, we utilize the logistic
regression model. The response variable will be the possibility of any
use of force against the civilian, only based on our parameters of
interest. We initially want to fit the logistic regression as an
additive model, mentioned above. In order to answer our question
however, we find the interaction of combinations of a setting and
appearance predictor relevant for interpretation. In addition we test
the nonlinear behaviour of age as a predictor.
To evaluate both these interaction models and the non-linear
modelling, an ANOVA would be useful, as performed in the
drop1 for the additive models. However, as the interaction
models with different combination are not nested, the assumptions for
using ANOVA are not fullfilled. In extension, we also test almost every
combination, which would yield a new prediction for every specific
combination and saturates the model, which is at risk of overfitting. To
prevent that, we calculate the training error and perform
cross-validation and evaluate the models on a combination of those.
We calculate the training error based on the squared deviance residuals, to obtain comparable values, that express the models goodness-of-fit.
trainErr <- function(mod) {
mean(residuals(mod, type = "deviance") ^ 2)
}
form0 <- bin_force ~ age2 + inout2 + ac_incid + offunif2 + cs_lkout + cs_furtv
form1 <- bin_force ~ age2*inout2 + ac_incid + offunif2 + cs_lkout + cs_furtv
form2 <- bin_force ~ age2 + inout2*cs_lkout + ac_incid + offunif2 + cs_furtv
form3 <- bin_force ~ age2 + inout2*cs_furtv + ac_incid + offunif2 + cs_lkout
form4 <- bin_force ~ age2*ac_incid + inout2 + offunif2 + cs_lkout + cs_furtv
form5 <- bin_force ~ age2 + inout2 + offunif2 + cs_lkout*ac_incid + cs_furtv
form6 <- bin_force ~ age2 + inout2 + offunif2 + cs_lkout + cs_furtv*ac_incid
form7 <- bin_force ~ age2*offunif2 + inout2 + ac_incid + cs_lkout + cs_furtv
form8 <- bin_force ~ age2 + inout2 + ac_incid + cs_lkout*offunif2 + cs_furtv
form9 <- bin_force ~ age2 + inout2 + ac_incid + cs_lkout + cs_furtv*offunif2
form10 <- bin_force ~ age2 + inout2 + ac_incid + cs_lkout*cs_furtv*offunif2
form11 <- bin_force ~ age2 + inout2 + cs_furtv*offunif2*ac_incid + cs_lkout
form12 <- bin_force ~ age2 + cs_furtv*offunif2*inout2 + ac_incid + cs_lkout
form12_1 <- bin_force ~ ns(age2, df=2) + cs_furtv*offunif2*inout2 + ac_incid + cs_lkout
form13 <- bin_force ~ cs_furtv*offunif2*age2 + inout2 + ac_incid + cs_lkout
form13_1 <- bin_force ~ cs_furtv*offunif2*ns(age2, df=2) + inout2 + ac_incid + cs_lkout
form14 <- bin_force ~ age2*(cs_furtv + cs_lkout)*( inout2 + ac_incid + offunif2)
form15 <- bin_force ~ ns(age2, df = 2)*(cs_furtv + cs_lkout)*(inout2 + ac_incid + offunif2)
form16 <- bin_force ~ ns(age2, df = 4)*(cs_furtv + cs_lkout)*(inout2 + ac_incid + offunif2)
form17 <- bin_force ~ ns(age2, df = 2)*(cs_furtv + cs_lkout)*(ac_incid + offunif2) + inout2
form18 <- bin_force ~ ns(age2, df = 2)*cs_furtv*(ac_incid + offunif2 + inout2) + cs_lkout
form19 <- bin_force ~ ns(age2, df = 2)*cs_furtv*(ac_incid + inout2) + offunif2*cs_lkout
form20 <- bin_force ~ ns(age2, df = 2)*(cs_furtv + cs_lkout)*(ac_incid + offunif2) + ns(age2, df = 2)*inout2
results1 <- sapply(list(form0, form1, form2, form3, form4, form5, form6, form7, form8, form9, form17, form18),
function(form){trainErr(glm(form, data = mice1,
family = binomial(link = "logit")))})
result1_df <- as.data.frame(results1)
result1_df$form_name <- c("form0", "form1", "form2", "form3", "form4", "form5", "form6", "form7", "form8", "form9", "form17", "form18")
colnames(result1_df) <- c("Train_Error","Model")
results2 <- sapply(list(form10, form11, form12, form12_1, form13, form13_1, form14, form15, form16, form19, form20),
function(form){trainErr(glm(form, data = mice1,
family = binomial(link = "logit")))})
result2_df <- as.data.frame(results2)
result2_df$form_name <- c("form10", "form11", "form12", "form12_1", "form13", "form13_1", "form14", "form15", "form16", "form19", "form20")
colnames(result2_df) <- c("Train_Error","Model")
kable(t(result1_df),
col.names = colnames(t(result1_df)),
caption = "Training error for models")%>%
kable_styling()
| Train_Error | 1.077868 | 1.077856 | 1.077668 | 1.077763 | 1.077857 | 1.077542 | 1.076967 | 1.077868 | 1.077795 | 1.077832 | 1.076503 | 1.076728 |
| Model | form0 | form1 | form2 | form3 | form4 | form5 | form6 | form7 | form8 | form9 | form17 | form18 |
kable(t(result2_df),
col.names = colnames(t(result2_df)) )%>%
kable_styling()
| Train_Error | 1.076860 | 1.076753 | 1.076927 | 1.076906 | 1.077793 | 1.077768 | 1.076188 | 1.076142 | 1.076106 | 1.076692 | 1.076487 |
| Model | form10 | form11 | form12 | form12_1 | form13 | form13_1 | form14 | form15 | form16 | form19 | form20 |
Based on the output of training errors, we notice the additive model and interaction models with 2nd order interactions have higher training errors than the higher order interactions. Also the interaction models including the splines on age yields lower training error. However, we notice a pattern that the lower training error, the more saturated our modelforms are in general, therefore we perform 5-fold cross-validation, to test the predictive strength. Specifically how \(80\%\) of the set to predict the remaining responses. This step is especially important for our interaction models, in order to prevent overfitting.
We perform the cross-validation, where we assume that the squared deviance residuals in the model is given by
\[ d(Y, \hat{\mu}) = -2 (Y \cdot \log(\hat{\mu}) + (1+Y)\cdot \log{(1-\hat{\mu})})\]
set.seed(200601)
cv_data <- mice1 %>%
mutate(bin_force = as.numeric(as.character(bin_force)))
testCV <- function(form, data, B = 1, k = 5) {
n <- nrow(data)
y <- data$bin_force
PEcv <- vector("list", B)
tmp <- numeric(n)
for(b in 1:B) {
group <- sample(rep(1:k, length.out = n))
for(i in 1:k) {
modelcv <- glm(form, family=binomial("logit"), data = data[group != i, ])
X_cv <- model.matrix(modelcv, data[group == i, ])
muhat <- predict(modelcv, newdata = data[group == i, ], type = "response")
y_cv <- y[group == i]
tmp[group == i] <- -(2*(y_cv*log(muhat)+(1-y_cv)*log(1-muhat)))
}
PEcv[[b]] <- tmp
}
mean(unlist(PEcv))
}
CV1s <- sapply(list(form0,form1, form2, form3, form4, form5, form6, form7, form8, form9, form17, form18), function(form){testCV(form, cv_data)})
CV1s_df <- as.data.frame(CV1s)
CV1s_df$form_name <- c("form0", "form1", "form2", "form3", "form4", "form5", "form6", "form7", "form8", "form9", "form17", "form18")
colnames(CV1s_df) <- c("EGE","Model")
CV2s <- sapply(list(form10, form11, form12, form12_1, form13, form13_1, form14, form15, form16, form19, form20), function(form){testCV(form, cv_data)})
CV2s_df <- as.data.frame(CV2s)
CV2s_df$form_name <- c("form10", "form11", "form12", "form12_1", "form13", "form13_1", "form14", "form15", "form16", "form19", "form20")
colnames(CV2s_df) <- c("EGE","Model")
kable(t(CV1s_df),
col.names = colnames(t(CV1s_df)),
caption="Expected Generalization Error for models") %>%
kable_styling()
| EGE | 1.077876 | 1.077865 | 1.077674 | 1.077774 | 1.077864 | 1.077551 | 1.076982 | 1.077875 | 1.077805 | 1.077840 | 1.076547 | 1.076764 |
| Model | form0 | form1 | form2 | form3 | form4 | form5 | form6 | form7 | form8 | form9 | form17 | form18 |
kable(t(CV2s_df),
col.names = colnames(t(CV2s_df)) ) %>%
kable_styling()
| EGE | 1.076878 | 1.076772 | 1.076941 | 1.076916 | 1.077807 | 1.077790 | 1.076215 | 1.076199 | 1.076205 | 1.076712 | 1.076527 |
| Model | form10 | form11 | form12 | form12_1 | form13 | form13_1 | form14 | form15 | form16 | form19 | form20 |
The expected generalization errors does not systematically get higher
or lower values, when the order of interaction changes, and the EGE’s
are generally close to each other. However, the model where each setting
predictor interact with each appearance predictor, which are
form14, form15 and form16, does
have smaller EGE. As these also had the lowest training erros, it is
sensible to choose this interaction as our best model to fit and predict
the data.
Furthermore the models with splines are also at risk of overfitting.
These are form12_1, form13_1 and
form15 with 2 degrees of freedom and form16
with 4 degrees of freedom. In general the splines with 2 degress of
freedom also have lower EGE’s, therefore the most sensible selection of
a model seems to be form15.
Secondly, to test the use of splines on age, we perform the ANOVA-testing on the chosen interaction model without splines, with 2 degrees of freedom and with 4 degrees of freedom.
int_model <- glm(form15,
family = binomial(link = logit), data = mice1)
int_model_no_spl <- glm(form14,
family = binomial(link = logit), data = mice1)
anovatable_df2 <- anova(int_model_no_spl, int_model, test = "Chisq")
kable(anovatable_df2,
col.names = colnames(anovatable_df2),
caption="Expected Generalization Error for models") %>%
kable_styling()
| Resid. Df | Resid. Dev | Df | Deviance | Pr(>Chi) |
|---|---|---|---|---|
| 1637767 | 1762570 | NA | NA | NA |
| 1637755 | 1762495 | 12 | 74.74708 | 0 |
To investigate our final model, we plot the residuals:
cv_data <- mice1 %>%
mutate(bin_force = as.numeric(as.character(bin_force)))
final_model <- glm(bin_force ~ ns(age2, df = 2)*(cs_furtv + cs_lkout)*(inout2 + ac_incid + offunif2), data = cv_data, family = binomial)
simDiag <- fortify(final_model)
ggplot(simDiag, aes(x = .fitted, y = .resid)) +
geom_point() +
geom_smooth(method = "glm", color = "darkseagreen", fill = "darkseagreen2")
We can’t tell if the residual plot seem reasonable thus we plot the simulated residuals under the null hypothesis that our model is correct and compare with the residual plot above.
set.seed(2023)
yNew1 <- simulate(final_model)[,1]
yNew2 <- simulate(final_model)[,1]
yNew3 <- simulate(final_model)[,1]
yNew4 <- simulate(final_model)[,1]
simGlmNew1 <- glm(yNew1 ~ ns(age2, df = 2)*(cs_furtv + cs_lkout)*(inout2 + ac_incid + offunif2), data = cv_data, family = binomial(link = logit), control = glm.control(maxit = 1000))
simGlmNew2 <- glm(yNew2 ~ ns(age2, df = 2)*(cs_furtv + cs_lkout)*(inout2 + ac_incid + offunif2), data = cv_data, family = binomial(link = logit), control = glm.control(maxit = 1000))
simGlmNew3 <- glm(yNew3 ~ ns(age2, df = 2)*(cs_furtv + cs_lkout)*(inout2 + ac_incid + offunif2), data = cv_data, family = binomial(link = logit), control = glm.control(maxit = 1000))
simGlmNew4 <- glm(yNew4 ~ ns(age2, df = 2)*(cs_furtv + cs_lkout)*(inout2 + ac_incid + offunif2), data = cv_data, family = binomial(link = logit), control = glm.control(maxit = 1000))
simDiagNew1 <- fortify(simGlmNew1)
simDiagNew2 <- fortify(simGlmNew2)
simDiagNew3 <- fortify(simGlmNew3)
simDiagNew4 <- fortify(simGlmNew4)
p1 <- qplot(.fitted, .resid, data = simDiagNew1) +
geom_smooth(method = "glm", color = "darkseagreen", fill = "darkseagreen2")
p2 <- qplot(.fitted, .resid, data = simDiagNew2) +
geom_smooth(method = "glm", color = "darkseagreen", fill = "darkseagreen2")
p3 <- qplot(.fitted, .resid, data = simDiagNew3) +
geom_smooth(method = "glm", color = "darkseagreen", fill = "darkseagreen2")
p4 <- qplot(.fitted, .resid, data = simDiagNew4) +
geom_smooth(method = "glm", color = "darkseagreen", fill = "darkseagreen2")
grid.arrange(p1,p2,p3,p4, ncol=2)
Since the simulated residual plots are similar to the one of our model, we don’t see any major reason for concern: Our model’s fit seems consistent across different resampled data sets, which indicates that it’s less likely to be a subject to overfitting. Also the consistency in the residual plots suggest that the model is capturing underlying patterns in the data without being affected by random outliers which in turn imply that the model’s predictive power is reliable. This again means that the model is likely to generalize well to unseen data.
Given the above model selection, we find the interaction model with the natural cubic splines of age, the appearance predictors and setting predictors, we choose below model to predict the risk of force being used against adolescent under the age of 21.
int_model <- glm(bin_force ~ ns(age2, df = 2)*(cs_furtv + cs_lkout)*(inout2 + ac_incid + offunif2),
family = binomial(link = logit), data = mice1)
sum <- (summary(int_model))$coefficients
kable(sum,
col.names = colnames(sum),
caption="Summary of the interaction model") %>%
kable_styling()
| Estimate | Std. Error | z value | Pr(>|z|) | |
|---|---|---|---|---|
| (Intercept) | -0.8693727 | 0.0611126 | -14.2257470 | 0.0000000 |
| ns(age2, df = 2)1 | 0.0452462 | 0.1197751 | 0.3777599 | 0.7056090 |
| ns(age2, df = 2)2 | -0.0225723 | 0.0199947 | -1.1289156 | 0.2589334 |
| cs_furtv1 | 0.1709293 | 0.0871817 | 1.9606092 | 0.0499246 |
| cs_lkout1 | -0.2940791 | 0.1310688 | -2.2437005 | 0.0248517 |
| inout21 | -0.3649725 | 0.0661200 | -5.5198497 | 0.0000000 |
| ac_incid1 | -0.5269654 | 0.0571303 | -9.2239195 | 0.0000000 |
| offunif21 | -0.2881609 | 0.0634883 | -4.5388055 | 0.0000057 |
| ns(age2, df = 2)1:cs_furtv1 | 0.1582603 | 0.1701951 | 0.9298754 | 0.3524356 |
| ns(age2, df = 2)2:cs_furtv1 | 0.0525042 | 0.0276052 | 1.9019661 | 0.0571756 |
| ns(age2, df = 2)1:cs_lkout1 | -0.2079858 | 0.2556665 | -0.8135042 | 0.4159291 |
| ns(age2, df = 2)2:cs_lkout1 | 0.0065120 | 0.0408639 | 0.1593592 | 0.8733859 |
| ns(age2, df = 2)1:inout21 | -0.0395810 | 0.1301147 | -0.3042012 | 0.7609746 |
| ns(age2, df = 2)2:inout21 | -0.0761867 | 0.0221588 | -3.4382159 | 0.0005856 |
| ns(age2, df = 2)1:ac_incid1 | 0.3259691 | 0.1120433 | 2.9093135 | 0.0036222 |
| ns(age2, df = 2)2:ac_incid1 | 0.0448788 | 0.0187604 | 2.3922146 | 0.0167470 |
| ns(age2, df = 2)1:offunif21 | 0.0054804 | 0.1244037 | 0.0440537 | 0.9648616 |
| ns(age2, df = 2)2:offunif21 | -0.0248245 | 0.0206661 | -1.2012167 | 0.2296671 |
| cs_furtv1:inout21 | -0.0092462 | 0.0993968 | -0.0930228 | 0.9258854 |
| cs_furtv1:ac_incid1 | 0.2666623 | 0.0800691 | 3.3304019 | 0.0008672 |
| cs_furtv1:offunif21 | 0.0148749 | 0.0879367 | 0.1691546 | 0.8656751 |
| cs_lkout1:inout21 | 0.1856105 | 0.1403612 | 1.3223773 | 0.1860425 |
| cs_lkout1:ac_incid1 | 0.1984317 | 0.1213646 | 1.6350044 | 0.1020481 |
| cs_lkout1:offunif21 | -0.2179842 | 0.1196576 | -1.8217332 | 0.0684955 |
| ns(age2, df = 2)1:cs_furtv1:inout21 | 0.2637838 | 0.1952581 | 1.3509494 | 0.1767116 |
| ns(age2, df = 2)2:cs_furtv1:inout21 | 0.0346328 | 0.0324584 | 1.0669878 | 0.2859773 |
| ns(age2, df = 2)1:cs_furtv1:ac_incid1 | 0.0333905 | 0.1567124 | 0.2130687 | 0.8312734 |
| ns(age2, df = 2)2:cs_furtv1:ac_incid1 | -0.0165104 | 0.0256670 | -0.6432524 | 0.5200604 |
| ns(age2, df = 2)1:cs_furtv1:offunif21 | -0.1490782 | 0.1717349 | -0.8680714 | 0.3853553 |
| ns(age2, df = 2)2:cs_furtv1:offunif21 | 0.0495722 | 0.0277823 | 1.7843088 | 0.0743735 |
| ns(age2, df = 2)1:cs_lkout1:inout21 | 0.1481126 | 0.2756907 | 0.5372420 | 0.5911005 |
| ns(age2, df = 2)2:cs_lkout1:inout21 | -0.0119466 | 0.0454174 | -0.2630413 | 0.7925187 |
| ns(age2, df = 2)1:cs_lkout1:ac_incid1 | 0.0996025 | 0.2372625 | 0.4197989 | 0.6746324 |
| ns(age2, df = 2)2:cs_lkout1:ac_incid1 | 0.0207898 | 0.0382253 | 0.5438749 | 0.5865275 |
| ns(age2, df = 2)1:cs_lkout1:offunif21 | 0.2043009 | 0.2334488 | 0.8751421 | 0.3814966 |
| ns(age2, df = 2)2:cs_lkout1:offunif21 | -0.0003374 | 0.0371558 | -0.0090813 | 0.9927542 |
From our summary, we notice our p-values describing each of the interactions differ quite significantly. Therefore we plot them, to get an overview of the interaction terms.
p_values <- summary(int_model)$coefficients[,4]
p_values <- data.frame(p_values)
ggplot(p_values, aes(x = reorder(rownames(summary(int_model)$coefficients), p_values), y = p_values)) +
geom_bar(stat = "identity", fill = "darkseagreen3") +
labs(title = "P-Values of Coefficients", x = "", y = "P-Value") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
geom_hline(yintercept = 0.05, color = "red", linetype = "dashed")
On our plot we have added a dotted line at 5% only as a benchmark. It
is clear to the eye, that a lot of the interaction terms have a high
p-value, which indicates that the terms are more likely to have no
effect on the response variable. Especially we notice the p-values of
the interaction terms containing inout2and
cs_lkout are large, which suggests that these interactions
have a less significant effect on the response. Even though these terms
have less effect, an analysis of that effect is already included in the
calculation of the expected generalization error. Therefore we are not
concerned with the big amount of interaction terms with higher p-values.
Also we note that the interaction ac_incid:cs_furtv has a
relatively small p-value, hence this estimate is different from 0 with
higher probability. The take away from this is that even though the
probability of most terms’ estimates being equal to 0 is large, they
still contribute to the model’s predictive strengths.
For our model, it is now possible to plot the predicted possibility of force used against adolescents.
predFrame0 <- expand.grid(cs_furtv = factor(c(0)),
ac_incid = factor(1),
inout2 = factor(c(0)),
offunif2 = factor(c(0)),
cs_lkout = factor(c(1)),
age2 = 10:21)
predFrame0$predicted_prob <- predict(int_model, type = "response",
newdata = predFrame0, interval = "confidence")
p_pred0 <- ggplot(predFrame0, aes(x = age2, y = predicted_prob)) +
facet_grid(cs_lkout + cs_furtv ~ ac_incid + inout2 + offunif2, label = label_both) +
geom_line(color = "cadetblue") + coord_cartesian(ylim = c(0.1,0.4)) +
geom_hline(yintercept = 0.25, linetype = "dashed", color = "firebrick3") +
scale_y_continuous(breaks = c(0.15, 0.25, 0.35)) +
labs(x = "Age", y = "Risk of force")
predFramecom <- expand.grid(cs_furtv = factor(c(1)),
ac_incid = factor(1),
inout2 = factor(c(0)),
offunif2 = factor(c(0)),
cs_lkout = factor(c(0)),
age2 = 10:21)
predFramecom$predicted_prob <- predict(int_model, type = "response",
newdata = predFramecom, interval = "confidence")
p_predcom <- ggplot(predFramecom, aes(x = age2, y = predicted_prob)) +
facet_grid(cs_lkout + cs_furtv ~ ac_incid+ inout2 + offunif2, label = label_both) +
geom_line(color = "cadetblue") + coord_cartesian(ylim = c(0.1,0.4)) +
geom_hline(yintercept = 0.25, linetype = "dashed", color = "firebrick3") +
scale_y_continuous(breaks = c(0.15, 0.25, 0.35)) +
labs(x = "Age", y = "Risk of force")
grid.arrange(p_pred0, p_predcom, ncol = 2)
The plot above presents a portion of the predictions from our model. The graph on the left, describes the risk of the police using force on the subject when stopped, with a red line marking 25%. It represents the following situation: it is nighttime, the subject is outside in a high crime area meeting an officer not in uniform. The subject appears as being on a lookout without exhibiting any suspicious movements. The prediction plot suggests that there’s approximately 21% risk of force being used against the subject, and this risk remains consistent for individuals aged between 10 and 21.
The right prediction plot indicates that, under the same setting, when the subject doesn’t appear to be on the lookout but exhibits suspicious movements, the risk of force being used against them surges significantly, exceeding 30%. Another interesting change that our model imply based on the change of appearance is, that the risk of force used against the civilian depends on the age. The risk increases with the age until around the age of 18, where it decreases a bit. This second order polynomial behaviour is due to our utilization of splines, which exactly is justified by the predictive strength.
predFrame <- expand.grid(cs_furtv = factor(c(0)),
ac_incid = factor(1),
inout2 = factor(c(0)),
offunif2 = factor(c(0)),
cs_lkout = factor(c(1)),
age2 = 10:21)
predFrame$predicted_prob <- predict(int_model, type = "response",
newdata = predFrame, interval = "confidence")
p_pred <- ggplot(predFrame0, aes(x = age2, y = predicted_prob)) +
facet_grid(cs_lkout + cs_furtv ~ ac_incid + inout2 + offunif2, label = label_both) +
geom_line(color = "cadetblue") + coord_cartesian(ylim = c(0.1,0.9)) +
geom_hline(yintercept = 0.6, linetype = "dashed", color = "firebrick3") +
geom_smooth(method = "glm", method.args = list(family = binomial(link = "logit")), se = TRUE, linetype = "blank") +
scale_y_continuous(breaks = c(0.2, 0.4, 0.6, 0.8)) +
labs(x = "Age", y = "Risk of force")
predFrame1 <- expand.grid(cs_furtv = factor(c(1)),
ac_incid = factor(1),
inout2 = factor(c(0)),
offunif2 = factor(c(0)),
cs_lkout = factor(c(0)),
age2 = 10:21)
predFrame1$predicted_prob <- predict(int_model, type = "response",
newdata = predFrame1, interval = "confidence")
p_pred1 <- ggplot(predFrame1, aes(x = age2, y = predicted_prob)) +
facet_grid(cs_lkout + cs_furtv ~ ac_incid + inout2 + offunif2, label = label_both) +
geom_line(color = "cadetblue") + coord_cartesian(ylim = c(0,0.9)) +
geom_hline(yintercept = 0.6, linetype = "dashed", color = "firebrick3") +
geom_smooth(method = "glm", method.args = list(family = binomial(link = "logit")), se = TRUE, linetype = "blank") +
scale_y_continuous(breaks = c(0.2, 0.4, 0.6, 0.8)) +
labs(x = "Age", y = "Risk of force")
grid.arrange(p_pred, p_pred1, ncol = 2)
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
Even though we just presented the prediction plots which describes the risk of the police using force as if they were a reality, we believe that this isn’t the full truth due to the models reliability.
predFrame1full <- expand.grid(cs_furtv = factor(c(0,1)),
ac_incid = factor(0),
inout2 = factor(c(0,1)),
offunif2 = factor(c(0,1)),
cs_lkout = factor(c(0,1)),
age2 = 10:21)
predFrame1full$predicted_prob <- predict(int_model, type = "response",
newdata = predFrame1full, interval = "confidence")
p_pred1full <- ggplot(predFrame1full, aes(x = age2, y = predicted_prob)) +
facet_grid(cs_lkout + cs_furtv ~ ac_incid + inout2 + offunif2, label = label_both) +
geom_line(color = "cadetblue") + coord_cartesian(ylim = c(0,0.9)) +
geom_hline(yintercept = 0.6, linetype = "dashed", color = "firebrick3") +
geom_smooth(method = "glm", method.args = list(family = binomial(link = "logit")), se = TRUE, linetype = "blank") +
theme(axis.text.y = element_text(size = 5)) +
scale_y_continuous(breaks = c(0.2, 0.4, 0.6, 0.8)) +
theme( strip.text.y = element_text(angle = 0)) +
labs(x = "Age", y = "Risk of force")
predFrame2full <- expand.grid(cs_furtv = factor(c(0,1)),
ac_incid = factor(1),
inout2 = factor(c(0,1)),
offunif2 = factor(c(0,1)),
cs_lkout = factor(c(0,1)),
age2 = 10:21)
predFrame2full$predicted_prob <- predict(int_model, type = "response",
newdata = predFrame2full, interval = "confidence")
p_pred2full <- ggplot(predFrame2full, aes(x = age2, y = predicted_prob)) +
facet_grid(cs_lkout + cs_furtv ~ ac_incid + inout2 + offunif2, label = label_both) +
geom_line(color = "cadetblue") + coord_cartesian(ylim = c(0,0.9)) +
geom_hline(yintercept = 0.6, linetype = "dashed", color = "firebrick3") +
geom_smooth(method = "glm", method.args = list(family = binomial(link = "logit")), se = TRUE, linetype = "blank") +
theme(axis.text.y = element_text(size = 5)) +
scale_y_continuous(breaks = c(0.2, 0.4, 0.6, 0.8)) +
theme( strip.text.y = element_text(angle = 0)) +
labs(x = "Age", y = "Risk of force")
p_pred1full
p_pred2full
Above is shown our full model prediction plot, with a dotted line at 60% as a comparable benchmark. By looking at the confidence intervals for each combination of setting and appearance predictors, one can tell that the real risk of the police using force when stopping a subject might as well be a great deal larger, or even smaller. When generating 100 confidence intervals 95 of them would contain the true risk of the police using force, and due to the width of the confidence intervals these will vary quite a lot.
We notice a few general tendencies from the overview: Firstly, the appearance seems to have an effect on the risk of the police using force. The strongest difference of risk is seen between the scenario where the subject is appearing to be on a lookout and not moving suspiciously, and the scenario where the subject is definitely not on a lookout but appear to be moving suspiciously. The model implies that the latter scenario has a 95% confidence interval, where the upper band mostly is above our benchmark. Opposedly, the 95%-confidence interval of the risk force usage is consistently lower in the first scenario, indicating that the police is less likely to use force when stopping the subject.
Another tendency in our prediction plot is a higher risk of use of force when the police isn’t wearing a uniform, which we observe on the confidence interval-bands. Even though this effect does not seem statistically strong, it is noteworthy, especially in the context of social theories. One theory suggests that police officers take on a more authoritative role when wearing a uniform, as it symbolizes the expectations they have to meet to fulfill their duties. This does not only affect the police officer’s view upon him or herself, but also the subject’s view of the police officer, their authorities and rights during the stop. Especially regarding children and adolescents as they might have a stronger belief in authority.
From the prediction plots, we also observe a rather counter intuitive tendency showing that the risk of use of force is not higher if the stop occurs in a high crime area. The risk in the prediction plots appears to be higher when the stop occured in a low crime area. This might suggest that police officers don’t hold biases towards the suggested criminal behavior of children and young adults in high crime areas, indicating no discriminatory behavior of the police regarding the area.
However, some of the data might be biased, as already mentioned for
instance in the analysis of the variable daytime. As we
don’t know who exactly provides the information of the stop and frisk,
we must assume that it is the same police officers that conduct the stop
and frisk and provide the information of the incident. If the stop and
frisk portrays the police officer negatively, he or she might provide
false information to make his or her actions appear more appropriate,
such as lowering the use of force, changing the subject’s appearance or
location of the stop. So, the non-existing bias, the prediction plots
suggests, might just be due to fabricated, left out or biased data.
From above discussion, we have concluded the tendencies our model
imply. However, we have encountered diagnostics in our analysis that
contradicts the reliability of the model. Mentionable is our Spearman
plot, showing little to no correlations within our datasets, which as
just disclosed might even be biased. Furthermore in our model selection,
the drop1-analysis suggested that all of the predictors are
significant and strong in explaining the variation in the response
variable. We have in our analysis interpreted it as all the variables
are relevant to include in the model, however this could just as well be
due to the data in generally correlating badly, and therefore the level
of significance is due to the predictors affecting the response
differently and not their goodness-of-fit.
We could also recall in the model selection, we chose our final model based on the training errors and expected generalization errors. Both of these were determined based on a comparison of values, that differed less than 0.001. To investigate this difference a little further we plot 100 simulations of the expected generalization errors, to compare how often, the predictive strength of our chosen model is better than the alternatives.
cv_sample <- sapply(numeric(100), function(x) testCV(form15, cv_data, k = 10))
Based on the plot, the EGE is turning out to be less than the lowest value from other models, whoch indicates, that this model would be chosen most of the times. However, above plot does not take into consideration, that the EGE of the other models also could vary.
Finally, it is also clear in our interpretation of the model that our 95%-confidence intervals are concerningly wide, and therefore our predictions from the model are quite vague.
Nevertheless, our model diagnostics show that the residuals are in fact distributed as could be expected from our chosen distribution. Even though this implies that we have chosen the correct underlying distribution, it is not strong enough to outweigh the arguments of why the model has bad reliability.
To answer our scientific question, we can say that the model implies that the appearance has an interactive effect on the risk of use of force against civilians under the age of 21, however, the setting does not necessarily. In conclusion, it is very important to take into consideration, that the model is diagnostically not performing very well, and should therefore be interpreted and utilized with careful considerations.