library(carData)
library(tidyverse)
library(dplyr)
head(Arrests)
In this assingment we will be studying the interaction effect. The same dataset will be used from the last assingment which is the ‘Arrests’ dataset. This dataset details arrests made on marijuana possesion in Toronto from 1997-2002 based on whether or not the arrest led to a court summons. Other variables such as citizenship, employment, age, and colour are included as well. Here we load the dataset and take a look at the first few rows of data.
checks_race <- Arrests %>%
group_by(colour) %>%
summarize(average_checks = mean(checks)) %>%
kable()
checks_race
| colour | average_checks |
|---|---|
| Black | 2.099379 |
| White | 1.485018 |
We will be looking at the interaction effect of race and gender on the number of times an individual has shown up in other police data bases in Toronto. In this table we are looking at the average number of times an individual has shown up in other police databases by race. Black has an average of 2.1 database appearances and White has 1.5.
checks_gender_age <- Arrests %>%
group_by(colour, age) %>%
summarize(average_checks = mean(checks)) %>%
spread(age, average_checks) %>%
kable()
checks_gender_age
| colour | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 44 | 45 | 46 | 47 | 48 | 49 | 50 | 51 | 52 | 53 | 54 | 55 | 56 | 57 | 58 | 59 | 60 | 61 | 62 | 64 | 66 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Black | NA | 0.800000 | 1.6428571 | 2.090909 | 1.568628 | 1.736842 | 2.073684 | 2.298246 | 1.846939 | 2.228261 | 2.125000 | 2.226190 | 2.253731 | 1.704546 | 2.400000 | 2.108108 | 2.333333 | 2.384615 | 2.241379 | 2.210526 | 1.888889 | 1.956522 | 2.277778 | 2.538461 | 2.380952 | 1.842105 | 2.000000 | 3.076923 | 2.526316 | 2.250000 | 2.888889 | 1.750000 | 0.75 | 1.750000 | 1.000000 | 1.166667 | 2.000000 | 3.666667 | 1.333333 | 3.4 | 1.50 | 2 | 0.000000 | 2.5 | NA | 3 | 3 | 2 | 1 | 2 | NA | 1 | 2 |
| White | 1.25 | 1.153846 | 0.6760563 | 1.094675 | 1.246094 | 1.329700 | 1.257218 | 1.398329 | 1.450000 | 1.575862 | 1.557789 | 1.442308 | 1.342105 | 1.211009 | 1.350515 | 1.804878 | 1.756410 | 1.750000 | 1.558823 | 1.549020 | 2.433962 | 2.015151 | 1.878788 | 2.120000 | 1.862069 | 2.166667 | 2.483871 | 1.829787 | 2.088235 | 1.969697 | 1.961539 | 2.269231 | 1.12 | 1.766667 | 1.857143 | 0.800000 | 2.214286 | 1.111111 | 2.333333 | 1.4 | 3.25 | 2 | 1.571429 | 2.0 | 0 | NA | NA | 3 | 0 | NA | 2 | 3 | NA |
In this table I wanted to look average number of database appearances by race and age. Black individuals have higher numbers of appearances in police databases across the board but especially at younger ages. White individuals appear less when arrested for marijuana possesion especially at the younger ages. However, older adult whites appear much more than older blacks.
library(texreg)
checks_model_1 <- lm(checks ~ colour, data = Arrests)
checks_model_2 <- lm(checks ~ colour + age, data = Arrests)
checks_model_3 <- lm(checks ~ colour*age, data = Arrests)
screenreg(list(checks_model_1, checks_model_2, checks_model_3))
======================================================
Model 1 Model 2 Model 3
------------------------------------------------------
(Intercept) 2.10 *** 1.53 *** 1.88 ***
(0.04) (0.08) (0.13)
colourWhite -0.61 *** -0.58 *** -1.05 ***
(0.05) (0.05) (0.15)
age 0.02 *** 0.01
(0.00) (0.00)
colourWhite:age 0.02 ***
(0.01)
------------------------------------------------------
R^2 0.03 0.04 0.05
Adj. R^2 0.03 0.04 0.05
Num. obs. 5226 5226 5226
RMSE 1.52 1.50 1.50
======================================================
*** p < 0.001, ** p < 0.01, * p < 0.05
Our next step is to create the regression models. Our first model looks at the relationship between race and the number of appearances in police databases for a marijuana possesion arrest. With a slope of -.61, whites will appear .61 less times than blacks.
Looking at our second model, our first slope lowered to -.58 but still very close to our first model. Whites are still appearing ~.6 times less. Our second slope tells us with every one increase in age, the number of times you appear in other police databases increases by .02.
Now moving on to the third model, our first slope jumps to -1.05 indicating whites appear in one whole less database than blacks. Our second slope drops by .01. Every one increase in age increases the number of database checks by .01. We now have a third slope of .02. This tells us that every one year increase in age for a white individual increases the number of checks by .02.
checks_black <- Arrests %>%
filter(colour == 'Black')
checks_white <- Arrests %>%
filter(colour == 'White')
white_reg_model <- lm(checks ~ age, data = checks_white)
black_reg_model <- lm(checks ~ age, data = checks_black)
screenreg(list(white_reg_model, black_reg_model, checks_model_3), custom.model.names = c('White','Black','Combined'))
======================================================
White Black Combined
------------------------------------------------------
(Intercept) 0.83 *** 1.88 *** 1.88 ***
(0.07) (0.13) (0.13)
age 0.03 *** 0.01 0.01
(0.00) (0.00) (0.00)
colourWhite -1.05 ***
(0.15)
colourWhite:age 0.02 ***
(0.01)
------------------------------------------------------
R^2 0.02 0.00 0.05
Adj. R^2 0.02 0.00 0.05
Num. obs. 3938 1288 5226
RMSE 1.50 1.52 1.50
======================================================
*** p < 0.001, ** p < 0.01, * p < 0.05
Here we are performing a subgroup analysis by race. To do this we filter our dataset by colour and create two dataframes for black and white and create two regression models. This helps us understand the interaction effect and the .02 slope of our ‘Combined’ model. When you take the slope of the ‘White’ model and subtract it from the slope of the ‘Black’ model, the result is .02, which is the slope of the interaction of race and age on checks (checks_model_3).
We have our first intercept for Whites, which shows that when all independent variables are 0, whites appear in less than 1 database. For blacks however, when independent variables are set to 0, they appear in almost 2 police databases.
We can see that for every one year increase in age for Whites, the number of database checks they show up in increases by .03, whereas for blacks, they appear in .01 more. This is in line with the trend of the data when looking at checks by age and race. Older whites had higher number of checks compared to blacks. This is interesting as blacks overall had a higher average number of database appearances.
library(visreg)
visreg2d(checks_model_3,"colour", "age", scale='response', plot.type='image', color=c('blue','white','red'), main='Police Database Checks by Race & Age')
visreg(checks_model_3, 'age', by ='colour', scale='response', main='Police Database Checks by Race & Age')
After visualizing the data, we can see that blacks appear in more police databases upon arrest for marijuana possesion across all age groups except the oldest one, which whites appear more. These plots support the results from our regression models. At around age 20, white individuals do not even appear in one police database which means that arrest for marijuana posession may have been there first, whereas black individuals already appear in multiple databases.