Now that our file is loaded, I want to first create a subset that includes just the columns that I will be working with for this project.
stopandfrisk <- as.data.frame(stopandfrisk[c(2, 4, 10, 15,23, 81, 82, 84)]) #this will give me the columns pertaining to race and age
is.data.frame(stopandfrisk) #i did this just to verify that I have a data frame
## [1] TRUE
head(stopandfrisk)
## pct datestop crimsusp arstmade frisked sex race age
## 1 61 1012015 FELONY N Y M W 33
## 2 22 1152015 FELONY N Y M B 14
## 3 20 1292015 MISD N N M B 14
## 4 20 1292015 MIDS N N M B 14
## 5 20 1292015 MISD N N M B 13
## 6 20 1292015 MISD N N M W 13
Now that we have the data we want to work with, I’m going to start manipulating the data. The first thing we are going to do is rename the columns of our data frame. We want the names to be more meaningful. Once they have been renamed, I wanted to see how many occasions ended in an arrest. I did so by creatin variables and counting how many times yes and no occurr.
colnames(stopandfrisk) <- c("pct" = "Precinct", "datestop" = "Date","crimsusp" = "Suspected", "arstmade" = "Arrested","frisked" = "Frisked", "sex" = "Sex", "race" = "Race", "age" = "Age")
head(stopandfrisk)
## Precinct Date Suspected Arrested Frisked Sex Race Age
## 1 61 1012015 FELONY N Y M W 33
## 2 22 1152015 FELONY N Y M B 14
## 3 20 1292015 MISD N N M B 14
## 4 20 1292015 MIDS N N M B 14
## 5 20 1292015 MISD N N M B 13
## 6 20 1292015 MISD N N M W 13
arrests <- stopandfrisk[, "Arrested"]
arrest.yes <- arrests == 'Y'
arrest.no <- arrests == 'N'
sum(arrest.yes) #this gives us the number of incidents where Y was in the Arrested column, giving us the number of those arrested
## [1] 3968
sum(arrest.no) #this gives us the number of incidents where N is in the Arrested column, giving us the number of those not arrested
## [1] 18595
rate.success <- 3968/18595
rate.success
## [1] 0.2133907
#of the 22563 incidents, we had 18595 non-arrests and 3968 arrests, a 'success' rate of 21%. This seems like it may not be effective
I want to do the same from above for the number of incidents where frisked occurred and compare to number of arrests
frisks <- stopandfrisk[, "Frisked"]
frisk.yes <- frisks == 'Y'
frisk.no <- frisks == 'N'
sum(frisk.yes) #this gives us the number of incidents where Y was in the Frisked column, meaning that the individual stopped was frisked
## [1] 15257
sum(frisk.no) #this gives us the number of incidents where N is in the Frisked column, meaning that the individual stoped was frisked
## [1] 7306
#we see that 15257 people were frisked, while 7306 were not.
I already have a feeling that the majority of those stopped will be male, but let us check and see. We will also check and see how race plays into these number.
sex <- stopandfrisk[, "Sex"]
sex.male <- sex == 'M'
sex.female <- sex == 'F'
sum(sex.male) #this gives us the number of incidents where M was in the Sex column, indicating the number of males that were stopped
## [1] 20853
sum(sex.female) #this gives us the number of incidents where F is in the Sex column, indicating the number of females that were stopped
## [1] 1515
#only 1515 of those stopped were female, while 20853 were male meaning that a staggering 92 percent of those stopped were male.
20853/22563
## [1] 0.9242122
black.male <- subset(stopandfrisk, Race == 'B' & Sex == 'M')
nrow(black.male)
## [1] 11131
#We see that the total number of individuals stopped is 11,131, about 50% of the total number of individuals stopped.
I want to see how many of the people were stopped, frisked, and arrested. I’ll create a subset of this group and see how many are male and how many are black.
all.three <- subset(stopandfrisk, Frisked == 'Y' & Arrested == 'Y')
nrow(all.three) #This shows us that there were 3067 individuals who were stopped, frisked, and arrested
## [1] 3067
I would like to see how many of those that were stopped, frisked and arrested were black and compare to the ratio of the total number of blacks who were stopped.
all.three.race <- subset(stopandfrisk, Frisked == 'Y'& Arrested == 'Y' & Race == 'B')
nrow(all.three.race)
## [1] 1535
ratio <- nrow(all.three.race)/nrow(all.three)
ratio #about 50% of those who were stopped, frisked and arrested were Black
## [1] 0.5004891
black.stopped <- subset(stopandfrisk, Race == 'B')
nrow(black.stopped) #11950 black individuals were stopped and 10613 other races
## [1] 11950
other.races <- nrow(stopandfrisk)-nrow(black.stopped)
other.races
## [1] 10613
nrow(black.stopped)/nrow(stopandfrisk) #about 52% of those stopped were black
## [1] 0.5296282
2088510/8550405
## [1] 0.2442586
I would like to now get a better understanding of my data visually. I would like a to visually show the race of those individuals stopped.
plot.race <- count(stopandfrisk, vars = "Race")
colnames(plot.race) <- c("Race" = "Race", "freq" = "Total")
barplot(plot.race$Total, main = "Stop and Frisk: RACE", xlab = "RACE", ylab = "AMOUNT", names.arg = plot.race$Race, col = "blue", border = "dark blue", density = c(10, 90, 10,10,10,10,10,10))

#This gives us the visual representation of how many people from each race is stopped. Black are stopped at a significantlly higher rate.
The last thing I would like to explore is the age of those individuals that are stopped. Since our age column is numeric already, we can easily do this.
age.plot <- subset(stopandfrisk, Age > 13 & Age < 100)
head(age.plot)
## Precinct Date Suspected Arrested Frisked Sex Race Age
## 1 61 1012015 FELONY N Y M W 33
## 2 22 1152015 FELONY N Y M B 14
## 3 20 1292015 MISD N N M B 14
## 4 20 1292015 MIDS N N M B 14
## 7 67 2062015 FEL N Y M B 25
## 8 7 2072015 FEL N Y M B 15
ages <- age.plot$Age
summary(ages)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 14.00 20.00 24.00 27.82 33.00 99.00
mode <- function(ages) {
unique.age <- unique(ages)
unique.age[which.max(tabulate(match(ages, unique.age)))]
} #from: https://www.tutorialspoint.com/r/r_mean_median_mode.htm
mode(ages)
## [1] 20
#the average of those stopped is approximately 29, median age is 24 and mode is 20
*To show this visually:
hist(age.plot$Age, main = "Stop and Frisk: AGE", xlab = "AGE", col = "blue", xlim = c(10,75), breaks = 20)
