Project #2.1

Hypothesis

I believe that data will support that black males are stopped and frisked significantly more than any other group. I will also determine the success rate of the stops (success= arrest made, failure= no arrest made).

Loading Data and required libraries.

Our first step is to load the .CSV file that we created.

library(plyr)
library(stringr)
library(ggplot2)
stopandfrisk <- read.csv("https://raw.githubusercontent.com/komotunde/DATA607/master/Project%202/stopandfrisk2015.csv")
View(stopandfrisk)

Now that our file is loaded, I want to first create a subset that includes just the columns that I will be working with for this project.

stopandfrisk <- as.data.frame(stopandfrisk[c(2, 4, 10, 15,23, 81, 82, 84)]) #this will give me the columns pertaining to race and age
is.data.frame(stopandfrisk)  #i did this just to verify that I have a data frame

## [1] TRUE

head(stopandfrisk)

##   pct datestop crimsusp arstmade frisked sex race age
## 1  61  1012015   FELONY        N       Y   M    W  33
## 2  22  1152015   FELONY        N       Y   M    B  14
## 3  20  1292015     MISD        N       N   M    B  14
## 4  20  1292015     MIDS        N       N   M    B  14
## 5  20  1292015     MISD        N       N   M    B  13
## 6  20  1292015     MISD        N       N   M    W  13

Now that we have the data we want to work with, I’m going to start manipulating the data. The first thing we are going to do is rename the columns of our data frame. We want the names to be more meaningful. Once they have been renamed, I wanted to see how many occasions ended in an arrest. I did so by creatin variables and counting how many times yes and no occurr.

colnames(stopandfrisk) <- c("pct" = "Precinct", "datestop" = "Date","crimsusp" = "Suspected", "arstmade" = "Arrested","frisked" = "Frisked", "sex" = "Sex", "race" = "Race", "age" = "Age")
head(stopandfrisk)

##   Precinct    Date Suspected Arrested Frisked Sex Race Age
## 1       61 1012015    FELONY        N       Y   M    W  33
## 2       22 1152015    FELONY        N       Y   M    B  14
## 3       20 1292015      MISD        N       N   M    B  14
## 4       20 1292015      MIDS        N       N   M    B  14
## 5       20 1292015      MISD        N       N   M    B  13
## 6       20 1292015      MISD        N       N   M    W  13

arrests <- stopandfrisk[, "Arrested"]
arrest.yes <- arrests == 'Y'
arrest.no <- arrests == 'N'
sum(arrest.yes)  #this gives us the number of incidents where Y was in the Arrested column, giving us the number of those arrested

## [1] 3968

sum(arrest.no)  #this gives us the number of incidents where N is in the Arrested column, giving us the number of those not arrested

## [1] 18595

rate.success <- 3968/18595
rate.success

## [1] 0.2133907

#of the 22563 incidents, we had 18595 non-arrests and 3968 arrests, a 'success' rate of 21%. This seems like it may not be effective

I want to do the same from above for the number of incidents where frisked occurred and compare to number of arrests

frisks <- stopandfrisk[, "Frisked"]
frisk.yes <- frisks == 'Y'
frisk.no <- frisks == 'N'
sum(frisk.yes)  #this gives us the number of incidents where Y was in the Frisked column, meaning that the individual stopped was frisked

## [1] 15257

sum(frisk.no)  #this gives us the number of incidents where N is in the Frisked column, meaning that the individual stoped was frisked

## [1] 7306

#we see that 15257 people were frisked, while 7306 were not.

I already have a feeling that the majority of those stopped will be male, but let us check and see. We will also check and see how race plays into these number.

sex <- stopandfrisk[, "Sex"]
sex.male <- sex == 'M'
sex.female <- sex == 'F'
sum(sex.male)  #this gives us the number of incidents where M was in the Sex column, indicating the number of males that were stopped

## [1] 20853

sum(sex.female)  #this gives us the number of incidents where F is in the Sex column, indicating the number of females that were stopped

## [1] 1515

#only 1515 of those stopped were female, while 20853 were male meaning that a staggering 92 percent of those  stopped were male.

20853/22563

## [1] 0.9242122

black.male <- subset(stopandfrisk, Race == 'B' & Sex == 'M')
nrow(black.male)

## [1] 11131

#We see that the total number of individuals stopped is 11,131, about 50% of the total number of individuals stopped.

I want to see how many of the people were stopped, frisked, and arrested. I’ll create a subset of this group and see how many are male and how many are black.

all.three <- subset(stopandfrisk, Frisked == 'Y' & Arrested == 'Y')
nrow(all.three) #This shows us that there were 3067 individuals who were stopped, frisked, and arrested

## [1] 3067

I would like to see how many of those that were stopped, frisked and arrested were black and compare to the ratio of the total number of blacks who were stopped.

all.three.race <- subset(stopandfrisk, Frisked == 'Y'& Arrested == 'Y' & Race == 'B')
nrow(all.three.race)

## [1] 1535

ratio <- nrow(all.three.race)/nrow(all.three)
ratio #about 50% of those who were stopped, frisked and arrested were Black

## [1] 0.5004891

black.stopped <- subset(stopandfrisk, Race == 'B')
nrow(black.stopped) #11950 black individuals were stopped and 10613 other races

## [1] 11950

other.races <- nrow(stopandfrisk)-nrow(black.stopped)
other.races

## [1] 10613

nrow(black.stopped)/nrow(stopandfrisk) #about 52% of those stopped were black

## [1] 0.5296282

2088510/8550405

## [1] 0.2442586

I would like to now get a better understanding of my data visually. I would like a to visually show the race of those individuals stopped.

plot.race <- count(stopandfrisk, vars = "Race")
colnames(plot.race) <- c("Race" = "Race", "freq" = "Total")
barplot(plot.race$Total, main = "Stop and Frisk: RACE", xlab = "RACE", ylab = "AMOUNT", names.arg = plot.race$Race, col = "blue", border = "dark blue", density = c(10, 90, 10,10,10,10,10,10))

#This gives us the visual representation of how many people from each race is stopped. Black are stopped at a significantlly higher rate.

The last thing I would like to explore is the age of those individuals that are stopped. Since our age column is numeric already, we can easily do this.

age.plot <- subset(stopandfrisk, Age > 13 & Age < 100)
head(age.plot)

##   Precinct    Date Suspected Arrested Frisked Sex Race Age
## 1       61 1012015    FELONY        N       Y   M    W  33
## 2       22 1152015    FELONY        N       Y   M    B  14
## 3       20 1292015      MISD        N       N   M    B  14
## 4       20 1292015      MIDS        N       N   M    B  14
## 7       67 2062015       FEL        N       Y   M    B  25
## 8        7 2072015       FEL        N       Y   M    B  15

ages <- age.plot$Age
summary(ages)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   14.00   20.00   24.00   27.82   33.00   99.00

mode <- function(ages) {
  unique.age <- unique(ages)
  unique.age[which.max(tabulate(match(ages, unique.age)))]
}  #from: https://www.tutorialspoint.com/r/r_mean_median_mode.htm
mode(ages)

## [1] 20

#the average of those stopped is approximately 29, median age is 24 and mode is 20

*To show this visually:

hist(age.plot$Age, main = "Stop and Frisk: AGE", xlab = "AGE", col = "blue", xlim = c(10,75), breaks = 20)

Conclusion

The population of blacks in NYC was 2,088,510 as of July 2015. The total populaion of NY for the same time was 8,550,405. This means that blacks make up approximately 25 percent of the population but make up about 52 percent of those inviduals stopped AND 50 percent of those people were arrested when stopped and frisked. The majority of individuals stopped had ages ranging from 15-25. This is not a normal distribution and I was not expecting it to be a normal distribution since your chances of getting stopped decrease significantly.

Project #2.1

Oluwakemi Omotunde

October 9, 2016

Task

We are to choose any three of the “wide” datasets identified in the Week 5 Discussion items and for each:

Introduction

Hypothesis

I believe that data will support that black males are stopped and frisked significantly more than any other group. I will also determine the success rate of the stops (success= arrest made, failure= no arrest made).

Loading Data and required libraries.

Our first step is to load the .CSV file that we created.

Now that our file is loaded, I want to first create a subset that includes just the columns that I will be working with for this project.

I want to do the same from above for the number of incidents where frisked occurred and compare to number of arrests

I already have a feeling that the majority of those stopped will be male, but let us check and see. We will also check and see how race plays into these number.

I want to see how many of the people were stopped, frisked, and arrested. I’ll create a subset of this group and see how many are male and how many are black.

I would like to see how many of those that were stopped, frisked and arrested were black and compare to the ratio of the total number of blacks who were stopped.

I would like to now get a better understanding of my data visually. I would like a to visually show the race of those individuals stopped.

The last thing I would like to explore is the age of those individuals that are stopped. Since our age column is numeric already, we can easily do this.

*To show this visually:

Conclusion