Introduction

In this project I want to do research on the relationship between crime commited and the age group of the suspect. I think it will give some important information as to what the age group of the suspects are that commit the most crimes. Other people would also care because it will give them information as to generally what age the suspect is most likely to commit the crimes and this will allow them to take precautionary and preventive measures to help lower and eradicate the crimes.

Data

Data Collection Method

I have collected data from the New York City Public data set for the NYPD Complaints that were filed Year to Date and analyze this data for the crimes and the relationships with age.

 I got the data from the New York City public safety website see link below.
 
 https://data.cityofnewyork.us/Public-Safety/NYPD-Complaint-Data-Historic/qgea-i56i/data
 

The data that I have collected is the Year to date data of 2019 crimes commited in new york city. In my research question I want to see if there is a relationship between the crimes commited and age group of the suspect

Cases

The cases are a crime commited and categorized in one of three categories felony, misdemeanor or violation. There are a total of 216182 cases in this data set I will be studying.

Dependent Variable

What is the response variable? Is it quantitative or qualitative?

The response variable is the LAW_CAT_CD and it is qualititive variable as it describes the category of crime commited Felony, Misdemeanor or violation

Independent Variable

You should have two independent variables, one quantitative and one qualitative.

The two independant variables will be gender of the suspect being qualitative and the second one will be the age of the suspect being quantitative.

Scope of Inference

I want to be able to see the proportion of people that have commited a felony and its relationship with Age Group. The subset of the population I will be focusing on is the age group 25-44 So I will be doing confidence interval and Hypothosis test for inference. I think the data can be used to describe how the age of the suspect may be related to the crime.

Exploratory Data Analysis.

I will begin my exploratory data analysis by looking at some of the data and drawing some charts to get an idea about the frequencies of the data.

Suspects Gender Analysis

summary(research_data$SUSP_SEX)
##             F      M      U 
##      0  49725 162495   3962
Sex<-table(research_data$SUSP_SEX)

barplot(Sex)

Analysis of the Crime Category

We can see below the proprtion of each crime category from this database below.

summary(research_data$LAW_CAT_CD)
##      FELONY MISDEMEANOR   VIOLATION 
##       59848      111733       44601
CrimeCategory<-table(research_data$LAW_CAT_CD)
barplot (CrimeCategory)

felonies<-subset(research_data,research_data$LAW_CAT_CD=="FELONY" )

totalFelonies<-nrow(felonies)

proportionOfFelonies<- (totalFelonies/totalCases)*100

We can see that the felonies are 27.684081 % of the data set on the crime for the year to date

Analysis on the Age Group with respect to the felonies Data

summary(felonies$lower_age)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00   18.00   25.00   25.97   25.00   65.00
barplot(table(felonies$lower_age))

We can see that the felonies data shows that the age group that is most common 25 to 44. These are adults who have reached the level of maturity.

Analysis on the Gender of the Suspects that commited a felony.

summary(felonies$SUSP_SEX)
##           F     M     U 
##     0  9591 49199  1058
barplot(table(felonies$SUSP_SEX))

Some Statstical Analysis using the Mosaic Plot

mosaicplot(table(research_data$LAW_CAT_CD,research_data$SUSP_SEX))

We can see that most suspects that commit the felony are male.

Inference

Below we will be preparing the data for inference . since our population of interest is the age group of 25-44 will be adding a column to the data set. Age_25_44 will be Yes 0r No depending on whether the age falls in that range.

felonies$Age_25_44<- ifelse(felonies$lower_age=="25" , "Yes","No")

head(felonies$Age_25_44)
## [1] "Yes" "No"  "No"  "No"  "Yes" "No"

We can see the first few rows in the data.

Applying the Conditions for Inference.

The Conditions for inference are the following :

1. The Sample observations are independant.

2. The sample must be large enough that n*p>10 and n*(1-p)>10

We know that the observations are independant. Now let us see if the second condition satisfies below.

n<-totalFelonies

age25<-subset(felonies,felonies$Age_25_44=="Yes")
p<-nrow(age25)/totalFelonies

n_p=n*p

n_p
## [1] 32768
N_OneMinus_P<- (1-p)*n

N_OneMinus_P
## [1] 27080
p
## [1] 0.5475204

I am pretty confident that the conditions for inference have been satisfied now we will go ahead now and calculate the standard error and prepare for calculating the 95 % Confidence Interval.

Theoretical Inference

If the conditions for inference are reasonable, we can either calculate the standard error and construct the interval by hand, or allow the inference function to do it for us.

inference(y = felonies$Age_25_44, data = felonies, statistic = "proportion",
          type = "ci",
          method = "theoretical", 
          success = "Yes")
## Single categorical variable, success: Yes
## n = 59848, p-hat = 0.5475
## 95% CI: (0.5435 , 0.5515)

Simulated Inference

inference(y = felonies$Age_25_44, data = felonies, statistic = "proportion",
          type = "ci",
          boot_method = "perc",
          method = "simulation", 
          success = "Yes")
## Single categorical variable, success: Yes
## n = 59848, p-hat = 0.5475
## 95% CI: (0.5435 , 0.5515)

Analysis

So based on the above methods we can see that the confidence interval has been created and the range in both using the theoritical methods and simulated method are similar. We also are 95% confident that the proprotion of criminals that commit felonies and are in the range of 25-44 will fall in that range.

Conclusion

Based on our analysis and calculation of inference on the ppoulation of interest we can clearly see the percentage of adults in the range 25-44 are more likely to commit a felony and also the inference data shows that we are 95 % confident that they will fall in the range of 54 to 55 percent of the felonies commited.

Future Research.

I think the Crimes data gives some very interesting insight in to the types of crimes commited and the relationship between gender and Age groups. I think if this data can be used to do further research on the crimes and its relationship with other factors like poverty level and education and it could prove to be a very valuable tool to hep prevent crime by doing workshops and awareness for the society and young adults so we could become a society that is free of crime and focuses on advancment and serving the humanity.