Project Background and Description:
This project uses data on police treatment of individuals arrested in Toronto
for simple possession of small quantities of marijuana.
The data are part of a larger data set featured in a series of articles in the Toronto Star newspaper.
Full information at:
Data at:
Data Selection
To better understand the background of adults who continuously use Marijuana, I selected all the arrested individuals with more than one previous police records related to Marijuana. 2002 is excluded due to number of variables are low.
Questions
Although the data are part of a larger data set ,it could still provide some insights of the following:
Therefore, a further analysis can be performed in certain area base on the result we found on this project.
Data wrangling
The code for data wrangling listed below:
urlfile <-'https://raw.githubusercontent.com/jayleecunysps/AssignmentforSPS/main/Arrests.csv'
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
Arrests<-read.csv(url(urlfile))
Arrests <- data.frame(Arrests)
Adultarrests <- subset(Arrests,age>17&checks>0&year<2002)
colnames(Adultarrests) <- c("released","race","arrest_year","arrest_age","sex","employed","citizen","policedatabase_in_records")
Adultarrests$employed <- as.character(Adultarrests$employed)
Adultarrests$released <- as.character(Adultarrests$released)
Adultarrests$employed [Adultarrests$employed == "Yes"] <- "Employed"
Adultarrests$employed [Adultarrests$employed == "No"] <- "Unemployed"
Adultarrests$released [Adultarrests$released == "Yes"] <- "Released"
Adultarrests$released [Adultarrests$released == "No"] <- "Unreleased"
Adultarrests$employed <- as.factor(Adultarrests$employed)
Adultarrests$released <- as.factor(Adultarrests$released)
The following paged table contains the full data set of target population.
library(rmarkdown)
paged_table(Adultarrests)
The following summary helps us to understanding the sample size better.
We can see mean and median of arrest age and number of records in police database is consistent during 1997 to 2001.
From the summary we can tell
summary(Adultarrests)
## released race arrest_year arrest_age
## Released :2082 Length:2644 Min. :1997 Min. :18.00
## Unreleased: 562 Class :character 1st Qu.:1998 1st Qu.:20.00
## Mode :character Median :1999 Median :23.00
## Mean :1999 Mean :26.46
## 3rd Qu.:2001 3rd Qu.:32.00
## Max. :2001 Max. :66.00
## sex employed citizen
## Length:2644 Employed :1872 Length:2644
## Class :character Unemployed: 772 Class :character
## Mode :character Mode :character
##
##
##
## policedatabase_in_records
## Min. :1.00
## 1st Qu.:2.00
## Median :3.00
## Mean :2.57
## 3rd Qu.:3.00
## Max. :6.00
aggregate(cbind(arrest_age,policedatabase_in_records) ~ arrest_year,Adultarrests,mean)
## arrest_year arrest_age policedatabase_in_records
## 1 1997 26.50923 2.767528
## 2 1998 26.10129 2.601293
## 3 1999 26.55424 2.620339
## 4 2000 26.31659 2.540335
## 5 2001 26.75529 2.450151
aggregate(cbind(arrest_age,policedatabase_in_records) ~ arrest_year,Adultarrests,median)
## arrest_year arrest_age policedatabase_in_records
## 1 1997 23 3.0
## 2 1998 23 3.0
## 3 1999 24 3.0
## 4 2000 23 3.0
## 5 2001 23 2.5
From the summary, we can see average arrested age do not have significant changes in years.
Let’s look at more trends on possession of small quantities of marijuana to see what kinds of other insights we can find!
library(ggplot2)
ggplot (Adultarrests, aes(x=released, y=policedatabase_in_records, fill=released)) + geom_col() + xlab("Current Situation") + ylab("Total Police Records")
employedcount <- table(Adultarrests$employed)
lbls <- c("Employed","Unemployed")
pct <- round(employedcount/sum(employedcount)*100)
lbls <-paste(lbls,pct)
lbls <-paste(lbls,"%")
pie(employedcount, labels=lbls, main ="Pie chart of employment of arrested people")
yearcount <- table(Adultarrests$employed,Adultarrests$arrest_year)
barplot(yearcount, main="Year Distribution", xlab = "Number of Arrests",
col=c("darkblue","red"),legend = rownames(yearcount))
agecounts <- table(Adultarrests$employed,Adultarrests$arrest_age)
barplot(agecounts, main="Age Distribution", xlab = "Number of Arrests", col=c("darkblue","red"),legend = rownames(agecounts))
Suggestion for further analysis