getwd()
## [1] "/Users/ingridellis/Desktop/CJS 310"
setwd("/Users/ingridellis/Desktop/CJS 310")
library(readxl)
Juvenile_Arrests <- read_excel("/Users/ingridellis/Desktop/CJS 310/Juvenile Arrests.xlsx")
head(Juvenile_Arrests)
## # A tibble: 6 × 9
## ARREST_DATE TOP_CHARGE_DESC HOME_PSA CRIME_PSA GIS_ID GLOBALID CREATED EDITED
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 2011/03/01 … Robbery -- For… 304 305 Juven… {119216… 2021/0… 2024/…
## 2 2011/03/01 … Juvenile Custo… 304 504 Juven… {F11EC1… 2021/0… 2024/…
## 3 2011/03/01 … Felony Escapee… 501 501 Juven… {4AC0AE… 2021/0… 2024/…
## 4 2011/03/01 … UCSA Possessio… 605 605 Juven… {F62548… 2021/0… 2024/…
## 5 2011/03/01 … Theft 2nd Degr… 404 302 Juven… {64AC66… 2021/0… 2024/…
## 6 2011/03/01 … Simple Asssault 604 604 Juven… {E4AA41… 2021/0… 2024/…
## # ℹ 1 more variable: OBJECTID <dbl>
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.6
## ✔ forcats 1.0.1 ✔ stringr 1.6.0
## ✔ ggplot2 4.0.1 ✔ tibble 3.3.0
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.2
## ✔ purrr 1.2.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)
df<-Juvenile_Arrests
Converting the date format into time and day so I can manipulate it properly.
df$ARREST_DATE <- as.POSIXct(df$ARREST_DATE,
format = "%Y/%m/%d %H:%M:%S",
tz = "UTC")
df$YEAR <- as.numeric(format(df$ARREST_DATE, "%Y"))
df$MONTH <- as.numeric(format(df$ARREST_DATE, "%m"))
df$DAY <- as.numeric(format(df$ARREST_DATE, "%A"))
## Warning: NAs introduced by coercion
PLOT 1
ggplot(df, aes(x = factor(YEAR))) +
geom_bar() +
labs(
title = "Juvenile Arrest Counts by Year",
x = "Year",
y = "Count"
) +
theme_minimal()
This plot gives me a quick overview of the counts of arrests per year. This data comes from District of Columbia and spans from 2011 until 2024. Using some of my background knowledge, I know that juvenile arrests have been decreasing so these trends are aligned with the rest of the country. It looks like 2011 had the highest count, and 2021 had the lowest. I’d be interested to look into why it spiked in 2023 after being so low.
PLOT 2
ggplot(df, aes(x = factor(MONTH))) +
geom_bar() +
scale_x_discrete(labels = c("January", "February", "March", "April",
"May", "June", "July", "August",
"September", "October", "November", "December")) +
labs(
title = "Juvenile Arrest Counts by Month",
x = "Month",
y = "Count"
) +
theme_minimal()
After exploring years, I went a little deaper and extracted the months.
Because this is juvenile data, I was honestly expecting there to be a
bit of higher counts in the summer when juveniles are outside of school.
Since my time working at the police department, I have noticed that
there is a little bit more crime when students come back from break
rather than when they are on break.
df$DATE_ONLY <- as.Date(df$ARREST_DATE)
df$DOW <- factor(
weekdays(df$DATE_ONLY),
levels = c("Sunday", "Monday", "Tuesday", "Wednesday",
"Thursday", "Friday", "Saturday")
)
PLOT 3
ggplot(df, aes(x = DOW)) +
geom_bar() +
theme_minimal() +
labs(
title = "Juvenile Arrests by Day of Week",
x = "Day of Week",
y = "Count"
)
The days of the week are interestingly showing a normal distribution. The pattern is extremely uniform with the highest count in the middle of the week on wednesday. Because this is juvenile, it makes me think there may be more supervision on the weekends because parents are home from work.