Data Exploration

getwd()

## [1] "/Users/ingridellis/Desktop/CJS 310"

setwd("/Users/ingridellis/Desktop/CJS 310")

library(readxl)
Juvenile_Arrests <- read_excel("/Users/ingridellis/Desktop/CJS 310/Juvenile Arrests.xlsx")
head(Juvenile_Arrests)

## # A tibble: 6 × 9
##   ARREST_DATE  TOP_CHARGE_DESC HOME_PSA CRIME_PSA GIS_ID GLOBALID CREATED EDITED
##   <chr>        <chr>           <chr>    <chr>     <chr>  <chr>    <chr>   <chr> 
## 1 2011/03/01 … Robbery -- For… 304      305       Juven… {119216… 2021/0… 2024/…
## 2 2011/03/01 … Juvenile Custo… 304      504       Juven… {F11EC1… 2021/0… 2024/…
## 3 2011/03/01 … Felony Escapee… 501      501       Juven… {4AC0AE… 2021/0… 2024/…
## 4 2011/03/01 … UCSA Possessio… 605      605       Juven… {F62548… 2021/0… 2024/…
## 5 2011/03/01 … Theft 2nd Degr… 404      302       Juven… {64AC66… 2021/0… 2024/…
## 6 2011/03/01 … Simple Asssault 604      604       Juven… {E4AA41… 2021/0… 2024/…
## # ℹ 1 more variable: OBJECTID <dbl>

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.6
## ✔ forcats   1.0.1     ✔ stringr   1.6.0
## ✔ ggplot2   4.0.1     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.2
## ✔ purrr     1.2.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(ggplot2)

df<-Juvenile_Arrests

Converting the date format into time and day so I can manipulate it properly.

df$ARREST_DATE <- as.POSIXct(df$ARREST_DATE,
                      format = "%Y/%m/%d %H:%M:%S",
                      tz = "UTC")

df$YEAR <- as.numeric(format(df$ARREST_DATE, "%Y"))
df$MONTH <- as.numeric(format(df$ARREST_DATE, "%m"))
df$DAY <- as.numeric(format(df$ARREST_DATE, "%A"))

## Warning: NAs introduced by coercion

PLOT 1

ggplot(df, aes(x = factor(YEAR))) +
  geom_bar() +
  labs(
    title = "Juvenile Arrest Counts by Year",
    x = "Year",
    y = "Count"
  ) +
  theme_minimal()

This plot gives me a quick overview of the counts of arrests per year. This data comes from District of Columbia and spans from 2011 until 2024. Using some of my background knowledge, I know that juvenile arrests have been decreasing so these trends are aligned with the rest of the country. It looks like 2011 had the highest count, and 2021 had the lowest. I’d be interested to look into why it spiked in 2023 after being so low.

PLOT 2

ggplot(df, aes(x = factor(MONTH))) +
  geom_bar() +
  scale_x_discrete(labels = c("January", "February", "March", "April",
                              "May", "June", "July", "August",
                              "September", "October", "November", "December")) +
  labs(
    title = "Juvenile Arrest Counts by Month",
    x = "Month",
    y = "Count"
  ) +
  theme_minimal()

After exploring years, I went a little deaper and extracted the months. Because this is juvenile data, I was honestly expecting there to be a bit of higher counts in the summer when juveniles are outside of school. Since my time working at the police department, I have noticed that there is a little bit more crime when students come back from break rather than when they are on break.

df$DATE_ONLY <- as.Date(df$ARREST_DATE)

df$DOW <- factor(
  weekdays(df$DATE_ONLY),
  levels = c("Sunday", "Monday", "Tuesday", "Wednesday",
             "Thursday", "Friday", "Saturday")
)

PLOT 3

ggplot(df, aes(x = DOW)) +
  geom_bar() +
  theme_minimal() +
  labs(
    title = "Juvenile Arrests by Day of Week",
    x = "Day of Week",
    y = "Count"
  )

The days of the week are interestingly showing a normal distribution. The pattern is extremely uniform with the highest count in the middle of the week on wednesday. Because this is juvenile, it makes me think there may be more supervision on the weekends because parents are home from work.

Data Exploration

Ingrid Ellis

2026-02-22