Code

After importing the data set, it was necessary to clean up the data by removing columns that provided little relevant information. In addition, I renamed some columns to improve readability and consistency. The final dataset contains essential variables like arrest date, location, demographics, and charge descriptions

arrests_df <- read.csv("Arrests.csv")

arrests_df <- arrests_df %>%
    select(-c("Report.ID", "Report.Type", "Reporting.District", "Charge.Group.Code",
              "Charge.Description", "Address", "LAT", "LON",
              "Location", "Booking.Date", "Booking.Location", "Booking.Location.Code",
              "Area.ID", "Arrest.Type.Code", "Booking.Time", "Time", "Disposition.Description", "Charge")) %>%
    filter(!apply(., 1, function(row) any(is.na(row) | row == "")))

arrests_df <- arrests_df %>%
  rename(
    Arrest_Date = Arrest.Date,
    Area = Area.Name,
    Cross_Street = Cross.Street,
    Gender = Sex.Code,
    Race = Descent.Code,
    Charge_Description = Charge.Group.Description,
  ) 

arrests_df$Arrest_Date <- gsub(" 12:00:00 AM", "", arrests_df$Arrest_Date)

Los Angeles County Arrests (2020 - Present)

This dataset covers arrests in Los Angeles County from 2020 to March 2025, sourced from data.gov. The columns represent crucial details like the date of arrest, location, demographics (age, gender, race), and the type of charge (e.g., Prostitution, Robbery). The data provides insight into trends and patterns in criminal activity.

Arrest_Date Area Age Gender Race Charge_Description Cross_Street
40168 02/28/2022 77th Street 17 M B Vehicle Theft VERNON AV
30731 05/30/2023 Van Nuys 48 M B Other Assaults VOSE
111816 06/17/2023 Central 24 M H Weapon (carry/poss) BROADWAY
140702 10/22/2024 Rampart 37 M W Narcotic Drug Laws ALVARADO ST
84105 01/20/2023 Foothill 24 M H Driving Under Influence TRUESDALE ST
28795 04/27/2020 Pacific 16 F B Robbery SANTA CLARA AV
57428 03/01/2020 Hollenbeck 18 M B Aggravated Assault OLYMPIC
22229 01/24/2021 N Hollywood 29 M H Burglary HAMLIN
81972 09/13/2022 Van Nuys 26 M H Moving Traffic Violations GAULT
3943 11/23/2023 Van Nuys 28 M H Larceny HARTLAND ST

Arrests per Month

This line chart visualizes monthly arrest numbers over several years, showing how they change when grouped by Month and Year. It highlights seasonal trends, like which months experience spikes in arrests. These variations might align with certain events or shifts in law enforcement activity, offering valuable insights into arrest patterns.

Gender vs Charge

This bar plot displays how gender is distributed across different charge types. By using side-by-side bars, it makes it simple to compare genders within each category, revealing trends such as whether certain crimes are more frequently associated with one gender than the other.

Slide 7

This pie chart breaks down crime distribution across different areas in Los Angeles. It highlights which locations see the most arrests, offering valuable insights into high crime zones. This kind of information could help guide decisions about allocating resources more effectively to address these issues.

Age, Location, and Charge Description

This 3D surface plot shows how age, location, and charge type are connected. It helps spot patterns, like whether certain age groups are more likely to commit specific crimes or if some areas have more of certain offenses.

Statistics

I conducted a t-test to compare the mean ages between females and males in the data set.

Since the p-value < 0.5 is well below the significance level, we reject the null hypothesis. This indicates that there is a significant difference in the mean ages between females and males, with males having a higher mean age of ~35.28 years old and females having a mean age of ~32.01 years.

## 
##  Welch Two Sample t-test
## 
## data:  Age by Gender
## t = -42.428, df = 39497, p-value < 2.2e-16
## alternative hypothesis: true difference in means between group F and group M is not equal to 0
## 95 percent confidence interval:
##  -3.426072 -3.123506
## sample estimates:
## mean in group F mean in group M 
##        32.00987        35.28466