DAT301 Midterm

Code

After importing the data set, it was necessary to clean up the data by removing columns that provided little relevant information. In addition, I renamed some columns to improve readability and consistency. The final dataset contains essential variables like arrest date, location, demographics, and charge descriptions

arrests_df <- read.csv("Arrests.csv")

arrests_df <- arrests_df %>%
    select(-c("Report.ID", "Report.Type", "Reporting.District", "Charge.Group.Code",
              "Charge.Description", "Address", "LAT", "LON",
              "Location", "Booking.Date", "Booking.Location", "Booking.Location.Code",
              "Area.ID", "Arrest.Type.Code", "Booking.Time", "Time", "Disposition.Description", "Charge")) %>%
    filter(!apply(., 1, function(row) any(is.na(row) | row == "")))

arrests_df <- arrests_df %>%
  rename(
    Arrest_Date = Arrest.Date,
    Area = Area.Name,
    Cross_Street = Cross.Street,
    Gender = Sex.Code,
    Race = Descent.Code,
    Charge_Description = Charge.Group.Description,
  ) 

arrests_df$Arrest_Date <- gsub(" 12:00:00 AM", "", arrests_df$Arrest_Date)

Los Angeles County Arrests (2020 - Present)

This dataset covers arrests in Los Angeles County from 2020 to March 2025, sourced from data.gov. The columns represent crucial details like the date of arrest, location, demographics (age, gender, race), and the type of charge (e.g., Prostitution, Robbery). The data provides insight into trends and patterns in criminal activity.

	Arrest_Date	Area	Age	Gender	Race	Charge_Description	Cross_Street
40168	02/28/2022	77th Street	17	M	B	Vehicle Theft	VERNON AV
30731	05/30/2023	Van Nuys	48	M	B	Other Assaults	VOSE
111816	06/17/2023	Central	24	M	H	Weapon (carry/poss)	BROADWAY
140702	10/22/2024	Rampart	37	M	W	Narcotic Drug Laws	ALVARADO ST
84105	01/20/2023	Foothill	24	M	H	Driving Under Influence	TRUESDALE ST
28795	04/27/2020	Pacific	16	F	B	Robbery	SANTA CLARA AV
57428	03/01/2020	Hollenbeck	18	M	B	Aggravated Assault	OLYMPIC
22229	01/24/2021	N Hollywood	29	M	H	Burglary	HAMLIN
81972	09/13/2022	Van Nuys	26	M	H	Moving Traffic Violations	GAULT
3943	11/23/2023	Van Nuys	28	M	H	Larceny	HARTLAND ST

Arrests per Month

This line chart visualizes monthly arrest numbers over several years, showing how they change when grouped by Month and Year. It highlights seasonal trends, like which months experience spikes in arrests. These variations might align with certain events or shifts in law enforcement activity, offering valuable insights into arrest patterns.

Gender vs Charge

This bar plot displays how gender is distributed across different charge types. By using side-by-side bars, it makes it simple to compare genders within each category, revealing trends such as whether certain crimes are more frequently associated with one gender than the other.

Slide 7

This pie chart breaks down crime distribution across different areas in Los Angeles. It highlights which locations see the most arrests, offering valuable insights into high crime zones. This kind of information could help guide decisions about allocating resources more effectively to address these issues.

Age, Location, and Charge Description

This 3D surface plot shows how age, location, and charge type are connected. It helps spot patterns, like whether certain age groups are more likely to commit specific crimes or if some areas have more of certain offenses.

Statistics

I conducted a t-test to compare the mean ages between females and males in the data set.

Since the p-value < 0.5 is well below the significance level, we reject the null hypothesis. This indicates that there is a significant difference in the mean ages between females and males, with males having a higher mean age of ~35.28 years old and females having a mean age of ~32.01 years.

## 
##  Welch Two Sample t-test
## 
## data:  Age by Gender
## t = -42.428, df = 39497, p-value < 2.2e-16
## alternative hypothesis: true difference in means between group F and group M is not equal to 0
## 95 percent confidence interval:
##  -3.426072 -3.123506
## sample estimates:
## mean in group F mean in group M 
##        32.00987        35.28466