Crime in Los Angeles

Author

D Devkota

Crime in Los Angeles from 2020 - 2024

This project analyzes crime in Los Angeles from 2020 to 2024. This data was collected by Los Angeles Police Department(LAPD) and includes crime type, victim sex, area name, date, time of occurence, latitude and longitude. So, The main question I want to explore are which areas have the highest number of crimes? How does victim age differ by crime type? Are crimes clustered in certain locations?

Loading Libraries

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.2.0     ✔ readr     2.1.6
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.2     ✔ tibble    3.3.1
✔ lubridate 1.9.5     ✔ tidyr     1.3.2
✔ purrr     1.2.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(janitor)

Attaching package: 'janitor'

The following objects are masked from 'package:stats':

    chisq.test, fisher.test
library(gganimate)
library(maps)

Attaching package: 'maps'

The following object is masked from 'package:purrr':

    map
library(leaflet)

Loading Dataset

crime <- read_csv("Crime_Data_from_2020_to_2024.csv")
Warning: One or more parsing issues, call `problems()` on your data frame for details,
e.g.:
  dat <- vroom(...)
  problems(dat)
Rows: 1004894 Columns: 28
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (16): Date Rptd, DATE OCC, TIME OCC, AREA, AREA NAME, Rpt Dist No, Crm C...
dbl (11): DR_NO, Part 1-2, Crm Cd, Vict Age, Premis Cd, Weapon Used Cd, Crm ...
lgl  (1): Crm Cd 4

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
glimpse(crime)
Rows: 1,004,894
Columns: 28
$ DR_NO            <dbl> 211507896, 201516622, 240913563, 210704711, 201418201…
$ `Date Rptd`      <chr> "04/11/2021 12:00:00 AM", "10/21/2020 12:00:00 AM", "…
$ `DATE OCC`       <chr> "11/07/2020 12:00:00 AM", "10/18/2020 12:00:00 AM", "…
$ `TIME OCC`       <chr> "0845", "1845", "1240", "1310", "1830", "1210", "1350…
$ AREA             <chr> "15", "15", "09", "07", "14", "04", "03", "11", "17",…
$ `AREA NAME`      <chr> "N Hollywood", "N Hollywood", "Van Nuys", "Wilshire",…
$ `Rpt Dist No`    <chr> "1502", "1521", "0933", "0782", "1454", "0429", "0396…
$ `Part 1-2`       <dbl> 2, 1, 2, 1, 1, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 2, 2,…
$ `Crm Cd`         <dbl> 354, 230, 354, 331, 420, 354, 354, 812, 354, 354, 812…
$ `Crm Cd Desc`    <chr> "THEFT OF IDENTITY", "ASSAULT WITH DEADLY WEAPON, AGG…
$ Mocodes          <chr> "0377", "0416 0334 2004 1822 1414 0305 0319 0400", "0…
$ `Vict Age`       <dbl> 31, 32, 30, 47, 63, 35, 21, 14, 43, 57, 13, 34, 0, 0,…
$ `Vict Sex`       <chr> "M", "M", "M", "F", "M", "M", "F", "F", "M", "M", "M"…
$ `Vict Descent`   <chr> "H", "H", "W", "A", "H", "B", "B", "H", "W", "W", "H"…
$ `Premis Cd`      <dbl> 501, 102, 501, 101, 103, 502, 501, 121, 501, 501, 501…
$ `Premis Desc`    <chr> "SINGLE FAMILY DWELLING", "SIDEWALK", "SINGLE FAMILY …
$ `Weapon Used Cd` <dbl> NA, 200, NA, NA, NA, NA, NA, 500, NA, NA, 400, NA, NA…
$ `Weapon Desc`    <chr> NA, "KNIFE WITH BLADE 6INCHES OR LESS", NA, NA, NA, N…
$ Status           <chr> "IC", "IC", "IC", "IC", "IC", "IC", "IC", "AO", "IC",…
$ `Status Desc`    <chr> "Invest Cont", "Invest Cont", "Invest Cont", "Invest …
$ `Crm Cd 1`       <dbl> 354, 230, 354, 331, 420, 354, 354, 812, 354, 354, 812…
$ `Crm Cd 2`       <dbl> NA, NA, NA, NA, NA, NA, NA, 860, NA, NA, 860, NA, NA,…
$ `Crm Cd 3`       <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ `Crm Cd 4`       <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ LOCATION         <chr> "7800    BEEMAN                       AV", "ATOLL    …
$ `Cross Street`   <chr> NA, "N  GAULT", NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ LAT              <dbl> 34.2124, 34.1993, 34.1847, 34.0339, 33.9813, 34.0830,…
$ LON              <dbl> -118.4092, -118.4203, -118.4509, -118.3747, -118.4350…

Cleaning and Filtering

crime <- crime %>%
  clean_names()
crime_clean <- crime %>%
  filter(!is.na(vict_age)) %>%
  filter(vict_age > 0, vict_age < 100) %>%
  filter(!is.na(lat), !is.na(lon)) %>%
  filter(lat != 0, lon != 0) %>%
  filter(!is.na(area_name)) %>%
  filter(!is.na(crm_cd_desc)) %>%
  filter(!is.na(date_occ))

Grouping and Filtering again

crime_by_area <- crime_clean %>%
  group_by(area_name) %>%
  summarize(total_crimes = n())

Creating Time Variables

crime_clean$time_occ <- as.numeric(crime_clean$time_occ)
crime_clean$hour <- crime_clean$time_occ / 100
crime_sample <- crime_clean %>%
  filter(vict_age > 18)

Creating Multiple Linear Reggression

model <- lm(vict_age ~ hour + area + crm_cd, data = crime_clean)
summary(model)

Call:
lm(formula = vict_age ~ hour + area + crm_cd, data = crime_clean)

Residuals:
    Min      1Q  Median      3Q     Max 
-41.481 -11.684  -2.863  10.195  63.839 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  3.884e+01  8.599e-02 451.628  < 2e-16 ***
hour        -4.369e-02  2.770e-03 -15.774  < 2e-16 ***
area02      -4.101e-01  1.088e-01  -3.769 0.000164 ***
area03      -2.296e+00  9.801e-02 -23.429  < 2e-16 ***
area04       1.379e+00  1.206e-01  11.435  < 2e-16 ***
area05       2.612e+00  1.151e-01  22.701  < 2e-16 ***
area06      -1.455e-01  1.035e-01  -1.406 0.159864    
area07       1.813e+00  1.063e-01  17.058  < 2e-16 ***
area08       5.031e+00  1.086e-01  46.310  < 2e-16 ***
area09       2.941e+00  1.084e-01  27.116  < 2e-16 ***
area10       4.535e+00  1.110e-01  40.842  < 2e-16 ***
area11       3.048e+00  1.128e-01  27.012  < 2e-16 ***
area12       5.366e-01  9.887e-02   5.427 5.72e-08 ***
area13      -7.224e-01  1.085e-01  -6.661 2.73e-11 ***
area14       3.025e+00  1.013e-01  29.872  < 2e-16 ***
area15       2.313e+00  1.059e-01  21.846  < 2e-16 ***
area16       2.375e+00  1.198e-01  19.821  < 2e-16 ***
area17       4.520e+00  1.131e-01  39.957  < 2e-16 ***
area18      -8.689e-02  1.056e-01  -0.823 0.410651    
area19       9.937e-01  1.121e-01   8.867  < 2e-16 ***
area20       6.379e-01  1.059e-01   6.022 1.73e-09 ***
area21       3.381e+00  1.075e-01  31.464  < 2e-16 ***
crm_cd      -5.156e-04  8.199e-05  -6.288 3.21e-10 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 15.45 on 733863 degrees of freedom
Multiple R-squared:  0.01588,   Adjusted R-squared:  0.01585 
F-statistic: 538.3 on 22 and 733863 DF,  p-value: < 2.2e-16

Creating Bar Graph showing Crimes by Area

ggplot(crime_by_area,
       aes(x = area_name,
           y = total_crimes,
           fill = area_name)) +
  geom_col() +
  scale_fill_manual(values = c(
    "red",
    "blue",
    "green",
    "orange",
    "purple",
    "black",
    "lightgrey",
    "brown",
    "cyan",
    "darkgrey",
    "gold",
    "lightgreen",
    "salmon",
    "darkred",
    "white",
    "pink",
    "grey",
    "darkgreen",
    "yellow",
    "navy",
    "skyblue"
  )) +
  theme_dark() +
  labs(
    title = "Crime Count by Area",
    x = "Area",
    y = "Total Crimes"
  )

This bar graph shows the total number of crimes in each Los Angeles area. The graph helps compare which areas reported more crimes during 2020 to 2024.

Creating Scatterplot Showing Ctime Hour and Victim Age

ggplot(crime_sample,
       aes(x = hour,
           y = vict_age)) +
  geom_point(color = "cyan") +
  theme_classic()

This scatterplot shows crime hour and victim age where each point represents a crime report. The graph helps show whether victim age changes based on the time of day.

Filtering for map

crime_map <- crime_clean %>%
  filter(vict_age > 18) %>%
  filter(vict_sex != "X") %>%
  filter(row_number() <= 5000)

Creating Map showing Crime Location in LA

leaflet() |>
  setView(lng = -118.2437,
          lat = 34.0522,
          zoom = 9) |>
  addProviderTiles("Esri.NatGeoWorldMap") |>
  addCircles(
    lat = ~lat,
    lng = ~lon,
    data = crime_map,
    color = "red",
    fillColor = "yellow",
    radius = 50,
    fillOpacity = 0.75
  )

This map shows crime locations in Los Angeles with latitude and longitude. The points show where crimes were reported and help identify location patterns. Since the dataset has more than one million record the map became too crowded, so I filtered the data to make the map clearer

crime_clean$year <- format(as.Date(crime_clean$date_occ,
                                   format = "%m/%d/%Y %I:%M:%S %p"),
                            "%Y")

animated_crime <- crime_clean %>%
  group_by(year, area_name) %>%
  summarize(total_crimes = n())
`summarise()` has regrouped the output.
ℹ Summaries were computed grouped by year and area_name.
ℹ Output is grouped by year.
ℹ Use `summarise(.groups = "drop_last")` to silence this message.
ℹ Use `summarise(.by = c(year, area_name))` for per-operation grouping
  (`?dplyr::dplyr_by`) instead.

Creating Crime Animation by Year

animated_plot <- ggplot(animated_crime,
                        aes(x = reorder(area_name, total_crimes),
                            y = total_crimes,
                            fill = area_name)) +
  geom_col() +
  coord_flip() +
  theme_classic() +
  labs(
    title = "Crime Count by Area: {closest_state}",
    x = "Area",
    y = "Number of Crimes"
  ) +
  transition_states(year)
animate(animated_plot)

This animation shows how crime counts by area changed across different years. It helps compare yearly changes in crime reports.

Closing Essay

The Los Angeles Police Department Crime Data from 2020 to Present dataset contains reported crime incidents across the City of Los Angeles starting in 2020. The dataset includes variables related to crime type, victim demographics, geographic location, and timce of occurrence. According to the Los Angeles Open Data Portal(https://catalog.data.gov/dataset/crime-data-from-2020-to-present), the dataset was created to comply with the FBI’s National Incident-Based Reporting System (NIBRS). The data was transcribed from original police crime reports, meaning some inaccuracies may exist due to manual data entry. Missing geographic coordinates are sometimes recorded as (0,0), and addresses are limited to nearby block locations to protect privacy.

I wanted to convert the scatterplot into a 12-hour time format, but I was unable to fully implement it correctly. Because of that, I kept the graph in the default 24-hour format.

Works Cited

City of Los Angeles. “Crime Data from 2020 to Present.” Data.gov, https://catalog.data.gov/dataset/crime-data-from-2020-to-present.

AI Use Attribution Statement

Field Value
Title Project 3
Creator Deepsagar Devkota
Context DATA 110
Document Type Student assignment
AI Permission AI-NO
AI Categories None

AI Tools Used

  • claude.ai — 2026-05-07 — Debugging
  • claude.ai — 2026-05-07 — Creating Animation

AI Prompt

For Debugging - Fix this code for Map

For Animation - Create me a Animation Graph showing the crime animation year

Human Role

To debug my map, I wrote the code and send to Claude, prompting for debug.

To create the animation, I got the code after the prompt and modified the animation, adding theme.

Notes

I used A.I used it to debug my Map and to create an Animation. I tried searching for some animation cheat codes, but had no luck, so I used AI for animation.


Generated with AI Attribution Generator