JLiang_Project_2

Load packages and data

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(alluvial) 
library(ggalluvial)
library(GGally)
Registered S3 method overwritten by 'GGally':
  method from   
  +.gg   ggplot2
library(leaflet)
library(sf)
Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE
library(knitr)

crime_md_1 <- read_csv("Violent_Crime___Property_Crime_by_County__1975_to_Present_20241113.csv")
Rows: 1104 Columns: 38
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (1): JURISDICTION
dbl (37): YEAR, POPULATION, MURDER, RAPE, ROBBERY, AGG. ASSAULT, B & E, LARC...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

What are the counties in the data set?

unique(crime_md_1$jurisdiction)
Warning: Unknown or uninitialised column: `jurisdiction`.
NULL

Name change from CAPS to lower case

names(crime_md_1) <- tolower(names(crime_md_1))

Chose specific columns in order to convert from wide to long format

new_crim_md <- crime_md_1 |>
  select("jurisdiction", "year", "population", "murder", "rape", "robbery", "agg. assault", "b & e", "larceny theft", "m/v theft")|>
  group_by(jurisdiction, year)
head(new_crim_md)
# A tibble: 6 × 10
# Groups:   jurisdiction, year [6]
  jurisdiction     year population murder  rape robbery `agg. assault` `b & e`
  <chr>           <dbl>      <dbl>  <dbl> <dbl>   <dbl>          <dbl>   <dbl>
1 Allegany County  1975      79655      3     5      20            114     669
2 Allegany County  1976      83923      2     2      24             59     581
3 Allegany County  1977      82102      3     7      32             85     592
4 Allegany County  1978      79966      1     2      18             81     539
5 Allegany County  1979      79721      1     7      18             84     502
6 Allegany County  1980      80461      2    12      26             79     541
# ℹ 2 more variables: `larceny theft` <dbl>, `m/v theft` <dbl>

Convert from wide to long

crime_md_long <- new_crim_md |>
  pivot_longer(
    cols = 4:10,
    names_to = "crimetype",
    values_to = "crimecount")
crime_md_long
# A tibble: 7,728 × 5
# Groups:   jurisdiction, year [1,104]
   jurisdiction     year population crimetype     crimecount
   <chr>           <dbl>      <dbl> <chr>              <dbl>
 1 Allegany County  1975      79655 murder                 3
 2 Allegany County  1975      79655 rape                   5
 3 Allegany County  1975      79655 robbery               20
 4 Allegany County  1975      79655 agg. assault         114
 5 Allegany County  1975      79655 b & e                669
 6 Allegany County  1975      79655 larceny theft       1425
 7 Allegany County  1975      79655 m/v theft             93
 8 Allegany County  1976      83923 murder                 2
 9 Allegany County  1976      83923 rape                   2
10 Allegany County  1976      83923 robbery               24
# ℹ 7,718 more rows

I want to explore 2 specific crimes

#Larceny is the most common crime but we will look at murder and rape

two_crim <- crime_md_long|>
  filter(crimetype %in% c("murder", "rape"))
two_crim
# A tibble: 2,208 × 5
# Groups:   jurisdiction, year [1,104]
   jurisdiction     year population crimetype crimecount
   <chr>           <dbl>      <dbl> <chr>          <dbl>
 1 Allegany County  1975      79655 murder             3
 2 Allegany County  1975      79655 rape               5
 3 Allegany County  1976      83923 murder             2
 4 Allegany County  1976      83923 rape               2
 5 Allegany County  1977      82102 murder             3
 6 Allegany County  1977      82102 rape               7
 7 Allegany County  1978      79966 murder             1
 8 Allegany County  1978      79966 rape               2
 9 Allegany County  1979      79721 murder             1
10 Allegany County  1979      79721 rape               7
# ℹ 2,198 more rows
sum(is.na((two_crim))) 
[1] 0

Graph the the number of cases of rape and murder in MD

plot1 <- two_crim |>
ggplot() +
geom_bar(aes(x= year, y= crimecount, fill= crimetype),
         position = "dodge", stat = "identity") +
  labs(fill = "Crime Type",
      y= "Number of Incidents", 
      title = "Incidences of Rape and Murder in MD Counties Between 1975-2020",
      caption = "Source: opendata.maryland.gov")
  
plot1

Top 5 counties

counties_md_2 <- two_crim |>
  group_by(jurisdiction)|>
  summarize(sum = sum(crimecount)) |>
  slice_max(order_by = sum, n=5) #operates on a grouped table, and returns the largest observations in each group.
counties_md_2
# A tibble: 5 × 2
  jurisdiction             sum
  <chr>                  <dbl>
1 Baltimore City         31973
2 Prince George's County 18811
3 Baltimore County       11357
4 Montgomery County       9105
5 Anne Arundel County     5831

Visualize the top 5 as an Alluvial

top_five <- two_crim |>
  filter(jurisdiction %in% c("Baltimore City", "Prince George's County", "Baltimore County", "Montgomery County", "Anne Arundel County"))|>
  group_by(jurisdiction, year)|>
  summarize(sum = sum(crimecount))
`summarise()` has grouped output by 'jurisdiction'. You can override using the
`.groups` argument.
top_five
# A tibble: 230 × 3
# Groups:   jurisdiction [5]
   jurisdiction         year   sum
   <chr>               <dbl> <dbl>
 1 Anne Arundel County  1975    90
 2 Anne Arundel County  1976    87
 3 Anne Arundel County  1977   134
 4 Anne Arundel County  1978    85
 5 Anne Arundel County  1979    97
 6 Anne Arundel County  1980   137
 7 Anne Arundel County  1981   108
 8 Anne Arundel County  1982   102
 9 Anne Arundel County  1983    84
10 Anne Arundel County  1984   113
# ℹ 220 more rows
twoCrim_alluv <- top_five |> 
  ggplot(aes(x = year, y = sum, alluvium = jurisdiction)) + 
geom_alluvium(aes(fill = jurisdiction), color = "white", width = .1, alpha = .7, decreasing = FALSE) + 
  scale_x_continuous(lim = c(1975, 2020)) + 
  labs(title = "Incidences of Rape and Murder in MD Counties Between 1975-2020", 
y = "Number of Incidences", 
fill = "crimetype", 
caption = "Source: opendata.maryland.gov")+
  theme_dark()+scale_fill_viridis_d(option="turbo")

twoCrim_alluv

Explore the relationship between crimes

ggpairs(new_crim_md, columns = 4:10)

Let’s add the locations

library(tidygeocoder)
# Tried geocoder... kind of worked, but it was mistaking a couple counties for counites out of MD

#data_with_location <- crime_md_long |>
  #geocode(jurisdiction, method = 'osm',lat = latitude , long = longitude)|>
  #filter(year== 2020)|>
  #filter(crimetype == "agg. assault")


data_with_location<- crime_md_long|>
  filter(year== 2020)|>
  filter(crimetype %in% c("murder", "rape", "agg. assault"))|>
           summarize(total_violent_crimes= sum(crimecount))
`summarise()` has grouped output by 'jurisdiction'. You can override using the
`.groups` argument.
data_with_location
# A tibble: 24 × 3
# Groups:   jurisdiction [24]
   jurisdiction         year total_violent_crimes
   <chr>               <dbl>                <dbl>
 1 Allegany County      2020                  189
 2 Anne Arundel County  2020                 1460
 3 Baltimore City       2020                 6039
 4 Baltimore County     2020                 2947
 5 Calvert County       2020                  127
 6 Caroline County      2020                   56
 7 Carroll County       2020                  168
 8 Cecil County         2020                  235
 9 Charles County       2020                  444
10 Dorchester County    2020                  172
# ℹ 14 more rows
lat = c(39.630,38.994,39.290,39.301, 38.535,38.872,39.575,39.574, 38.474, 38.422, 39.472, 39.529, 39.536, 39.215, 39.230, 39.140, 38.830, 39.030, 38.080,  38.216, 38.780, 39.613,  38.394, 38.230)
data_with_location$lat <- lat
long = c(-78.6900, -76.568, -76.612, -76.611,  -76.5301,-75.832, -76.996, 75.946, -77.014, -76.083, -77.398, -79.273, -76.299, -76.886, -76.100, -77.200, -76.8500, -76.080, -75.853,  -76.529,- 76.1320, -77.699, -75.667,-75.2800)
data_with_location$long <- long
data_with_location
# A tibble: 24 × 5
# Groups:   jurisdiction [24]
   jurisdiction         year total_violent_crimes   lat  long
   <chr>               <dbl>                <dbl> <dbl> <dbl>
 1 Allegany County      2020                  189  39.6 -78.7
 2 Anne Arundel County  2020                 1460  39.0 -76.6
 3 Baltimore City       2020                 6039  39.3 -76.6
 4 Baltimore County     2020                 2947  39.3 -76.6
 5 Calvert County       2020                  127  38.5 -76.5
 6 Caroline County      2020                   56  38.9 -75.8
 7 Carroll County       2020                  168  39.6 -77.0
 8 Cecil County         2020                  235  39.6  75.9
 9 Charles County       2020                  444  38.5 -77.0
10 Dorchester County    2020                  172  38.4 -76.1
# ℹ 14 more rows

Map time

popup <- paste0(
      "<b>Year: </b>",  data_with_location$year, "<br>",
      "<b>County: </b>",  data_with_location$`jurisdiction`, "<br>",
      "<b>Incidences: </b>",  data_with_location$total_violent_crimes, "<br>" )
leaflet() |>
  setView(lng = -76.61, lat = 39.30, zoom =8) |> #MD
addProviderTiles("Esri.NatGeoWorldMap")|>
addCircles(
    data = data_with_location,
    radius = data_with_location$total_violent_crimes*2,
    color = "#619CFF",
    fillColor = "#F8766D",
    fillOpacity = .8,
    popup = popup)
Assuming "long" and "lat" are longitude and latitude, respectively

Essay

My data set is a collection of Violent Crime & Property Crime by County from 1975 to the Present provided by Maryland Statistical Analysis Center (MSAC), within the Governor’s Office of Crime Control and Prevention (GOCCP). My data has 1104 observations and 38 variables. Before going forward I had to convert the data from wide to long format. In the end, the data contained 7728 observations and 5 variables.

Out of the 7 crime types, I chose murder and rape to be visualized in my first visualization. I wanted to focus on murder and rape because they are topics that need to be emphasized especially since these are crimes are taken place in our home state MD.

In the second visual, the alluvial depicts the top 5 counties that had the highest occurrence of murder and rape. Surprisingly, Montgomery County was on the list.

In the correlation table, I was curious about the relationship each crime had with each other based on the number of occurrences. I was surprised by how straight of a line breaking and entering and larceny theft was with a correlation of .956.

Lastly, I wanted to utilize (GIS) mapping to visualize the total number of violent crimes in each county. Violent crimes pertain to murder, rape, and aggravated assault. Unfortunately my data did not contain latitude and longitude values. I made attempts to see if there is an easier method in obtaining the latitude and longitude values for each county. Geocode was one of them, however some of the values were incorrect as they were outside of MD. In the end, I had to manually search and input the values to my data set.

It was a fun project in which I thoroughy enjoyed!