Rows: 1104 Columns: 38
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): JURISDICTION
dbl (37): YEAR, POPULATION, MURDER, RAPE, ROBBERY, AGG. ASSAULT, B & E, LARC...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
What are the counties in the data set?
unique(crime_md_1$jurisdiction)
Warning: Unknown or uninitialised column: `jurisdiction`.
NULL
Name change from CAPS to lower case
names(crime_md_1) <-tolower(names(crime_md_1))
Chose specific columns in order to convert from wide to long format
# A tibble: 7,728 × 5
# Groups: jurisdiction, year [1,104]
jurisdiction year population crimetype crimecount
<chr> <dbl> <dbl> <chr> <dbl>
1 Allegany County 1975 79655 murder 3
2 Allegany County 1975 79655 rape 5
3 Allegany County 1975 79655 robbery 20
4 Allegany County 1975 79655 agg. assault 114
5 Allegany County 1975 79655 b & e 669
6 Allegany County 1975 79655 larceny theft 1425
7 Allegany County 1975 79655 m/v theft 93
8 Allegany County 1976 83923 murder 2
9 Allegany County 1976 83923 rape 2
10 Allegany County 1976 83923 robbery 24
# ℹ 7,718 more rows
I want to explore 2 specific crimes
#Larceny is the most common crime but we will look at murder and rapetwo_crim <- crime_md_long|>filter(crimetype %in%c("murder", "rape"))two_crim
# A tibble: 2,208 × 5
# Groups: jurisdiction, year [1,104]
jurisdiction year population crimetype crimecount
<chr> <dbl> <dbl> <chr> <dbl>
1 Allegany County 1975 79655 murder 3
2 Allegany County 1975 79655 rape 5
3 Allegany County 1976 83923 murder 2
4 Allegany County 1976 83923 rape 2
5 Allegany County 1977 82102 murder 3
6 Allegany County 1977 82102 rape 7
7 Allegany County 1978 79966 murder 1
8 Allegany County 1978 79966 rape 2
9 Allegany County 1979 79721 murder 1
10 Allegany County 1979 79721 rape 7
# ℹ 2,198 more rows
sum(is.na((two_crim)))
[1] 0
Graph the the number of cases of rape and murder in MD
plot1 <- two_crim |>ggplot() +geom_bar(aes(x= year, y= crimecount, fill= crimetype),position ="dodge", stat ="identity") +labs(fill ="Crime Type",y="Number of Incidents", title ="Incidences of Rape and Murder in MD Counties Between 1975-2020",caption ="Source: opendata.maryland.gov")plot1
Top 5 counties
counties_md_2 <- two_crim |>group_by(jurisdiction)|>summarize(sum =sum(crimecount)) |>slice_max(order_by = sum, n=5) #operates on a grouped table, and returns the largest observations in each group.counties_md_2
# A tibble: 5 × 2
jurisdiction sum
<chr> <dbl>
1 Baltimore City 31973
2 Prince George's County 18811
3 Baltimore County 11357
4 Montgomery County 9105
5 Anne Arundel County 5831
`summarise()` has grouped output by 'jurisdiction'. You can override using the
`.groups` argument.
top_five
# A tibble: 230 × 3
# Groups: jurisdiction [5]
jurisdiction year sum
<chr> <dbl> <dbl>
1 Anne Arundel County 1975 90
2 Anne Arundel County 1976 87
3 Anne Arundel County 1977 134
4 Anne Arundel County 1978 85
5 Anne Arundel County 1979 97
6 Anne Arundel County 1980 137
7 Anne Arundel County 1981 108
8 Anne Arundel County 1982 102
9 Anne Arundel County 1983 84
10 Anne Arundel County 1984 113
# ℹ 220 more rows
twoCrim_alluv <- top_five |>ggplot(aes(x = year, y = sum, alluvium = jurisdiction)) +geom_alluvium(aes(fill = jurisdiction), color ="white", width = .1, alpha = .7, decreasing =FALSE) +scale_x_continuous(lim =c(1975, 2020)) +labs(title ="Incidences of Rape and Murder in MD Counties Between 1975-2020", y ="Number of Incidences", fill ="crimetype", caption ="Source: opendata.maryland.gov")+theme_dark()+scale_fill_viridis_d(option="turbo")twoCrim_alluv
Explore the relationship between crimes
ggpairs(new_crim_md, columns =4:10)
Let’s add the locations
library(tidygeocoder)# Tried geocoder... kind of worked, but it was mistaking a couple counties for counites out of MD#data_with_location <- crime_md_long |>#geocode(jurisdiction, method = 'osm',lat = latitude , long = longitude)|>#filter(year== 2020)|>#filter(crimetype == "agg. assault")data_with_location<- crime_md_long|>filter(year==2020)|>filter(crimetype %in%c("murder", "rape", "agg. assault"))|>summarize(total_violent_crimes=sum(crimecount))
`summarise()` has grouped output by 'jurisdiction'. You can override using the
`.groups` argument.
data_with_location
# A tibble: 24 × 3
# Groups: jurisdiction [24]
jurisdiction year total_violent_crimes
<chr> <dbl> <dbl>
1 Allegany County 2020 189
2 Anne Arundel County 2020 1460
3 Baltimore City 2020 6039
4 Baltimore County 2020 2947
5 Calvert County 2020 127
6 Caroline County 2020 56
7 Carroll County 2020 168
8 Cecil County 2020 235
9 Charles County 2020 444
10 Dorchester County 2020 172
# ℹ 14 more rows
Assuming "long" and "lat" are longitude and latitude, respectively
Essay
My data set is a collection of Violent Crime & Property Crime by County from 1975 to the Present provided by Maryland Statistical Analysis Center (MSAC), within the Governor’s Office of Crime Control and Prevention (GOCCP). My data has 1104 observations and 38 variables. Before going forward I had to convert the data from wide to long format. In the end, the data contained 7728 observations and 5 variables.
Out of the 7 crime types, I chose murder and rape to be visualized in my first visualization. I wanted to focus on murder and rape because they are topics that need to be emphasized especially since these are crimes are taken place in our home state MD.
In the second visual, the alluvial depicts the top 5 counties that had the highest occurrence of murder and rape. Surprisingly, Montgomery County was on the list.
In the correlation table, I was curious about the relationship each crime had with each other based on the number of occurrences. I was surprised by how straight of a line breaking and entering and larceny theft was with a correlation of .956.
Lastly, I wanted to utilize (GIS) mapping to visualize the total number of violent crimes in each county. Violent crimes pertain to murder, rape, and aggravated assault. Unfortunately my data did not contain latitude and longitude values. I made attempts to see if there is an easier method in obtaining the latitude and longitude values for each county. Geocode was one of them, however some of the values were incorrect as they were outside of MD. In the end, I had to manually search and input the values to my data set.
It was a fun project in which I thoroughy enjoyed!