How does the crime rate change over time in Maryland? I will be choosing the five most populated counties in Maryland which are Montgomery County, Howard County, Prince George’s County, Anne Arundel County and Baltimore County.This data set tracks the amount of crime in Maryland more specifically in each county, this data set covers violent crime, such as rape, murder etc. and property crime such as m/v theft, b&e etc. This data set covers the crimes from 1975 to 2020. The data set also tracks the population, as well as the change in percentage per crime, as well as the crime per 100,000 people. To answer this question this data set to find the crime rate I will use grand_total, which is the total of crimes counted all together and we will divide it by the population and multiply it by 100,000 to find the rate of crime. I will be looking at how the crime rate changes over the 47 years of data this data set covers.
First, set the working directory and get the data set. After I use gsub to clean up the data set and get rid of any spaces, commas, etc. Look at the structure of the data set and dimensions. After, use filter to make sure the only counties we are looking at are Montgomery County, Prince George’s, Howard, Anne Arundel, and Baltimore county. Then use the mutate function to create the crime rate, by taking the grand total of crime dividing it by the population and multiplying by 100,000. Using the select function, we can make the data set only show, jurisdiction, year, population, and crime rate this helps with the visualization. Later check for any n/a values, and then visualize the data using a line graph.
# load the libraries
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.5.2
## Warning: package 'ggplot2' was built under R version 4.5.2
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 4.0.2 ✔ tibble 3.3.0
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.1.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
setwd("C:/Users/rjzavaleta/Downloads/Data 101")
crime_county <- read_csv("Violent_Crime___Property_Crime_by_County__1975_to_Present.csv")
# cleaning
names(crime_county) <- tolower(names(crime_county))
names(crime_county) <- gsub(" ","_",names(crime_county))
names(crime_county) <- gsub("[(). //-]", "_", names(crime_county))
head(crime_county)
## # A tibble: 6 × 38
## jurisdiction year population murder rape robbery agg__assault `b_&_e`
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Allegany County 1975 79655 3 5 20 114 669
## 2 Allegany County 1976 83923 2 2 24 59 581
## 3 Allegany County 1977 82102 3 7 32 85 592
## 4 Allegany County 1978 79966 1 2 18 81 539
## 5 Allegany County 1979 79721 1 7 18 84 502
## 6 Allegany County 1980 80461 2 12 26 79 541
## # ℹ 30 more variables: larceny_theft <dbl>, m_v_theft <dbl>, grand_total <dbl>,
## # percent_change <dbl>, violent_crime_total <dbl>,
## # violent_crime_percent <dbl>, violent_crime_percent_change <dbl>,
## # property_crime_totals <dbl>, property_crime_percent <dbl>,
## # property_crime_percent_change <dbl>,
## # `overall_crime_rate_per_100,000_people` <dbl>,
## # `overall_percent_change_per_100,000_people` <dbl>, …
# check the dataset
str(crime_county)
## spc_tbl_ [1,104 × 38] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ jurisdiction : chr [1:1104] "Allegany County" "Allegany County" "Allegany County" "Allegany County" ...
## $ year : num [1:1104] 1975 1976 1977 1978 1979 ...
## $ population : num [1:1104] 79655 83923 82102 79966 79721 ...
## $ murder : num [1:1104] 3 2 3 1 1 2 11 1 5 2 ...
## $ rape : num [1:1104] 5 2 7 2 7 12 13 18 9 15 ...
## $ robbery : num [1:1104] 20 24 32 18 18 26 24 18 19 6 ...
## $ agg__assault : num [1:1104] 114 59 85 81 84 79 101 80 89 67 ...
## $ b_&_e : num [1:1104] 669 581 592 539 502 541 539 447 347 361 ...
## $ larceny_theft : num [1:1104] 1425 1384 1390 1390 1611 ...
## $ m_v_theft : num [1:1104] 93 73 102 100 99 108 88 55 67 68 ...
## $ grand_total : num [1:1104] 2329 2125 2211 2131 2322 ...
## $ percent_change : num [1:1104] NA -8.8 4 -3.6 9 6.5 0 -11.5 -11 -4.7 ...
## $ violent_crime_total : num [1:1104] 142 87 127 102 110 119 149 117 122 90 ...
## $ violent_crime_percent : num [1:1104] 6.1 4.1 5.7 4.8 4.7 4.8 6 5.3 6.3 4.8 ...
## $ violent_crime_percent_change : num [1:1104] NA -38.7 46 -19.7 7.8 8.2 25.2 -21.5 4.3 -26.2 ...
## $ property_crime_totals : num [1:1104] 2187 2038 2084 2029 2212 ...
## $ property_crime_percent : num [1:1104] 93.9 95.9 94.3 95.2 95.3 95.2 94 94.7 93.7 95.2 ...
## $ property_crime_percent_change : num [1:1104] NA -6.8 2.3 -2.6 9 6.5 -1.3 -10.8 -11.9 -3.2 ...
## $ overall_crime_rate_per_100,000_people : num [1:1104] 2924 2532 2693 2665 2913 ...
## $ overall_percent_change_per_100,000_people : num [1:1104] NA -13.4 6.4 -1 9.3 5.6 -1.7 -11.6 -11.8 -2.6 ...
## $ violent_crime_rate_per_100,000_people : num [1:1104] 178 104 155 128 138 ...
## $ violent_crime_rate_percent_change_per_100,000_people : num [1:1104] NA -41.8 49.2 -17.5 8.2 7.2 23.2 -21.6 3.3 -24.6 ...
## $ property_crime_rate_per_100,000_people : num [1:1104] 2746 2428 2538 2537 2775 ...
## $ property_crime_rate_percent_change_per_100,000_people: num [1:1104] NA -11.6 4.5 0 9.4 5.5 -2.9 -10.9 -12.7 -1.1 ...
## $ murder_per_100,000_people : num [1:1104] 3.8 2.4 3.7 1.3 1.3 2.5 13.5 1.2 6.1 2.5 ...
## $ rape_per_100,000_people : num [1:1104] 6.3 2.4 8.5 2.5 8.8 14.9 15.9 22 10.9 18.6 ...
## $ robbery_per_100,000_people : num [1:1104] 25.1 28.6 39 22.5 22.6 32.3 29.3 22 23 7.4 ...
## $ agg__assault_per_100,000_people : num [1:1104] 143.1 70.3 103.5 101.3 105.4 ...
## $ b_&_e_per_100,000_people : num [1:1104] 840 692 721 674 630 ...
## $ larceny_theft_per_100,000_people : num [1:1104] 1789 1649 1693 1738 2021 ...
## $ m_v_theft_per_100,000_people : num [1:1104] 117 87 124 125 124 ...
## $ murder__rate_percent_change_per_100,000_people : num [1:1104] NA -36.7 53.3 -65.8 0.3 ...
## $ rape_rate_percent_change_per_100,000_people : num [1:1104] NA -62 257.8 -70.7 251.1 ...
## $ robbery_rate_percent_change_per_100,000_people : num [1:1104] NA 13.9 36.3 -42.2 0.3 43.1 -9.2 -25.1 4.6 -67.7 ...
## $ agg__assault__rate_percent_change_per_100,000_people : num [1:1104] NA -50.9 47.3 -2.2 4 -6.8 25.8 -20.9 10.2 -23.1 ...
## $ b_&_e_rate_percent_change_per_100,000_people : num [1:1104] NA -17.6 4.2 -6.5 -6.6 6.8 -2 -17.1 -23.1 6.3 ...
## $ larceny_theft__rate_percent_change_per_100,000_people: num [1:1104] NA -7.8 2.7 2.7 16.3 4.9 -2.1 -7.6 -10.9 -3.2 ...
## $ m_v_theft__rate_percent_change_per_100,000_people : num [1:1104] NA -25.5 42.8 0.7 -0.7 8.1 -19.8 -37.6 20.7 3.7 ...
## - attr(*, "spec")=
## .. cols(
## .. JURISDICTION = col_character(),
## .. YEAR = col_double(),
## .. POPULATION = col_double(),
## .. MURDER = col_double(),
## .. RAPE = col_double(),
## .. ROBBERY = col_double(),
## .. `AGG. ASSAULT` = col_double(),
## .. `B & E` = col_double(),
## .. `LARCENY THEFT` = col_double(),
## .. `M/V THEFT` = col_double(),
## .. `GRAND TOTAL` = col_double(),
## .. `PERCENT CHANGE` = col_double(),
## .. `VIOLENT CRIME TOTAL` = col_double(),
## .. `VIOLENT CRIME PERCENT` = col_double(),
## .. `VIOLENT CRIME PERCENT CHANGE` = col_double(),
## .. `PROPERTY CRIME TOTALS` = col_double(),
## .. `PROPERTY CRIME PERCENT` = col_double(),
## .. `PROPERTY CRIME PERCENT CHANGE` = col_double(),
## .. `OVERALL CRIME RATE PER 100,000 PEOPLE` = col_double(),
## .. `OVERALL PERCENT CHANGE PER 100,000 PEOPLE` = col_double(),
## .. `VIOLENT CRIME RATE PER 100,000 PEOPLE` = col_double(),
## .. `VIOLENT CRIME RATE PERCENT CHANGE PER 100,000 PEOPLE` = col_double(),
## .. `PROPERTY CRIME RATE PER 100,000 PEOPLE` = col_double(),
## .. `PROPERTY CRIME RATE PERCENT CHANGE PER 100,000 PEOPLE` = col_double(),
## .. `MURDER PER 100,000 PEOPLE` = col_double(),
## .. `RAPE PER 100,000 PEOPLE` = col_double(),
## .. `ROBBERY PER 100,000 PEOPLE` = col_double(),
## .. `AGG. ASSAULT PER 100,000 PEOPLE` = col_double(),
## .. `B & E PER 100,000 PEOPLE` = col_double(),
## .. `LARCENY THEFT PER 100,000 PEOPLE` = col_double(),
## .. `M/V THEFT PER 100,000 PEOPLE` = col_double(),
## .. `MURDER RATE PERCENT CHANGE PER 100,000 PEOPLE` = col_double(),
## .. `RAPE RATE PERCENT CHANGE PER 100,000 PEOPLE` = col_double(),
## .. `ROBBERY RATE PERCENT CHANGE PER 100,000 PEOPLE` = col_double(),
## .. `AGG. ASSAULT RATE PERCENT CHANGE PER 100,000 PEOPLE` = col_double(),
## .. `B & E RATE PERCENT CHANGE PER 100,000 PEOPLE` = col_double(),
## .. `LARCENY THEFT RATE PERCENT CHANGE PER 100,000 PEOPLE` = col_double(),
## .. `M/V THEFT RATE PERCENT CHANGE PER 100,000 PEOPLE` = col_double()
## .. )
## - attr(*, "problems")=<externalptr>
dim(crime_county)
## [1] 1104 38
top_five <- crime_county |>
filter(jurisdiction %in% c("Howard County","Montgomery County", "Prince George's County","Baltimore County","Anne Arundel County")) |> # select only the five counties we are looking at
mutate(crime_population_rate = (grand_total/population)*100000) |> #create a variable that shows crime rate
select(jurisdiction,year,population, crime_population_rate) # shows only jurisdiction, year, population, and crime rate
head(top_five)
## # A tibble: 6 × 4
## jurisdiction year population crime_population_rate
## <chr> <dbl> <dbl> <dbl>
## 1 Anne Arundel County 1975 331390 6760.
## 2 Anne Arundel County 1976 340345 5507.
## 3 Anne Arundel County 1977 347538 5322.
## 4 Anne Arundel County 1978 363169 4714.
## 5 Anne Arundel County 1979 361749 4825.
## 6 Anne Arundel County 1980 370099 5489.
colSums(is.na(top_five)) # look for any n/a values
## jurisdiction year population
## 0 0 0
## crime_population_rate
## 0
options(scipen = 999)
top_five |>
ggplot(aes(x= year, y = crime_population_rate, fill = jurisdiction, colour = jurisdiction)) + geom_point() +geom_line() + scale_color_brewer(palette = "Set1") + labs(title = "Crime Rate Per 100,000 in Maryland (MOCO, PG, HOWARD, AA, BALT)")
Based on the graph we can clearly see that there has been a downward trend in crime rate in all five counties, in the beginning and all throughout the graph Prince George’s county had the highest crime rate, but at the end Baltimore county becomes the county with the highest crime rate per 100,000. Howard County and Montgomery county are usually have the lowest crime rates throughout the graph as well as Anne Arundel county usually in the middle of all five counties. For all five counties there is a big spike in crime around 1980, and that maybe could be researched even further as to what happened during this time period. However the bigger thing to research is that how the five counties were able to get the crime rate down, whether it was more law enforcement, safety rules and laws. Or it was from something different entirely, I believe that is the most important thing to research after seeing this graph.
To find the populations of the counties: https://worldpopulationreview.com/us-counties/maryland
To find the formula of crime rate: https://www.criminaljustice.ny.gov/crimnet/ojsa/countycrimestats.htm
Visiualization was used from past knowledge in DATA-110