In this dataset, it shows the counts of violent crime and property crime per municipality in the state of Maryland. This dataset is from opendata.maryland.gov. This dataset shows the population of each city and county in Maryland, as well as the counts of crime, such as murder, rape, robbery, b&e etc. For this project I will specifically be using property crime rates per 100,000 and violent crime rate per 100,000. I will specifically be looking at the counties Montgomery County and Prince George’s County because of the large amounts of data there is for other cities and counties. I will be looking at the linear regression of each and compare them between each other.
# load the librarieslibrary(tidyverse)
Warning: package 'tidyverse' was built under R version 4.5.2
Warning: package 'ggplot2' was built under R version 4.5.2
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 4.0.2 ✔ tibble 3.3.0
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.1.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(readr)library(ggthemes)
Warning: package 'ggthemes' was built under R version 4.5.2
library(ggrepel)
Warning: package 'ggrepel' was built under R version 4.5.2
# set working directorydataset <-read_csv("Violent_Crime_&_Property_Crime_by_Municipality__2000_to_Present_20260323.csv")
Warning: One or more parsing issues, call `problems()` on your data frame for details,
e.g.:
dat <- vroom(...)
problems(dat)
Rows: 4284 Columns: 32
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (10): JURISDICTION, COUNTY, PERCENT CHANGE, VIOLENT CRIME PERCENT, VIOLE...
dbl (4): YEAR, MURDER, RAPE, RAPE PER 100,000 PEOPLE
num (18): POPULATION, ROBBERY, AGG. ASSAULT, B & E, LARCENY THEFT, M/V THEFT...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# first plotp1 <-ggplot(dataset_2020, aes(x =`property_crime_rate_per_100000_people`, y =`violent_crime_rate_per_100000_people`)) +labs(title ="Property Crime Vs. Violent Crime In Maryland \n Per 100,000",caption ="Source: opendata.maryland.gov",x ="Property Crime rates in Maryland per 100,000 (2020)", y ="Violent Crime rates in Maryland per 100,000 (2020)") +theme_minimal(base_size =12) +geom_point(aes(`property_crime_rate_per_100000_people`, `violent_crime_rate_per_100000_people`, colour = county), size =1.5) +theme_tufte() p1
# add label of cityp2 <-ggplot(dataset_2020, aes(x =`property_crime_rate_per_100000_people`, y =`violent_crime_rate_per_100000_people`, label = county)) +labs(title ="Property Crime Vs. Violent Crime In Maryland \n Per 100,000",caption ="Source: opendata.maryland.gov",x ="Property Crime rates in Maryland per 100,000 (2020)", y ="Violent Crime rates in Maryland per 100,000 (2020)") +theme_minimal(base_size =12) +geom_point(aes(`property_crime_rate_per_100000_people`, `violent_crime_rate_per_100000_people`, colour = county), size =1.5) +geom_text_repel(aes(label = jurisdiction,), nudge_x =0.5,size=1.8) +theme_tufte()p2
Warning: ggrepel: 114 unlabeled data points (too many overlaps). Consider
increasing max.overlaps
# focus on only Montgomery County and PG Countymoco_pg <- dataset_2020 |>filter(county %in%c("Montgomery", "Prince George's"))head(moco_pg)
# Similar graph that shows moco and pgp6 <-ggplot(moco_pg, aes(x =`property_crime_rate_per_100000_people`, y =`violent_crime_rate_per_100000_people`, label = jurisdiction)) +labs(title ="Property Crime Vs. Violent Crime In Maryland \n Per 100,000 (MOCO and PG)",caption ="Source: opendata.maryland.gov",x ="Property Crime rates in Maryland per 100,000 (2020)", y ="Violent Crime rates in Maryland per 100,000 (2020)") +theme_minimal(base_size =12) +geom_point(aes(`property_crime_rate_per_100000_people`, `violent_crime_rate_per_100000_people`, colour = county), size =2) +geom_text_repel(aes(label = jurisdiction,), nudge_x =0.5,size=2) +theme_tufte() +scale_color_brewer(palette ="Oranges")p6
Warning: ggrepel: 4 unlabeled data points (too many overlaps). Consider
increasing max.overlaps
# linear modelp7 <- p6 +geom_smooth(method='lm',formula=y~x)p7
Warning: The following aesthetics were dropped during statistical transformation: label.
ℹ This can happen when ggplot fails to infer the correct grouping structure in
the data.
ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
variable into a factor?
Warning: ggrepel: 4 unlabeled data points (too many overlaps). Consider
increasing max.overlaps
# linear regressionfit5 <-lm( violent_crime_rate_per_100000_people ~ property_crime_rate_per_100000_people, data = moco_pg) #lm(y ~ x)summary(fit5)
Call:
lm(formula = violent_crime_rate_per_100000_people ~ property_crime_rate_per_100000_people,
data = moco_pg)
Residuals:
Min 1Q Median 3Q Max
-481.22 -128.20 -25.07 63.13 526.85
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 47.34135 49.99854 0.947 0.349
property_crime_rate_per_100000_people 0.12119 0.02164 5.600 1.39e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 196.2 on 43 degrees of freedom
Multiple R-squared: 0.4217, Adjusted R-squared: 0.4083
F-statistic: 31.36 on 1 and 43 DF, p-value: 1.393e-06
I was able to clean this dataset up by using gsub and tolower by taking away all the capitalized words, periods, commas, and replacing it with underscores. I also used filter to filter out the rest of the years to only show 2020, as well as the counties only being Montgomery County and Prince George’s county. The visualization is able to show how mostly Prince Georges’ county has more crime than Montgomery County, with the plots being a lot higher up as well as being a different color. The linear model was able to show that there is significance between violent crime per 100,000 and property crime per 100,000, which means that if one increases so does the other, and if one decreases so does the other.