This analysis applies the data transformation and visualization techniques from Chapters 2 and 3 of Basic Statistics Using R for Crime Analysis (Choi, 2025) to the 2018 Pennsylvania Uniform Crime Report (UCR) dataset.
The UCR is compiled by the FBI from reports submitted by more than 18,000 police departments across the United States. This dataset covers Part 1 offenses — the most serious index crimes — reported by Pennsylvania municipalities in 2018.
Part 1 Violent Crimes include:
Part 1 Property Crimes include:
Note on data limitations: The UCR only counts crimes reported to the police. Unreported crime — the so-called “dark figure of crime” — is not captured here.
Following the approach in Chapter 3, we install and load the
readxl package to import the Excel file, and
tidyverse for data manipulation and visualization.
# Install packages if not already installed
# install.packages("readxl")
# install.packages("tidyverse")
# install.packages("rstudioapi") # used to resolve the working directory
library(readxl)
library(tidyverse)
library(rstudioapi)We use read_excel() from the readxl
package, consistent with Chapter 3’s approach to reading
.xlsx files. Note the X prefix convention: R
does not allow object names beginning with a number.
## # A tibble: 6 × 12
## City Population `Violent\r\ncrime` Murder and\r\nnonneg…¹ Rape Robbery
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Abington T… 55631 44 1 6 12
## 2 Adamstown 1857 3 0 0 0
## 3 Adams Town… 14105 3 0 0 0
## 4 Adams Town… 5581 0 0 0 0
## 5 Akron 4015 7 0 1 0
## 6 Albion 1466 0 0 0 0
## # ℹ abbreviated name: ¹`Murder and\r\nnonnegligent\r\nmanslaughter`
## # ℹ 6 more variables: `Aggravated\r\nassault` <dbl>, `Property\r\ncrime` <dbl>,
## # Burglary <dbl>, `Larceny-\r\ntheft` <dbl>,
## # `Motor\r\nvehicle\r\ntheft` <dbl>, Arson <dbl>
## [1] 989 12
The dataset has 989 municipalities and 12 variables. Each row represents a city or township, consistent with the UCR’s unit of analysis (unlike survey data where each row represents an individual respondent).
## City Population Violent\r\ncrime
## Length:989 Min. : 132 Min. : 0.00
## Class :character 1st Qu.: 2066 1st Qu.: 1.00
## Mode :character Median : 4320 Median : 5.00
## Mean : 10054 Mean : 34.16
## 3rd Qu.: 9088 3rd Qu.: 15.00
## Max. :1586916 Max. :14420.00
## Murder and\r\nnonnegligent\r\nmanslaughter Rape
## Min. : 0.0000 Min. : 0.000
## 1st Qu.: 0.0000 1st Qu.: 0.000
## Median : 0.0000 Median : 0.000
## Mean : 0.6977 Mean : 2.971
## 3rd Qu.: 0.0000 3rd Qu.: 1.000
## Max. :351.0000 Max. :1095.000
## Robbery Aggravated\r\nassault Property\r\ncrime Burglary
## Min. : 0.000 Min. : 0.00 Min. : 0.0 Min. : 0.00
## 1st Qu.: 0.000 1st Qu.: 1.00 1st Qu.: 9.0 1st Qu.: 1.00
## Median : 0.000 Median : 4.00 Median : 40.0 Median : 5.00
## Mean : 9.449 Mean : 21.05 Mean : 164.6 Mean : 21.42
## 3rd Qu.: 2.000 3rd Qu.: 11.00 3rd Qu.: 105.0 3rd Qu.: 12.00
## Max. :5262.000 Max. :7712.00 Max. :49145.0 Max. :6497.00
## Larceny-\r\ntheft Motor\r\nvehicle\r\ntheft Arson
## Min. : 0.0 Min. : 0.00 Min. : 0.000
## 1st Qu.: 7.0 1st Qu.: 0.00 1st Qu.: 0.000
## Median : 32.0 Median : 1.00 Median : 0.000
## Mean : 131.3 Mean : 11.84 Mean : 1.147
## 3rd Qu.: 89.0 3rd Qu.: 4.00 3rd Qu.: 0.000
## Max. :36968.0 Max. :5680.00 Max. :430.000
Observation: The variable names contain embedded newline characters (
\n) from the Excel formatting. For example,Violent\ncrime. These must be renamed before analysis, as shown in Chapter 3.
Using rename_with() from dplyr (part of
tidyverse), we replace all twelve column names in one
operation. This approach avoids issues with backtick-escaped
rename() calls, since the exact newline escape sequence can
vary depending on the operating system and how Excel wrote the file.
X2018_UCR_PA_cleaned <- X2018_UCR_PA %>%
rename_with(~ c(
"City", "Population",
"violent.crime", "murder.manslaughter",
"rape", "robbery", "aggravated.assault",
"property.crime", "burglary", "larceny.theft",
"motor.theft", "arson"
))
# Confirm cleaned names
names(X2018_UCR_PA_cleaned)## [1] "City" "Population" "violent.crime"
## [4] "murder.manslaughter" "rape" "robbery"
## [7] "aggravated.assault" "property.crime" "burglary"
## [10] "larceny.theft" "motor.theft" "arson"
## City Population violent.crime murder.manslaughter
## Length:989 Min. : 132 Min. : 0.00 Min. : 0.0000
## Class :character 1st Qu.: 2066 1st Qu.: 1.00 1st Qu.: 0.0000
## Mode :character Median : 4320 Median : 5.00 Median : 0.0000
## Mean : 10054 Mean : 34.16 Mean : 0.6977
## 3rd Qu.: 9088 3rd Qu.: 15.00 3rd Qu.: 0.0000
## Max. :1586916 Max. :14420.00 Max. :351.0000
## rape robbery aggravated.assault property.crime
## Min. : 0.000 Min. : 0.000 Min. : 0.00 Min. : 0.0
## 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 1.00 1st Qu.: 9.0
## Median : 0.000 Median : 0.000 Median : 4.00 Median : 40.0
## Mean : 2.971 Mean : 9.449 Mean : 21.05 Mean : 164.6
## 3rd Qu.: 1.000 3rd Qu.: 2.000 3rd Qu.: 11.00 3rd Qu.: 105.0
## Max. :1095.000 Max. :5262.000 Max. :7712.00 Max. :49145.0
## burglary larceny.theft motor.theft arson
## Min. : 0.00 Min. : 0.0 Min. : 0.00 Min. : 0.000
## 1st Qu.: 1.00 1st Qu.: 7.0 1st Qu.: 0.00 1st Qu.: 0.000
## Median : 5.00 Median : 32.0 Median : 1.00 Median : 0.000
## Mean : 21.42 Mean : 131.3 Mean : 11.84 Mean : 1.147
## 3rd Qu.: 12.00 3rd Qu.: 89.0 3rd Qu.: 4.00 3rd Qu.: 0.000
## Max. :6497.00 Max. :36968.0 Max. :5680.00 Max. :430.000
All variables now display clean, dot-separated names and proper numeric summary statistics (minimum, quartiles, mean, median, maximum).
The crime rate is a standardized measure: the number of Part 1 offenses per 100,000 residents. This allows meaningful comparison across municipalities of very different sizes.
\[\text{Crime Rate} = \frac{\text{Violent Crime} + \text{Property Crime}}{\text{Population}} \times 100{,}000\]
Using the mutate() function from dplyr:
X2018_UCR_PA_cleaned <- X2018_UCR_PA_cleaned %>%
mutate(
crime.rate = ((violent.crime + property.crime) / Population) * 100000
)# View city and crime rate side by side
X2018_UCR_PA_cleaned %>%
select(City, Population, violent.crime, property.crime, crime.rate) %>%
head(10)## # A tibble: 10 × 5
## City Population violent.crime property.crime crime.rate
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Abington Township, Montgo… 55631 44 949 1785.
## 2 Adamstown 1857 3 14 915.
## 3 Adams Township, Butler Co… 14105 3 46 347.
## 4 Adams Township, Cambria C… 5581 0 11 197.
## 5 Akron 4015 7 34 1021.
## 6 Albion 1466 0 13 887.
## 7 Alburtis 2663 3 5 300.
## 8 Aldan 4157 3 100 2478.
## 9 Aleppo Township 1876 0 4 213.
## 10 Aliquippa 8946 50 81 1464.
Using arrange() with desc() to rank
municipalities from highest to lowest:
arranged_data <- X2018_UCR_PA_cleaned %>%
select(City, Population, crime.rate) %>%
arrange(desc(crime.rate))
# Top 15 municipalities by overall crime rate
head(arranged_data, 15)## # A tibble: 15 × 3
## City Population crime.rate
## <chr> <dbl> <dbl>
## 1 Wilkes-Barre Township 2889 17757.
## 2 Frazer Township 1130 14425.
## 3 Eddystone 2412 10904.
## 4 Homestead 3162 9962.
## 5 Southwest Regional, Washington County 132 6818.
## 6 Muncy Township 1063 6679.
## 7 Tullytown 1839 6417.
## 8 McKees Rocks 5929 6409.
## 9 Union Township, Lawrence County 4921 6198.
## 10 Uniontown 9751 6061.
## 11 East Rochester 537 5400.
## 12 West Lebanon Township 822 5353.
## 13 Upland 3250 5015.
## 14 Arnold 4868 4786.
## 15 Chester 34087 4767.
# Bottom 10 municipalities (excluding zeroes which may indicate non-reporting)
X2018_UCR_PA_cleaned %>%
select(City, Population, crime.rate) %>%
filter(crime.rate > 0) %>%
arrange(crime.rate) %>%
head(10)## # A tibble: 10 × 3
## City Population crime.rate
## <chr> <dbl> <dbl>
## 1 Scott Township, Lackawanna County 4753 21.0
## 2 Mahoning Township, Lawrence County 2911 34.4
## 3 Shade Township 2602 38.4
## 4 Lamar Township 2552 39.2
## 5 Ryan Township 2510 39.8
## 6 Liberty 2477 40.4
## 7 Cornwall 4326 46.2
## 8 Jackson Township, Cambria County 4083 49.0
## 9 Lykens 1770 56.5
## 10 Orangeville Area 1731 57.8
Following Chapter 3, we use cut() to categorize
municipalities by population size into five ordered groups:
breaks <- c(0, 10000, 50000, 100000, 500000, Inf)
labels <- c("Small", "Medium", "Large", "Very Large", "Metropolitan")
X2018_UCR_PA_cleaned <- X2018_UCR_PA_cleaned %>%
mutate(
population.category = cut(
Population,
breaks = breaks,
labels = labels,
include.lowest = TRUE
)
)
# Distribution of municipalities by population category
summary(X2018_UCR_PA_cleaned$population.category)## Small Medium Large Very Large Metropolitan
## 763 209 14 2 1
Using group_by() and summarize() to compute
average and total crime rates by population category:
crime.table <- X2018_UCR_PA_cleaned %>%
group_by(population.category) %>%
summarize(
n_municipalities = n(),
avg.crime.rate = mean(crime.rate, na.rm = TRUE),
avg.violent.rate = mean(violent.crime.rate, na.rm = TRUE),
avg.property.rate = mean(property.crime.rate, na.rm = TRUE),
total.violent = sum(violent.crime, na.rm = TRUE),
total.property = sum(property.crime, na.rm = TRUE)
)
crime.table## # A tibble: 5 × 7
## population.category n_municipalities avg.crime.rate avg.violent.rate
## <fct> <int> <dbl> <dbl>
## 1 Small 763 1184. 192.
## 2 Medium 209 1466. 177.
## 3 Large 14 1917. 323.
## 4 Very Large 2 3125. 459.
## 5 Metropolitan 1 4006. 909.
## # ℹ 3 more variables: avg.property.rate <dbl>, total.violent <dbl>,
## # total.property <dbl>
Interpretation: Larger municipalities do not necessarily have higher crime rates — the per-100,000 standardization removes the size effect, revealing underlying crime burden independent of population.
All visualizations use ggplot2 (part of
tidyverse), following the template introduced in Chapter
2:
ggplot(data = <DATA>) +
<GEOM_FUNCTION>(mapping = aes(<MAPPINGS>))
A histogram is appropriate for continuous variables like crime rate.
Per Chapter 3, we use geom_histogram() with
theme_minimal() for a clean appearance.
X2018_UCR_PA_cleaned %>%
ggplot(aes(x = crime.rate, fill = ..count..)) +
geom_histogram(bins = 40, color = "white") +
scale_fill_gradient(low = "steelblue", high = "darkred") +
labs(
x = "Crime Rate (per 100,000 residents)",
y = "Number of Municipalities",
title = "Distribution of Overall Crime Rates Across Pennsylvania Municipalities (2018)",
fill = "Count"
) +
theme_minimal()Interpretation: The distribution is strongly right-skewed — most Pennsylvania municipalities have relatively low crime rates, while a small number of outliers drive the upper tail. This is a common pattern in crime data.
Bar charts are suited for categorical variables, as covered in Chapter 2’s discussion of the RACE variable in the GSS data.
pop.bar <- X2018_UCR_PA_cleaned %>%
filter(!is.na(population.category)) %>%
ggplot(aes(x = population.category, fill = population.category)) +
geom_bar() +
scale_fill_manual(
values = c("Small" = "steelblue",
"Medium" = "darkcyan",
"Large" = "goldenrod3",
"Very Large" = "darkorange",
"Metropolitan" = "darkred"),
guide = FALSE
) +
labs(
x = "Population Category",
y = "Number of Municipalities",
title = "Pennsylvania Municipalities by Population Category (2018)"
) +
theme_minimal()
pop.barcrime.table %>%
filter(!is.na(population.category)) %>%
ggplot(aes(x = population.category,
y = avg.crime.rate,
fill = population.category)) +
geom_col() +
scale_fill_manual(
values = c("Small" = "steelblue",
"Medium" = "darkcyan",
"Large" = "goldenrod3",
"Very Large" = "darkorange",
"Metropolitan" = "darkred"),
guide = FALSE
) +
labs(
x = "Population Category",
y = "Average Crime Rate (per 100,000)",
title = "Average Crime Rate by Municipality Population Category (2018)"
) +
theme_minimal()crime.table %>%
filter(!is.na(population.category)) %>%
select(population.category, avg.violent.rate, avg.property.rate) %>%
pivot_longer(
cols = c(avg.violent.rate, avg.property.rate),
names_to = "crime.type",
values_to = "avg.rate"
) %>%
mutate(crime.type = recode(crime.type,
"avg.violent.rate" = "Violent Crime",
"avg.property.rate" = "Property Crime"
)) %>%
ggplot(aes(x = population.category, y = avg.rate, fill = crime.type)) +
geom_col(position = "stack") +
scale_fill_manual(values = c("Violent Crime" = "darkred",
"Property Crime" = "steelblue")) +
labs(
x = "Population Category",
y = "Average Rate (per 100,000)",
fill = "Crime Type",
title = "Average Violent vs. Property Crime Rates by Population Category (2018)"
) +
theme_minimal()X2018_UCR_PA_cleaned %>%
filter(violent.crime.rate > 0) %>%
ggplot(aes(x = violent.crime.rate, fill = ..count..)) +
geom_histogram(bins = 35, color = "white") +
scale_fill_gradient(low = "lightyellow", high = "darkred") +
labs(
x = "Violent Crime Rate (per 100,000 residents)",
y = "Number of Municipalities",
title = "Distribution of Violent Crime Rates — Pennsylvania (2018)",
fill = "Count"
) +
theme_minimal()X2018_UCR_PA_cleaned %>%
filter(!is.na(crime.rate)) %>%
arrange(desc(crime.rate)) %>%
slice_head(n = 20) %>%
mutate(City = fct_reorder(City, crime.rate)) %>%
ggplot(aes(x = City, y = crime.rate, fill = crime.rate)) +
geom_col() +
coord_flip() +
scale_fill_gradient(low = "steelblue", high = "darkred", guide = FALSE) +
labs(
x = NULL,
y = "Crime Rate (per 100,000 residents)",
title = "Top 20 Pennsylvania Municipalities by Crime Rate (2018)"
) +
theme_minimal()# Summary statistics for the crime rate variable
crime_summary <- X2018_UCR_PA_cleaned %>%
summarize(
n_municipalities = n(),
mean_crime_rate = round(mean(crime.rate, na.rm = TRUE), 1),
median_crime_rate = round(median(crime.rate, na.rm = TRUE), 1),
max_crime_rate = round(max(crime.rate, na.rm = TRUE), 1),
min_crime_rate = round(min(crime.rate[crime.rate > 0], na.rm = TRUE), 1),
sd_crime_rate = round(sd(crime.rate, na.rm = TRUE), 1)
)
crime_summary## # A tibble: 1 × 6
## n_municipalities mean_crime_rate median_crime_rate max_crime_rate
## <int> <dbl> <dbl> <dbl>
## 1 989 1261. 905. 17757
## # ℹ 2 more variables: min_crime_rate <dbl>, sd_crime_rate <dbl>
## Highest crime rate municipality:
X2018_UCR_PA_cleaned %>%
arrange(desc(crime.rate)) %>%
select(City, Population, crime.rate) %>%
slice(1)## # A tibble: 1 × 3
## City Population crime.rate
## <chr> <dbl> <dbl>
## 1 Wilkes-Barre Township 2889 17757.
Choi, J. (2025). Basic statistics using R for crime analysis. PA-ADOPT / West Chester University. CC BY-SA 4.0.
Federal Bureau of Investigation. (2018). Uniform Crime Report: Crime in the United States, 2018. U.S. Department of Justice.