1 Introduction

This analysis applies the data transformation and visualization techniques from Chapters 2 and 3 of Basic Statistics Using R for Crime Analysis (Choi, 2025) to the 2018 Pennsylvania Uniform Crime Report (UCR) dataset.

The UCR is compiled by the FBI from reports submitted by more than 18,000 police departments across the United States. This dataset covers Part 1 offenses — the most serious index crimes — reported by Pennsylvania municipalities in 2018.

Part 1 Violent Crimes include:

  • Murder / Non-negligent manslaughter
  • Rape
  • Robbery
  • Aggravated assault

Part 1 Property Crimes include:

  • Burglary
  • Larceny-theft
  • Motor vehicle theft
  • Arson

Note on data limitations: The UCR only counts crimes reported to the police. Unreported crime — the so-called “dark figure of crime” — is not captured here.


2 Setup: Loading Required Packages

Following the approach in Chapter 3, we install and load the readxl package to import the Excel file, and tidyverse for data manipulation and visualization.

# Install packages if not already installed
# install.packages("readxl")
# install.packages("tidyverse")
# install.packages("rstudioapi")  # used to resolve the working directory

library(readxl)
library(tidyverse)
library(rstudioapi)

3 Importing and Inspecting the Data

3.1 Importing the Data

We use read_excel() from the readxl package, consistent with Chapter 3’s approach to reading .xlsx files. Note the X prefix convention: R does not allow object names beginning with a number.

X2018_UCR_PA <- read_excel("C:/Users/qgree/OneDrive/Desktop/Drexel Winter Quarter Classes 2025-2026/CJS-310/Week 3/2018.UCR.PA.xlsx")

3.2 Viewing the Data Structure

# Display the first six rows
head(X2018_UCR_PA)
## # A tibble: 6 × 12
##   City        Population `Violent\r\ncrime` Murder and\r\nnonneg…¹  Rape Robbery
##   <chr>            <dbl>              <dbl>                  <dbl> <dbl>   <dbl>
## 1 Abington T…      55631                 44                      1     6      12
## 2 Adamstown         1857                  3                      0     0       0
## 3 Adams Town…      14105                  3                      0     0       0
## 4 Adams Town…       5581                  0                      0     0       0
## 5 Akron             4015                  7                      0     1       0
## 6 Albion            1466                  0                      0     0       0
## # ℹ abbreviated name: ¹​`Murder and\r\nnonnegligent\r\nmanslaughter`
## # ℹ 6 more variables: `Aggravated\r\nassault` <dbl>, `Property\r\ncrime` <dbl>,
## #   Burglary <dbl>, `Larceny-\r\ntheft` <dbl>,
## #   `Motor\r\nvehicle\r\ntheft` <dbl>, Arson <dbl>
# Check dimensions: rows = municipalities, columns = variables
dim(X2018_UCR_PA)
## [1] 989  12

The dataset has 989 municipalities and 12 variables. Each row represents a city or township, consistent with the UCR’s unit of analysis (unlike survey data where each row represents an individual respondent).

3.3 Summary Statistics — Initial Inspection

summary(X2018_UCR_PA)
##      City             Population       Violent\r\ncrime 
##  Length:989         Min.   :    132   Min.   :    0.00  
##  Class :character   1st Qu.:   2066   1st Qu.:    1.00  
##  Mode  :character   Median :   4320   Median :    5.00  
##                     Mean   :  10054   Mean   :   34.16  
##                     3rd Qu.:   9088   3rd Qu.:   15.00  
##                     Max.   :1586916   Max.   :14420.00  
##  Murder and\r\nnonnegligent\r\nmanslaughter      Rape         
##  Min.   :  0.0000                           Min.   :   0.000  
##  1st Qu.:  0.0000                           1st Qu.:   0.000  
##  Median :  0.0000                           Median :   0.000  
##  Mean   :  0.6977                           Mean   :   2.971  
##  3rd Qu.:  0.0000                           3rd Qu.:   1.000  
##  Max.   :351.0000                           Max.   :1095.000  
##     Robbery         Aggravated\r\nassault Property\r\ncrime    Burglary      
##  Min.   :   0.000   Min.   :   0.00       Min.   :    0.0   Min.   :   0.00  
##  1st Qu.:   0.000   1st Qu.:   1.00       1st Qu.:    9.0   1st Qu.:   1.00  
##  Median :   0.000   Median :   4.00       Median :   40.0   Median :   5.00  
##  Mean   :   9.449   Mean   :  21.05       Mean   :  164.6   Mean   :  21.42  
##  3rd Qu.:   2.000   3rd Qu.:  11.00       3rd Qu.:  105.0   3rd Qu.:  12.00  
##  Max.   :5262.000   Max.   :7712.00       Max.   :49145.0   Max.   :6497.00  
##  Larceny-\r\ntheft Motor\r\nvehicle\r\ntheft     Arson        
##  Min.   :    0.0   Min.   :   0.00           Min.   :  0.000  
##  1st Qu.:    7.0   1st Qu.:   0.00           1st Qu.:  0.000  
##  Median :   32.0   Median :   1.00           Median :  0.000  
##  Mean   :  131.3   Mean   :  11.84           Mean   :  1.147  
##  3rd Qu.:   89.0   3rd Qu.:   4.00           3rd Qu.:  0.000  
##  Max.   :36968.0   Max.   :5680.00           Max.   :430.000

Observation: The variable names contain embedded newline characters (\n) from the Excel formatting. For example, Violent\ncrime. These must be renamed before analysis, as shown in Chapter 3.


4 Data Cleaning: Renaming Variables (Chapter 3)

Using rename_with() from dplyr (part of tidyverse), we replace all twelve column names in one operation. This approach avoids issues with backtick-escaped rename() calls, since the exact newline escape sequence can vary depending on the operating system and how Excel wrote the file.

X2018_UCR_PA_cleaned <- X2018_UCR_PA %>%
  rename_with(~ c(
    "City", "Population",
    "violent.crime", "murder.manslaughter",
    "rape", "robbery", "aggravated.assault",
    "property.crime", "burglary", "larceny.theft",
    "motor.theft", "arson"
  ))

# Confirm cleaned names
names(X2018_UCR_PA_cleaned)
##  [1] "City"                "Population"          "violent.crime"      
##  [4] "murder.manslaughter" "rape"                "robbery"            
##  [7] "aggravated.assault"  "property.crime"      "burglary"           
## [10] "larceny.theft"       "motor.theft"         "arson"
summary(X2018_UCR_PA_cleaned)
##      City             Population      violent.crime      murder.manslaughter
##  Length:989         Min.   :    132   Min.   :    0.00   Min.   :  0.0000   
##  Class :character   1st Qu.:   2066   1st Qu.:    1.00   1st Qu.:  0.0000   
##  Mode  :character   Median :   4320   Median :    5.00   Median :  0.0000   
##                     Mean   :  10054   Mean   :   34.16   Mean   :  0.6977   
##                     3rd Qu.:   9088   3rd Qu.:   15.00   3rd Qu.:  0.0000   
##                     Max.   :1586916   Max.   :14420.00   Max.   :351.0000   
##       rape             robbery         aggravated.assault property.crime   
##  Min.   :   0.000   Min.   :   0.000   Min.   :   0.00    Min.   :    0.0  
##  1st Qu.:   0.000   1st Qu.:   0.000   1st Qu.:   1.00    1st Qu.:    9.0  
##  Median :   0.000   Median :   0.000   Median :   4.00    Median :   40.0  
##  Mean   :   2.971   Mean   :   9.449   Mean   :  21.05    Mean   :  164.6  
##  3rd Qu.:   1.000   3rd Qu.:   2.000   3rd Qu.:  11.00    3rd Qu.:  105.0  
##  Max.   :1095.000   Max.   :5262.000   Max.   :7712.00    Max.   :49145.0  
##     burglary       larceny.theft      motor.theft          arson        
##  Min.   :   0.00   Min.   :    0.0   Min.   :   0.00   Min.   :  0.000  
##  1st Qu.:   1.00   1st Qu.:    7.0   1st Qu.:   0.00   1st Qu.:  0.000  
##  Median :   5.00   Median :   32.0   Median :   1.00   Median :  0.000  
##  Mean   :  21.42   Mean   :  131.3   Mean   :  11.84   Mean   :  1.147  
##  3rd Qu.:  12.00   3rd Qu.:   89.0   3rd Qu.:   4.00   3rd Qu.:  0.000  
##  Max.   :6497.00   Max.   :36968.0   Max.   :5680.00   Max.   :430.000

All variables now display clean, dot-separated names and proper numeric summary statistics (minimum, quartiles, mean, median, maximum).


5 Creating New Variables (Chapter 3)

5.1 Computing Crime Rate

The crime rate is a standardized measure: the number of Part 1 offenses per 100,000 residents. This allows meaningful comparison across municipalities of very different sizes.

\[\text{Crime Rate} = \frac{\text{Violent Crime} + \text{Property Crime}}{\text{Population}} \times 100{,}000\]

Using the mutate() function from dplyr:

X2018_UCR_PA_cleaned <- X2018_UCR_PA_cleaned %>%
  mutate(
    crime.rate = ((violent.crime + property.crime) / Population) * 100000
  )
# View city and crime rate side by side
X2018_UCR_PA_cleaned %>%
  select(City, Population, violent.crime, property.crime, crime.rate) %>%
  head(10)
## # A tibble: 10 × 5
##    City                       Population violent.crime property.crime crime.rate
##    <chr>                           <dbl>         <dbl>          <dbl>      <dbl>
##  1 Abington Township, Montgo…      55631            44            949      1785.
##  2 Adamstown                        1857             3             14       915.
##  3 Adams Township, Butler Co…      14105             3             46       347.
##  4 Adams Township, Cambria C…       5581             0             11       197.
##  5 Akron                            4015             7             34      1021.
##  6 Albion                           1466             0             13       887.
##  7 Alburtis                         2663             3              5       300.
##  8 Aldan                            4157             3            100      2478.
##  9 Aleppo Township                  1876             0              4       213.
## 10 Aliquippa                        8946            50             81      1464.

5.2 Computing Separate Violent and Property Crime Rates

X2018_UCR_PA_cleaned <- X2018_UCR_PA_cleaned %>%
  mutate(
    violent.crime.rate  = (violent.crime  / Population) * 100000,
    property.crime.rate = (property.crime / Population) * 100000
  )

6 Summary Statistics by Arrangement and Grouping (Chapter 3)

6.1 Highest Crime Rate Municipalities

Using arrange() with desc() to rank municipalities from highest to lowest:

arranged_data <- X2018_UCR_PA_cleaned %>%
  select(City, Population, crime.rate) %>%
  arrange(desc(crime.rate))

# Top 15 municipalities by overall crime rate
head(arranged_data, 15)
## # A tibble: 15 × 3
##    City                                  Population crime.rate
##    <chr>                                      <dbl>      <dbl>
##  1 Wilkes-Barre Township                       2889     17757.
##  2 Frazer Township                             1130     14425.
##  3 Eddystone                                   2412     10904.
##  4 Homestead                                   3162      9962.
##  5 Southwest Regional, Washington County        132      6818.
##  6 Muncy Township                              1063      6679.
##  7 Tullytown                                   1839      6417.
##  8 McKees Rocks                                5929      6409.
##  9 Union Township, Lawrence County             4921      6198.
## 10 Uniontown                                   9751      6061.
## 11 East Rochester                               537      5400.
## 12 West Lebanon Township                        822      5353.
## 13 Upland                                      3250      5015.
## 14 Arnold                                      4868      4786.
## 15 Chester                                    34087      4767.

6.2 Lowest Crime Rate Municipalities

# Bottom 10 municipalities (excluding zeroes which may indicate non-reporting)
X2018_UCR_PA_cleaned %>%
  select(City, Population, crime.rate) %>%
  filter(crime.rate > 0) %>%
  arrange(crime.rate) %>%
  head(10)
## # A tibble: 10 × 3
##    City                               Population crime.rate
##    <chr>                                   <dbl>      <dbl>
##  1 Scott Township, Lackawanna County        4753       21.0
##  2 Mahoning Township, Lawrence County       2911       34.4
##  3 Shade Township                           2602       38.4
##  4 Lamar Township                           2552       39.2
##  5 Ryan Township                            2510       39.8
##  6 Liberty                                  2477       40.4
##  7 Cornwall                                 4326       46.2
##  8 Jackson Township, Cambria County         4083       49.0
##  9 Lykens                                   1770       56.5
## 10 Orangeville Area                         1731       57.8

6.3 Population Category Variable (cut() Function)

Following Chapter 3, we use cut() to categorize municipalities by population size into five ordered groups:

breaks <- c(0, 10000, 50000, 100000, 500000, Inf)
labels <- c("Small", "Medium", "Large", "Very Large", "Metropolitan")

X2018_UCR_PA_cleaned <- X2018_UCR_PA_cleaned %>%
  mutate(
    population.category = cut(
      Population,
      breaks        = breaks,
      labels        = labels,
      include.lowest = TRUE
    )
  )

# Distribution of municipalities by population category
summary(X2018_UCR_PA_cleaned$population.category)
##        Small       Medium        Large   Very Large Metropolitan 
##          763          209           14            2            1

6.4 Group-Level Summary Statistics (group_by)

Using group_by() and summarize() to compute average and total crime rates by population category:

crime.table <- X2018_UCR_PA_cleaned %>%
  group_by(population.category) %>%
  summarize(
    n_municipalities   = n(),
    avg.crime.rate     = mean(crime.rate,     na.rm = TRUE),
    avg.violent.rate   = mean(violent.crime.rate,  na.rm = TRUE),
    avg.property.rate  = mean(property.crime.rate, na.rm = TRUE),
    total.violent      = sum(violent.crime,   na.rm = TRUE),
    total.property     = sum(property.crime,  na.rm = TRUE)
  )

crime.table
## # A tibble: 5 × 7
##   population.category n_municipalities avg.crime.rate avg.violent.rate
##   <fct>                          <int>          <dbl>            <dbl>
## 1 Small                            763          1184.             192.
## 2 Medium                           209          1466.             177.
## 3 Large                             14          1917.             323.
## 4 Very Large                         2          3125.             459.
## 5 Metropolitan                       1          4006.             909.
## # ℹ 3 more variables: avg.property.rate <dbl>, total.violent <dbl>,
## #   total.property <dbl>

Interpretation: Larger municipalities do not necessarily have higher crime rates — the per-100,000 standardization removes the size effect, revealing underlying crime burden independent of population.


7 Data Visualization (Chapters 2 & 3)

All visualizations use ggplot2 (part of tidyverse), following the template introduced in Chapter 2:

ggplot(data = <DATA>) +
  <GEOM_FUNCTION>(mapping = aes(<MAPPINGS>))

7.1 Distribution of Crime Rates — Histogram (geom_histogram)

A histogram is appropriate for continuous variables like crime rate. Per Chapter 3, we use geom_histogram() with theme_minimal() for a clean appearance.

X2018_UCR_PA_cleaned %>%
  ggplot(aes(x = crime.rate, fill = ..count..)) +
  geom_histogram(bins = 40, color = "white") +
  scale_fill_gradient(low = "steelblue", high = "darkred") +
  labs(
    x     = "Crime Rate (per 100,000 residents)",
    y     = "Number of Municipalities",
    title = "Distribution of Overall Crime Rates Across Pennsylvania Municipalities (2018)",
    fill  = "Count"
  ) +
  theme_minimal()

Interpretation: The distribution is strongly right-skewed — most Pennsylvania municipalities have relatively low crime rates, while a small number of outliers drive the upper tail. This is a common pattern in crime data.

7.2 Distribution of Municipalities by Population Category — Bar Chart (geom_bar)

Bar charts are suited for categorical variables, as covered in Chapter 2’s discussion of the RACE variable in the GSS data.

pop.bar <- X2018_UCR_PA_cleaned %>%
  filter(!is.na(population.category)) %>%
  ggplot(aes(x = population.category, fill = population.category)) +
  geom_bar() +
  scale_fill_manual(
    values = c("Small"        = "steelblue",
               "Medium"       = "darkcyan",
               "Large"        = "goldenrod3",
               "Very Large"   = "darkorange",
               "Metropolitan" = "darkred"),
    guide = FALSE
  ) +
  labs(
    x     = "Population Category",
    y     = "Number of Municipalities",
    title = "Pennsylvania Municipalities by Population Category (2018)"
  ) +
  theme_minimal()

pop.bar

7.3 Average Crime Rate by Population Category — Bar Chart

crime.table %>%
  filter(!is.na(population.category)) %>%
  ggplot(aes(x = population.category,
             y = avg.crime.rate,
             fill = population.category)) +
  geom_col() +
  scale_fill_manual(
    values = c("Small"        = "steelblue",
               "Medium"       = "darkcyan",
               "Large"        = "goldenrod3",
               "Very Large"   = "darkorange",
               "Metropolitan" = "darkred"),
    guide = FALSE
  ) +
  labs(
    x     = "Population Category",
    y     = "Average Crime Rate (per 100,000)",
    title = "Average Crime Rate by Municipality Population Category (2018)"
  ) +
  theme_minimal()

7.4 Violent vs. Property Crime Rates — Stacked Comparison

crime.table %>%
  filter(!is.na(population.category)) %>%
  select(population.category, avg.violent.rate, avg.property.rate) %>%
  pivot_longer(
    cols      = c(avg.violent.rate, avg.property.rate),
    names_to  = "crime.type",
    values_to = "avg.rate"
  ) %>%
  mutate(crime.type = recode(crime.type,
    "avg.violent.rate"  = "Violent Crime",
    "avg.property.rate" = "Property Crime"
  )) %>%
  ggplot(aes(x = population.category, y = avg.rate, fill = crime.type)) +
  geom_col(position = "stack") +
  scale_fill_manual(values = c("Violent Crime" = "darkred",
                               "Property Crime" = "steelblue")) +
  labs(
    x     = "Population Category",
    y     = "Average Rate (per 100,000)",
    fill  = "Crime Type",
    title = "Average Violent vs. Property Crime Rates by Population Category (2018)"
  ) +
  theme_minimal()

7.5 Distribution of Violent Crime Rate — Histogram

X2018_UCR_PA_cleaned %>%
  filter(violent.crime.rate > 0) %>%
  ggplot(aes(x = violent.crime.rate, fill = ..count..)) +
  geom_histogram(bins = 35, color = "white") +
  scale_fill_gradient(low = "lightyellow", high = "darkred") +
  labs(
    x     = "Violent Crime Rate (per 100,000 residents)",
    y     = "Number of Municipalities",
    title = "Distribution of Violent Crime Rates — Pennsylvania (2018)",
    fill  = "Count"
  ) +
  theme_minimal()

7.6 Top 20 Municipalities by Overall Crime Rate

X2018_UCR_PA_cleaned %>%
  filter(!is.na(crime.rate)) %>%
  arrange(desc(crime.rate)) %>%
  slice_head(n = 20) %>%
  mutate(City = fct_reorder(City, crime.rate)) %>%
  ggplot(aes(x = City, y = crime.rate, fill = crime.rate)) +
  geom_col() +
  coord_flip() +
  scale_fill_gradient(low = "steelblue", high = "darkred", guide = FALSE) +
  labs(
    x     = NULL,
    y     = "Crime Rate (per 100,000 residents)",
    title = "Top 20 Pennsylvania Municipalities by Crime Rate (2018)"
  ) +
  theme_minimal()


8 Key Findings

# Summary statistics for the crime rate variable
crime_summary <- X2018_UCR_PA_cleaned %>%
  summarize(
    n_municipalities   = n(),
    mean_crime_rate    = round(mean(crime.rate, na.rm = TRUE), 1),
    median_crime_rate  = round(median(crime.rate, na.rm = TRUE), 1),
    max_crime_rate     = round(max(crime.rate, na.rm = TRUE), 1),
    min_crime_rate     = round(min(crime.rate[crime.rate > 0], na.rm = TRUE), 1),
    sd_crime_rate      = round(sd(crime.rate, na.rm = TRUE), 1)
  )

crime_summary
## # A tibble: 1 × 6
##   n_municipalities mean_crime_rate median_crime_rate max_crime_rate
##              <int>           <dbl>             <dbl>          <dbl>
## 1              989           1261.              905.          17757
## # ℹ 2 more variables: min_crime_rate <dbl>, sd_crime_rate <dbl>
cat("Highest crime rate municipality:\n")
## Highest crime rate municipality:
X2018_UCR_PA_cleaned %>%
  arrange(desc(crime.rate)) %>%
  select(City, Population, crime.rate) %>%
  slice(1)
## # A tibble: 1 × 3
##   City                  Population crime.rate
##   <chr>                      <dbl>      <dbl>
## 1 Wilkes-Barre Township       2889     17757.
  • The dataset covers 989 Pennsylvania municipalities.
  • The mean overall crime rate is 1261.1 per 100,000, with a median of 904.8, confirming the right-skewed distribution seen in the histogram.
  • The standard deviation of 1348.7 reflects substantial variability across municipalities.
  • The overwhelming majority of municipalities fall in the Small (under 10,000) population category.
  • Property crime consistently drives the overall crime rate across all population categories, far exceeding violent crime rates.

9 References

Choi, J. (2025). Basic statistics using R for crime analysis. PA-ADOPT / West Chester University. CC BY-SA 4.0.

Federal Bureau of Investigation. (2018). Uniform Crime Report: Crime in the United States, 2018. U.S. Department of Justice.