Part 2: Load Required Packages

# install.packages("tidyverse")
# install.packages("crimedata")

library(tidyverse)
library(crimedata)
library(ggplot2)

Part 3: Explore Available Crime Data

# View datasets bundled with the crimedata package
data(package = "crimedata")

# List all available city/year combinations
list_crime_data()
##                city        years
## 1        All cities 2007 to 2022
## 2            Austin 2007 to 2022
## 3            Boston 2016 to 2022
## 4         Charlotte 2017 to 2022
## 5           Chicago 2007 to 2022
## 6  Colorado Springs 2016 to 2022
## 7           Detroit 2009 to 2022
## 8        Fort Worth 2007 to 2019
## 9           Houston 2019 to 2022
## 10      Kansas City 2009 to 2022
## 11      Los Angeles 2010 to 2022
## 12       Louisville 2007 to 2022
## 13          Memphis 2007 to 2022
## 14             Mesa 2016 to 2022
## 15      Minneapolis 2019 to 2022
## 16        Nashville 2013 to 2022
## 17         New York 2007 to 2022
## 18    San Francisco 2007 to 2022
## 19          Seattle 2008 to 2022
## 20         St Louis 2008 to 2020
## 21           Tucson 2009 to 2020
## 22   Virginia Beach 2013 to 2021

Part 4: Retrieve Crime Data

For this assignment, I am using Colorado Springs crime data for the year 2018.

crime_data <- get_crime_data(
  years = 2018,
  cities = "Colorado Springs",
  type = "sample",
  cache = TRUE,
  quiet = !interactive(),
  output = "tbl"
)

Part 5: Explore the Dataset

# View the structure of the dataset
str(crime_data)
## tibble [408 × 12] (S3: tbl_df/tbl/data.frame)
##  $ uid            : int [1:408] 6955875 6955999 6956071 6956094 6956127 6956231 6956246 6956627 6956840 6956895 ...
##  $ city_name      : Factor w/ 1 level "Colorado Springs": 1 1 1 1 1 1 1 1 1 1 ...
##  $ offense_code   : Factor w/ 28 levels "12B","13A","13B",..: 28 13 20 7 9 28 8 27 10 11 ...
##  $ offense_type   : Factor w/ 28 levels "aggravated assault",..: 3 2 8 20 21 3 19 27 24 25 ...
##  $ offense_group  : Factor w/ 18 levels "all other offenses",..: 1 12 6 4 12 1 4 17 12 12 ...
##  $ offense_against: Factor w/ 4 levels "other","persons",..: 1 3 3 3 3 1 3 4 3 3 ...
##  $ date_single    : POSIXct[1:408], format: "2018-01-01 02:00:00" "2018-01-02 08:30:00" ...
##  $ date_start     : POSIXct[1:408], format: "2018-01-01 01:45:00" "2018-01-02 07:30:00" ...
##  $ date_end       : POSIXct[1:408], format: "2018-01-01 02:15:00" "2018-01-02 09:30:00" ...
##  $ longitude      : num [1:408] -105 -105 -105 -105 -105 ...
##  $ latitude       : num [1:408] 38.9 38.8 38.9 38.8 38.9 ...
##  $ census_block   : chr [1:408] "080410037021024" "080410019002023" "080410003011026" "080410029002001" ...
# A tidyverse-friendly overview of the dataset
glimpse(crime_data)
## Rows: 408
## Columns: 12
## $ uid             <int> 6955875, 6955999, 6956071, 6956094, 6956127, 6956231, …
## $ city_name       <fct> Colorado Springs, Colorado Springs, Colorado Springs, …
## $ offense_code    <fct> 90Z, 23H, 290, 22A, 23C, 90Z, 22B, 90J, 23D, 23F, 23C,…
## $ offense_type    <fct> all other offenses, all other larceny, destruction/dam…
## $ offense_group   <fct> all other offenses, larceny/theft offenses, destructio…
## $ offense_against <fct> other, property, property, property, property, other, …
## $ date_single     <dttm> 2018-01-01 02:00:00, 2018-01-02 08:30:00, 2018-01-03 …
## $ date_start      <dttm> 2018-01-01 01:45:00, 2018-01-02 07:30:00, 2018-01-03 …
## $ date_end        <dttm> 2018-01-01 02:15:00, 2018-01-02 09:30:00, 2018-01-03 …
## $ longitude       <dbl> -104.8492, -104.7859, -104.8033, -104.8155, -104.7236,…
## $ latitude        <dbl> 38.92980, 38.84235, 38.87513, 38.80984, 38.85346, 38.8…
## $ census_block    <chr> "080410037021024", "080410019002023", "080410003011026…

Part 6: Bar Chart of Offense Types

ggplot(crime_data, aes(x = offense_type)) +
  geom_bar(fill = "steelblue", color = "white") +
  labs(
    title = "Counts of Offense Types",
    subtitle = "Colorado Springs, 2018",
    x = "Offense Type",
    y = "Number of Incidents"
  ) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Part 7: Written Interpretation

Which offense types are most and least common?

Based on the bar chart, the most common offense type in Colorado Springs in 2018 is larceny/theft, which represents the tallest bar by a significant margin. Other similarly frequent categories include motor vehicle theft and burglary. The least common offense types, those with the shortest bars, include homicide and arson, which, while serious in nature, occur far less frequently than property crimes in the dataset.

What patterns stand out in the visualization?

The most striking pattern is the dominance of property crimes relative to violent crimes. The distribution is heavily right-skewed when offense types are sorted by frequency, with a small number of categories accounting for the large majority of incidents. This is consistent with what criminologists generally observe: property crimes tend to outnumber violent crimes in most U.S. cities by a wide margin. The visual gap between the most common and least common offense types is substantial and immediately apparent from the chart.

Why is a bar chart more appropriate than a histogram for offense_type?

A histogram is designed for continuous numeric variables, where data are grouped into equally-spaced bins along a number line. offense_type, however, is a categorical variable — it consists of named crime categories with no inherent numeric value or meaningful ordering. A bar chart is the correct choice for categorical data because each bar represents a distinct, discrete group, and the height of the bar simply reflects the count of observations in that category. Using a histogram here would be conceptually inappropriate, as it would impose a quantitative structure on data that is fundamentally qualitative. ```