library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.6
## ✔ forcats   1.0.1     ✔ stringr   1.6.0
## ✔ ggplot2   4.0.1     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.2
## ✔ purrr     1.2.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(crimedata)
library(ggplot2)
data(package = "crimedata")
list_crime_data()
##                city        years
## 1        All cities 2007 to 2022
## 2            Austin 2007 to 2022
## 3            Boston 2016 to 2022
## 4         Charlotte 2017 to 2022
## 5           Chicago 2007 to 2022
## 6  Colorado Springs 2016 to 2022
## 7           Detroit 2009 to 2022
## 8        Fort Worth 2007 to 2019
## 9           Houston 2019 to 2022
## 10      Kansas City 2009 to 2022
## 11      Los Angeles 2010 to 2022
## 12       Louisville 2007 to 2022
## 13          Memphis 2007 to 2022
## 14             Mesa 2016 to 2022
## 15      Minneapolis 2019 to 2022
## 16        Nashville 2013 to 2022
## 17         New York 2007 to 2022
## 18    San Francisco 2007 to 2022
## 19          Seattle 2008 to 2022
## 20         St Louis 2008 to 2020
## 21           Tucson 2009 to 2020
## 22   Virginia Beach 2013 to 2021
sf2020 <- get_crime_data(
  years = 2020,
  cities = "San Francisco",
  type = "sample",
  cache = TRUE,
  quiet = !interactive(),
  output = "tbl"
)
str(sf2020)
## tibble [834 × 10] (S3: tbl_df/tbl/data.frame)
##  $ uid            : int [1:834] 26160368 26160379 26160440 26160468 26160519 26160915 26161216 26161341 26161395 26161397 ...
##  $ city_name      : Factor w/ 1 level "San Francisco": 1 1 1 1 1 1 1 1 1 1 ...
##  $ offense_code   : Factor w/ 36 levels "100","11D","12A",..: 36 26 18 14 15 27 36 16 15 24 ...
##  $ offense_type   : Factor w/ 36 levels "aggravated assault",..: 3 30 2 28 31 9 3 32 31 36 ...
##  $ offense_group  : Factor w/ 21 levels "all other offenses",..: 1 19 14 14 14 7 1 14 14 12 ...
##  $ offense_against: Factor w/ 4 levels "other","persons",..: 1 3 3 3 3 3 1 3 3 3 ...
##  $ date_single    : POSIXct[1:834], format: "2020-01-01 02:04:00" "2020-01-01 02:44:00" ...
##  $ longitude      : num [1:834] -122 -122 -122 -122 -122 ...
##  $ latitude       : num [1:834] 37.8 37.8 37.8 37.8 37.8 ...
##  $ census_block   : chr [1:834] "060750151003000" "060750119012000" "060750125024000" "060750117003013" ...
glimpse(sf2020)
## Rows: 834
## Columns: 10
## $ uid             <int> 26160368, 26160379, 26160440, 26160468, 26160519, 2616…
## $ city_name       <fct> San Francisco, San Francisco, San Francisco, San Franc…
## $ offense_code    <fct> 90Z, 280, 23H, 23C, 23D, 290, 90Z, 23F, 23D, 26E, 23F,…
## $ offense_type    <fct> all other offenses, stolen property offenses, all othe…
## $ offense_group   <fct> all other offenses, stolen property offenses, larceny/…
## $ offense_against <fct> other, property, property, property, property, propert…
## $ date_single     <dttm> 2020-01-01 02:04:00, 2020-01-01 02:44:00, 2020-01-01 …
## $ longitude       <dbl> -122.4221, -122.4139, -122.4126, -122.4052, -122.4008,…
## $ latitude        <dbl> 37.78894, 37.79046, 37.78393, 37.78869, 37.78770, 37.7…
## $ census_block    <chr> "060750151003000", "060750119012000", "060750125024000…
view(sf2020)
head(sf2020)
## # A tibble: 6 × 10
##        uid city_name     offense_code offense_type offense_group offense_against
##      <int> <fct>         <fct>        <fct>        <fct>         <fct>          
## 1 26160368 San Francisco 90Z          all other o… all other of… other          
## 2 26160379 San Francisco 280          stolen prop… stolen prope… property       
## 3 26160440 San Francisco 23H          all other l… larceny/thef… property       
## 4 26160468 San Francisco 23C          shoplifting  larceny/thef… property       
## 5 26160519 San Francisco 23D          theft from … larceny/thef… property       
## 6 26160915 San Francisco 290          destruction… destruction/… property       
## # ℹ 4 more variables: date_single <dttm>, longitude <dbl>, latitude <dbl>,
## #   census_block <chr>
ggplot(sf2020, aes(x = offense_type)) +
  geom_bar() +
  labs(
    title = "Counts of Offense Types",
    x = "Offense Type",
    y = "Number of Incidents"
  ) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Part 7: Written Interpretation

Which offense types are most and least common? What patterns stand out in the visualization? Why is a bar chart more appropriate than a histogram for offense_type?

The most common offenses in San Francisco in 2020 was theft from motor vehicle (except theft of motor vehicle parts or accessories). There was also a large amount of larcenies and residential burglaries. From my background knowledge, I know that there were a lot more property crimes during COVID, so this pattern makes sense. This makes me think about creating a different plot looking at crimes against property or crimes against persons. I notice that in terms of the entire visualization, most of the offense types have extremely low counts making it hard to compare the ones that have larger counts. This makes me think about reordering them in terms of count, making it easier to compare the most common and least common types of offenses. In the case of offense type, it makes more sense to use a bar chart because we are visualizing qualitative data. It is categorical and not continuous. Using a histogram makes more sense when there are continuous numerical variables, like if we were looking at the distribution of offenders age within a data set.