During 2023 in the United States, there has been concern of a growing trend teenagers are engaging in on Tiktok that involves stealing cars for joy rides. It’ll take time to gather enough data about this Tiktok trend to understand the motivations behind it better, but perhaps we can use the Fatal Encounters database to observe trends of teenage vehicle incidents that resulted in death from a different angle.
A goal of this data synthesis is to look at the trend of teenage recklessness while driving, specifically police officer-involved vehicle pursuits resulting in death.
DISCLAIMER The project described in this paper relies on data from survey(s) administered by the National Officer-Involved Homicide Database, which is maintained by the Center for Economic and Social Research (CESR) at the University of Southern California. The content of this paper is solely the responsibility of the authors and does not necessarily represent the official views of USC or NOIHD.
install.packages("haven")
## Installing package into '/usr/local/lib/R/site-library'
## (as 'lib' is unspecified)
install.packages("skimr")
## Installing package into '/usr/local/lib/R/site-library'
## (as 'lib' is unspecified)
Several packages are needed for the visualizations, but it’s not necessary to show the full list here.
The data set is from Fatal Encounters.
load('census_teen_data.RData') # for census tract
load('all_FE_data.RData') # combined Fatal Encounters data sets
The following code chunk only needed one time to load into the global environment; the data will be later to load it locally each time.
# Exported one .csv tab at a time, since the columns do not match
FE_responses <- read.csv('formResponses_FatalEncounters.csv')
FE_population <- read.csv('population_FatalEncounters.csv')
FE_type <- read.csv('threatassess_FatalEncounters.csv')
I can use the following code chunk to check a little bit of my data before doing analysis, but it’s long so I’ll comment it out. I’ll show this data tidied and with more context below on the page.
skim(FE_responses)
| Name | FE_responses |
| Number of rows | 31498 |
| Number of columns | 36 |
| _______________________ | |
| Column type frequency: | |
| character | 28 |
| logical | 1 |
| numeric | 7 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| Name | 0 | 1.00 | 4 | 82 | 0 | 29859 | 0 |
| Age | 0 | 1.00 | 0 | 5 | 1221 | 112 | 0 |
| Gender | 0 | 1.00 | 0 | 11 | 144 | 4 | 0 |
| Race | 0 | 1.00 | 0 | 57 | 1 | 12 | 0 |
| Race.with.imputations | 862 | 0.97 | 0 | 23 | 6 | 10 | 0 |
| Imputation.probability | 881 | 0.97 | 0 | 19 | 3 | 6614 | 0 |
| URL.of.image..PLS.NO.HOTLINKS. | 0 | 1.00 | 0 | 373 | 16773 | 14668 | 0 |
| Date.of.injury.resulting.in.death..month.day.year. | 0 | 1.00 | 10 | 10 | 0 | 7736 | 0 |
| Location.of.injury..address. | 0 | 1.00 | 0 | 74 | 556 | 28893 | 0 |
| Location.of.death..city. | 0 | 1.00 | 0 | 30 | 36 | 6340 | 0 |
| State | 0 | 1.00 | 0 | 2 | 1 | 52 | 0 |
| Location.of.death..county. | 0 | 1.00 | 0 | 33 | 15 | 1536 | 0 |
| Full.Address | 0 | 1.00 | 0 | 103 | 1 | 29709 | 0 |
| Latitude | 0 | 1.00 | 0 | 17 | 1 | 29515 | 0 |
| Agency.or.agencies.involved | 0 | 1.00 | 0 | 266 | 78 | 6829 | 0 |
| Highest.level.of.force | 0 | 1.00 | 0 | 33 | 4 | 19 | 0 |
| Name.Temporary | 0 | 1.00 | 0 | 58 | 25969 | 5284 | 0 |
| Armed.Unarmed | 0 | 1.00 | 0 | 19 | 14419 | 10 | 0 |
| Alleged.weapon | 0 | 1.00 | 0 | 35 | 14421 | 269 | 0 |
| Aggressive.physical.movement | 0 | 1.00 | 0 | 42 | 14418 | 32 | 0 |
| Fleeing.Not.fleeing | 0 | 1.00 | 0 | 42 | 14419 | 26 | 0 |
| Description.Temp | 0 | 1.00 | 0 | 2239 | 27431 | 3870 | 0 |
| URL.Temp | 0 | 1.00 | 0 | 723 | 28281 | 3066 | 0 |
| Brief.description | 0 | 1.00 | 0 | 2239 | 2 | 29883 | 0 |
| Dispositions.Exclusions.INTERNAL.USE..NOT.FOR.ANALYSIS | 0 | 1.00 | 0 | 89 | 3 | 156 | 0 |
| Intended.use.of.force..Developing. | 0 | 1.00 | 0 | 22 | 3 | 9 | 0 |
| Supporting.document.link | 0 | 1.00 | 0 | 438 | 2 | 29269 | 0 |
| Foreknowledge.of.mental.illness..INTERNAL.USE..NOT.FOR.ANALYSIS | 0 | 1.00 | 0 | 19 | 62 | 5 | 0 |
Variable type: logical
| skim_variable | n_missing | complete_rate | mean | count |
|---|---|---|---|---|
| X | 31498 | 0 | NaN | : |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| Unique.ID | 1 | 1.00 | 15749.00 | 9092.55 | 1.00 | 7875 | 15749.00 | 23623.00 | 31497.00 | ▇▇▇▇▇ |
| Location.of.death..zip.code. | 182 | 0.99 | 58352.53 | 27966.03 | 1013.00 | 33147 | 60649.00 | 85033.00 | 99921.00 | ▃▇▃▆▇ |
| Longitude | 1 | 1.00 | -95.40 | 16.30 | -165.59 | -111 | -90.56 | -82.57 | -67.27 | ▁▁▅▇▇ |
| UID.Temporary | 25969 | 0.18 | 15464.08 | 6559.72 | 9759.00 | 11156 | 12549.00 | 19240.00 | 30340.00 | ▇▁▁▁▂ |
| X.1 | 31497 | 0.00 | 10895.00 | NA | 10895.00 | 10895 | 10895.00 | 10895.00 | 10895.00 | ▁▁▇▁▁ |
| Unique.ID.formula | 31496 | 0.00 | 29497.00 | 2828.43 | 27497.00 | 28497 | 29497.00 | 30497.00 | 31497.00 | ▇▁▁▁▇ |
| Unique.identifier..redundant. | 1 | 1.00 | 15749.00 | 9092.55 | 1.00 | 7875 | 15749.00 | 23623.00 | 31497.00 | ▇▇▇▇▇ |
This is an easier way for me to load census variable codes locally, so I can easily pick variables that relate to the study.
# Using tidycensus library
# Search for available variables
variables <- load_variables(2019, "acs5", cache = TRUE)
# View the variable names and descriptions
head(variables)
Again, the primary tract for this study is of the southwest states of Arizona, New Mexico, Texas, and Oklahoma.
(The comments can be removed from other regions to focus on others or all of them, but there was too much data to reasonably load for the purpose of this project goal.)
This, along with the census variables I chose create the tract for if I need to map data points. Potentially relevant variables include vehicles per household, households where grandparents are the primary guardian, and population aged 12 to 17.
# Define states in the East and South regions
# southeast_states <- c("AR", "LA", "MS", "TN", "AL", "KY", "WV", "DC", "VA", "NC", "SC", "GA", "FL")
#
# northeast_states <- c("NY", "PA", "NJ", "MA", "CT", "RI", "VT", "NH", "ME", "MD", "DE")
#
# west_states <- c("WA", "OR", "CA", "HI", "AK", "ID", "NV", "MT", "WY", "UT", "CO")
#
# midwest_states <- c("ND", "SD", "NE", "KS", "MN", "IA", "MO", "WI", "IL", "MI", "IN", "OH")
southwest_states <- c("AZ", "NM", "TX", "OK")
# territories_us <- c("PR", "GU", "AS", "VI", "MP") #Puerto Rico, Guam, American Samoa, U.S. Virgin Islands, and Northern Mariana Islands
## In case I want to include all states and territories
# us_states_and_territories <- c(southeast_states, northeast_states, west_states, midwest_states, southwest_states, territories_us)
census_var <- c(hhincome = 'B19019_001',
foodstamps.status = "B22003_001",
workclass.from16 = "B24081_001",
incometopoverty.under18 = 'B05010_001',
geomobility = "B07001_001",
modecommutework.16to19 = "B08101_002",
modecommutework.poverty = "B08122_001",
vehicles.household = "B08201_001",
population.12to14 = "B09001_008",
population.15to17 = "B09001_009",
householdtype.under18 = "B09005_001",
grandhguardian = "B10050_001", # grandparent guardian
grandemployment = "B10058_001", # ..by employment
grade9to12.poverty = "B14006_018",
notenrolled.school = "B14007_019",
allocenrolled.school = "B99141_001"
)
census_tract <- get_acs(geography = "tract", state = southwest_states,
output = "wide", geometry = TRUE, year = 2020,
variables = census_var)
## Getting data from the 2016-2020 5-year ACS
## Downloading feature geometry from the Census website. To cache shapefiles for use in future sessions, set `options(tigris_use_cache = TRUE)`.
## Fetching tract data by state and combining the result.
census_tract
# GEOID and Unique.ID are the values that are important to make sure they're distinct
census_tract_clean <- census_tract %>% distinct(.)
census_tract[!duplicated(census_tract$GEOID),]
FE_responses_clean <- FE_responses %>% distinct(.)
FE_responses[!duplicated(FE_responses$Unique.ID),]
# Can't have NA ages because we're looking at teenager data
FE_responses_clean <- FE_responses_clean %>%
filter(!is.na(Age)) %>%
mutate(Age = as.numeric(Age))
## Warning: There was 1 warning in `mutate()`.
## ℹ In argument: `Age = as.numeric(Age)`.
## Caused by warning:
## ! NAs introduced by coercion
skim(FE_responses_clean)
| Name | FE_responses_clean |
| Number of rows | 31498 |
| Number of columns | 36 |
| _______________________ | |
| Column type frequency: | |
| character | 27 |
| logical | 1 |
| numeric | 8 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| Name | 0 | 1.00 | 4 | 82 | 0 | 29859 | 0 |
| Gender | 0 | 1.00 | 0 | 11 | 144 | 4 | 0 |
| Race | 0 | 1.00 | 0 | 57 | 1 | 12 | 0 |
| Race.with.imputations | 862 | 0.97 | 0 | 23 | 6 | 10 | 0 |
| Imputation.probability | 881 | 0.97 | 0 | 19 | 3 | 6614 | 0 |
| URL.of.image..PLS.NO.HOTLINKS. | 0 | 1.00 | 0 | 373 | 16773 | 14668 | 0 |
| Date.of.injury.resulting.in.death..month.day.year. | 0 | 1.00 | 10 | 10 | 0 | 7736 | 0 |
| Location.of.injury..address. | 0 | 1.00 | 0 | 74 | 556 | 28893 | 0 |
| Location.of.death..city. | 0 | 1.00 | 0 | 30 | 36 | 6340 | 0 |
| State | 0 | 1.00 | 0 | 2 | 1 | 52 | 0 |
| Location.of.death..county. | 0 | 1.00 | 0 | 33 | 15 | 1536 | 0 |
| Full.Address | 0 | 1.00 | 0 | 103 | 1 | 29709 | 0 |
| Latitude | 0 | 1.00 | 0 | 17 | 1 | 29515 | 0 |
| Agency.or.agencies.involved | 0 | 1.00 | 0 | 266 | 78 | 6829 | 0 |
| Highest.level.of.force | 0 | 1.00 | 0 | 33 | 4 | 19 | 0 |
| Name.Temporary | 0 | 1.00 | 0 | 58 | 25969 | 5284 | 0 |
| Armed.Unarmed | 0 | 1.00 | 0 | 19 | 14419 | 10 | 0 |
| Alleged.weapon | 0 | 1.00 | 0 | 35 | 14421 | 269 | 0 |
| Aggressive.physical.movement | 0 | 1.00 | 0 | 42 | 14418 | 32 | 0 |
| Fleeing.Not.fleeing | 0 | 1.00 | 0 | 42 | 14419 | 26 | 0 |
| Description.Temp | 0 | 1.00 | 0 | 2239 | 27431 | 3870 | 0 |
| URL.Temp | 0 | 1.00 | 0 | 723 | 28281 | 3066 | 0 |
| Brief.description | 0 | 1.00 | 0 | 2239 | 2 | 29883 | 0 |
| Dispositions.Exclusions.INTERNAL.USE..NOT.FOR.ANALYSIS | 0 | 1.00 | 0 | 89 | 3 | 156 | 0 |
| Intended.use.of.force..Developing. | 0 | 1.00 | 0 | 22 | 3 | 9 | 0 |
| Supporting.document.link | 0 | 1.00 | 0 | 438 | 2 | 29269 | 0 |
| Foreknowledge.of.mental.illness..INTERNAL.USE..NOT.FOR.ANALYSIS | 0 | 1.00 | 0 | 19 | 62 | 5 | 0 |
Variable type: logical
| skim_variable | n_missing | complete_rate | mean | count |
|---|---|---|---|---|
| X | 31498 | 0 | NaN | : |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| Unique.ID | 1 | 1.00 | 15749.00 | 9092.55 | 1.00 | 7875 | 15749.00 | 23623.00 | 31497.00 | ▇▇▇▇▇ |
| Age | 1223 | 0.96 | 35.28 | 13.83 | 0.08 | 25 | 33.00 | 44.00 | 107.00 | ▂▇▃▁▁ |
| Location.of.death..zip.code. | 182 | 0.99 | 58352.53 | 27966.03 | 1013.00 | 33147 | 60649.00 | 85033.00 | 99921.00 | ▃▇▃▆▇ |
| Longitude | 1 | 1.00 | -95.40 | 16.30 | -165.59 | -111 | -90.56 | -82.57 | -67.27 | ▁▁▅▇▇ |
| UID.Temporary | 25969 | 0.18 | 15464.08 | 6559.72 | 9759.00 | 11156 | 12549.00 | 19240.00 | 30340.00 | ▇▁▁▁▂ |
| X.1 | 31497 | 0.00 | 10895.00 | NA | 10895.00 | 10895 | 10895.00 | 10895.00 | 10895.00 | ▁▁▇▁▁ |
| Unique.ID.formula | 31496 | 0.00 | 29497.00 | 2828.43 | 27497.00 | 28497 | 29497.00 | 30497.00 | 31497.00 | ▇▁▁▁▇ |
| Unique.identifier..redundant. | 1 | 1.00 | 15749.00 | 9092.55 | 1.00 | 7875 | 15749.00 | 23623.00 | 31497.00 | ▇▇▇▇▇ |
Age no longer includes missing or empty values.
save(census_var, census_tract_clean, file = 'census_teen_data.RData')
save(FE_population, FE_responses_clean, FE_type, file = 'all_FE_data.RData')
To try and find correlations, let’s visualize the data sets.
Looking at the data, less than 25% of the deadly vehicle encounters with police involved teens.
When we look at the encounters separated by teen ages from 13 to 19, we can see that a majority of the incidents are represented by teens of the legal driving age, who happen to be eligible to work and might be responsible for their own commute to work or activities more than a 13-year-old would.
# Create a new column 'age_category' based on the 'Age' column
FE_responses_clean <- FE_responses_clean %>%
mutate(age_category = ifelse(Age >= 13 & Age <= 19, "teen", "not_teen"))
# Filter incident data
vehicle_data <- FE_responses_clean %>%
filter(Highest.level.of.force == "Vehicle")
# circle graph
pie_chart <- ggplot(data = vehicle_data) +
geom_bar(mapping = aes(x = "", fill = age_category), width = 1, stat = "count") +
coord_polar(theta = "y") +
labs(fill = "Age Category") +
scale_fill_discrete(name = "Age Category") +
theme_minimal() +
# ggdark::dark_theme_gray() +
ggtitle("Vehicle Incidents by Age Category (Total)") +
annotate("text", x = 1, y = 1, label = paste("Total Count:", nrow(vehicle_data)), vjust = 1.5, hjust = 0.5, size = 4)
# Bar chart
teen_vehicle_bar_chart <- ggplot(data = vehicle_data %>% filter(age_category == "teen")) +
geom_bar(mapping = aes(x = as.factor(Age), fill = as.factor(Age)), width = 0.7, stat = "count") +
labs(fill = "Age") +
scale_fill_discrete(name = "Age") +
theme_minimal() +
ggtitle("Teen Vehicle Incidents by Age") +
annotate("text", x = 1, y = 1, label = paste("Total Count:", sum(FE_responses_clean$age_category == "teen")), vjust = 1.5, hjust = 0.5, size = 4)
# Set up the layout for two plots side by side
par(mfrow = c(1, 2))
print(pie_chart)
print(teen_vehicle_bar_chart)
Most of the incidents from this criteria are vehicle pursuits, as opposed to pursuits on foot or other means the vehicle was abandonded during the pursuits. The next chart indicates that deadly force was not reported as an intention in most cases.
From these insights, we still cannot immediately correlate these incidents with joyriding and car theft encouraged by the TikTok challenges.
# both Pursuit and non-Pursuit
teen_vehicle_data <- FE_responses_clean %>%
filter(age_category == "teen" & Highest.level.of.force == "Vehicle")
# distribution of intended uses of force
comparison_pie_chart <- ggplot(data = teen_vehicle_data) +
geom_bar(mapping = aes(x = "", fill = factor(Intended.use.of.force..Developing.)), width = 1, stat = "count") +
coord_polar(theta = "y") +
labs(fill = "Intended Use of Force") +
scale_fill_discrete(name = "Intended Use of Force") +
theme_minimal() +
ggtitle("Teen Vehicle Incidents by Intended Use of Force") +
annotate("text", x = 1, y = 1, label = paste("Total Count:", nrow(teen_vehicle_data)), vjust = 1.5, hjust = 0.5, size = 4)
print(comparison_pie_chart)
As we can see, vehicles are highly represented, but this is still not as
common than the gunshots category for incidents involving teens and
police officers.
# Compare all teen death categories
teen_data <- FE_responses_clean %>%
filter(age_category == "teen")
bar_plot <- ggplot(data = teen_data) +
geom_bar(mapping = aes(x = Highest.level.of.force, fill = Highest.level.of.force), position = "stack") +
labs(fill = "Highest Level of Force") +
scale_fill_discrete(name = "Highest Level of Force") +
theme_minimal() +
# ggdark::dark_theme_gray() +
ggtitle("Comparison of Teen Incidents by Highest Level of Force") +
coord_flip() # switch x and y axes
print(bar_plot)
So, we cannot immediately correlate these incidents with joyriding and
car theft encouraged by the TikTok challenges. We may need to look at
trends by year to see if there has been a significant increase, then
we’d need to collect and compare recent data from possibly surveys that
aim to understand the intent behind teen participation in the car theft
challenges.