Context

During 2023 in the United States, there has been concern of a growing trend teenagers are engaging in on Tiktok that involves stealing cars for joy rides. It’ll take time to gather enough data about this Tiktok trend to understand the motivations behind it better, but perhaps we can use the Fatal Encounters database to observe trends of teenage vehicle incidents that resulted in death from a different angle.

A goal of this data synthesis is to look at the trend of teenage recklessness while driving, specifically police officer-involved vehicle pursuits resulting in death.

DISCLAIMER The project described in this paper relies on data from survey(s) administered by the National Officer-Involved Homicide Database, which is maintained by the Center for Economic and Social Research (CESR) at the University of Southern California. The content of this paper is solely the responsibility of the authors and does not necessarily represent the official views of USC or NOIHD.

Setup

install.packages("haven")

## Installing package into '/usr/local/lib/R/site-library'
## (as 'lib' is unspecified)

install.packages("skimr")

## Installing package into '/usr/local/lib/R/site-library'
## (as 'lib' is unspecified)

Several packages are needed for the visualizations, but it’s not necessary to show the full list here.

The data set is from Fatal Encounters.

load('census_teen_data.RData') # for census tract
load('all_FE_data.RData') # combined Fatal Encounters data sets

The following code chunk only needed one time to load into the global environment; the data will be later to load it locally each time.

# Exported one .csv tab at a time, since the columns do not match
FE_responses <- read.csv('formResponses_FatalEncounters.csv')

FE_population <- read.csv('population_FatalEncounters.csv')

FE_type <- read.csv('threatassess_FatalEncounters.csv')

I can use the following code chunk to check a little bit of my data before doing analysis, but it’s long so I’ll comment it out. I’ll show this data tidied and with more context below on the page.

skim(FE_responses)

Data summary
Name	FE_responses
Number of rows	31498
Number of columns	36
_______________________
Column type frequency:
character	28
logical	1
numeric	7
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	empty	n_unique
Name	0	1.00	4	82	0	29859
Age	0	1.00	0	5	1221	112
Gender	0	1.00	0	11	144	4
Race	0	1.00	0	57	1	12
Race.with.imputations	862	0.97	0	23	6	10
Imputation.probability	881	0.97	0	19	3	6614
URL.of.image..PLS.NO.HOTLINKS.	0	1.00	0	373	16773	14668
Date.of.injury.resulting.in.death..month.day.year.	0	1.00	10	10	0	7736
Location.of.injury..address.	0	1.00	0	74	556	28893
Location.of.death..city.	0	1.00	0	30	36	6340
State	0	1.00	0	2	1	52
Location.of.death..county.	0	1.00	0	33	15	1536
Full.Address	0	1.00	0	103	1	29709
Latitude	0	1.00	0	17	1	29515
Agency.or.agencies.involved	0	1.00	0	266	78	6829
Highest.level.of.force	0	1.00	0	33	4	19
Name.Temporary	0	1.00	0	58	25969	5284
Armed.Unarmed	0	1.00	0	19	14419	10
Alleged.weapon	0	1.00	0	35	14421	269
Aggressive.physical.movement	0	1.00	0	42	14418	32
Fleeing.Not.fleeing	0	1.00	0	42	14419	26
Description.Temp	0	1.00	0	2239	27431	3870
URL.Temp	0	1.00	0	723	28281	3066
Brief.description	0	1.00	0	2239	2	29883
Dispositions.Exclusions.INTERNAL.USE..NOT.FOR.ANALYSIS	0	1.00	0	89	3	156
Intended.use.of.force..Developing.	0	1.00	0	22	3	9
Supporting.document.link	0	1.00	0	438	2	29269
Foreknowledge.of.mental.illness..INTERNAL.USE..NOT.FOR.ANALYSIS	0	1.00	0	19	62	5

Variable type: logical

skim_variable	n_missing	complete_rate	mean	count
X	31498	0	NaN	:

Variable type: numeric

skim_variable	n_missing	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
Unique.ID	1	1.00	15749.00	9092.55	1.00	7875	15749.00	23623.00	31497.00	▇▇▇▇▇
Location.of.death..zip.code.	182	0.99	58352.53	27966.03	1013.00	33147	60649.00	85033.00	99921.00	▃▇▃▆▇
Longitude	1	1.00	-95.40	16.30	-165.59	-111	-90.56	-82.57	-67.27	▁▁▅▇▇
UID.Temporary	25969	0.18	15464.08	6559.72	9759.00	11156	12549.00	19240.00	30340.00	▇▁▁▁▂
X.1	31497	0.00	10895.00	NA	10895.00	10895	10895.00	10895.00	10895.00	▁▁▇▁▁
Unique.ID.formula	31496	0.00	29497.00	2828.43	27497.00	28497	29497.00	30497.00	31497.00	▇▁▁▁▇
Unique.identifier..redundant.	1	1.00	15749.00	9092.55	1.00	7875	15749.00	23623.00	31497.00	▇▇▇▇▇

This is an easier way for me to load census variable codes locally, so I can easily pick variables that relate to the study.

# Using tidycensus library

# Search for available variables
variables <- load_variables(2019, "acs5", cache = TRUE)

# View the variable names and descriptions
head(variables)

Prepare data

Again, the primary tract for this study is of the southwest states of Arizona, New Mexico, Texas, and Oklahoma.

(The comments can be removed from other regions to focus on others or all of them, but there was too much data to reasonably load for the purpose of this project goal.)

This, along with the census variables I chose create the tract for if I need to map data points. Potentially relevant variables include vehicles per household, households where grandparents are the primary guardian, and population aged 12 to 17.

# Define states in the East and South regions
# southeast_states <- c("AR", "LA", "MS", "TN", "AL", "KY", "WV", "DC", "VA", "NC", "SC", "GA", "FL")
# 
# northeast_states <- c("NY", "PA", "NJ", "MA", "CT", "RI", "VT", "NH", "ME", "MD", "DE")
# 
# west_states <- c("WA", "OR", "CA", "HI", "AK", "ID", "NV", "MT", "WY", "UT", "CO")
# 
# midwest_states <- c("ND", "SD", "NE", "KS", "MN", "IA", "MO", "WI", "IL", "MI", "IN", "OH")

southwest_states <- c("AZ", "NM", "TX", "OK")

# territories_us <- c("PR", "GU", "AS", "VI", "MP") #Puerto Rico, Guam, American Samoa, U.S. Virgin Islands, and Northern Mariana Islands

## In case I want to include all states and territories
# us_states_and_territories <- c(southeast_states, northeast_states, west_states, midwest_states, southwest_states, territories_us)


census_var <- c(hhincome = 'B19019_001',
                foodstamps.status = "B22003_001",
                workclass.from16 = "B24081_001", 
                incometopoverty.under18 = 'B05010_001', 
                geomobility = "B07001_001",
                modecommutework.16to19 = "B08101_002",
                modecommutework.poverty = "B08122_001",
                vehicles.household = "B08201_001",
                population.12to14 = "B09001_008",
                population.15to17 = "B09001_009",
                householdtype.under18 = "B09005_001",
                grandhguardian = "B10050_001", # grandparent guardian
                grandemployment = "B10058_001", # ..by employment
                grade9to12.poverty = "B14006_018",
                notenrolled.school = "B14007_019",
                allocenrolled.school = "B99141_001"
                )

census_tract <- get_acs(geography = "tract", state = southwest_states,
                 output = "wide", geometry = TRUE, year = 2020,
                 variables = census_var)

## Getting data from the 2016-2020 5-year ACS

## Downloading feature geometry from the Census website.  To cache shapefiles for use in future sessions, set `options(tigris_use_cache = TRUE)`.

## Fetching tract data by state and combining the result.

census_tract

Delete duplicate data

# GEOID and Unique.ID are the values that are important to make sure they're distinct
census_tract_clean <- census_tract %>% distinct(.)
census_tract[!duplicated(census_tract$GEOID),]

FE_responses_clean <- FE_responses %>% distinct(.)
FE_responses[!duplicated(FE_responses$Unique.ID),]

Drop empty elements

# Can't have NA ages because we're looking at teenager data
FE_responses_clean <- FE_responses_clean %>%
  filter(!is.na(Age)) %>%
  mutate(Age = as.numeric(Age))

## Warning: There was 1 warning in `mutate()`.
## ℹ In argument: `Age = as.numeric(Age)`.
## Caused by warning:
## ! NAs introduced by coercion

skim(FE_responses_clean)

Data summary
Name	FE_responses_clean
Number of rows	31498
Number of columns	36
_______________________
Column type frequency:
character	27
logical	1
numeric	8
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	empty	n_unique
Name	0	1.00	4	82	0	29859
Gender	0	1.00	0	11	144	4
Race	0	1.00	0	57	1	12
Race.with.imputations	862	0.97	0	23	6	10
Imputation.probability	881	0.97	0	19	3	6614
URL.of.image..PLS.NO.HOTLINKS.	0	1.00	0	373	16773	14668
Date.of.injury.resulting.in.death..month.day.year.	0	1.00	10	10	0	7736
Location.of.injury..address.	0	1.00	0	74	556	28893
Location.of.death..city.	0	1.00	0	30	36	6340
State	0	1.00	0	2	1	52
Location.of.death..county.	0	1.00	0	33	15	1536
Full.Address	0	1.00	0	103	1	29709
Latitude	0	1.00	0	17	1	29515
Agency.or.agencies.involved	0	1.00	0	266	78	6829
Highest.level.of.force	0	1.00	0	33	4	19
Name.Temporary	0	1.00	0	58	25969	5284
Armed.Unarmed	0	1.00	0	19	14419	10
Alleged.weapon	0	1.00	0	35	14421	269
Aggressive.physical.movement	0	1.00	0	42	14418	32
Fleeing.Not.fleeing	0	1.00	0	42	14419	26
Description.Temp	0	1.00	0	2239	27431	3870
URL.Temp	0	1.00	0	723	28281	3066
Brief.description	0	1.00	0	2239	2	29883
Dispositions.Exclusions.INTERNAL.USE..NOT.FOR.ANALYSIS	0	1.00	0	89	3	156
Intended.use.of.force..Developing.	0	1.00	0	22	3	9
Supporting.document.link	0	1.00	0	438	2	29269
Foreknowledge.of.mental.illness..INTERNAL.USE..NOT.FOR.ANALYSIS	0	1.00	0	19	62	5

Variable type: logical

skim_variable	n_missing	complete_rate	mean	count
X	31498	0	NaN	:

Variable type: numeric

skim_variable	n_missing	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
Unique.ID	1	1.00	15749.00	9092.55	1.00	7875	15749.00	23623.00	31497.00	▇▇▇▇▇
Age	1223	0.96	35.28	13.83	0.08	25	33.00	44.00	107.00	▂▇▃▁▁
Location.of.death..zip.code.	182	0.99	58352.53	27966.03	1013.00	33147	60649.00	85033.00	99921.00	▃▇▃▆▇
Longitude	1	1.00	-95.40	16.30	-165.59	-111	-90.56	-82.57	-67.27	▁▁▅▇▇
UID.Temporary	25969	0.18	15464.08	6559.72	9759.00	11156	12549.00	19240.00	30340.00	▇▁▁▁▂
X.1	31497	0.00	10895.00	NA	10895.00	10895	10895.00	10895.00	10895.00	▁▁▇▁▁
Unique.ID.formula	31496	0.00	29497.00	2828.43	27497.00	28497	29497.00	30497.00	31497.00	▇▁▁▁▇
Unique.identifier..redundant.	1	1.00	15749.00	9092.55	1.00	7875	15749.00	23623.00	31497.00	▇▇▇▇▇

Age no longer includes missing or empty values.

Save data

save(census_var, census_tract_clean, file = 'census_teen_data.RData')

save(FE_population, FE_responses_clean, FE_type, file = 'all_FE_data.RData')

Visualize

To try and find correlations, let’s visualize the data sets.

Involving teenagers and vehicles

Looking at the data, less than 25% of the deadly vehicle encounters with police involved teens.

When we look at the encounters separated by teen ages from 13 to 19, we can see that a majority of the incidents are represented by teens of the legal driving age, who happen to be eligible to work and might be responsible for their own commute to work or activities more than a 13-year-old would.

# Create a new column 'age_category' based on the 'Age' column
FE_responses_clean <- FE_responses_clean %>%
  mutate(age_category = ifelse(Age >= 13 & Age <= 19, "teen", "not_teen"))

# Filter incident data
vehicle_data <- FE_responses_clean %>%
  filter(Highest.level.of.force == "Vehicle")

# circle graph
pie_chart <- ggplot(data = vehicle_data) +
  geom_bar(mapping = aes(x = "", fill = age_category), width = 1, stat = "count") +
  coord_polar(theta = "y") +
  labs(fill = "Age Category") +
  scale_fill_discrete(name = "Age Category") +
  theme_minimal() +
  # ggdark::dark_theme_gray() +
  ggtitle("Vehicle Incidents by Age Category (Total)") +
  annotate("text", x = 1, y = 1, label = paste("Total Count:", nrow(vehicle_data)), vjust = 1.5, hjust = 0.5, size = 4)

# Bar chart
teen_vehicle_bar_chart <- ggplot(data = vehicle_data %>% filter(age_category == "teen")) +
  geom_bar(mapping = aes(x = as.factor(Age), fill = as.factor(Age)), width = 0.7, stat = "count") +
  labs(fill = "Age") +
  scale_fill_discrete(name = "Age") +
  theme_minimal() + 
  ggtitle("Teen Vehicle Incidents by Age") +
  annotate("text", x = 1, y = 1, label = paste("Total Count:", sum(FE_responses_clean$age_category == "teen")), vjust = 1.5, hjust = 0.5, size = 4)


# Set up the layout for two plots side by side
par(mfrow = c(1, 2))


print(pie_chart)

print(teen_vehicle_bar_chart)

Most of the incidents from this criteria are vehicle pursuits, as opposed to pursuits on foot or other means the vehicle was abandonded during the pursuits. The next chart indicates that deadly force was not reported as an intention in most cases.

From these insights, we still cannot immediately correlate these incidents with joyriding and car theft encouraged by the TikTok challenges.

# both Pursuit and non-Pursuit
teen_vehicle_data <- FE_responses_clean %>%
  filter(age_category == "teen" & Highest.level.of.force == "Vehicle")

# distribution of intended uses of force
comparison_pie_chart <- ggplot(data = teen_vehicle_data) +
  geom_bar(mapping = aes(x = "", fill = factor(Intended.use.of.force..Developing.)), width = 1, stat = "count") +
  coord_polar(theta = "y") +
  labs(fill = "Intended Use of Force") +
  scale_fill_discrete(name = "Intended Use of Force") +
  theme_minimal() +
  ggtitle("Teen Vehicle Incidents by Intended Use of Force") +
  annotate("text", x = 1, y = 1, label = paste("Total Count:", nrow(teen_vehicle_data)), vjust = 1.5, hjust = 0.5, size = 4)

print(comparison_pie_chart)

As we can see, vehicles are highly represented, but this is still not as common than the gunshots category for incidents involving teens and police officers.

# Compare all teen death categories
teen_data <- FE_responses_clean %>%
  filter(age_category == "teen")


bar_plot <- ggplot(data = teen_data) +
  geom_bar(mapping = aes(x = Highest.level.of.force, fill = Highest.level.of.force), position = "stack") +
  labs(fill = "Highest Level of Force") +
  scale_fill_discrete(name = "Highest Level of Force") +
  theme_minimal() +
  # ggdark::dark_theme_gray() +
  ggtitle("Comparison of Teen Incidents by Highest Level of Force") +
  coord_flip() # switch x and y axes
  


print(bar_plot)

So, we cannot immediately correlate these incidents with joyriding and car theft encouraged by the TikTok challenges. We may need to look at trends by year to see if there has been a significant increase, then we’d need to collect and compare recent data from possibly surveys that aim to understand the intent behind teen participation in the car theft challenges.

Major Assignment 1

Jalisa Smith

2023-10-04