The project explores the fatal encounters data in Hawaii, specifically in Honolulu County. Its aim is to analyze whether the compositions of ethnicities in a Tract has an effect on the number of fatal encounters in the given Tract.
Steps:
Explore the fatal encounters data through visualizations
Retrieve Census data for Tract boundaries and other variables, and calculate the ethnic composition for each Tract
Spatial join the fatal encounters data with the Census data to count the number of fatalities in each Tract
Run regression models comparing the compositions of ethnicities and the number of fatal encounters, among other variables
data <- read.csv("/Users/helenalindsay/Documents/Fall_23/CP8883/Major1/FATAL_ENCOUNTERS.csv")
data_HI <- data%>%
filter(State == 'HI')
HI_dropna <- data_HI %>%
drop_na(Longitude)%>%
drop_na(Latitude)
epsg_id <- 4326
# Converting into a sf object
HI_sf <- HI_dropna %>%
st_as_sf(coords=c("Longitude", "Latitude"), crs = epsg_id)
HI_sf <- HI_sf %>%
mutate(Aggressive.physical.movement = ifelse(Aggressive.physical.movement == "", NA, Aggressive.physical.movement))
tmap_mode("view")
map <- tm_shape(HI_sf[!is.na(HI_sf$Race),]) +
tm_dots('Race', shape = 2, alpha = 1, size = 0.05, col="Race",id="Race",popup.vars=c( "Race", "Age", "Aggressive.physical.movement", "Location.of.death..county.","Location.of.death..city."))
#+
# tm_shape(HI_sf[is.na(HI_sf$Race),]) +
# tm_dots('Race', shape = 5, alpha = 0.3, size = 0.02,col="Race",id="Race",popup.vars=c( "Race", #"Age", "Aggressive.physical.movement", "Location.of.death..county.","Location.of.death..city."))
bb <- c(-160.5, 18.5, -154.5, 22.5)
# Adjust the map's bounding box
map <- map + tm_view(bbox = bb)
# Plot the map
map
race_count <- HI_sf %>%
group_by(Race) %>%
summarise(count = n())%>%
st_drop_geometry()
ggplot(race_count, aes(x = count, y = Race)) +
geom_bar(stat = "identity")
kable(head(race_count))
| Race | count |
|---|---|
| African-American/Black | 2 |
| Asian/Pacific Islander | 66 |
| European-American/White | 23 |
| Hispanic/Latino | 6 |
| Race unspecified | 19 |
HI_tract <- suppressMessages(
get_acs(geography = "tract", # or "block group", "county", "state" etc.
state = "HI",
variables = c(hhincome = 'B19019_001',
race.tot = "B02001_001",
race.native.hawaiian = 'B01001E_001',
race.asian = 'B01001D_001',
race.white = "B02001_002",
race.black = "B02001_003",
trans.total = "B08006_001",
trans.car = "B08006_002",
trans.drovealone = "B08006_003",
trans.carpooled = "B08006_004",
trans.pubtrans = "B08006_008",
trans.bicycle = "B08006_014",
trans.walk = "B08006_015",
trans.WfH = "B08006_017",
med_housexp = "B25104_001",
med_realestate_taxes = "B25103_001"
),
year = 2019,
survey = "acs5", # American Community Survey 5-year estimate
geometry = TRUE, # returns sf objects
output = "wide") # wide vs. long
)
HI_tract <- HI_tract %>%
select(GEOID,
hhincome = hhincomeE, # New name = old name
race.tot = race.totE,
race.white = race.whiteE,
race.asian = race.asianE,
race.native.hawaiian = race.native.hawaiianE,
race.black = race.blackE,
trans.total = trans.totalE,
trans.car = trans.carE,
trans.drovealone = trans.drovealoneE,
trans.carpooled = trans.carpooledE,
trans.pubtrans = trans.pubtransE,
trans.bicycle = trans.bicycleE,
trans.walk = trans.walkE,
trans.WfH = trans.WfHE,
Med_HHExp = med_housexpE,
med_RETaxes = med_realestate_taxesE)
epsg_id <- 4326
HI_tract_sf <- HI_tract %>%
st_as_sf(coords=c("coordinates.longitude", "coordinates.latitude"), crs = epsg_id)
is_empty <- st_is_empty(HI_tract_sf[, "geometry"])
HI_tract_sf_filtered <- HI_tract_sf %>%
filter(!is_empty)
save(HI_tract_sf_filtered,file="/Users/helenalindsay/Documents/Fall_23/CP8883/Major1/tracts.RData")
load("/Users/helenalindsay/Documents/Fall_23/CP8883/Major1/tracts.RData")
HI_tract_sf_filtered$perc.asian <- round((HI_tract_sf_filtered$race.asian/HI_tract_sf_filtered$race.tot)*100,3)
HI_tract_sf_filtered$perc.hawaiian <- round((HI_tract_sf_filtered$race.native.hawaiian/HI_tract_sf_filtered$race.tot)*100,3)
HI_tract_sf_filtered$perc.white <- round((HI_tract_sf_filtered$race.white/HI_tract_sf_filtered$race.tot)*100,3)
HI_tract_sf_filtered$perc.asian.hawaiian <- round(((HI_tract_sf_filtered$race.native.hawaiian+HI_tract_sf_filtered$race.asian)/HI_tract_sf_filtered$race.tot)*100,3)
majority_counts <- as.data.frame(table(HI_tract_sf_filtered$majority))%>%
rename(Ethnicity = Var1)
save(majority_counts,file="/Users/helenalindsay/Documents/Fall_23/CP8883/Major1/majority.RData")
kable(head(majority_counts))
bb_honolulu <- c(-158.1950, 21.40, -157.560, 21.50)
honolulu <- HI_tract_sf_filtered %>%
filter(str_detect(GEOID, "^15003"))
HI_sf <- HI_sf%>%
filter(Location.of.death..county.== 'Honolulu')
load("/Users/helenalindsay/Documents/Fall_23/CP8883/Major1/majority.RData")
HI_tract_sf_filtered <- honolulu %>%
mutate(
majority = case_when(
perc.white >= perc.asian & perc.white >= perc.hawaiian ~ "White",
(perc.asian + perc.hawaiian) >= perc.white ~ "Asian/Hawaiian",
TRUE ~ "Other"
)
)
tmap_mode("view")
tm_basemap("Esri.WorldTopoMap")+tm_shape(HI_tract_sf_filtered) + tm_polygons(col='majority', alpha=0.7,palette = "Set2") +tm_shape(HI_sf[!is.na(HI_sf$Race),]) +
tm_dots('Race', palette = "Set1",shape = 2, alpha = 1, size = 0.05, col="Race",id="Race",popup.vars=c( "Race", "Age", "Aggressive.physical.movement", "Location.of.death..county.","Location.of.death..city.")) + tm_view(bbox = bb_honolulu)
# Assuming 'GEOID' is the column that contains the GEOID values
st_crs(honolulu) <- st_crs(HI_sf) <- st_crs("EPSG:4326")
# Calculate the intersection of geometries between 'honolulu' and 'HI_sf'
result_df <- data.frame()
# Iterate through each row in 'honolulu'
for (i in 1:nrow(HI_sf)) {
# Get the geometry for the current row in 'honolulu'
honolulu_row <- HI_sf[i, ]
# Calculate the intersection of the current geometry with 'honolulu'
intersection <- st_intersection(honolulu_row, honolulu)
# Add the result to the 'result_df'
result_df <- rbind(result_df, intersection)
}
save(result_df, file = "/Users/helenalindsay/Documents/Fall_23/CP8883/Major1/result_df.RData")
count <- result_df %>%
group_by(GEOID) %>%
summarise(count = n())%>%
st_drop_geometry()
merged_count <- left_join(HI_tract_sf_filtered, count, by = "GEOID")
merged_count <- merged_count%>%
st_as_sf()
st_crs(merged_count) <- st_crs("EPSG:4326")
save(merged_count,file = "/Users/helenalindsay/Documents/Fall_23/CP8883/Major1/merged.RData")
load("/Users/helenalindsay/Documents/Fall_23/CP8883/Major1/merged.RData")
tmap_mode("view")
tm_basemap("Esri.WorldTopoMap")+tm_shape(merged_count) + tm_polygons(col='count', alpha=0.7,palette = "Reds") +tm_shape(HI_sf[!is.na(HI_sf$Race),]) +
tm_dots('Race', palette = "Set1",shape = 2, alpha = 1, size = 0.05, col="Race",id="Race",popup.vars=c( "Race", "Age", "Aggressive.physical.movement", "Location.of.death..county.","Location.of.death..city.")) + tm_view(bbox = bb_honolulu)
count_white <- merged_count%>%
filter(majority=="White")%>%
st_drop_geometry()
count_ah <- merged_count%>%
filter(majority=="Asian/Hawaiian")%>%
st_drop_geometry()
# Replace NAs with 0 in data frame
count_white[is.na(count_white)] <- 0
count_ah[is.na(count_ah)] <- 0
t_test_result <- t.test(count_white$count, count_ah$count)
t_test_result
##
## Welch Two Sample t-test
##
## data: count_white$count and count_ah$count
## t = -1.7464, df = 172.44, p-value = 0.08252
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.30533304 0.01866638
## sample estimates:
## mean of x mean of y
## 0.2166667 0.3600000
From the Two-Sample t-test with a p-value of 0.08252, I found that the number of fatal encounters is statistically significantly different between tracts with different ethnic majorities with a 90% CI. In other words, tracts with a majority of Asian/Hawaiian population have a higher rate of fatal encounters.
load("/Users/helenalindsay/Documents/Fall_23/CP8883/Major1/result_df.RData")
custom_breaks <- c(0, 25, 50, 75, 100)
custom_labels <- c("0-25", "25-50", "50-75", "75-100")
result_df$asian.hawaiian.categories <- cut(result_df$perc.asian.hawaiian, breaks = custom_breaks, labels = custom_labels)
result_df$white.categories <- cut(result_df$perc.white, breaks = custom_breaks, labels = custom_labels)
category_counts <- result_df %>%
count(asian.hawaiian.categories)
category_counts$white.categories <- result_df %>%
count(white.categories)
result_df <- as.data.frame(result_df)
merged_df <- left_join(result_df, category_counts, by = "asian.hawaiian.categories")
merged_df <- merged_df %>%
mutate(Alleged.weapon = ifelse(Alleged.weapon == "", NA, Alleged.weapon))%>%
mutate(Fleeing.Not.fleeing = ifelse(Fleeing.Not.fleeing == "", NA, Fleeing.Not.fleeing))
graph1 <- ggplot(category_counts, aes(x = n, y = asian.hawaiian.categories)) +
geom_bar(stat = "identity") + labs(title = "Number of Deaths in Tracts with \nDifferent Percentage of Asian/Hawaiian Residents", x = "Number of deaths", y = "% Asian/Hawaiian")+theme(axis.text.x = element_text(angle = 45, hjust = 1),plot.title = element_text(size = 9))
graph2 <- ggplot(category_counts, aes(x = category_counts$white.categories$n, y = category_counts$white.categories$white.categories)) +
geom_bar(stat = "identity") + labs(title = "Number of Deaths in Tracts with \nDifferent Percentage of White Residents", x = "Number of deaths", y = "% White")+theme(axis.text.x = element_text(angle = 45, hjust = 1),plot.title = element_text(size = 9))
graph3 <- ggplot(merged_df, aes(x = white.categories.x, y = asian.hawaiian.categories)) +
geom_count() + labs(title = "Number of Deaths in Neighborhoods with \nDifferent Ethnic Compositions", x = "% White", y = "% Asian/Hawaiian")+theme(axis.text.x = element_text(angle = 45, hjust = 1))
grid.arrange(graph1, graph2, graph3, ncol = 2)
merged_df$combined <- paste(result_df$white.categories, result_df$asian.hawaiian.categories, sep = "&")
plot1 <- ggplot(merged_df, aes(x = combined, y = Race)) +
geom_count() + labs(title = "Race of Victim for Tracts with Different Percentage of Asian/Hawaiian Residents", x = "% White & % Asian/Hawaiian", y = "Race of victim")+theme(axis.text.x = element_text(angle = 45, hjust = 1),plot.title = element_text(size = 9))
plot1
#kable(head(merged_df))
GGally::ggpairs(merged_df[, c("hhincome","Med_HHExp", "perc.asian.hawaiian", "perc.white")])
After observing the relationships between independent variables, I omitted the perc.white variable from the model since it seemed to be highly correlated to the perc.asian.hawaiian variable.
model <- lm(n~Location.of.death..city.+hhincome+Med_HHExp+perc.asian.hawaiian +Armed.Unarmed,data=merged_df)
summary(model)
##
## Call:
## lm(formula = n ~ Location.of.death..city. + hhincome + Med_HHExp +
## perc.asian.hawaiian + Armed.Unarmed, data = merged_df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.526 -1.818 0.000 2.134 7.919
##
## Coefficients:
## Estimate Std. Error t value
## (Intercept) 4.534e+01 3.573e+00 12.689
## Location.of.death..city.Ewa Beach -5.223e+00 3.257e+00 -1.604
## Location.of.death..city.Haleiwa -1.513e+01 4.998e+00 -3.027
## Location.of.death..city.Honolulu -3.761e+00 2.227e+00 -1.689
## Location.of.death..city.Kailua -4.370e+01 5.137e+00 -8.506
## Location.of.death..city.Kaneohe -7.572e+00 3.308e+00 -2.289
## Location.of.death..city.Kapolei -8.460e+00 3.051e+00 -2.773
## Location.of.death..city.Laie -9.491e+00 4.771e+00 -1.989
## Location.of.death..city.Mililani -3.453e+00 3.661e+00 -0.943
## Location.of.death..city.Nanakuli 2.282e+00 4.709e+00 0.485
## Location.of.death..city.Pearl City -8.447e+00 3.444e+00 -2.453
## Location.of.death..city.Schofield Barracks -4.546e+01 5.117e+00 -8.885
## Location.of.death..city.Wahiawa -6.604e+00 3.630e+00 -1.820
## Location.of.death..city.Waianae -3.160e+00 3.077e+00 -1.027
## Location.of.death..city.Waiʻanae -2.834e+00 2.989e+00 -0.948
## Location.of.death..city.Waimanalo -5.046e+00 3.217e+00 -1.569
## Location.of.death..city.Waipahu -9.820e+00 2.829e+00 -3.471
## hhincome 3.186e-05 2.135e-05 1.492
## Med_HHExp 2.284e-03 6.407e-04 3.565
## perc.asian.hawaiian -3.555e-01 3.910e-02 -9.092
## Armed.UnarmedArmed 1.028e+00 1.473e+00 0.698
## Armed.UnarmedUnarmed 1.056e+00 1.363e+00 0.775
## Armed.UnarmedUncertain 2.196e+00 3.561e+00 0.617
## Pr(>|t|)
## (Intercept) < 2e-16 ***
## Location.of.death..city.Ewa Beach 0.115101
## Location.of.death..city.Haleiwa 0.003895 **
## Location.of.death..city.Honolulu 0.097434 .
## Location.of.death..city.Kailua 2.77e-11 ***
## Location.of.death..city.Kaneohe 0.026358 *
## Location.of.death..city.Kapolei 0.007785 **
## Location.of.death..city.Laie 0.052153 .
## Location.of.death..city.Mililani 0.350226
## Location.of.death..city.Nanakuli 0.630007
## Location.of.death..city.Pearl City 0.017710 *
## Location.of.death..city.Schofield Barracks 7.35e-12 ***
## Location.of.death..city.Wahiawa 0.074808 .
## Location.of.death..city.Waianae 0.309438
## Location.of.death..city.Waiʻanae 0.347635
## Location.of.death..city.Waimanalo 0.122979
## Location.of.death..city.Waipahu 0.001078 **
## hhincome 0.141947
## Med_HHExp 0.000812 ***
## perc.asian.hawaiian 3.57e-12 ***
## Armed.UnarmedArmed 0.488652
## Armed.UnarmedUnarmed 0.441885
## Armed.UnarmedUncertain 0.540346
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.111 on 50 degrees of freedom
## (3 observations deleted due to missingness)
## Multiple R-squared: 0.8198, Adjusted R-squared: 0.7406
## F-statistic: 10.34 on 22 and 50 DF, p-value: 6.767e-12
# Breusch-Pagan test
bptest(model)
##
## studentized Breusch-Pagan test
##
## data: model
## BP = 27.762, df = 22, p-value = 0.1837
Breusch-Pagan test can be used to test the presence of heteroskedasticity in residuals from a regression. In this model, we fail to reject the null hypothesis, thus the residuals are not heteroskedastic.
I noticed that a significant majority of individuals involved in fatal encounters were of Asian/Hawaiian descent, comprising 66 out of the total 116 observed cases in Hawaii. This represents 56.9% of all fatal encounters in the state.
From the Two-Sample t-test with a p-value of 0.003731, I found that the number of fatal encounters is statistically significantly different between tracts with different ethnic majorities. Specifically, tracts with a majority of Asian/Hawaiian population have a higher rate of fatal encounters than tracts with majority White residents. This was clear in the ethnic majority and fatal encounter count maps.
Upon examining the ethnic composition of various tracts, I discovered that the highest number of fatal encounters occurred in tracts where the population was characterized by 0-25% White residents and 50-75% Asian/Hawaiian residents. In these tracts, there were instances of victims from Hispanic/Latino, European-American/White, and Asian/Hawaiian backgrounds. However, it’s noteworthy that the majority of victims in these tracts were of Asian/Hawaiian descent, which aligns with the broader trend observed across all of Honolulu.
Lastly, through linear regression, I found that Med_HHExp (Median Household Expenditure) and perc.asian.hawaiian were statistically significant in determining the number of fatal encounters in Honolulu. In addition, the location of death (city) had statistically significant results as well, indicating cities such as Kailua, Schofield Barracks, Haleiwa, Kapolei, and Waipahu sees more fatal encounters than some other cities in Honolulu County. Some of the coefficients seem to be inverse of the relationship I had expected, which suggests that some further model corrections may be necessary.