Major Assignment1- Fatal Encounters in Honolulu County, HI

Overview

The project explores the fatal encounters data in Hawaii, specifically in Honolulu County. Its aim is to analyze whether the compositions of ethnicities in a Tract has an effect on the number of fatal encounters in the given Tract.

Steps:

Explore the fatal encounters data through visualizations
Retrieve Census data for Tract boundaries and other variables, and calculate the ethnic composition for each Tract
Spatial join the fatal encounters data with the Census data to count the number of fatalities in each Tract
Run regression models comparing the compositions of ethnicities and the number of fatal encounters, among other variables

Visualization

Map the Fatal Encounters Data

data <- read.csv("/Users/helenalindsay/Documents/Fall_23/CP8883/Major1/FATAL_ENCOUNTERS.csv")
data_HI <- data%>%
  filter(State == 'HI')

HI_dropna <- data_HI %>% 
  drop_na(Longitude)%>%
  drop_na(Latitude)

epsg_id <- 4326
# Converting into a sf object
HI_sf <- HI_dropna %>% 
  st_as_sf(coords=c("Longitude", "Latitude"), crs = epsg_id)

HI_sf <- HI_sf %>%
  mutate(Aggressive.physical.movement = ifelse(Aggressive.physical.movement == "", NA, Aggressive.physical.movement))

tmap_mode("view")
map <- tm_shape(HI_sf[!is.na(HI_sf$Race),]) +
  tm_dots('Race', shape = 2, alpha = 1, size = 0.05, col="Race",id="Race",popup.vars=c( "Race", "Age", "Aggressive.physical.movement", "Location.of.death..county.","Location.of.death..city.")) 
#+
#  tm_shape(HI_sf[is.na(HI_sf$Race),]) + 
#  tm_dots('Race', shape = 5, alpha = 0.3, size = 0.02,col="Race",id="Race",popup.vars=c( "Race", #"Age", "Aggressive.physical.movement", "Location.of.death..county.","Location.of.death..city.")) 

bb <- c(-160.5, 18.5, -154.5, 22.5)  
# Adjust the map's bounding box
map <- map + tm_view(bbox = bb)
# Plot the map
map

Analyze the Ethnicities of the Victims of Fatal Encounters

race_count <- HI_sf %>%
  group_by(Race) %>%
  summarise(count = n())%>%
  st_drop_geometry()

ggplot(race_count, aes(x = count, y = Race)) +
  geom_bar(stat = "identity")

kable(head(race_count))

Race	count
African-American/Black	2
Asian/Pacific Islander	66
European-American/White	23
Hispanic/Latino	6
Race unspecified	19

Get the Census Data

HI_tract <- suppressMessages(
  get_acs(geography = "tract", # or "block group", "county", "state" etc.
          state = "HI",
          variables = c(hhincome = 'B19019_001',
                        race.tot = "B02001_001",
                        race.native.hawaiian = 'B01001E_001',
                        race.asian = 'B01001D_001',
                        race.white = "B02001_002",
                        race.black = "B02001_003",
                        trans.total = "B08006_001",
                        trans.car = "B08006_002",
                        trans.drovealone = "B08006_003",
                        trans.carpooled = "B08006_004", 
                        trans.pubtrans = "B08006_008", 
                        trans.bicycle = "B08006_014",
                        trans.walk = "B08006_015",
                        trans.WfH = "B08006_017",
                        med_housexp = "B25104_001",
                        med_realestate_taxes = "B25103_001"
          ),
          year = 2019,
          survey = "acs5", # American Community Survey 5-year estimate
          geometry = TRUE, # returns sf objects
          output = "wide") # wide vs. long
)

HI_tract <- HI_tract %>%
  select(GEOID,
         hhincome = hhincomeE, # New name = old name
         race.tot = race.totE,
         race.white = race.whiteE,
         race.asian = race.asianE,
         race.native.hawaiian = race.native.hawaiianE,
         race.black = race.blackE,
         trans.total = trans.totalE,
         trans.car = trans.carE,
         trans.drovealone = trans.drovealoneE,
         trans.carpooled = trans.carpooledE,
         trans.pubtrans = trans.pubtransE,
         trans.bicycle = trans.bicycleE,
         trans.walk = trans.walkE,
         trans.WfH = trans.WfHE,
         Med_HHExp = med_housexpE,
         med_RETaxes = med_realestate_taxesE)


epsg_id <- 4326
HI_tract_sf <- HI_tract %>% 
  st_as_sf(coords=c("coordinates.longitude", "coordinates.latitude"), crs = epsg_id)

is_empty <- st_is_empty(HI_tract_sf[, "geometry"])

HI_tract_sf_filtered <- HI_tract_sf %>%
  filter(!is_empty)

save(HI_tract_sf_filtered,file="/Users/helenalindsay/Documents/Fall_23/CP8883/Major1/tracts.RData")

Calculate the percentage of Asian and Native Hawaiians Relative to the Total Population of the Tract

load("/Users/helenalindsay/Documents/Fall_23/CP8883/Major1/tracts.RData")
HI_tract_sf_filtered$perc.asian <- round((HI_tract_sf_filtered$race.asian/HI_tract_sf_filtered$race.tot)*100,3)

HI_tract_sf_filtered$perc.hawaiian <- round((HI_tract_sf_filtered$race.native.hawaiian/HI_tract_sf_filtered$race.tot)*100,3)

HI_tract_sf_filtered$perc.white <- round((HI_tract_sf_filtered$race.white/HI_tract_sf_filtered$race.tot)*100,3)

HI_tract_sf_filtered$perc.asian.hawaiian <- round(((HI_tract_sf_filtered$race.native.hawaiian+HI_tract_sf_filtered$race.asian)/HI_tract_sf_filtered$race.tot)*100,3)

Calculate the Majority Ethnicity for Each Tract

majority_counts <- as.data.frame(table(HI_tract_sf_filtered$majority))%>%
  rename(Ethnicity = Var1)
save(majority_counts,file="/Users/helenalindsay/Documents/Fall_23/CP8883/Major1/majority.RData")
kable(head(majority_counts))

Focus on Honolulu and Visualize the Ethnic Majorities for Each Tract

bb_honolulu <- c(-158.1950, 21.40, -157.560, 21.50)
honolulu <- HI_tract_sf_filtered %>%
  filter(str_detect(GEOID, "^15003"))

HI_sf <- HI_sf%>%
  filter(Location.of.death..county.== 'Honolulu')

load("/Users/helenalindsay/Documents/Fall_23/CP8883/Major1/majority.RData")
HI_tract_sf_filtered <- honolulu %>%
  mutate(
    majority = case_when(
      perc.white >= perc.asian & perc.white >= perc.hawaiian ~ "White",
      (perc.asian + perc.hawaiian) >= perc.white ~ "Asian/Hawaiian",
      TRUE ~ "Other"
    )
  )


tmap_mode("view")
tm_basemap("Esri.WorldTopoMap")+tm_shape(HI_tract_sf_filtered) + tm_polygons(col='majority', alpha=0.7,palette = "Set2") +tm_shape(HI_sf[!is.na(HI_sf$Race),]) +
  tm_dots('Race', palette = "Set1",shape = 2, alpha = 1, size = 0.05, col="Race",id="Race",popup.vars=c( "Race", "Age", "Aggressive.physical.movement", "Location.of.death..county.","Location.of.death..city.")) + tm_view(bbox = bb_honolulu)

Merge the Census data to the Fatality data

# Assuming 'GEOID' is the column that contains the GEOID values

st_crs(honolulu) <- st_crs(HI_sf) <- st_crs("EPSG:4326")

# Calculate the intersection of geometries between 'honolulu' and 'HI_sf'
result_df <- data.frame()
# Iterate through each row in 'honolulu'
for (i in 1:nrow(HI_sf)) {
  # Get the geometry for the current row in 'honolulu'
  honolulu_row <- HI_sf[i, ]
  
  # Calculate the intersection of the current geometry with 'honolulu'
  intersection <- st_intersection(honolulu_row, honolulu)
  
  # Add the result to the 'result_df'
  result_df <- rbind(result_df, intersection)
}

save(result_df, file = "/Users/helenalindsay/Documents/Fall_23/CP8883/Major1/result_df.RData")

Count the Number of Fatal Encounters for Each Tract

count <- result_df %>%
  group_by(GEOID) %>%
  summarise(count = n())%>%
  st_drop_geometry()

merged_count <- left_join(HI_tract_sf_filtered, count, by = "GEOID")
merged_count <- merged_count%>%
  st_as_sf()
st_crs(merged_count) <- st_crs("EPSG:4326")
save(merged_count,file = "/Users/helenalindsay/Documents/Fall_23/CP8883/Major1/merged.RData")

load("/Users/helenalindsay/Documents/Fall_23/CP8883/Major1/merged.RData")
tmap_mode("view")
tm_basemap("Esri.WorldTopoMap")+tm_shape(merged_count) + tm_polygons(col='count', alpha=0.7,palette = "Reds") +tm_shape(HI_sf[!is.na(HI_sf$Race),]) +
  tm_dots('Race', palette = "Set1",shape = 2, alpha = 1, size = 0.05, col="Race",id="Race",popup.vars=c( "Race", "Age", "Aggressive.physical.movement", "Location.of.death..county.","Location.of.death..city.")) + tm_view(bbox = bb_honolulu)

Analyze Whether the Number of Fatal Encounters is Significantly Different Between Tracts with Majority White vs. Majority Asian/Hawaiian

count_white <- merged_count%>%
  filter(majority=="White")%>%
  st_drop_geometry()
count_ah <- merged_count%>%
  filter(majority=="Asian/Hawaiian")%>%
  st_drop_geometry()

# Replace NAs with 0 in data frame
count_white[is.na(count_white)] <- 0
count_ah[is.na(count_ah)] <- 0
t_test_result <- t.test(count_white$count, count_ah$count)
t_test_result

## 
##  Welch Two Sample t-test
## 
## data:  count_white$count and count_ah$count
## t = -1.7464, df = 172.44, p-value = 0.08252
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.30533304  0.01866638
## sample estimates:
## mean of x mean of y 
## 0.2166667 0.3600000

From the Two-Sample t-test with a p-value of 0.08252, I found that the number of fatal encounters is statistically significantly different between tracts with different ethnic majorities with a 90% CI. In other words, tracts with a majority of Asian/Hawaiian population have a higher rate of fatal encounters.

Categorize the Ethnicity Percentage for Data Visualization

load("/Users/helenalindsay/Documents/Fall_23/CP8883/Major1/result_df.RData")
custom_breaks <- c(0, 25, 50, 75, 100)
custom_labels <- c("0-25", "25-50", "50-75", "75-100")
result_df$asian.hawaiian.categories <- cut(result_df$perc.asian.hawaiian, breaks = custom_breaks, labels = custom_labels)
result_df$white.categories <- cut(result_df$perc.white, breaks = custom_breaks, labels = custom_labels)

category_counts <- result_df %>%
  count(asian.hawaiian.categories)

category_counts$white.categories <- result_df %>%
  count(white.categories)

result_df <- as.data.frame(result_df)
merged_df <- left_join(result_df, category_counts, by = "asian.hawaiian.categories")
merged_df <- merged_df %>%
  mutate(Alleged.weapon = ifelse(Alleged.weapon == "", NA, Alleged.weapon))%>%
  mutate(Fleeing.Not.fleeing = ifelse(Fleeing.Not.fleeing == "", NA, Fleeing.Not.fleeing))

graph1 <- ggplot(category_counts, aes(x = n, y = asian.hawaiian.categories)) +
  geom_bar(stat = "identity") + labs(title = "Number of Deaths in Tracts with \nDifferent Percentage of Asian/Hawaiian Residents", x = "Number of deaths", y = "% Asian/Hawaiian")+theme(axis.text.x = element_text(angle = 45, hjust = 1),plot.title = element_text(size = 9))
graph2 <- ggplot(category_counts, aes(x = category_counts$white.categories$n, y = category_counts$white.categories$white.categories)) +
  geom_bar(stat = "identity") + labs(title = "Number of Deaths in Tracts with \nDifferent Percentage of White Residents", x = "Number of deaths", y = "% White")+theme(axis.text.x = element_text(angle = 45, hjust = 1),plot.title = element_text(size = 9))

graph3 <- ggplot(merged_df, aes(x = white.categories.x, y = asian.hawaiian.categories)) +
  geom_count() + labs(title = "Number of Deaths in Neighborhoods with \nDifferent Ethnic Compositions", x = "% White", y = "% Asian/Hawaiian")+theme(axis.text.x = element_text(angle = 45, hjust = 1))

grid.arrange(graph1, graph2, graph3, ncol = 2)

Relationship Between Ethnic Composition of Tract and Race of Victim

merged_df$combined <- paste(result_df$white.categories, result_df$asian.hawaiian.categories, sep = "&")

plot1 <- ggplot(merged_df, aes(x = combined, y = Race)) +
  geom_count() + labs(title = "Race of Victim for Tracts with Different Percentage of Asian/Hawaiian Residents", x = "% White & % Asian/Hawaiian", y = "Race of victim")+theme(axis.text.x = element_text(angle = 45, hjust = 1),plot.title = element_text(size = 9))

plot1

Linear Regression

#kable(head(merged_df))
GGally::ggpairs(merged_df[, c("hhincome","Med_HHExp", "perc.asian.hawaiian", "perc.white")])

After observing the relationships between independent variables, I omitted the perc.white variable from the model since it seemed to be highly correlated to the perc.asian.hawaiian variable.

model <- lm(n~Location.of.death..city.+hhincome+Med_HHExp+perc.asian.hawaiian +Armed.Unarmed,data=merged_df)
summary(model)

## 
## Call:
## lm(formula = n ~ Location.of.death..city. + hhincome + Med_HHExp + 
##     perc.asian.hawaiian + Armed.Unarmed, data = merged_df)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -8.526 -1.818  0.000  2.134  7.919 
## 
## Coefficients:
##                                              Estimate Std. Error t value
## (Intercept)                                 4.534e+01  3.573e+00  12.689
## Location.of.death..city.Ewa Beach          -5.223e+00  3.257e+00  -1.604
## Location.of.death..city.Haleiwa            -1.513e+01  4.998e+00  -3.027
## Location.of.death..city.Honolulu           -3.761e+00  2.227e+00  -1.689
## Location.of.death..city.Kailua             -4.370e+01  5.137e+00  -8.506
## Location.of.death..city.Kaneohe            -7.572e+00  3.308e+00  -2.289
## Location.of.death..city.Kapolei            -8.460e+00  3.051e+00  -2.773
## Location.of.death..city.Laie               -9.491e+00  4.771e+00  -1.989
## Location.of.death..city.Mililani           -3.453e+00  3.661e+00  -0.943
## Location.of.death..city.Nanakuli            2.282e+00  4.709e+00   0.485
## Location.of.death..city.Pearl City         -8.447e+00  3.444e+00  -2.453
## Location.of.death..city.Schofield Barracks -4.546e+01  5.117e+00  -8.885
## Location.of.death..city.Wahiawa            -6.604e+00  3.630e+00  -1.820
## Location.of.death..city.Waianae            -3.160e+00  3.077e+00  -1.027
## Location.of.death..city.Waiʻanae           -2.834e+00  2.989e+00  -0.948
## Location.of.death..city.Waimanalo          -5.046e+00  3.217e+00  -1.569
## Location.of.death..city.Waipahu            -9.820e+00  2.829e+00  -3.471
## hhincome                                    3.186e-05  2.135e-05   1.492
## Med_HHExp                                   2.284e-03  6.407e-04   3.565
## perc.asian.hawaiian                        -3.555e-01  3.910e-02  -9.092
## Armed.UnarmedArmed                          1.028e+00  1.473e+00   0.698
## Armed.UnarmedUnarmed                        1.056e+00  1.363e+00   0.775
## Armed.UnarmedUncertain                      2.196e+00  3.561e+00   0.617
##                                            Pr(>|t|)    
## (Intercept)                                 < 2e-16 ***
## Location.of.death..city.Ewa Beach          0.115101    
## Location.of.death..city.Haleiwa            0.003895 ** 
## Location.of.death..city.Honolulu           0.097434 .  
## Location.of.death..city.Kailua             2.77e-11 ***
## Location.of.death..city.Kaneohe            0.026358 *  
## Location.of.death..city.Kapolei            0.007785 ** 
## Location.of.death..city.Laie               0.052153 .  
## Location.of.death..city.Mililani           0.350226    
## Location.of.death..city.Nanakuli           0.630007    
## Location.of.death..city.Pearl City         0.017710 *  
## Location.of.death..city.Schofield Barracks 7.35e-12 ***
## Location.of.death..city.Wahiawa            0.074808 .  
## Location.of.death..city.Waianae            0.309438    
## Location.of.death..city.Waiʻanae           0.347635    
## Location.of.death..city.Waimanalo          0.122979    
## Location.of.death..city.Waipahu            0.001078 ** 
## hhincome                                   0.141947    
## Med_HHExp                                  0.000812 ***
## perc.asian.hawaiian                        3.57e-12 ***
## Armed.UnarmedArmed                         0.488652    
## Armed.UnarmedUnarmed                       0.441885    
## Armed.UnarmedUncertain                     0.540346    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.111 on 50 degrees of freedom
##   (3 observations deleted due to missingness)
## Multiple R-squared:  0.8198, Adjusted R-squared:  0.7406 
## F-statistic: 10.34 on 22 and 50 DF,  p-value: 6.767e-12

# Breusch-Pagan test
bptest(model)

## 
##  studentized Breusch-Pagan test
## 
## data:  model
## BP = 27.762, df = 22, p-value = 0.1837

Breusch-Pagan test can be used to test the presence of heteroskedasticity in residuals from a regression. In this model, we fail to reject the null hypothesis, thus the residuals are not heteroskedastic.

Conclusion

I noticed that a significant majority of individuals involved in fatal encounters were of Asian/Hawaiian descent, comprising 66 out of the total 116 observed cases in Hawaii. This represents 56.9% of all fatal encounters in the state.

From the Two-Sample t-test with a p-value of 0.003731, I found that the number of fatal encounters is statistically significantly different between tracts with different ethnic majorities. Specifically, tracts with a majority of Asian/Hawaiian population have a higher rate of fatal encounters than tracts with majority White residents. This was clear in the ethnic majority and fatal encounter count maps.

Upon examining the ethnic composition of various tracts, I discovered that the highest number of fatal encounters occurred in tracts where the population was characterized by 0-25% White residents and 50-75% Asian/Hawaiian residents. In these tracts, there were instances of victims from Hispanic/Latino, European-American/White, and Asian/Hawaiian backgrounds. However, it’s noteworthy that the majority of victims in these tracts were of Asian/Hawaiian descent, which aligns with the broader trend observed across all of Honolulu.

Lastly, through linear regression, I found that Med_HHExp (Median Household Expenditure) and perc.asian.hawaiian were statistically significant in determining the number of fatal encounters in Honolulu. In addition, the location of death (city) had statistically significant results as well, indicating cities such as Kailua, Schofield Barracks, Haleiwa, Kapolei, and Waipahu sees more fatal encounters than some other cities in Honolulu County. Some of the coefficients seem to be inverse of the relationship I had expected, which suggests that some further model corrections may be necessary.