The exploration below is an analysis of race percentage in New York City on a census-tract level. Four main variables (race) are chosen: Hispanic or Latino, Black-alone (not Hispanic or Latino), Asian-alone (not Hispanic or Latino), and White-alone (not Hispanic or Latino).

Code

# Load packages
library(tidyverse)
library(tidycensus)
library(scales)
library(sf)
library(RColorBrewer)
library(knitr)
options(scipen = 999)


# Import spatial data
#   Import borough shapefiles from NYC Open Data
boros <- st_read("~/Documents/DUE/Fall 2022/Methods/methods1/main_data/raw/geo/Borough Boundaries.geojson")

#   Import Neighborhood Tabulation Areas for NYC
nabes <- st_read("~/Documents/DUE/Fall 2022/Methods/methods1/main_data/raw/geo/nynta2020.shp")


# List 2020 census variable
#   Create table of all variables in the 2020 redistricting file
pl_2020 <- load_variables(2020, "pl", cache = T)
#     Only data for redistricting is available so far
#       The 2020 Census Redistricting Data (P.L. 94-171) Summary Files
#       “pl” | cache = T -> T = True


# Import table of:
#   RACE
#     P1_001N = !!Total:
#     P2_002N = !!Total:!!Hispanic or Latino
#     P2_006N = !!Total:!!Not Hispanic or Latino:!!Population of one race:!!Black or African American alone
#     P2_008N = !!Total:!!Not Hispanic or Latino:!!Population of one race:!!Asian alone
#     P2_005N = !!Total:!!Not Hispanic or Latino:!!Population of one race:!!White alone

raw_race <- get_decennial(geography = "tract",
                           variables = c(`Total` = "P1_001N",
                                         `Hispanic or Latino` = "P2_002N",
                                         `Black-alone` = "P2_006N",
                                         `Asian-alone` = "P2_008N",
                                         `White-alone` = "P2_005N"),
                           state='NY',
                          county = c('New York', 'Kings', 'Bronx', 'Richmond',
                                     'Queens'),
                           geometry = T,
                           year = 2020,
                           output = "wide")


# Create a new dataframe for the percentage of U.S. and Non-U.S. Citizenship
race <- raw_race %>% 
  mutate(`Hispanic or Latino (%)` = `Hispanic or Latino`/`Total`,
         `Black-alone (%)` = `Black-alone`/`Total`,
         `Asian-alone (%)` = `Asian-alone`/`Total`,
         `White-alone (%)` = `White-alone`/`Total`)


# Process the N/As
#   Redefine the 26 N/As
#     Is currently not a number
#     Convert to actual N/As
race <- raw_race %>% 
  mutate(`Hispanic or Latino (%)` = `Hispanic or Latino`/`Total`,
         `Hispanic or Latino (%)` = ifelse(is.nan(`Hispanic or Latino (%)`),
                                                NA, `Hispanic or Latino (%)`)) %>%
  mutate(`Black-alone (%)` = `Black-alone`/`Total`,
         `Black-alone (%)` = ifelse(is.nan(`Black-alone (%)`),
                                           NA, `Black-alone (%)`)) %>%
  mutate(`Asian-alone (%)` = `Asian-alone`/`Total`,
         `Asian-alone (%)` = ifelse(is.nan(`Asian-alone (%)`),
                                           NA, `Asian-alone (%)`)) %>% 
  mutate(`White-alone (%)` = `White-alone`/`Total`,
         `White-alone (%)` = ifelse(is.nan(`White-alone (%)`),
                                    NA, `White-alone (%)`))


# Select census tracts
#   Check projection of census tract data
#     st_crs() -> To print spatial data frames projections in the console
st_crs(race)
#       EPSG (code for projection) = 4269

#   Check projection of NTA data
st_crs(nabes)
#     EPSG = 2263

#   Transform Projection
#     IF working with NYC data, you want the projection to be 2263
#       st_transform -> will change the projection of a data
race_2263 <-  st_transform(race, 2263)

#   Check projection again (to make sure it works)
st_crs(race_2263)
#     EPSG = 2263


# Select the fields from NTA to add to citizenship
#   Remove unnecessary fields in the neighbourhood shapefile
nabes_selected <- nabes %>%
  select(BoroName, BoroName, NTA2020, NTAName)


# Perform spatial join
race_nabes <- race_2263 %>%
  st_join(nabes_selected, 
          left = TRUE, # left -> defines it as left_join -- meaning all census tract are kept
          join = st_intersects, # join -> defines the join definition as "if they intersect"
          largest = TRUE) # largest -> if a census tract overlaps with more than one neighbourhood, name/join it is as the largest neighbourhood

Methods

Purpose

The purpose of this exploration is to study the percentage of Hispanic or Latino, African American (Black), Asian, and Caucasian (White) race in New York City.

Sources

  • Total Population: Total population is from the Decennial US Census, 2020
  • Hispanic or Latino Population: Hispanic or Latino population is from the Decennial US Census, 2020
  • Black-alone (not Hispanic or Latino) Population: Black-alone (not Hispanic or Latino) population is from the Decennial US Census, 2020
  • Asian-alone (not Hispanic or Latino) Population: Asian-alone (not Hispanic or Latino) population is from the Decennial US Census, 2020
  • White-alone (not Hispanic or Latino) Population: White-alone (not Hispanic or Latino) population is from the Decennial US Census, 2020

Methodology

The following variables are calculated:

  • Hispanic or Latino (%) = Hispanic or Latino/Total
  • Black-alone (%) = Black-alone/Total
  • Asian-alone (%) = Asian-alone/Total
  • White-alone (%) = White-alone/Total

The data is using simple calculation method, which can be further developed should a more detailed research is necessary. The findings will serve as a base insight of race population in New York City, which can be used to answer and/or justify socio-demographic-related questions, such as redlining.

Results

Summary Tables

By NTA

nyc_race <- st_drop_geometry(race_nabes) %>% 
  group_by(NTAName) %>% 
  summarise(Borough = first(BoroName),
            `Total` = sum(`Total`),
            `Total Hispanic or Latino` = sum(`Hispanic or Latino`),
            `Total Black-alone` = sum(`Black-alone`),
            `Total Asian-alone` = sum(`Asian-alone`),
            `Total White-alone` = sum(`White-alone`)) %>% 
  mutate(`Total Hispanic or Latino (%)` = percent(`Total Hispanic or Latino`/`Total`, accuracy = 1L),
         `Total Black-alone (%)` = percent(`Total Black-alone`/`Total`, accuracy = 1L),
         `Total Asian-alone (%)` = percent(`Total Asian-alone`/`Total`, accuracy = 1L),
         `Total White-alone (%)` = `Total White-alone`/`Total`) %>% 
  select(Borough, NTAName, Total, `Total Hispanic or Latino (%)`, `Total Black-alone (%)`,
         `Total Asian-alone (%)`, `Total White-alone (%)`)

By County

nyc_county_race <- st_drop_geometry(race_nabes) %>% 
  group_by(BoroName) %>% 
  summarise(Borough = first(BoroName),
            `Total` = sum(`Total`),
            `Total Hispanic or Latino` = sum(`Hispanic or Latino`),
            `Total Black-alone` = sum(`Black-alone`),
            `Total Asian-alone` = sum(`Asian-alone`),
            `Total White-alone` = sum(`White-alone`)) %>% 
    mutate(`Total Hispanic or Latino (%)` = `Total Hispanic or Latino`/`Total`,
           `Total Black-alone (%)` = `Total Black-alone`/`Total`,
           `Total Asian-alone (%)` = `Total Asian-alone`/`Total`,
           `Total White-alone (%)` = `Total White-alone`/`Total`) %>% 
  select(Borough, Total, `Total Hispanic or Latino (%)`, `Total Black-alone (%)`,
         `Total Asian-alone (%)`, `Total White-alone (%)`)

Maps

Hispanic or Latino (%)

Black-alone (%)

Asian-alone (%)

White-alone (%)

Based on the four maps above, we can see that the White-alone population dominates most New York City neighbourhoods. Furthermore, one map shows distinct concentrations of the Black-alone population in East Flatbush, Brooklyn and Jamaica, Queens. As for the Hispanic or Latino population, there is a massive percentage of their community in Corona, Queens. One stop away from Corona is Flushing, where a high rate of Asian population is documented in the census, along with a dense concentration in Sunset Park, Brooklyn.