Exploring race demographics in New York City

This document outlines a first exploration on the distribution of racial demographics in New York City. The goal of this document is to map the spatial distribution of this data in the territory of the city, explored on a census tract level.

Methodology

This analysis is designed for easy calculation to determine if further research is warranted. To estimate the percentage of each racial category in the city, we will use the following variables:

Total Population = P1_001N
Total Hispanic or latino population = P2_002N
Total Black-alone population (not Hispanic or Latino) = P2_006N
Total Asian-alone population (not Hispanic or Latino) = P2_008N
Total White-alone population (not Hispanic or Latino) = P2_005N

Sources:

All variables are taken from the Decennial US Census, 2020.
Spatial data is imported from:
- NYC Planning for the 2020 Neighborhood tabulation areas https://www.nyc.gov/site/planning/data-maps/open-data/census-download-metadata.page
- NYC Open Data for the NYC Borough Boundaries https://data.cityofnewyork.us/City-Government/Borough-Boundaries/tqmj-j8zm

We calculated the following variables for the analysis:

Percentage of hispanic or latine population = total hispanic population/total population
Percentage of white-alone population = total white population/total population
Percentage of asian-alone population = total asian population/total population
Percentage of black-alone population = total black population/total population

A simple, straight-forward sum function was also used for the calculation of summary statistics.

Data Processing

In the first step of this analysis, we first installed all of the necessary libraries for our script to run. We then downloaded and examined all the variables available in the 2020 Decennial, and and determined the corresponding codes necessary for the analysis.

library(tidyverse)
library(tidycensus)
library(sf)
library(scales)
library(viridis)
options(scipen = 999)

pl_201620 <- load_variables(2020, "pl", cache = T)

After selecting the variables from the decenial, we imported them for processing. Since our objective is the creation of maps for the specific territory of New York City, the import was set at the census tract level with specification of New York as the state, and also downloaded the map specifications (see ‘geometry’) for the eventual map distribution.

raw_race_2020 = get_decennial(geography = "tract", 
                                 variables = c(total_pop = "P1_001N", 
                                               total_hisp = "P2_002N", 
                                               total_white = "P2_005N", 
                                               total_black = "P2_006N", 
                                               total_asian = "P2_008N"), 
                                 state = "36",
                              geometry = T, 
                                 year = 2020, 
                                 output = "wide")

For us to be able to see more clearly the racial distribution, we will first process the data and calculate the percentage of group: hispanic, black, asian and white.

When looking at our newly processed dataset, we can see a number of rows that have a total 0 or an NaN value. We will convert these values to numeric NAs so we can estimate the percentages correctly, and specify the code chunk to omit them from our calculations. We will also filter only the boroughs for New York City and divide the columns into census tracts, county and state for easier processing.

race_2020_tracs <- raw_race_2020 %>% 
  separate(NAME, into = c("tract", "county", "state"), sep = ", ") %>% 
  filter(county == "Kings County" | county == "Queens County" | county == "Bronx County" | county == "New York County" | county == "Richmond County") %>% 
  mutate(pct_white = round(total_white/total_pop, 3),
         pct_white = ifelse(is.nan(pct_white), NA, pct_white),
         pct_hisp = round(total_hisp/total_pop, 3),
         pct_hisp = ifelse(is.nan(pct_hisp), NA, pct_hisp),
         pct_black = round(total_black/total_pop, 3),
         pct_black = ifelse(is.nan(pct_black), NA, pct_black),
         pct_asian = round(total_asian/total_pop, 3),
         pct_asian = ifelse(is.nan(pct_asian), NA, pct_asian))

For the creation of the maps, we will also need to import the spatial data:

boros <- st_read("~/Desktop/methods1/main_data/raw/geo/BoroughBoundaries.geojson")

nabes <- st_read("~/Desktop/methods1/main_data/raw/geo/nynta2020_22b/nynta2020.shp")

The first step is making sure that both datasets are in the same projection, and make a conversion if they aren’t. In this case, the census tract data and Borough boundaries are under a different projection, so we’ll convert it to the appropriate one (2263).

race_2020_tracts_2263 <- st_transform(race_2020_tracs, 2263)

boros_2263 <- st_transform(boros, 2263)

After both of these are adjusted to the same projection, we’ll filter out the unnecessary fields in the neighbourhood shapefile, and make the spatial join between the dataframes afterwards.

nabes_selected <- nabes %>%
  select(BoroCode, BoroName, NTA2020, NTAName)

race_nabes <- race_2020_tracts_2263 %>%
  st_join(nabes_selected, 
          left = TRUE,
          join = st_intersects,
          largest = TRUE)

After the datasets are joined, we can proceed with making a plot of NYC, mapping each racial group per graph.

hisp_pop <- ggplot()  + 
  geom_sf(data = race_nabes, mapping = aes(fill = pct_hisp), 
          color = "#ffffff",
          lwd = 0) +
  theme_void() +
  scale_fill_distiller(breaks=c(0, .2, .4, .6, .8, 1),
                       direction = 1,
                       palette = "Greens",
                       na.value = "transparent",
                       name="Percent Hispanic Population (%)",
                       labels=percent_format(accuracy = 1L)) +
  labs(title = "NYC, Percentage of Hispanic Population by Census Tract",
  caption = "Source: Decennial Census 2020"
  ) + 
  geom_sf(data = boros_2263 %>% filter(boro_name == "Staten Island" | boro_name == "Bronx" | boro_name == "Manhattan" |
                                         boro_name == "Brooklyn" | boro_name == "Queens"), 
          color = "black", fill = NA, lwd = 0.5)

white_pop <- ggplot()  + 
  geom_sf(data = race_nabes, mapping = aes(fill = pct_white), 
          color = "#ffffff",
          lwd = 0) +
  theme_void() +
  scale_fill_distiller(breaks=c(0, .2, .4, .6, .8, 1),
                       direction = 1,
                       palette = "Blues",
                       na.value = "transparent",
                       name="Percent White Population (%)",
                       labels=percent_format(accuracy = 1L)) +
  labs(title = "NYC, Percentage of White Population by Census Tract",
       caption = "Source: Decennial Census 2020"
  ) + 
  geom_sf(data = boros_2263 %>% filter(boro_name == "Staten Island" | boro_name == "Bronx" | boro_name == "Manhattan" |
                                         boro_name == "Brooklyn" | boro_name == "Queens"), 
          color = "black", fill = NA, lwd = 0.5)

asian_pop <- ggplot()  + 
  geom_sf(data = race_nabes, mapping = aes(fill = pct_asian), 
          color = "#ffffff",
          lwd = 0) +
  theme_void() +
  scale_fill_distiller(breaks=c(0, .2, .4, .6, .8, 1),
                       direction = 1,
                       palette = "Reds",
                       na.value = "transparent",
                       name="Percent Asian Population (%)",
                       labels=percent_format(accuracy = 1L)) +
  labs(title = "NYC, Percentage of Asian Population by Census Tract",
       caption = "Source: Decennial Census 2020"
  ) + 
  geom_sf(data = boros_2263 %>% filter(boro_name == "Staten Island" | boro_name == "Bronx" | boro_name == "Manhattan" |
                                         boro_name == "Brooklyn" | boro_name == "Queens"), 
          color = "black", fill = NA, lwd = 0.5)

black_pop <- ggplot()  + 
  geom_sf(data = race_nabes, mapping = aes(fill = pct_black), 
          color = "#ffffff",
          lwd = 0) +
  theme_void() +
  scale_fill_distiller(breaks=c(0, .2, .4, .6, .8, 1),
                       direction = 1,
                       palette = "Purples",
                       na.value = "transparent",
                       name="Percent Black Population (%)",
                       labels=percent_format(accuracy = 1L)) +
  labs(title = "NYC, Percentage of Black Population by Census Tract",
       caption = "Source: Decennial Census 2020"
  ) + 
  geom_sf(data = boros_2263 %>% filter(boro_name == "Staten Island" | boro_name == "Bronx" | boro_name == "Manhattan" |
                                         boro_name == "Brooklyn" | boro_name == "Queens"), 
          color = "black", fill = NA, lwd = 0.5)

After creating the maps, we’ll make a new dataset calculating summary statistics about New York City and the racial distribution of its population(s).

race_stats <- st_drop_geometry(race_nabes) %>% 
  group_by(county) %>% 
  summarise(county = first(county),
            total_pop = sum(total_pop, na.rm = TRUE),
            total_white = sum(total_white, na.rm = TRUE),
            total_hisp = sum(total_hisp, na.rm = TRUE), 
            total_black = sum(total_black, na.rm = TRUE),
            total_asian = sum(total_asian, na.rm = TRUE)) %>% 
  mutate(pct_white = percent(total_white/total_pop, accuracy = 1L), 
         pct_hisp = percent(total_hisp/total_pop, accuracy = 1L),
         pct_black = percent (total_black/total_pop, accuracy = 1L),
         pct_asian = percent (total_asian/total_pop, accuracy = 1L))

The Results

After the processing the census data and mapping it in relationship to the territory of New York City, we can see from our summary statistics that different boroughs in the city have vastly different concentrations of people in relationship to race and ethnicity. For example, Bronx has a very low percentage of white population and very high percentage og hispanic and black population (at 55% and 28% correspondingly). On the opposite side, boroughs like Manhattan and Staten Island have a predominantly white population, with 47% and 56%. Boroughs like Brooklyn and Queens appear more balanced in contrast, although Queens has the higher percentage of asian population of all boroughs, with a total of 27%.

race_stats

It is also interesting to note the distributions of these groups within the boroughs themselves: in the case of the white population, there is a higher concentration in south Manhattan and the west side of Queens and Brooklyn, which are spatially closer to Manhattan. Areas further away from South Manhattan, including Bronx and West brooklyn and Queens, have a much lower concentration. Staten Island also has a very high concentration of white population, with the other groups being very low in comparison.

white_pop

This is further supported when you observe the distribution of black, hispanic and asian populations. The black population is visibly concentrated in spaces that almost seem like islands or ‘pockets’, along the eastern edge of Queens and Brooklyn closer to the Rockaways, and in the northern part of the Bronx. Meanwhile, the hispanic population is more concentrated in South Bronx/Upper Manhattan, as well as the surrounding Flushing Bay areas (Elmhurst, Jackson Heights, Corona). Finally, the asian population is more concentrated along Flushing/East Queens and South Brooklyn.

black_pop

hisp_pop

asian_pop