The analysis is an exploration of U.S. citizenship in Corona, Queens, as part of a bigger research of the socio-demographic conditions in the neighbourhood. The analysis will be divided to 3 different categories: U.S. citizenship, U.S. citizenship by naturalisation, and non-U.S. citizenship. The result will also be compared to neighbouring districts in Community Board 3 and 4.

Data Processing Preparation

To prepare for the analysis, we will load packages before processing with our data exploration.

library(tidyverse)
library(tidycensus)
library(scales)
library(sf)
library(RColorBrewer)
library(knitr)
options(scipen = 999)

We will then import variables from American Community Survey (ACS).

acs201620 <- load_variables(2020, "acs5", cache = T)

For the next step, we will import spatial data, which are the borough shapefiles from NYC Open Data and Neighborhood Tabulation Areas for NYC.

boros <- st_read("raw/geo/Borough Boundaries.geojson")

nabes <- st_read("raw/geo/nynta2020.shp")

Data Processing

For our exploration, we will look at the imported ACS variables and choose the Nativity and Citizenship Status in the United States (B05001) and create a new dataframe to process the variables.

raw_citizenship <- get_acs(geography = "tract",
                        variables = c(`Total Population` = "B05001_001",
                                      `U.S. Citizen` = "B05001_002",
                                      `U.S. Citizen (Naturalisation)` = "B05001_005",
                                      `Non-U.S. Citizen` = "B05001_006"),
                        state='NY',
                        county = 'Queens',
                        geometry = T,
                        year = 2020,
                        output = "wide")

We will then process the raw dataframe to process the percentage of citizenship.

citizenship <- raw_citizenship %>% 
  mutate(`U.S. Citizenship Percentage` = `U.S. CitizenE`/`Total PopulationE`,
         `U.S. Citizenship (Naturalisation) Percentage` = `U.S. Citizen (Naturalisation)E`/`Total PopulationE`,
         `Non-U.S. Citizenship Percentage` = `Non-U.S. CitizenE`/`Total PopulationE`)

Notice that the table below shows NaN, which means that the denominator is zero (the same as N/A). We want to process these NaNs to actual numbers.

GEOID NAME Total PopulationE Total PopulationM U.S. CitizenE U.S. CitizenM U.S. Citizen (Naturalisation)E U.S. Citizen (Naturalisation)M Non-U.S. CitizenE Non-U.S. CitizenM geometry U.S. Citizenship Percentage U.S. Citizenship (Naturalisation) Percentage Non-U.S. Citizenship Percentage
36081005000 Census Tract 50, Queens County, New York 0 12 0 12 0 12 0 12 MULTIPOLYGON (((-73.85767 4… NaN NaN NaN
36081006900 Census Tract 69, Queens County, New York 3785 598 2492 513 740 228 465 194 MULTIPOLYGON (((-73.92511 4… 0.6583884 0.1955086 0.1228534
36081007500 Census Tract 75, Queens County, New York 3982 681 2430 509 1026 283 464 307 MULTIPOLYGON (((-73.92915 4… 0.6102461 0.2576595 0.1165244

Processing N/As

We need to redefine the N/As to numbers. First, we will use is.na() to look at the rows with N/A.

na_tracts <- citizenship %>% 
  filter(is.na(`U.S. Citizenship Percentage`)) %>%
  filter(is.na(`U.S. Citizenship (Naturalisation) Percentage`)) %>% 
  filter(is.na(`Non-U.S. Citizenship Percentage`))

We will now convert the NaNs or N/As to numbers with the code chunk below.

citizenship <- raw_citizenship %>% 
  mutate(`U.S. Citizenship Percentage` = `U.S. CitizenE`/`Total PopulationE`,
         `U.S. Citizenship Percentage` = ifelse(is.nan(`U.S. Citizenship Percentage`),
                                                NA, `U.S. Citizenship Percentage`)) %>%
  mutate(`Non-U.S. Citizenship Percentage` = `Non-U.S. CitizenE`/`Total PopulationE`,
         `Non-U.S. Citizenship Percentage` = ifelse(is.nan(`Non-U.S. Citizenship Percentage`),
                                                NA, `Non-U.S. Citizenship Percentage`)) %>% 
  mutate(`U.S. Citizenship (Naturalisation) Percentage` = `U.S. Citizen (Naturalisation)E`/`Total PopulationE`,
         `U.S. Citizenship (Naturalisation) Percentage` = ifelse(is.nan(`U.S. Citizenship (Naturalisation) Percentage`),
                                                    NA, `U.S. Citizenship (Naturalisation) Percentage`))
GEOID NAME Total PopulationE Total PopulationM U.S. CitizenE U.S. CitizenM U.S. Citizen (Naturalisation)E U.S. Citizen (Naturalisation)M Non-U.S. CitizenE Non-U.S. CitizenM geometry U.S. Citizenship Percentage Non-U.S. Citizenship Percentage U.S. Citizenship (Naturalisation) Percentage
36081005000 Census Tract 50, Queens County, New York 0 12 0 12 0 12 0 12 MULTIPOLYGON (((-73.85767 4… NA NA NA
36081006900 Census Tract 69, Queens County, New York 3785 598 2492 513 740 228 465 194 MULTIPOLYGON (((-73.92511 4… 0.6583884 0.1228534 0.1955086
36081007500 Census Tract 75, Queens County, New York 3982 681 2430 509 1026 283 464 307 MULTIPOLYGON (((-73.92915 4… 0.6102461 0.1165244 0.2576595

Selecting Census Tract and Spatial Join

For the next step, we will choose the census tract for Corona.
Before that, we have to transform the spatial projection to prepare for the map.

We will check the projection for the census tract with st_crs() to print spatial data frames projections in the console, which will show that the EPSG (code for projection) is 4269.

st_crs(citizenship)

Check projection for the NTA data, which shows that the EPSG is 2263.

st_crs(nabes)

Whenever we are working with a New York City data, we want the projection to be 2263, therefore we will change the projection using st_transform.

citizenship_2263 <-  st_transform(citizenship, 2263)

Check projection again to make sure it works and it will show that the EPSG is now 2263.

st_crs(citizenship_2263)

We will select the fields from NTA to add to citizenship and remove unnecessary fields in the neighbourhood shapefile.

nabes_selected <- nabes %>%
  select(BoroName, BoroName, NTA2020, NTAName)

We can now perform spatial join!

citizenship_nabes <- citizenship_2263 %>%
  st_join(nabes_selected, 
          left = TRUE, # left -> defines it as left_join -- meaning all census tract are kept
          join = st_intersects, # join -> defines the join definition as "if they intersect"
          largest = TRUE) # largest -> if a census tract overlaps with more than one neighbourhood, name/join it is as the largest neighbourhood

Selecting Corona
corona <- citizenship_nabes %>% 
  filter(NTAName == "Corona" | NTAName == "North Corona")

Output

Maps

For this step, we can process the maps for each categories.

The map above shows the percentage of U.S. citizenship in Corona (by census tract). In general, the rate of U.S. citizenship in Corona is below 50%, with a concentration of higher percentage in the lower border and top-left corner of the neighbourhood.

The percentage of naturalised U.S. citizens in Corona has a low average of 20%, with the highest rate of 32% in census tract area 401.

Contrasting to the maps above, there is a higher average of non-U.S. citizenship in Corona with the highest percentage of 70%, especially in the center part the neighbourhood. The lowest percentage is illustrated on the bottom-left of the map, which is the LeFrak City Apartments.


Summary Statistics for Corona and Neighbouring Districts (CB 3 and 4)

As we have insights on the citizenships in Corona, we can look at the rate in other neighbourhoods in Community Board 3 and 4 to further examine the data.

qcb3and4_citizenship_nabes_stats <- st_drop_geometry(citizenship_nabes) %>% 
  group_by(NTAName) %>% 
  filter(NTAName == "Corona" | NTAName == "North Corona" | NTAName == "East Elmhurst" | NTAName == "Jackson Heights" | NTAName == "Elmhurst") %>% 
  summarise(Borough = first(BoroName),
            `Est. Total Population` = sum(`Total PopulationE`),
            `Est. Total U.S. Citizenship` = sum(`U.S. CitizenE`),
            `Est. Total U.S. Citizenship (Naturalisation)` = sum(`U.S. Citizen (Naturalisation)E`),
            `Est. Total Non-U.S. Citizenship` = sum(`Non-U.S. CitizenE`)) %>% 
  mutate(`Est. U.S. Citizenship Percentage` = percent(`Est. Total U.S. Citizenship`/`Est. Total Population`, accuracy = 1L),
          `Est. U.S. Citizenship (Naturalisation) Percentage` = percent(`Est. Total U.S. Citizenship (Naturalisation)`/`Est. Total Population`, accuracy = 1L),
          `Est. Non-U.S. Citizenship Percentage` = percent(`Est. Total Non-U.S. Citizenship`/`Est. Total Population`, accuracy = 1L))
NTAName Borough Est. Total Population Est. Total U.S. Citizenship Est. Total U.S. Citizenship (Naturalisation) Est. Total Non-U.S. Citizenship Est. U.S. Citizenship Percentage Est. U.S. Citizenship (Naturalisation) Percentage Est. Non-U.S. Citizenship Percentage
Corona Queens 69333 27728 16108 24183 40% 23% 35%
East Elmhurst Queens 24321 10537 7492 5806 43% 31% 24%
Elmhurst Queens 98560 31225 31629 33757 32% 32% 34%
Jackson Heights Queens 89628 36016 25695 25539 40% 29% 28%
North Corona Queens 39263 13955 7552 17113 36% 19% 44%

From the table above, we can see that Corona (North Corona and Corona) has the highest percentage of non-U.S. citizenship compared to other CB 3 and 4 neighbourhoods. However, we understand that there is a discrete number of undocumented residents in Corona, hence it can be assumed that the rate of non-U.S. citizens in Corona is higher than the indicated number.

