The analysis is an exploration of U.S. citizenship in Corona, Queens,
as part of a bigger research of the socio-demographic conditions in the
neighbourhood. The analysis will be divided to 3 different categories:
U.S. citizenship, U.S. citizenship by naturalisation, and non-U.S.
citizenship. The result will also be compared to neighbouring districts
in Community Board 3 and 4.
Data Processing Preparation
To prepare for the analysis, we will load packages before processing
with our data exploration.
library(tidyverse)
library(tidycensus)
library(scales)
library(sf)
library(RColorBrewer)
library(knitr)
options(scipen = 999)
We will then import variables from American Community Survey
(ACS).
acs201620 <- load_variables(2020, "acs5", cache = T)
For the next step, we will import spatial data, which are the borough
shapefiles from NYC Open Data and Neighborhood Tabulation Areas for
NYC.
boros <- st_read("raw/geo/Borough Boundaries.geojson")
nabes <- st_read("raw/geo/nynta2020.shp")
Data Processing
For our exploration, we will look at the imported ACS variables and
choose the Nativity and Citizenship Status in the United States
(B05001) and create a new dataframe to process the
variables.
- B05001_001 = Estimate!!Total:
- B05001_002 = Estimate!!Total:!!U.S. citizen, born in the United
States
- B05001_005 = Estimate!!Total:!!U.S. citizen by naturalization
- B05001_006 = Estimate!!Total:!!Not a U.S. citizen
raw_citizenship <- get_acs(geography = "tract",
variables = c(`Total Population` = "B05001_001",
`U.S. Citizen` = "B05001_002",
`U.S. Citizen (Naturalisation)` = "B05001_005",
`Non-U.S. Citizen` = "B05001_006"),
state='NY',
county = 'Queens',
geometry = T,
year = 2020,
output = "wide")
We will then process the raw dataframe to process the percentage of
citizenship.
citizenship <- raw_citizenship %>%
mutate(`U.S. Citizenship Percentage` = `U.S. CitizenE`/`Total PopulationE`,
`U.S. Citizenship (Naturalisation) Percentage` = `U.S. Citizen (Naturalisation)E`/`Total PopulationE`,
`Non-U.S. Citizenship Percentage` = `Non-U.S. CitizenE`/`Total PopulationE`)
Notice that the table below shows NaN, which means
that the denominator is zero (the same as N/A). We want to process these
NaNs to actual numbers.
| 36081005000 |
Census Tract 50, Queens County, New York |
0 |
12 |
0 |
12 |
0 |
12 |
0 |
12 |
MULTIPOLYGON (((-73.85767 4… |
NaN |
NaN |
NaN |
| 36081006900 |
Census Tract 69, Queens County, New York |
3785 |
598 |
2492 |
513 |
740 |
228 |
465 |
194 |
MULTIPOLYGON (((-73.92511 4… |
0.6583884 |
0.1955086 |
0.1228534 |
| 36081007500 |
Census Tract 75, Queens County, New York |
3982 |
681 |
2430 |
509 |
1026 |
283 |
464 |
307 |
MULTIPOLYGON (((-73.92915 4… |
0.6102461 |
0.2576595 |
0.1165244 |
Processing N/As
We need to redefine the N/As to numbers. First, we will use
is.na() to look at the rows with N/A.
na_tracts <- citizenship %>%
filter(is.na(`U.S. Citizenship Percentage`)) %>%
filter(is.na(`U.S. Citizenship (Naturalisation) Percentage`)) %>%
filter(is.na(`Non-U.S. Citizenship Percentage`))
We will now convert the NaNs or N/As to numbers with the code chunk
below.
citizenship <- raw_citizenship %>%
mutate(`U.S. Citizenship Percentage` = `U.S. CitizenE`/`Total PopulationE`,
`U.S. Citizenship Percentage` = ifelse(is.nan(`U.S. Citizenship Percentage`),
NA, `U.S. Citizenship Percentage`)) %>%
mutate(`Non-U.S. Citizenship Percentage` = `Non-U.S. CitizenE`/`Total PopulationE`,
`Non-U.S. Citizenship Percentage` = ifelse(is.nan(`Non-U.S. Citizenship Percentage`),
NA, `Non-U.S. Citizenship Percentage`)) %>%
mutate(`U.S. Citizenship (Naturalisation) Percentage` = `U.S. Citizen (Naturalisation)E`/`Total PopulationE`,
`U.S. Citizenship (Naturalisation) Percentage` = ifelse(is.nan(`U.S. Citizenship (Naturalisation) Percentage`),
NA, `U.S. Citizenship (Naturalisation) Percentage`))
| 36081005000 |
Census Tract 50, Queens County, New York |
0 |
12 |
0 |
12 |
0 |
12 |
0 |
12 |
MULTIPOLYGON (((-73.85767 4… |
NA |
NA |
NA |
| 36081006900 |
Census Tract 69, Queens County, New York |
3785 |
598 |
2492 |
513 |
740 |
228 |
465 |
194 |
MULTIPOLYGON (((-73.92511 4… |
0.6583884 |
0.1228534 |
0.1955086 |
| 36081007500 |
Census Tract 75, Queens County, New York |
3982 |
681 |
2430 |
509 |
1026 |
283 |
464 |
307 |
MULTIPOLYGON (((-73.92915 4… |
0.6102461 |
0.1165244 |
0.2576595 |
Selecting Census Tract and Spatial Join
For the next step, we will choose the census tract for Corona.
Before that, we have to transform the spatial projection to prepare for
the map.
We will check the projection for the census tract with
st_crs() to print spatial data frames projections in
the console, which will show that the EPSG (code for
projection) is 4269.
st_crs(citizenship)
Check projection for the NTA data, which shows that
the EPSG is 2263.
st_crs(nabes)
Whenever we are working with a New York City data, we want
the projection to be 2263, therefore we will change the
projection using st_transform.
citizenship_2263 <- st_transform(citizenship, 2263)
Check projection again to make sure it works and it will show that
the EPSG is now 2263.
st_crs(citizenship_2263)
We will select the fields from NTA to add to citizenship and remove
unnecessary fields in the neighbourhood shapefile.
nabes_selected <- nabes %>%
select(BoroName, BoroName, NTA2020, NTAName)
We can now perform spatial join!
citizenship_nabes <- citizenship_2263 %>%
st_join(nabes_selected,
left = TRUE, # left -> defines it as left_join -- meaning all census tract are kept
join = st_intersects, # join -> defines the join definition as "if they intersect"
largest = TRUE) # largest -> if a census tract overlaps with more than one neighbourhood, name/join it is as the largest neighbourhood
Selecting Corona
corona <- citizenship_nabes %>%
filter(NTAName == "Corona" | NTAName == "North Corona")
Output
Maps
For this step, we can process the maps for each categories.

The map above shows the percentage of U.S. citizenship in Corona (by
census tract). In general, the rate of U.S. citizenship in Corona is
below 50%, with a concentration of higher percentage in the lower border
and top-left corner of the neighbourhood.

The percentage of naturalised U.S. citizens in Corona has a low
average of 20%, with the highest rate of 32% in census tract area
401.

Contrasting to the maps above, there is a higher average of non-U.S.
citizenship in Corona with the highest percentage of 70%, especially in
the center part the neighbourhood. The lowest percentage is illustrated
on the bottom-left of the map, which is the LeFrak City Apartments.
Summary Statistics for Corona and Neighbouring Districts (CB 3 and
4)
As we have insights on the citizenships in Corona, we can look at the
rate in other neighbourhoods in Community Board 3 and 4 to further
examine the data.
qcb3and4_citizenship_nabes_stats <- st_drop_geometry(citizenship_nabes) %>%
group_by(NTAName) %>%
filter(NTAName == "Corona" | NTAName == "North Corona" | NTAName == "East Elmhurst" | NTAName == "Jackson Heights" | NTAName == "Elmhurst") %>%
summarise(Borough = first(BoroName),
`Est. Total Population` = sum(`Total PopulationE`),
`Est. Total U.S. Citizenship` = sum(`U.S. CitizenE`),
`Est. Total U.S. Citizenship (Naturalisation)` = sum(`U.S. Citizen (Naturalisation)E`),
`Est. Total Non-U.S. Citizenship` = sum(`Non-U.S. CitizenE`)) %>%
mutate(`Est. U.S. Citizenship Percentage` = percent(`Est. Total U.S. Citizenship`/`Est. Total Population`, accuracy = 1L),
`Est. U.S. Citizenship (Naturalisation) Percentage` = percent(`Est. Total U.S. Citizenship (Naturalisation)`/`Est. Total Population`, accuracy = 1L),
`Est. Non-U.S. Citizenship Percentage` = percent(`Est. Total Non-U.S. Citizenship`/`Est. Total Population`, accuracy = 1L))
| Corona |
Queens |
69333 |
27728 |
16108 |
24183 |
40% |
23% |
35% |
| East Elmhurst |
Queens |
24321 |
10537 |
7492 |
5806 |
43% |
31% |
24% |
| Elmhurst |
Queens |
98560 |
31225 |
31629 |
33757 |
32% |
32% |
34% |
| Jackson Heights |
Queens |
89628 |
36016 |
25695 |
25539 |
40% |
29% |
28% |
| North Corona |
Queens |
39263 |
13955 |
7552 |
17113 |
36% |
19% |
44% |
From the table above, we can see that Corona (North Corona
and Corona) has the highest percentage of non-U.S. citizenship compared
to other CB 3 and 4 neighbourhoods. However, we understand that
there is a discrete number of undocumented residents in Corona, hence it
can be assumed that the rate of non-U.S. citizens in Corona is higher
than the indicated number.
