This document outlines a first exploration on the usage of different means of transportation to work from the residents of Corona, Queens. The final goal of this analysis is to explore the accessibility and quality of Corona’s residents to public neighbourhood, as Corona has a notorious lack of accesibility and low quality of mobility in comparison to the rest of the the city.

For this initial analysis, we’re looking at the three main means of transportation to work: public, private, and bycicle. The output for this stage will be the visualization through maps of the percentage of use for each mean, divided by census tracts within the neighbourhood. Through the use of summary statistics, we will also observe the difference in these from the rest of the borough.

A first look at the data

In the first step of this analysis, we first installed all of the necessary libraries for our script to run. We then downloaded and examined all the variables available in the latest ACS from 2020, and from there chose a set or variables that we could use for the analysis

library(tidycensus)
library(tidyverse)
library(sf)
library(viridis)
library(scales)
acs201620 <- load_variables(2020, "acs5", cache = T)

After examining the variables in the ACS, we determined the variables that we would use for this exploration, and imported them for processing. Since our objective is the creation of maps for the specific area of corona, the import was set at the census tract level with specification of Queens as the county, and also downloaded the map specifications (see ‘geometry’) for the eventual map distribution.

raw_transport <- get_acs(geography = "tract",
                        variables = c(means_total = "B08301_001",
                                      means_private = "B08301_002",
                                      means_public = "B08301_010",
                                      means_bike = "B08301_018"),
                        state='NY',
                        county = 'Queens',
                        geometry = T,
                        year = 2020,
                        output = "wide")

For us to be able to see more clearly the means of transportation, we will first process the data and calculate the percentage of each mean of transportation: public, private, and bycicle.

transport <- raw_transport %>% 
  mutate(pct_private = means_privateE/means_totalE,
         pct_public = means_publicE/means_totalE,
         pct_bike = means_bikeE/means_totalE)

When looking at our newly processed dataset, we can see a number of rows that have a total 0 or an NaN value. These values should be numeric, but the NaN makes R assume that these values are a character.For us to be able to process these values correctly, we need to convert them to its numerical value.

Looking at the NAs before processing is important to understand why and where they are, and the impact they have on our data. In this case, the NA’s are actually NaNs, because the denominator (population) is 0.

na_tracts <- transport %>% 
  filter(is.na(pct_private)) %>% 
  filter(is.na(pct_public)) %>% 
  filter(is.na(pct_bike))

We will convert these values to numeric NAs, and once we map them we, can determine if we should remove them entirely.

transport <- raw_transport %>% 
  mutate(pct_private = means_privateE/means_totalE,
         pct_private = ifelse(is.nan(pct_private), NA, pct_private)) %>% 
  mutate(pct_public = means_publicE/means_totalE,
         pct_public = ifelse(is.nan(pct_public), NA, pct_public)) %>% 
  mutate(pct_bike = means_bikeE/means_totalE,
         pct_bike = ifelse(is.nan(pct_bike), NA, pct_bike))

Mapping the data

Since the census does not include spatial data in their datasets, for us to be able to make a map, we need to import it from a separate source. In this case, I’m importing the data for the 2020 Neighborhood tabulation areas from NYC Planning, and the NYC Borough Boundaries from NYC Open Data.

See the sources for the spatial data below:

https://www.nyc.gov/site/planning/data-maps/open-data/census-download-metadata.page

https://data.cityofnewyork.us/City-Government/Borough-Boundaries/tqmj-j8zm

boros <- st_read("~/Desktop/methods1/main_data/raw/geo/BoroughBoundaries.geojson")

nabes <- st_read("~/Desktop/methods1/main_data/raw/geo/nynta2020_22b/nynta2020.shp")  

When making the map, we’ll add the Borough Boundaries and the Neighborhoods from the imported spatial data, and will filter to only show Queens. We’ll make one map for each means of transportation to work: private, public, and bicycle.

Percentage of people using private transportation as a means to commute to work:

ggplot()  + 
  geom_sf(data = transport, mapping = aes(fill = pct_private), 
          color = "#ffffff",
          lwd = 0) +
  theme_void() +
  scale_fill_distiller(breaks=c(0, .2, .4, .6, .8, 1),
                       direction = 1,
                       na.value = "transparent",
                       name="Percent of Private Transportation (%)",
                       labels=percent_format(accuracy = 1L)) +
  labs(
    title = "Queens, Private transportation by Census Tract",
    caption = "Source: American Community Survey, 2016-20"
  ) + 
  geom_sf(data = nabes %>% filter(BoroName == "Queens"), 
          color = "gray", fill = NA, lwd = 0.25) + 
  geom_sf(data = boros %>% filter(boro_name == "Queens"), 
          color = "black", fill = NA, lwd = .5)

Percentage of people using public transportation as a means to commute to work:

ggplot()  + 
  geom_sf(data = transport, mapping = aes(fill = pct_public), 
          color = "#ffffff",
          lwd = 0) +
  theme_void() +
  scale_fill_distiller(breaks=c(0, .2, .4, .6, .8, 1),
                       direction = 1,
                       na.value = "transparent",
                       name="Percent of Public Transportation (%)",
                       labels=percent_format(accuracy = 1L)) +
  labs(
    title = "Queens, Public transportation by Census Tract",
    caption = "Source: American Community Survey, 2016-20"
  ) + 
  geom_sf(data = nabes %>% filter(BoroName == "Queens"), 
          color = "gray", fill = NA, lwd = 0.25) + 
  geom_sf(data = boros %>% filter(boro_name == "Queens"), 
          color = "black", fill = NA, lwd = .5)

Percentage of people using bicycle as a means to commute to work:

ggplot()  + 
  geom_sf(data = transport, mapping = aes(fill = pct_bike), 
          color = "#ffffff",
          lwd = 0) +
  theme_void() +
  scale_fill_distiller(breaks=c(0, .2, .4, .6, .8, 1),
                       direction = 1,
                       na.value = "transparent",
                       name="Percent of Bicycle Transportation (%)",
                       labels=percent_format(accuracy = 1L)) +
  labs(
    title = "Queens, Bycicle transportation by Census Tract",
    caption = "Source: American Community Survey, 2016-20"
  ) + 
  geom_sf(data = nabes %>% filter(BoroName == "Queens"), 
          color = "gray", fill = NA, lwd = 0.25) + 
  geom_sf(data = boros %>% filter(boro_name == "Queens"), 
          color = "black", fill = NA, lwd = .5)

Understanding Corona

For the creation of a map zooming in Corona, we need to first identify what neighbourhood each census tract is in, as the map we have now is organized by census tracts. To identify them, we will use a spatial join.

The first step is making sure that both datasets are in the same projection, and make a conversion if they aren’t. In this case, the census tract data is under a different projection, se we’ll convert it to the appropriate one.

transport_2263 <- st_transform(transport, 2263)

After both of these are adjusted to the same projection, we’ll filter out the unnecessary fields in the neighbourhood shapefile, and make the spatial join between the dataframes afterwards.

nabes_selected <- nabes %>%
  select(BoroCode, BoroName, NTA2020, NTAName)
transport_nabes <- transport_2263 %>%
  st_join(nabes_selected, 
          left = TRUE,
          join = st_intersects,
          largest = TRUE)
## Warning: attribute variables are assumed to be spatially constant throughout all
## geometries

After the datasets are joined, we can proceed with making a specific choropleth of Corona. We’ll first make a dataset specifically filtering Corona, and will make the map with it afterwards. We’ll make one map for each means of transportation to work: private, public, and bicycle.

Percentage of people using private transportation as a means to commute to work:

corona <- transport_nabes %>% 
  filter(NTAName == "Corona" | NTAName == "North Corona")

ggplot()  + 
  geom_sf(data = corona, mapping = aes(fill = pct_private), 
          color = "#ffffff",
          lwd = 0) +
  theme_void() +
  scale_fill_distiller(breaks=c(0, .2, .4, .6, .8, 1),
                       direction = 1,
                       na.value = "transparent",
                       name="Percent Private Transportation (%)",
                       labels=percent_format(accuracy = 1L)) +
  labs(
    title = "Corona, Percentage of Private transportation by Census Tract",
    caption = "Source: American Community Survey, 2016-20"
  ) + 
  geom_sf(data = nabes %>% filter(NTAName == "Corona" | NTAName == "North Corona"), 
          color = "black", fill = NA, lwd = 0.5)

Percentage of people using public transportation as a means to commute to work:

corona <- transport_nabes %>% 
  filter(NTAName == "Corona" | NTAName == "North Corona")

ggplot()  + 
  geom_sf(data = corona, mapping = aes(fill = pct_public), 
          color = "#ffffff",
          lwd = 0) +
  theme_void() +
  scale_fill_distiller(breaks=c(0, .2, .4, .6, .8, 1),
                       direction = 1,
                       na.value = "transparent",
                       name="Percent Public Transportation (%)",
                       labels=percent_format(accuracy = 1L)) +
  labs(
    title = "Corona, Percentage of Public transportation by Census Tract",
    caption = "Source: American Community Survey, 2016-20"
  ) + 
  geom_sf(data = nabes %>% filter(NTAName == "Corona" | NTAName == "North Corona"), 
          color = "black", fill = NA, lwd = 0.5)

Percentage of people using bicycle as a means to commute to work:

corona <- transport_nabes %>% 
  filter(NTAName == "Corona" | NTAName == "North Corona")

ggplot()  + 
  geom_sf(data = corona, mapping = aes(fill = pct_bike), 
          color = "#ffffff",
          lwd = 0) +
  theme_void() +
  scale_fill_distiller(breaks=c(0, .2, .4, .6, .8, 1),
                       direction = 1,
                       na.value = "transparent",
                       name="Percent Bycicle Transportation (%)",
                       labels=percent_format(accuracy = 1L)) +
  labs(
    title = "Corona, Percentage of Bycicle transportation by Census Tract",
    caption = "Source: American Community Survey, 2016-20"
  ) + 
  geom_sf(data = nabes %>% filter(NTAName == "Corona" | NTAName == "North Corona"), 
          color = "black", fill = NA, lwd = 0.5)

After creating the maps, we’ll make a new dataset calculating summary statistics about Corona and its surrounding neighbourhoods.

corona_and_wider_stats <- st_drop_geometry(transport_nabes) %>% 
  group_by(NTAName) %>% 
  filter(NTAName == "Corona" | NTAName == "North Corona" | NTAName == "East Elmhurst" | NTAName == "Jackson Heights" | NTAName == "Elmhurst" | NTAName == "Corona") %>% 
  summarise(Borough = first(BoroName),
            total_pop = sum(means_totalE),
            total_means_private = sum(means_privateE),
            total_means_public = sum(means_publicE),
            total_means_bike = sum(means_bikeE)) %>% 
  mutate(est_pct_public = total_means_public/total_pop,
         est_pct_private = total_means_private/total_pop,
         est_pct_bike = total_means_bike/total_pop)

corona_and_wider_stats