Homework 3: Analysis of DSNY Curbside Compost Districts

DSNY Rollouts

The NYC Department of Sanitation began the Curbside Composting program in 2013, as a pilot program. The “rollouts” would include the delivery of specially designed brown bins for food and yard waste, given to every residential home with 1-9 units within selected Community Districts. Any building with more than 9 units would have to enroll online and request the number of bins their building(s) require. Since the density of apartment buildings in all Manhattan and Central and South Bronx is so high, they are deemed “Enrollment Districts”. Rollouts do not occur there and therefore are excluded from this analysis. Following the delivery of the brown bins, each district would receive weekly pickup of food and yard waste along with their normal recycling schedule (referred to as “service”). alt text

The Pause

The program continued to expand in 2014 and 2015, with a large non-pilot expansion in 2017. In 2018, after one district received service, the expansion schedule was paused for a multitude of reasons and has remained paused to present day, causing widespread confusion about the continuation of the program amongst the public.

So…who got compost service? Who didn’t?

Outreach is conducted on a daily basis to districts with service. Since the program is relatively new to many people and is also voluntary, outreach and education is a necessity. A common question that gets asked is “why did all the”white" neighborhoods get service first?". Since only half of all districts were able to receive service before the expansion was paused, one might wonder which districts received service and which are still waiting. And whether explicit or implicit, do the priority of the two categories correlate with race or, by extension, income? The validity behind this statement is unknown, and if true, cannot be proven to be intentional. However, the analysis below will use some open source data to check the validity of this statement. alt text

The Data

First we will gather some data.

For the order of rollouts and to see which districts are currently receiving service, all data was pulled from Organics Collection Reports, available through nyc.gov

Next, for social and economic data including mutually exclusive race and median household income data, we will use NYC Planning Population FactFinder. All data from here is collected from 2013-2017 Census Data.

To avoid too much confusion with incredibly large data sets, I compiled only the necessary information needed into a csv file.

To get started, we first need to install all necessary packages.

# Packages
install.packages('tidyverse')

## Installing package into '/home/rstudio-user/R/x86_64-pc-linux-gnu-library/3.6'
## (as 'lib' is unspecified)

install.packages('janitor')

## Installing package into '/home/rstudio-user/R/x86_64-pc-linux-gnu-library/3.6'
## (as 'lib' is unspecified)

install.packages('lubridate')

## Installing package into '/home/rstudio-user/R/x86_64-pc-linux-gnu-library/3.6'
## (as 'lib' is unspecified)

install.packages('sf')

## Installing package into '/home/rstudio-user/R/x86_64-pc-linux-gnu-library/3.6'
## (as 'lib' is unspecified)

library(tidyverse) # (includes dplyr, ggplot2, and more)

## ── Attaching packages ───────────────────────────────────── tidyverse 1.2.1 ──

## ✔ ggplot2 3.2.1     ✔ purrr   0.3.2
## ✔ tibble  2.1.3     ✔ dplyr   0.8.3
## ✔ tidyr   1.0.0     ✔ stringr 1.4.0
## ✔ readr   1.3.1     ✔ forcats 0.4.0

## ── Conflicts ──────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

library(janitor) # load for clean_names()

## 
## Attaching package: 'janitor'

## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test

library(lubridate) # for working with date columns

## 
## Attaching package: 'lubridate'

## The following object is masked from 'package:base':
## 
##     date

library(sf) # for simple maps later

## Linking to GEOS 3.5.1, GDAL 2.2.2, PROJ 4.9.2

Then we load our data set, CDRace. This has racial identification and median household income for all districts, their service status, and when they received service.

CDRace <- read_csv('Income_and_Race_Data_1.csv') %>%janitor::clean_names()

## Parsed with column specification:
## cols(
##   `Community District` = col_character(),
##   Borough = col_character(),
##   District_Abbr = col_character(),
##   `Percent White` = col_double(),
##   `Percent Non-White` = col_double(),
##   `Median Income` = col_number(),
##   `Has Service (1 = Yes, 2 = No)` = col_double(),
##   `Service Received` = col_character()
## )

#Make sure that the columns were read correctly (a function I used in undergrad a lot)

sapply(CDRace, class)

##     community_district                borough          district_abbr 
##            "character"            "character"            "character" 
##          percent_white      percent_non_white          median_income 
##              "numeric"              "numeric"              "numeric" 
## has_service_1_yes_2_no       service_received 
##              "numeric"            "character"

Now we will add 3 columns to make data analysis a little bit easier to read.

A column to get the Community Districts into an abbreviated form. Ex: Brooklyn 2 -> BK02.
A column to change “has_service_1_yes_2_no” into “Service Status” with options “has service” and “no service” for each community district
A column that takes the year that the district received service and removes the season, since examining season for our question at hand would seem a bit excessive. Ex: Spring 2014 -> 2014

We will use the mutate function 3 times and pipe %>% them together into a new dataframe called CDRaceAbbr

# Mutate Community Districts into abbreviated AA## format 
CDRaceAbbr <- CDRace %>%  mutate(
  boro_abbrev = recode(
    borough, 
    "Manhattan" = "MN", 
    "Bronx" = "BX", 
    "Brooklyn" = "BK", 
    "Queens" = "Q", 
    "Staten Island" = "SI",
    .default = NA_character_
  ),
  cd_num = str_sub(community_district, 1, 2),
  CDNum = str_c(boro_abbrev, " ", cd_num)) %>% 
  
# Mutate 1's & 2's with Service Status   
 mutate(
    Service_status = recode(
      has_service_1_yes_2_no, 
      "1" = "Has Service", 
      "2" = "No Service", 
      .default = NA_character_
    )) %>% 
  
# Change season to just year received to make grouping easier  
mutate(
    Service_Year = recode(
      service_received, 
      "Spring 2013" = "2013", 
      "Spring 2014" = "2014", 
      "Spring 2017" = "2017",
      "Spring 2018" = "2018",
      "Summer 2017" = "2017",
      "Fall 2015" = "2015",
      "Fall 2017" = "2017",
      .default = NA_character_
    ),
    Service_Year = str_c(Service_Year))

Visuals

Now we are ready to run some preliminary graphs. First to get a sense for some quick answers to our main question, let’s look at the racial breakdown (%) for both districts with service and those without. For the sake of simplicity, racial breakdown will be categorized as majority (white - exclusively) and minority (non-white - exclusively).

ggplot(CDRaceAbbr, aes(CDNum,percent_white)) + 
  geom_point(color = "red")+
  facet_wrap(~ Service_status, nrow = 2)+
labs(title = "Community Districts With and Without Compost Service v. Race",
     x = "Community District", y = "Percent of Population that Identifies as White")+
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

Just by eyeballing, we can see that there may be some validity to the statement. For districts with a white population of 50% or more, it seems there are more districts above that threshold that have service compared to those who do not have service.

Income

Let’s see if we see the same trend when we look at median household income. To do so, we will make some income brackets to make the data easier to visualize by again using mutate and adding cut and breaks.

First, we will just look at median household income and white population to see the correalation there, since we cannot assume that a relationship exists.

CDRaceWithIncomeLevels <- mutate(
  CDRaceAbbr,
  IncomeLevel=cut(
    median_income,
    breaks=c(0, 50000, 90000, 150000),
    labels=c("< 50K","< 90K","< 150K")
  )
)
#Graph this
CD_Race_Income <- ggplot(CDRaceWithIncomeLevels, aes(CDNum,percent_white))
CD_Race_Income + geom_bar(stat = "identity", aes(fill = IncomeLevel), position = "dodge") +
  theme_classic() +
  labs(
    title = "Median Household Income of Community Districts with Racial Breakdown",
    x = "Community Districts",
    y = "Percent of Population that Identifies as White")+
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

Looks like there is some correlation between higher income and white population, but not drastically obvious. Let’s take a closer look at just the districts with service.

# Same Graph, but Just Districts with Service
Districts_With_Service <- CDRaceWithIncomeLevels %>% 
  filter(Service_status == "Has Service")

ggplot(Districts_With_Service, aes(CDNum,percent_white))+ 
  geom_bar(stat = "identity", aes(fill = Districts_With_Service$IncomeLevel), position = "dodge") +
  theme_classic() +
  labs(
    title = "Median Household Income of Community Districts ",
    subtitle = "with Racial Breakdown of Serviced Districts",
    x = "Community Districts",
    y = "Percent of Population that Identifies as White")+
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

And compare that with districts with no service:

#Same Graph, but Districts With NO Service

Districts_Without_Service <- CDRaceWithIncomeLevels %>% 
  filter(Service_status == "No Service")

ggplot(Districts_Without_Service, aes(CDNum,percent_white))+ 
  geom_bar(stat = "identity", aes(fill = Districts_Without_Service$IncomeLevel), position = "dodge") +
  theme_classic() +
  labs(
    title = "Median Household Income of Community Districts",  subtitle = "with Racial Breakdown of Unserviced Districts",
    x = "Community Districts",
    y = "Percent of Population that Identifies as White")+
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

Income Findings

Looking at both graphs together, we can see that there are more low income districts (red bars) without service than those with service, and more high income districts (blue bars) with service than those without.

Year Service Received

Zooming in on those districts with service again, let’s see when each got service in comparison to the data we have above.

Rollout_Order <- Districts_With_Service %>% 
  arrange(Service_Year)

ggplot(Rollout_Order, aes(Service_Year, percent_white))+ 
  geom_text(label = Rollout_Order$CDNum)+
  theme_classic() +
  labs(
    title = "Order of Service Received v. White Population of Districts",
    x = "Year Received",
    y = "Percent of Population that Identifies as White")

Map Visualization

Now we can look at the same racial data but on a map of the community districts.

We will read in shapefiles available from NYC Planning

cd_shapes <- read_sf("nycd_19c/nycd.shp", crs = 2263) %>% 
  janitor::clean_names()

## Warning: st_crs<- : replacing crs does not reproject data; use st_transform
## for that

cd_shapes

## Simple feature collection with 71 features and 3 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: 913175.1 ymin: 120121.9 xmax: 1067383 ymax: 272844.3
## epsg (SRID):    2263
## proj4string:    +proj=lcc +lat_1=41.03333333333333 +lat_2=40.66666666666666 +lat_0=40.16666666666666 +lon_0=-74 +x_0=300000.0000000001 +y_0=0 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=us-ft +no_defs
## # A tibble: 71 x 4
##    boro_cd shape_leng shape_area                                   geometry
##      <int>      <dbl>      <dbl>            <MULTIPOLYGON [US_survey_foot]>
##  1     311     51550. 103177786. (((991748.4 161085, 991861 160304.4, 9919…
##  2     313     65822.  88195686. (((988770.8 156350.5, 988958.2 156202.4, …
##  3     312     52246.  99525500. (((992187.4 175455.5, 992239.1 175223.2, …
##  4     304     37008.  56663217. (((1012966 187886.9, 1012949 187765.4, 10…
##  5     206     35876.  42664311. (((1019708 246708.1, 1019689 246596.2, 10…
##  6     226     32820.  50566410. (((1020768 268271.1, 1020752 268224.4, 10…
##  7     317     43327.  93810207. (((1009903 176534.5, 1009643 176315.8, 10…
##  8     404     37018.  65739662. (((1026508 208553.9, 1026369 208464.6, 10…
##  9     316     32998.  51768907. (((1010864 186749.4, 1011019 186632, 1010…
## 10     412     65928. 267333268. (((1039267 182097.9, 1039271 182116.3, 10…
## # … with 61 more rows

To get the data above to match the shapefile, we need to use mutate again to get the districts into numerical code form. Ex: Bronx 02 -> 202

cd_race <- Districts_With_Service %>%  
  mutate(
    boro = recode(
      Districts_With_Service$borough, 
      "Manhattan" = 1, 
      "Bronx" = 2, 
      "Brooklyn" = 3, 
      "Queens" = 4, 
      "Staten Island" = 5
    ),
    boro_cd = as.integer(str_c(boro, cd_num)) # need integer to match cd_shapes format
  )

Then we use left_join

cd_race_shape <- left_join(cd_shapes, cd_race, by = "boro_cd")

And make maps!

Districts with Service

cd_race_shapes_service <- cd_race_shape %>% 
  select(-starts_with("shape"), percent_white, Service_Year,IncomeLevel)  

cd_race_shapes_service

## Simple feature collection with 71 features and 16 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: 913175.1 ymin: 120121.9 xmax: 1067383 ymax: 272844.3
## epsg (SRID):    2263
## proj4string:    +proj=lcc +lat_1=41.03333333333333 +lat_2=40.66666666666666 +lat_0=40.16666666666666 +lon_0=-74 +x_0=300000.0000000001 +y_0=0 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=us-ft +no_defs
## # A tibble: 71 x 17
##    boro_cd                  geometry community_distr… borough district_abbr
##      <int> <MULTIPOLYGON [US_survey> <chr>            <chr>   <chr>        
##  1     311 (((991748.4 161085, 9918… 11 Brooklyn      Brookl… BK11         
##  2     313 (((988770.8 156350.5, 98… 13 Brooklyn      Brookl… BK13         
##  3     312 (((992187.4 175455.5, 99… 12 Brooklyn      Brookl… BK12         
##  4     304 (((1012966 187886.9, 101… <NA>             <NA>    <NA>         
##  5     206 (((1019708 246708.1, 101… <NA>             <NA>    <NA>         
##  6     226 (((1020768 268271.1, 102… <NA>             <NA>    <NA>         
##  7     317 (((1009903 176534.5, 100… <NA>             <NA>    <NA>         
##  8     404 (((1026508 208553.9, 102… <NA>             <NA>    <NA>         
##  9     316 (((1010864 186749.4, 101… 16 Brooklyn      Brookl… BK16         
## 10     412 (((1039267 182097.9, 103… <NA>             <NA>    <NA>         
## # … with 61 more rows, and 12 more variables: percent_white <dbl>,
## #   percent_non_white <dbl>, median_income <dbl>,
## #   has_service_1_yes_2_no <dbl>, service_received <chr>,
## #   boro_abbrev <chr>, cd_num <chr>, CDNum <chr>, Service_status <chr>,
## #   Service_Year <chr>, IncomeLevel <fct>, boro <dbl>

# Map of Percent White in Serviced Districts 
cd_race_shapes_service_ordered <- cd_race_shapes_service %>% 
  mutate(percent_white = as.ordered(percent_white)) # make ordered factor for mapping

ggplot(cd_race_shapes_service_ordered) +
  aes(fill = percent_white) +
  geom_sf() +
  theme_void() +
  labs(
    title = "Community Districts with DSNY Compost Service",
    subtitle = "v. White Population",
    fill = "Percent of District that Identifies as White",
    caption = "Sources: 1) nyc.gov/organics
    2) NYC Planning Population FactFinder")

Districts Without Service (we’ll do a separate left_join with cd_shapes for this one)

cd_race_no_service <- Districts_Without_Service %>%  
  mutate(
    boro = recode(
      Districts_Without_Service$borough, 
      "Manhattan" = 1, 
      "Bronx" = 2, 
      "Brooklyn" = 3, 
      "Queens" = 4, 
      "Staten Island" = 5
    ),
    boro_cd = as.integer(str_c(boro, cd_num)) # need integer to match cd_shapes format
  )

#Left Join Again
cd_race_shape_no_service <- left_join(cd_shapes, cd_race_no_service, by = "boro_cd")

cd_race_shapes_no_service <- cd_race_shape_no_service%>% 
  select(-starts_with("shape"), percent_white, Service_Year,IncomeLevel)  

cd_race_shapes_no_service

## Simple feature collection with 71 features and 16 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: 913175.1 ymin: 120121.9 xmax: 1067383 ymax: 272844.3
## epsg (SRID):    2263
## proj4string:    +proj=lcc +lat_1=41.03333333333333 +lat_2=40.66666666666666 +lat_0=40.16666666666666 +lon_0=-74 +x_0=300000.0000000001 +y_0=0 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=us-ft +no_defs
## # A tibble: 71 x 17
##    boro_cd                  geometry community_distr… borough district_abbr
##      <int> <MULTIPOLYGON [US_survey> <chr>            <chr>   <chr>        
##  1     311 (((991748.4 161085, 9918… <NA>             <NA>    <NA>         
##  2     313 (((988770.8 156350.5, 98… <NA>             <NA>    <NA>         
##  3     312 (((992187.4 175455.5, 99… <NA>             <NA>    <NA>         
##  4     304 (((1012966 187886.9, 101… 04 Brooklyn      Brookl… BK04         
##  5     206 (((1019708 246708.1, 101… <NA>             <NA>    <NA>         
##  6     226 (((1020768 268271.1, 102… <NA>             <NA>    <NA>         
##  7     317 (((1009903 176534.5, 100… 17 Brooklyn      Brookl… BK17         
##  8     404 (((1026508 208553.9, 102… 04 Queens        Queens  Q04          
##  9     316 (((1010864 186749.4, 101… <NA>             <NA>    <NA>         
## 10     412 (((1039267 182097.9, 103… 12 Queens        Queens  Q12          
## # … with 61 more rows, and 12 more variables: percent_white <dbl>,
## #   percent_non_white <dbl>, median_income <dbl>,
## #   has_service_1_yes_2_no <dbl>, service_received <chr>,
## #   boro_abbrev <chr>, cd_num <chr>, CDNum <chr>, Service_status <chr>,
## #   Service_Year <chr>, IncomeLevel <fct>, boro <dbl>

# Map of Percent White in Serviced Districts 
cd_race_shapes_no_service_ordered <- cd_race_shapes_no_service %>% 
  mutate(percent_white = as.ordered(percent_white)) # make ordered factor for mapping

ggplot(cd_race_shapes_no_service_ordered) +
  aes(fill = percent_white) +
  geom_sf() +
  theme_void() +
  labs(
    title = "Community Districts without DSNY Compost Service",
    subtitle = "v. White Population",
    fill = "Percent of District that Identifies as White",
    caption = "Sources: 1) nyc.gov/organics
    2) NYC Planning Population FactFinder")

```

Table Count

Lastly, we can draw up quick tables to show exactly what we see happening on the maps translated into numbers.

# Make categories for CD's with 50%+ white population vs Service Status
CDRacePopulationCategory <- mutate(
  CDRaceWithIncomeLevels,
  WhitePopulation=cut(
    percent_white,
    breaks=c( 0, 50, 100),
    labels=c("Less than 50%","Greater than 50%")
  )
)


#Serviced Districts
Served <- CDRacePopulationCategory %>% 
  filter(Service_status== "Has Service") %>% 
group_by(WhitePopulation)%>% tally()


#Non-Serviced Districts
Unserved <- CDRacePopulationCategory %>% 
  filter(Service_status== "No Service") %>% 
  group_by(WhitePopulation)%>% tally()

Districts With Service

Served

## # A tibble: 2 x 2
##   WhitePopulation      n
##   <fct>            <int>
## 1 Less than 50%       16
## 2 Greater than 50%     7

Districts Without Service

Unserved

## # A tibble: 2 x 2
##   WhitePopulation      n
##   <fct>            <int>
## 1 Less than 50%       14
## 2 Greater than 50%     2

Conclusion

As we can see from the above graphs, tables, and maps, there are more “white” districts that received compost service than those that are not as “white” (difference of 5). Communities that are comprised of at least 50% minorities see roughly the same number of districted serviced as non-serviced.

To answer the initial question, we can say that more “white” neighborhoods received service than others, but we cannot answer why they did. Again, this could be a coincidence as there are a finite number of ways that the expansion of the program could have occured. Regardless of intention, it is important in terms of environmental justice and for the reputation of the program and the agency that statistics like these are taken into account when continuing this or any future programs.