R Week 8 Assignment

Author

Daniel Danovich

Published

March 27, 2026

R Spatial Lab Assignment #2

Task 1: Load the Week 7 spatial objects

I loaded the three sf objects saved in Week 7, including ZIP code polygons, health facilities, and retail food stores. These saved objects are required for the Week 8 assignment.

Output:

# load week 7 sf objects
load("data/R-Spatial_II_Lab/week7_spatial_objects.RData")

# inspect the loaded objects
str(nyc_zip_sf)
Classes 'sf' and 'data.frame':  263 obs. of  13 variables:
 $ ZIPCODE   : chr  "11436" "11213" "11212" "11225" ...
 $ BLDGZIP   : chr  "0" "0" "0" "0" ...
 $ PO_NAME   : chr  "Jamaica" "Brooklyn" "Brooklyn" "Brooklyn" ...
 $ POPULATION: num  18681 62426 83866 56527 72280 ...
 $ AREA      : num  22699295 29631004 41972104 23698630 36868799 ...
 $ STATE     : chr  "NY" "NY" "NY" "NY" ...
 $ COUNTY    : chr  "Queens" "Kings" "Kings" "Kings" ...
 $ ST_FIPS   : chr  "36" "36" "36" "36" ...
 $ CTY_FIPS  : chr  "081" "047" "047" "047" ...
 $ URL       : chr  "http://www.usps.com/" "http://www.usps.com/" "http://www.usps.com/" "http://www.usps.com/" ...
 $ SHAPE_AREA: num  0 0 0 0 0 0 0 0 0 0 ...
 $ SHAPE_LEN : num  0 0 0 0 0 0 0 0 0 0 ...
 $ geometry  :sfc_POLYGON of length 263; first list element: List of 1
  ..$ : num [1:159, 1:2] 1038098 1038142 1038171 1038280 1038521 ...
  ..- attr(*, "class")= chr [1:3] "XY" "POLYGON" "sfg"
 - attr(*, "sf_column")= chr "geometry"
 - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA NA ...
  ..- attr(*, "names")= chr [1:12] "ZIPCODE" "BLDGZIP" "PO_NAME" "POPULATION" ...
str(health_fac_sf)
sf [3,843 × 35] (S3: sf/tbl_df/tbl/data.frame)
 $ Facility ID                 : num [1:3843] 204 620 1156 2589 3455 ...
 $ Facility Name               : chr [1:3843] "Hospice at Lourdes" "Charles T Sitrin Health Care Center Inc" "East Side Nursing Home" "Wellsville Manor Care Center" ...
 $ Short Description           : chr [1:3843] "HSPC" "NH" "NH" "NH" ...
 $ Description                 : chr [1:3843] "Hospice" "Residential Health Care Facility - SNF" "Residential Health Care Facility - SNF" "Residential Health Care Facility - SNF" ...
 $ Facility Open Date          : chr [1:3843] "06/01/1985" "02/01/1989" "08/01/1979" "02/01/1989" ...
 $ Facility Address 1          : chr [1:3843] "4102 Old Vestal Road" "2050 Tilden Avenue" "62 Prospect St" "4192A Bolivar Road" ...
 $ Facility Address 2          : chr [1:3843] NA NA NA NA ...
 $ Facility City               : chr [1:3843] "Vestal" "New Hartford" "Warsaw" "Wellsville" ...
 $ Facility State              : chr [1:3843] "New York" "New York" "New York" "New York" ...
 $ Facility Zip Code           : chr [1:3843] "13850" "13413" "14569" "14895" ...
 $ Facility Phone Number       : num [1:3843] 6.08e+09 3.16e+09 5.86e+09 5.86e+09 7.17e+09 ...
 $ Facility Fax Number         : num [1:3843] NA NA NA NA NA ...
 $ Facility Website            : chr [1:3843] NA NA NA NA ...
 $ Facility County Code        : num [1:3843] 3 32 60 2 14 ...
 $ Facility County             : chr [1:3843] "Broome" "Oneida" "Wyoming" "Allegany" ...
 $ Regional Office ID          : num [1:3843] 3 3 1 1 1 7 1 7 5 7 ...
 $ Regional Office             : chr [1:3843] "Central New York Regional Office" "Central New York Regional Office" "Western Regional Office - Buffalo" "Western Regional Office - Buffalo" ...
 $ Main Site Name              : chr [1:3843] NA NA NA NA ...
 $ Main Site Facility ID       : num [1:3843] NA NA NA NA NA ...
 $ Operating Certificate Number: chr [1:3843] "0301501F" "3227304N" "6027303N" "0228305N" ...
 $ Operator Name               : chr [1:3843] "Our Lady of Lourdes Memorial Hospital Inc" "Charles T Sitrin Health Care Center, Inc" "East Side Nursing Home Inc" "Wellsville Manor LLC" ...
 $ Operator Address 1          : chr [1:3843] "169 Riverside Drive" "Box 1000 Tilden Avenue" "62 Prospect Street" "4192a Bolivar Road" ...
 $ Operator Address 2          : chr [1:3843] NA NA NA NA ...
 $ Operator City               : chr [1:3843] "Binghamton" "New Hartford" "Warsaw" "Wellsville" ...
 $ Operator State              : chr [1:3843] "New York" "New York" "New York" "New York" ...
 $ Operator Zip Code           : chr [1:3843] "13905" "13413" "14569" "14897" ...
 $ Cooperator Name             : chr [1:3843] NA NA NA NA ...
 $ Cooperator Address          : chr [1:3843] NA NA NA NA ...
 $ Cooperator Address 2        : chr [1:3843] NA NA NA NA ...
 $ Cooperator City             : chr [1:3843] NA NA NA NA ...
 $ Cooperator State            : chr [1:3843] "New York" "New York" "New York" "New York" ...
 $ Cooperator Zip Code         : chr [1:3843] NA NA NA NA ...
 $ Ownership Type              : chr [1:3843] "Not for Profit Corporation" "Not for Profit Corporation" "Business Corporation" "LLC" ...
 $ Facility Location           : chr [1:3843] "(42.097095, -75.975243)" "(43.05497, -75.228828)" "(42.738979, -78.12867)" "(42.126461, -77.967834)" ...
 $ geometry                    :sfc_POINT of length 3843; first list element:  'XY' num [1:2] -76 42.1
 - attr(*, "sf_column")= chr "geometry"
 - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA NA ...
  ..- attr(*, "names")= chr [1:34] "Facility ID" "Facility Name" "Short Description" "Description" ...
str(retail_food_sf)
sf [11,301 × 17] (S3: sf/tbl_df/tbl/data.frame)
 $ County            : chr [1:11301] "Bronx" "Bronx" "Bronx" "Bronx" ...
 $ License Number    : chr [1:11301] "734149" "606221" "606228" "723375" ...
 $ Operation Type    : chr [1:11301] "Store" "Store" "Store" "Store" ...
 $ Establishment Type: chr [1:11301] "JAC" "JAC" "JAC" "JAC" ...
 $ Entity Name       : chr [1:11301] "7 ELEVEN FOOD STORE #37933H" "1001 SAN MIGUEL FOOD CENTER INC" "1029 FOOD PLAZA INC" "1078 DELI GROCERY CORP" ...
 $ DBA Name          : chr [1:11301] NA "1001 SAN MIGUEL FD CNTR" "1029 FOOD PLAZA" "1078 DELI GROCERY" ...
 $ Street Number     : chr [1:11301] "500" "1001" "122" "1078" ...
 $ Street Name       : chr [1:11301] "BAYCHESTER AVE" "SHERIDAN AVE" "E 181ST ST" "EAST 165TH STREET" ...
 $ Address Line 2    : logi [1:11301] NA NA NA NA NA NA ...
 $ Address Line 3    : logi [1:11301] NA NA NA NA NA NA ...
 $ City              : chr [1:11301] "BRONX" "BRONX" "BRONX" "BRONX" ...
 $ State             : chr [1:11301] "NY" "NY" "NY" "NY" ...
 $ Zip Code          : num [1:11301] 10475 10456 10453 10459 10456 ...
 $ Square Footage    : num [1:11301] 0 1100 2000 1200 1500 2400 1000 1200 3400 500 ...
 $ Location          : chr [1:11301] "500 BAYCHESTER AVE\nBRONX, NY 10475\n(40.869156, -73.831875)" "1001 SHERIDAN AVE\nBRONX, NY 10456\n(40.829061, -73.919613)" "122 E 181ST ST\nBRONX, NY 10453\n(40.854755, -73.902853)" "1078 EAST 165TH STREET\nBRONX, NY 10459\n(40.825105, -73.890589)" ...
 $ coords_text       : chr [1:11301] "40.869156, -73.831875" "40.829061, -73.919613" "40.854755, -73.902853" "40.825105, -73.890589" ...
 $ geometry          :sfc_POINT of length 11301; first list element:  'XY' num [1:2] -73.8 40.9
 - attr(*, "sf_column")= chr "geometry"
 - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA NA ...
  ..- attr(*, "names")= chr [1:16] "County" "License Number" "Operation Type" "Establishment Type" ...

Task 2: Read and join one COVID by ZCTA dataset

I read one COVID testing dataset by modified ZIP code tabulation area and joined it to the ZIP polygon layer using the ZIP code field.

Output:

# read the chosen COVID dataset
covid_raw <- read_csv(
  "data/R-Spatial_II_Lab/tests-by-zcta_2020_04_19.csv",
  show_col_types = FALSE
)

# make join fields the same type
nyc_zip_sf <- nyc_zip_sf %>%
  mutate(ZIPCODE = as.character(ZIPCODE))

covid_raw <- covid_raw %>%
  mutate(MODZCTA = as.character(MODZCTA))

# join COVID data to ZIP polygons
zip_covid_sf <- nyc_zip_sf %>%
  left_join(covid_raw, by = c("ZIPCODE" = "MODZCTA"))

# inspect the joined result
names(zip_covid_sf)
 [1] "ZIPCODE"           "BLDGZIP"           "PO_NAME"          
 [4] "POPULATION"        "AREA"              "STATE"            
 [7] "COUNTY"            "ST_FIPS"           "CTY_FIPS"         
[10] "URL"               "SHAPE_AREA"        "SHAPE_LEN"        
[13] "Positive"          "Total"             "zcta_cum.perc_pos"
[16] "geometry"         

Task 3: Count retail food stores by ZIP code

I used a spatial join to assign retail food store points to ZIP polygons and then counted the number of stores in each ZIP code.

Output:

# make sure CRS matches before the spatial join
retail_food_sf <- st_transform(retail_food_sf, st_crs(nyc_zip_sf))

# spatially join retail food points to ZIP polygons
zip_food_count <- st_join(
  nyc_zip_sf,
  retail_food_sf,
  join = st_contains,
  left = TRUE
) %>%
  st_drop_geometry() %>%
  group_by(ZIPCODE) %>%
  summarise(food_store_count = sum(!is.na(`License Number`)))

zip_food_count
# A tibble: 248 × 2
   ZIPCODE food_store_count
   <chr>              <int>
 1 00083                  0
 2 10001                 54
 3 10002                199
 4 10003                 94
 5 10004                 10
 6 10005                  9
 7 10006                  5
 8 10007                 20
 9 10009                 84
10 10010                 48
# ℹ 238 more rows

Task 4: Count nursing homes by ZIP code

I filtered the health facilities layer to nursing homes and used a spatial join to count nursing homes in each ZIP code.

Output:

# make sure CRS matches before the spatial join
health_fac_sf <- st_transform(health_fac_sf, st_crs(nyc_zip_sf))

# keep only nursing homes
nursing_home_sf <- health_fac_sf %>%
  filter(`Short Description` == "NH")

# spatially join nursing homes to ZIP polygons
zip_health_count <- st_join(
  nyc_zip_sf,
  nursing_home_sf,
  join = st_contains,
  left = TRUE
) %>%
  st_drop_geometry() %>%
  group_by(ZIPCODE) %>%
  summarise(nursing_home_count = sum(!is.na(`Facility ID`)))

zip_health_count
# A tibble: 248 × 2
   ZIPCODE nursing_home_count
   <chr>                <int>
 1 00083                    0
 2 10001                    0
 3 10002                    2
 4 10003                    0
 5 10004                    0
 6 10005                    0
 7 10006                    0
 8 10007                    0
 9 10009                    0
10 10010                    0
# ℹ 238 more rows

Task 5: Read ACS demographic data and census tract polygons

I read the ACS demographic file and the census tract shapefile, then created a tract ID field so the ACS table could be joined to the tract polygons.

Output:

# read ACS demographic data
acs_clean <- readLines("data/R-Spatial_II_Lab/ACSDP5Y2018.DP05_data_with_overlays_2020-04-22T132935.csv") %>%
  magrittr::extract(-2) %>%
  textConnection() %>%
  read.csv(header = TRUE, quote = "\"") %>%
  dplyr::select(
    GEO_ID,
    total_pop = DP05_0001E,
    elderly_pop = DP05_0024E,
    white_pop = DP05_0037E,
    black_pop = DP05_0038E,
    asian_pop = DP05_0067E,
    hispanic_pop = DP05_0071E
  ) %>%
  mutate(censusCode = stringr::str_sub(GEO_ID, -9, -1))

# read census tract shapefile
tract_sf <- st_read(
  "data/R-Spatial_II_Lab/census_tracts/geo_export_1dc7b645-647b-4806-b9a0-7b79660f120a.shp",
  quiet = TRUE
)

# create tract join key
tract_sf <- tract_sf %>%
  mutate(
    cntyFIPS = case_when(
      boro_name == "Bronx" ~ "005",
      boro_name == "Brooklyn" ~ "047",
      boro_name == "Manhattan" ~ "061",
      boro_name == "Queens" ~ "081",
      boro_name == "Staten Island" ~ "085"
    ),
    tractFIPS = paste0(cntyFIPS, ct2010)
  )

# join ACS table to tract polygons
tract_demo_sf <- merge(
  tract_sf,
  acs_clean,
  by.x = "tractFIPS",
  by.y = "censusCode"
)

names(tract_demo_sf)
 [1] "tractFIPS"    "boro_code"    "boro_ct201"   "boro_name"    "cdeligibil"  
 [6] "ct2010"       "ctlabel"      "ntacode"      "ntaname"      "puma"        
[11] "shape_area"   "shape_leng"   "cntyFIPS"     "GEO_ID"       "total_pop"   
[16] "elderly_pop"  "white_pop"    "black_pop"    "asian_pop"    "hispanic_pop"
[21] "geometry"    

Task 6: Aggregate tract demographics to ZIP codes

I converted the tract polygons to centroids and then aggregated demographic variables to ZIP code areas using a spatial join.

Output:

# transform tract layer to match ZIP CRS
tract_demo_sf <- st_transform(tract_demo_sf, st_crs(nyc_zip_sf))

# spatially join tract centroids to ZIP polygons and summarize
zip_demo <- st_join(
  nyc_zip_sf,
  st_centroid(tract_demo_sf),
  join = st_contains,
  left = TRUE
) %>%
  st_drop_geometry() %>%
  group_by(ZIPCODE) %>%
  summarise(
    total_population = sum(total_pop, na.rm = TRUE),
    elderly_population = sum(elderly_pop, na.rm = TRUE),
    white_population = sum(white_pop, na.rm = TRUE),
    black_population = sum(black_pop, na.rm = TRUE),
    asian_population = sum(asian_pop, na.rm = TRUE),
    hispanic_population = sum(hispanic_pop, na.rm = TRUE)
  )

zip_demo
# A tibble: 248 × 7
   ZIPCODE total_population elderly_population white_population black_population
   <chr>              <int>              <int>            <int>            <int>
 1 00083                  3                  0                1                2
 2 10001              19146               2500            12485             1092
 3 10002              74310              15815            24033             5969
 4 10003              53487               6296            40503             3130
 5 10004               1731                 52             1207               71
 6 10005               8809                209             6419              185
 7 10006               4639                 66             3130               52
 8 10007              15510               2134             7912             1008
 9 10009              58683               8907            35952             5745
10 10010              28129               3469            21438             1159
# ℹ 238 more rows
# ℹ 2 more variables: asian_population <int>, hispanic_population <int>

Task 7: Create one final ZIP-level sf object

I joined the COVID data, retail food store counts, nursing home counts, and demographic summaries into one final ZIP-level sf object.

Output:

# combine all ZIP-level information
final_zip_sf <- zip_covid_sf %>%
  left_join(zip_food_count, by = "ZIPCODE") %>%
  left_join(zip_health_count, by = "ZIPCODE") %>%
  left_join(zip_demo, by = "ZIPCODE")

# inspect final object
names(final_zip_sf)
 [1] "ZIPCODE"             "BLDGZIP"             "PO_NAME"            
 [4] "POPULATION"          "AREA"                "STATE"              
 [7] "COUNTY"              "ST_FIPS"             "CTY_FIPS"           
[10] "URL"                 "SHAPE_AREA"          "SHAPE_LEN"          
[13] "Positive"            "Total"               "zcta_cum.perc_pos"  
[16] "food_store_count"    "nursing_home_count"  "total_population"   
[19] "elderly_population"  "white_population"    "black_population"   
[22] "asian_population"    "hispanic_population" "geometry"           
str(final_zip_sf)
Classes 'sf' and 'data.frame':  263 obs. of  24 variables:
 $ ZIPCODE            : chr  "11436" "11213" "11212" "11225" ...
 $ BLDGZIP            : chr  "0" "0" "0" "0" ...
 $ PO_NAME            : chr  "Jamaica" "Brooklyn" "Brooklyn" "Brooklyn" ...
 $ POPULATION         : num  18681 62426 83866 56527 72280 ...
 $ AREA               : num  22699295 29631004 41972104 23698630 36868799 ...
 $ STATE              : chr  "NY" "NY" "NY" "NY" ...
 $ COUNTY             : chr  "Queens" "Kings" "Kings" "Kings" ...
 $ ST_FIPS            : chr  "36" "36" "36" "36" ...
 $ CTY_FIPS           : chr  "081" "047" "047" "047" ...
 $ URL                : chr  "http://www.usps.com/" "http://www.usps.com/" "http://www.usps.com/" "http://www.usps.com/" ...
 $ SHAPE_AREA         : num  0 0 0 0 0 0 0 0 0 0 ...
 $ SHAPE_LEN          : num  0 0 0 0 0 0 0 0 0 0 ...
 $ Positive           : num  342 972 1086 814 1163 ...
 $ Total              : num  567 1653 1793 1359 1967 ...
 $ zcta_cum.perc_pos  : num  60.3 58.8 60.6 59.9 59.1 ...
 $ food_store_count   : int  3 118 187 103 135 236 176 73 128 144 ...
 $ nursing_home_count : int  0 1 1 0 1 3 3 0 1 0 ...
 $ total_population   : int  22377 66602 73069 60958 67426 103729 85562 74162 77494 83040 ...
 $ elderly_population : int  2456 7662 9518 7592 8144 12278 10192 10368 11808 11083 ...
 $ white_population   : int  1192 16483 5278 15300 40315 16173 59375 27580 55294 47554 ...
 $ black_population   : int  13972 43625 59541 40048 5045 69058 890 37835 4325 845 ...
 $ asian_population   : int  2626 1836 1330 2272 14695 5207 19828 4451 12744 26910 ...
 $ hispanic_population: int  3226 6738 13900 5293 11169 18244 9982 5860 7536 10986 ...
 $ geometry           :sfc_POLYGON of length 263; first list element: List of 1
  ..$ : num [1:159, 1:2] 1038098 1038142 1038171 1038280 1038521 ...
  ..- attr(*, "class")= chr [1:3] "XY" "POLYGON" "sfg"
 - attr(*, "sf_column")= chr "geometry"
 - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA NA ...
  ..- attr(*, "names")= chr [1:23] "ZIPCODE" "BLDGZIP" "PO_NAME" "POPULATION" ...

Task 8: Verify and save the final result

I mapped one variable from the final object to verify the result and then saved the final ZIP-level sf object into an .RData file.

Map: Food store count by ZIP

# make a static map of food store counts by ZIP
plot(final_zip_sf["food_store_count"])

Output:

# save the final ZIP-level sf object
save(
  final_zip_sf,
  file = "output/week8_final_zip_sf.RData"
)