R Week 8 Assignment

R Spatial Lab Assignment # 2

Don’t use a single chunk for the entire assignment. Break it into multiple chunks.

task 1: Join the COVID-19 data to the NYC zip code area data (sf or sp polygons).

load("../Week7/Lab1_sf_files.RData")
ls()

## [1] "nyc_retail_sf"            "nys_health_facilities_sf"
## [3] "postal_areas"

colnames(postal_areas)

##  [1] "ZIPCODE"    "BLDGZIP"    "PO_NAME"    "POPULATION" "AREA"      
##  [6] "STATE"      "COUNTY"     "ST_FIPS"    "CTY_FIPS"   "URL"       
## [11] "SHAPE_AREA" "SHAPE_LEN"  "geometry"

covid <- readr::read_csv("R-Spatial_II_Lab/tests-by-zcta_2020_04_12.csv", lazy = FALSE)

## Rows: 178 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (4): MODZCTA, Positive, Total, zcta_cum.perc_pos
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

str(covid)

## spc_tbl_ [178 × 4] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ MODZCTA          : num [1:178] NA 10001 10002 10003 10004 ...
##  $ Positive         : num [1:178] 1934 211 539 279 23 ...
##  $ Total            : num [1:178] 2082 448 1024 662 59 ...
##  $ zcta_cum.perc_pos: num [1:178] 92.9 47.1 52.6 42.1 39 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   MODZCTA = col_double(),
##   ..   Positive = col_double(),
##   ..   Total = col_double(),
##   ..   zcta_cum.perc_pos = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>

class(postal_areas$ZIPCODE)

## [1] "character"

class(covid$MODZCTA)

## [1] "numeric"

dplyr::left_join(postal_areas %>% 
                   dplyr::mutate(ZIPCODE=as.numeric(as.character(ZIPCODE))), 
                 covid, 
                 by = c('ZIPCODE' = 'MODZCTA')) -> zip_covid_merge

names(zip_covid_merge)

##  [1] "ZIPCODE"           "BLDGZIP"           "PO_NAME"          
##  [4] "POPULATION"        "AREA"              "STATE"            
##  [7] "COUNTY"            "ST_FIPS"           "CTY_FIPS"         
## [10] "URL"               "SHAPE_AREA"        "SHAPE_LEN"        
## [13] "Positive"          "Total"             "zcta_cum.perc_pos"
## [16] "geometry"

zip_covid_merge[1]

## Simple feature collection with 263 features and 1 field
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: 913129 ymin: 120020.9 xmax: 1067494 ymax: 272710.9
## Projected CRS: NAD83 / New York Long Island (ftUS)
## First 10 features:
##    ZIPCODE                       geometry
## 1    11436 POLYGON ((1038098 188138.4,...
## 2    11213 POLYGON ((1001614 186926.4,...
## 3    11212 POLYGON ((1011174 183696.3,...
## 4    11225 POLYGON ((995908.4 183617.6...
## 5    11218 POLYGON ((991997.1 176307.5...
## 6    11226 POLYGON ((994821.5 177865.7...
## 7    11219 POLYGON ((987286.4 173946.5...
## 8    11210 POLYGON ((995796 171110.1, ...
## 9    11230 POLYGON ((994099.3 171240.7...
## 10   11204 POLYGON ((989500.2 170730.2...

task 2: Aggregate the NYC food retails store data (points) to the zip code data, so that we know how many retail stores in each zip code area. Note that not all locations are for food retail. And we need to choose the specific types according to the data.

st_crs(zip_covid_merge)

## Coordinate Reference System:
##   User input: NAD83 / New York Long Island (ftUS) 
##   wkt:
## PROJCRS["NAD83 / New York Long Island (ftUS)",
##     BASEGEOGCRS["NAD83",
##         DATUM["North American Datum 1983",
##             ELLIPSOID["GRS 1980",6378137,298.257222101,
##                 LENGTHUNIT["metre",1]]],
##         PRIMEM["Greenwich",0,
##             ANGLEUNIT["degree",0.0174532925199433]],
##         ID["EPSG",4269]],
##     CONVERSION["SPCS83 New York Long Island zone (US survey foot)",
##         METHOD["Lambert Conic Conformal (2SP)",
##             ID["EPSG",9802]],
##         PARAMETER["Latitude of false origin",40.1666666666667,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8821]],
##         PARAMETER["Longitude of false origin",-74,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8822]],
##         PARAMETER["Latitude of 1st standard parallel",41.0333333333333,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8823]],
##         PARAMETER["Latitude of 2nd standard parallel",40.6666666666667,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8824]],
##         PARAMETER["Easting at false origin",984250,
##             LENGTHUNIT["US survey foot",0.304800609601219],
##             ID["EPSG",8826]],
##         PARAMETER["Northing at false origin",0,
##             LENGTHUNIT["US survey foot",0.304800609601219],
##             ID["EPSG",8827]]],
##     CS[Cartesian,2],
##         AXIS["easting (X)",east,
##             ORDER[1],
##             LENGTHUNIT["US survey foot",0.304800609601219]],
##         AXIS["northing (Y)",north,
##             ORDER[2],
##             LENGTHUNIT["US survey foot",0.304800609601219]],
##     USAGE[
##         SCOPE["Engineering survey, topographic mapping."],
##         AREA["United States (USA) - New York - counties of Bronx; Kings; Nassau; New York; Queens; Richmond; Suffolk."],
##         BBOX[40.47,-74.26,41.3,-71.8]],
##     ID["EPSG",2263]]

st_crs(nyc_retail_sf)

## Coordinate Reference System:
##   User input: EPSG:4326 
##   wkt:
## GEOGCRS["WGS 84",
##     ENSEMBLE["World Geodetic System 1984 ensemble",
##         MEMBER["World Geodetic System 1984 (Transit)"],
##         MEMBER["World Geodetic System 1984 (G730)"],
##         MEMBER["World Geodetic System 1984 (G873)"],
##         MEMBER["World Geodetic System 1984 (G1150)"],
##         MEMBER["World Geodetic System 1984 (G1674)"],
##         MEMBER["World Geodetic System 1984 (G1762)"],
##         MEMBER["World Geodetic System 1984 (G2139)"],
##         MEMBER["World Geodetic System 1984 (G2296)"],
##         ELLIPSOID["WGS 84",6378137,298.257223563,
##             LENGTHUNIT["metre",1]],
##         ENSEMBLEACCURACY[2.0]],
##     PRIMEM["Greenwich",0,
##         ANGLEUNIT["degree",0.0174532925199433]],
##     CS[ellipsoidal,2],
##         AXIS["geodetic latitude (Lat)",north,
##             ORDER[1],
##             ANGLEUNIT["degree",0.0174532925199433]],
##         AXIS["geodetic longitude (Lon)",east,
##             ORDER[2],
##             ANGLEUNIT["degree",0.0174532925199433]],
##     USAGE[
##         SCOPE["Horizontal component of 3D system."],
##         AREA["World."],
##         BBOX[-90,-180,90,180]],
##     ID["EPSG",4326]]

nyc_retail_sf <- st_transform(nyc_retail_sf, st_crs(zip_covid_merge))
zip_food_join <- sf::st_join(zip_covid_merge, nyc_retail_sf, join = sf::st_contains)
names(zip_food_join)

##  [1] "ZIPCODE"            "BLDGZIP"            "PO_NAME"           
##  [4] "POPULATION"         "AREA"               "STATE"             
##  [7] "COUNTY"             "ST_FIPS"            "CTY_FIPS"          
## [10] "URL"                "SHAPE_AREA"         "SHAPE_LEN"         
## [13] "Positive"           "Total"              "zcta_cum.perc_pos" 
## [16] "County"             "License.Number"     "Operation.Type"    
## [19] "Establishment.Type" "Entity.Name"        "DBA.Name"          
## [22] "Street.Number"      "Street.Name"        "Address.Line.2"    
## [25] "Address.Line.3"     "City"               "State"             
## [28] "Zip.Code"           "Square.Footage"     "Location"          
## [31] "Coords"             "geometry"

unique(nyc_retail_sf$Establishment.Type[!is.na(nyc_retail_sf$Establishment.Type)])

##  [1] "JAC"    "A"      "JACD"   "JACDK"  "JAD"    "JABCHK" "JACHK"  "JABC"  
##  [9] "JAZ"    "JABCK"  "JACK"   "JACDHK" "JABH"   "JACH"   "JACDE"  "JABCH" 
## [17] "JABCDH" "JABK"   "JABHK"  "JABCD"  "JACG"   "JACDH"  "JADHK"  "JKA"   
## [25] "JADK"   "JAB"    "JAHK"   "JABCDK" "JACZ"   "JAK"    "JADO"   "JDA"

unique(nyc_retail_sf$Operation.Type[!is.na(nyc_retail_sf$Operation.Type)])

## [1] "Store"

food_count <- zip_food_join %>% dplyr::group_by(ZIPCODE) %>% dplyr::summarise(num_food_store = sum(!is.na(Operation.Type)))

zip_food_count <- sf::st_join(zip_covid_merge, food_count, by = "ZIPCODE")

task 3: Aggregate the NYC health facilities (points) to the zip code data. Similarly, choose appropriate subtypes such as nursing homes from the facilities.

names(nys_health_facilities_sf)

##  [1] "Facility ID"                  "Facility Name"               
##  [3] "Short Description"            "Description"                 
##  [5] "Facility Open Date"           "Facility Address 1"          
##  [7] "Facility Address 2"           "Facility City"               
##  [9] "Facility State"               "Facility Zip Code"           
## [11] "Facility Phone Number"        "Facility Fax Number"         
## [13] "Facility Website"             "Facility County Code"        
## [15] "Facility County"              "Regional Office ID"          
## [17] "Regional Office"              "Main Site Name"              
## [19] "Main Site Facility ID"        "Operating Certificate Number"
## [21] "Operator Name"                "Operator Address 1"          
## [23] "Operator Address 2"           "Operator City"               
## [25] "Operator State"               "Operator Zip Code"           
## [27] "Cooperator Name"              "Cooperator Address"          
## [29] "Cooperator Address 2"         "Cooperator City"             
## [31] "Cooperator State"             "Cooperator Zip Code"         
## [33] "Ownership Type"               "Facility Location"           
## [35] "geometry"

unique(nys_health_facilities_sf$`Short Description`[!is.na(nys_health_facilities_sf$`Short Description`)])

##  [1] "HSPC"    "NH"      "DTC"     "CHHA"    "HOSP-EC" "DTC-EC"  "HOSP-SB"
##  [8] "HOSP"    "ADHCP"   "LTHHCP"

unique(nys_health_facilities_sf$`Facility County`[!is.na(nys_health_facilities_sf$`Facility County`)])

##  [1] "Broome"         "Oneida"         "Wyoming"        "Allegany"      
##  [5] "Erie"           "Nassau"         "New York"       "Madison"       
##  [9] "Kings"          "Suffolk"        "Orange"         "Franklin"      
## [13] "Cattaraugus"    "Monroe"         "Clinton"        "Westchester"   
## [17] "Albany"         "Bronx"          "Chemung"        "Otsego"        
## [21] "Dutchess"       "Niagara"        "Queens"         "Jefferson"     
## [25] "Rensselaer"     "Richmond"       "Ontario"        "Livingston"    
## [29] "Wayne"          "Columbia"       "Putnam"         "Oswego"        
## [33] "Essex"          "Ulster"         "Onondaga"       "Saint Lawrence"
## [37] "Rockland"       "Washington"     "Montgomery"     "Steuben"       
## [41] "Schuyler"       "Warren"         "Genesee"        "Tompkins"      
## [45] "Saratoga"       "Fulton"         "Chautauqua"     "Schenectady"   
## [49] "Orleans"        "Cortland"       "Herkimer"       "Schoharie"     
## [53] "Cayuga"         "Sullivan"       "Yates"          "Delaware"      
## [57] "Lewis"          "Chenango"       "Tioga"          "Greene"        
## [61] "Seneca"         "Hamilton"

nyc_health_facilities <- nys_health_facilities_sf %>% dplyr::filter(`Facility County` %in% c("New York", "Kings", "Bronx", "Queens", "Richmond"))

st_crs(zip_food_count)

## Coordinate Reference System:
##   User input: NAD83 / New York Long Island (ftUS) 
##   wkt:
## PROJCRS["NAD83 / New York Long Island (ftUS)",
##     BASEGEOGCRS["NAD83",
##         DATUM["North American Datum 1983",
##             ELLIPSOID["GRS 1980",6378137,298.257222101,
##                 LENGTHUNIT["metre",1]]],
##         PRIMEM["Greenwich",0,
##             ANGLEUNIT["degree",0.0174532925199433]],
##         ID["EPSG",4269]],
##     CONVERSION["SPCS83 New York Long Island zone (US survey foot)",
##         METHOD["Lambert Conic Conformal (2SP)",
##             ID["EPSG",9802]],
##         PARAMETER["Latitude of false origin",40.1666666666667,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8821]],
##         PARAMETER["Longitude of false origin",-74,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8822]],
##         PARAMETER["Latitude of 1st standard parallel",41.0333333333333,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8823]],
##         PARAMETER["Latitude of 2nd standard parallel",40.6666666666667,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8824]],
##         PARAMETER["Easting at false origin",984250,
##             LENGTHUNIT["US survey foot",0.304800609601219],
##             ID["EPSG",8826]],
##         PARAMETER["Northing at false origin",0,
##             LENGTHUNIT["US survey foot",0.304800609601219],
##             ID["EPSG",8827]]],
##     CS[Cartesian,2],
##         AXIS["easting (X)",east,
##             ORDER[1],
##             LENGTHUNIT["US survey foot",0.304800609601219]],
##         AXIS["northing (Y)",north,
##             ORDER[2],
##             LENGTHUNIT["US survey foot",0.304800609601219]],
##     USAGE[
##         SCOPE["Engineering survey, topographic mapping."],
##         AREA["United States (USA) - New York - counties of Bronx; Kings; Nassau; New York; Queens; Richmond; Suffolk."],
##         BBOX[40.47,-74.26,41.3,-71.8]],
##     ID["EPSG",2263]]

st_crs(nyc_health_facilities)

## Coordinate Reference System:
##   User input: EPSG:4326 
##   wkt:
## GEOGCRS["WGS 84",
##     ENSEMBLE["World Geodetic System 1984 ensemble",
##         MEMBER["World Geodetic System 1984 (Transit)"],
##         MEMBER["World Geodetic System 1984 (G730)"],
##         MEMBER["World Geodetic System 1984 (G873)"],
##         MEMBER["World Geodetic System 1984 (G1150)"],
##         MEMBER["World Geodetic System 1984 (G1674)"],
##         MEMBER["World Geodetic System 1984 (G1762)"],
##         MEMBER["World Geodetic System 1984 (G2139)"],
##         MEMBER["World Geodetic System 1984 (G2296)"],
##         ELLIPSOID["WGS 84",6378137,298.257223563,
##             LENGTHUNIT["metre",1]],
##         ENSEMBLEACCURACY[2.0]],
##     PRIMEM["Greenwich",0,
##         ANGLEUNIT["degree",0.0174532925199433]],
##     CS[ellipsoidal,2],
##         AXIS["geodetic latitude (Lat)",north,
##             ORDER[1],
##             ANGLEUNIT["degree",0.0174532925199433]],
##         AXIS["geodetic longitude (Lon)",east,
##             ORDER[2],
##             ANGLEUNIT["degree",0.0174532925199433]],
##     USAGE[
##         SCOPE["Horizontal component of 3D system."],
##         AREA["World."],
##         BBOX[-90,-180,90,180]],
##     ID["EPSG",4326]]

nyc_health_facilities <- st_transform(nyc_health_facilities, st_crs(zip_food_count))

nyc_hosp <- nyc_health_facilities %>% dplyr::filter(`Short Description` == "HOSP")
zip_food_health <- sf::st_join(zip_food_count, nyc_hosp, join = sf::st_contains)
names(zip_food_health)

##  [1] "ZIPCODE.x"                    "BLDGZIP"                     
##  [3] "PO_NAME"                      "POPULATION"                  
##  [5] "AREA"                         "STATE"                       
##  [7] "COUNTY"                       "ST_FIPS"                     
##  [9] "CTY_FIPS"                     "URL"                         
## [11] "SHAPE_AREA"                   "SHAPE_LEN"                   
## [13] "Positive"                     "Total"                       
## [15] "zcta_cum.perc_pos"            "ZIPCODE.y"                   
## [17] "num_food_store"               "Facility ID"                 
## [19] "Facility Name"                "Short Description"           
## [21] "Description"                  "Facility Open Date"          
## [23] "Facility Address 1"           "Facility Address 2"          
## [25] "Facility City"                "Facility State"              
## [27] "Facility Zip Code"            "Facility Phone Number"       
## [29] "Facility Fax Number"          "Facility Website"            
## [31] "Facility County Code"         "Facility County"             
## [33] "Regional Office ID"           "Regional Office"             
## [35] "Main Site Name"               "Main Site Facility ID"       
## [37] "Operating Certificate Number" "Operator Name"               
## [39] "Operator Address 1"           "Operator Address 2"          
## [41] "Operator City"                "Operator State"              
## [43] "Operator Zip Code"            "Cooperator Name"             
## [45] "Cooperator Address"           "Cooperator Address 2"        
## [47] "Cooperator City"              "Cooperator State"            
## [49] "Cooperator Zip Code"          "Ownership Type"              
## [51] "Facility Location"            "geometry"

health_count <- zip_food_health %>% dplyr::group_by(ZIPCODE.x) %>% dplyr::summarise(num_hosp = sum(!is.na(`Short Description`)))

zip_food_health_by_hospitals <- sf::st_join(zip_food_count, health_count, by= "ZIPCODE")
names(zip_food_health_by_hospitals)

##  [1] "ZIPCODE.x.x"       "BLDGZIP"           "PO_NAME"          
##  [4] "POPULATION"        "AREA"              "STATE"            
##  [7] "COUNTY"            "ST_FIPS"           "CTY_FIPS"         
## [10] "URL"               "SHAPE_AREA"        "SHAPE_LEN"        
## [13] "Positive"          "Total"             "zcta_cum.perc_pos"
## [16] "ZIPCODE.y"         "num_food_store"    "ZIPCODE.x.y"      
## [19] "num_hosp"          "geometry"

task 4: Join the Census ACS population, race, and age data to the NYC Planning Census Tract Data.

nycCensus <- sf::st_read(
  "./R-Spatial_II_Lab/2010 Census Tracts/geo_export_1dc7b645-647b-4806-b9a0-7b79660f120a.shp"
)

## Reading layer `geo_export_1dc7b645-647b-4806-b9a0-7b79660f120a' from data source `/Users/areeba/Desktop/R/R-spatial/Week8/R-Spatial_II_Lab/2010 Census Tracts/geo_export_1dc7b645-647b-4806-b9a0-7b79660f120a.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 2165 features and 11 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -74.25559 ymin: 40.49612 xmax: -73.70001 ymax: 40.91553
## Geodetic CRS:  WGS84(DD)

str(nycCensus)

## Classes 'sf' and 'data.frame':   2165 obs. of  12 variables:
##  $ boro_code : chr  "5" "1" "1" "1" ...
##  $ boro_ct201: chr  "5000900" "1009800" "1010000" "1010200" ...
##  $ boro_name : chr  "Staten Island" "Manhattan" "Manhattan" "Manhattan" ...
##  $ cdeligibil: chr  "E" "I" "I" "I" ...
##  $ ct2010    : chr  "000900" "009800" "010000" "010200" ...
##  $ ctlabel   : chr  "9" "98" "100" "102" ...
##  $ ntacode   : chr  "SI22" "MN19" "MN19" "MN17" ...
##  $ ntaname   : chr  "West New Brighton-New Brighton-St. George" "Turtle Bay-East Midtown" "Turtle Bay-East Midtown" "Midtown-Midtown South" ...
##  $ puma      : chr  "3903" "3808" "3808" "3807" ...
##  $ shape_area: num  2497010 1906016 1860938 1860993 1864600 ...
##  $ shape_leng: num  7729 5534 5692 5688 5693 ...
##  $ geometry  :sfc_MULTIPOLYGON of length 2165; first list element: List of 1
##   ..$ :List of 1
##   .. ..$ : num [1:28, 1:2] -74.1 -74.1 -74.1 -74.1 -74.1 ...
##   ..- attr(*, "class")= chr [1:3] "XY" "MULTIPOLYGON" "sfg"
##  - attr(*, "sf_column")= chr "geometry"
##  - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA NA ...
##   ..- attr(*, "names")= chr [1:11] "boro_code" "boro_ct201" "boro_name" "cdeligibil" ...

nycCensus %<>% dplyr::mutate(cntyFIPS = case_when(
  boro_name == 'Bronx' ~ '005',
  boro_name == 'Brooklyn' ~ '047',
  boro_name == 'Manhattan' ~ '061',
  boro_name == 'Queens' ~ '081',
  boro_name == 'Staten Island' ~ '085'),
  tractFIPS = paste(cntyFIPS, ct2010, sep='')
)

acsData <- readLines("./R-Spatial_II_Lab/ACSDP5Y2018.DP05_data_with_overlays_2020-04-22T132935.csv") %>%
  magrittr::extract(-2) %>% 
  textConnection() %>%
  read.csv(header=TRUE, quote= "\"") %>%
  dplyr::select(GEO_ID, 
                totPop = DP05_0001E, elderlyPop = DP05_0024E, # >= 65
                malePop = DP05_0002E, femalePop = DP05_0003E,  
                whitePop = DP05_0037E, blackPop = DP05_0038E,
                asianPop = DP05_0067E, hispanicPop = DP05_0071E,
                adultPop = DP05_0021E, citizenAdult = DP05_0087E) %>%
  dplyr::mutate(censusCode = stringr::str_sub(GEO_ID, -9,-1));

acsData %>%
  magrittr::extract(1:10,)

##                  GEO_ID totPop elderlyPop malePop femalePop whitePop blackPop
## 1  1400000US36005000100   7080         51    6503       577     1773     4239
## 2  1400000US36005000200   4542        950    2264      2278     2165     1279
## 3  1400000US36005000400   5634        710    2807      2827     2623     1699
## 4  1400000US36005001600   5917        989    2365      3552     2406     2434
## 5  1400000US36005001900   2765         76    1363      1402      585     1041
## 6  1400000US36005002000   9409        977    4119      5290     3185     4487
## 7  1400000US36005002300   4600        648    2175      2425      479     2122
## 8  1400000US36005002400    172          0     121        51       69       89
## 9  1400000US36005002500   5887        548    2958      2929      903     1344
## 10 1400000US36005002701   2868        243    1259      1609      243      987
##    asianPop hispanicPop adultPop citizenAdult censusCode
## 1       130        2329     6909         6100  005000100
## 2       119        3367     3582         2952  005000200
## 3       226        3873     4507         4214  005000400
## 4        68        3603     4416         3851  005001600
## 5       130        1413     2008         1787  005001900
## 6        29        5905     6851         6170  005002000
## 7        27        2674     3498         3056  005002300
## 8        14           0      131           42  005002400
## 9        68        4562     4237         2722  005002500
## 10        0        1985     1848         1412  005002701

popData <- merge(nycCensus, acsData, by.x ='tractFIPS', by.y = 'censusCode')

sum(popData$totPop)

## [1] 8443713

names(popData)

##  [1] "tractFIPS"    "boro_code"    "boro_ct201"   "boro_name"    "cdeligibil"  
##  [6] "ct2010"       "ctlabel"      "ntacode"      "ntaname"      "puma"        
## [11] "shape_area"   "shape_leng"   "cntyFIPS"     "GEO_ID"       "totPop"      
## [16] "elderlyPop"   "malePop"      "femalePop"    "whitePop"     "blackPop"    
## [21] "asianPop"     "hispanicPop"  "adultPop"     "citizenAdult" "geometry"

task 5: Aggregate the ACS census data to zip code area data.

popNYC <- sf::st_transform(popData, st_crs(zip_covid_merge)) 

popData %>% magrittr::extract('elderlyPop') %>% plot(breaks = 'jenks')

str(zip_food_health_by_hospitals)

## Classes 'sf' and 'data.frame':   7744 obs. of  20 variables:
##  $ ZIPCODE.x.x      : num  11436 11436 11436 11436 11436 ...
##  $ BLDGZIP          : chr  "0" "0" "0" "0" ...
##  $ PO_NAME          : chr  "Jamaica" "Jamaica" "Jamaica" "Jamaica" ...
##  $ POPULATION       : num  18681 18681 18681 18681 18681 ...
##  $ AREA             : num  22699295 22699295 22699295 22699295 22699295 ...
##  $ STATE            : chr  "NY" "NY" "NY" "NY" ...
##  $ COUNTY           : chr  "Queens" "Queens" "Queens" "Queens" ...
##  $ ST_FIPS          : chr  "36" "36" "36" "36" ...
##  $ CTY_FIPS         : chr  "081" "081" "081" "081" ...
##  $ URL              : chr  "http://www.usps.com/" "http://www.usps.com/" "http://www.usps.com/" "http://www.usps.com/" ...
##  $ SHAPE_AREA       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ SHAPE_LEN        : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Positive         : num  269 269 269 269 269 269 269 269 269 269 ...
##  $ Total            : num  412 412 412 412 412 412 412 412 412 412 ...
##  $ zcta_cum.perc_pos: num  65.3 65.3 65.3 65.3 65.3 ...
##  $ ZIPCODE.y        : num  11420 11420 11420 11420 11420 ...
##  $ num_food_store   : int  3 3 3 3 3 3 0 0 0 0 ...
##  $ ZIPCODE.x.y      : num  11420 11430 11433 11434 11435 ...
##  $ num_hosp         : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ geometry         :sfc_POLYGON of length 7744; first list element: List of 1
##   ..$ : num [1:159, 1:2] 1038098 1038142 1038171 1038280 1038521 ...
##   ..- attr(*, "class")= chr [1:3] "XY" "POLYGON" "sfg"
##  - attr(*, "sf_column")= chr "geometry"
##  - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA NA ...
##   ..- attr(*, "names")= chr [1:19] "ZIPCODE.x.x" "BLDGZIP" "PO_NAME" "POPULATION" ...

covidPopZipNYC <- sf::st_join(zip_covid_merge, 
                              popNYC %>% sf::st_centroid(), 
                              join = st_contains) %>% 
  group_by(ZIPCODE, PO_NAME, POPULATION, COUNTY, Positive, Total) %>% 
  summarise(totPop = sum(totPop),
            malePctg = sum(malePop)/totPop*100, 
            asianPop = sum(asianPop),
            blackPop = sum(blackPop),
            hispanicPop = sum(hispanicPop),
            whitePop = sum(whitePop))

## Warning: st_centroid assumes attributes are constant over geometries

## `summarise()` has regrouped the output.
## ℹ Summaries were computed grouped by ZIPCODE, PO_NAME, POPULATION, COUNTY,
##   Positive, and Total.
## ℹ Output is grouped by ZIPCODE, PO_NAME, POPULATION, COUNTY, and Positive.
## ℹ Use `summarise(.groups = "drop_last")` to silence this message.
## ℹ Use `summarise(.by = c(ZIPCODE, PO_NAME, POPULATION, COUNTY, Positive,
##   Total))` for per-operation grouping (`?dplyr::dplyr_by`) instead.

covidPopZipNYC %>% head()

## Simple feature collection with 6 features and 12 fields
## Geometry type: GEOMETRY
## Dimension:     XY
## Bounding box:  xmin: 971132.6 ymin: 188447.3 xmax: 998309.7 ymax: 230942.5
## Projected CRS: NAD83 / New York Long Island (ftUS)
## # A tibble: 6 × 13
## # Groups:   ZIPCODE, PO_NAME, POPULATION, COUNTY, Positive [6]
##   ZIPCODE PO_NAME      POPULATION COUNTY Positive Total totPop malePctg asianPop
##     <dbl> <chr>             <dbl> <chr>     <dbl> <dbl>  <int>    <dbl>    <int>
## 1      83 Central Park         25 New Y…       NA    NA      3      0          0
## 2   10001 New York          22413 New Y…      211   448  19146     51.2     4837
## 3   10002 New York          81305 New Y…      539  1024  74310     48.4    32149
## 4   10003 New York          55878 New Y…      279   662  53487     50.3     8027
## 5   10004 New York           2187 New Y…       23    59     NA     NA         NA
## 6   10005 New York           8107 New Y…       38   116   8809     42.8     1974
## # ℹ 4 more variables: blackPop <int>, hispanicPop <int>, whitePop <int>,
## #   geometry <GEOMETRY [US_survey_foot]>

sum(covidPopZipNYC$totPop, na.rm = T)

## [1] 8395306

covidPopZipNYC %>% dplyr::filter(is.na(totPop))

## Simple feature collection with 68 features and 12 fields
## Geometry type: GEOMETRY
## Dimension:     XY
## Bounding box:  xmin: 971132.6 ymin: 151085.5 xmax: 1067494 ymax: 263362
## Projected CRS: NAD83 / New York Long Island (ftUS)
## # A tibble: 68 × 13
## # Groups:   ZIPCODE, PO_NAME, POPULATION, COUNTY, Positive [68]
##    ZIPCODE PO_NAME  POPULATION COUNTY   Positive Total totPop malePctg asianPop
##  *   <dbl> <chr>         <dbl> <chr>       <dbl> <dbl>  <int>    <dbl>    <int>
##  1   10004 New York       2187 New York       23    59     NA       NA       NA
##  2   10020 New York          0 New York       NA    NA     NA       NA       NA
##  3   10041 New York          0 New York       NA    NA     NA       NA       NA
##  4   10043 New York          0 New York       NA    NA     NA       NA       NA
##  5   10045 New York          0 New York       NA    NA     NA       NA       NA
##  6   10047 New York          0 New York       NA    NA     NA       NA       NA
##  7   10048 New York          0 New York       NA    NA     NA       NA       NA
##  8   10055 New York         12 New York       NA    NA     NA       NA       NA
##  9   10075 New York      25203 New York      262   564     NA       NA       NA
## 10   10080 New York          0 New York       NA    NA     NA       NA       NA
## # ℹ 58 more rows
## # ℹ 4 more variables: blackPop <int>, hispanicPop <int>, whitePop <int>,
## #   geometry <GEOMETRY [US_survey_foot]>

plot(covidPopZipNYC["Positive"], breaks='jenks')

plot(covidPopZipNYC["asianPop"], breaks='jenks')

R Week 8 Assignment

Areeba Ghumman

3/20/2026

Explanation of the template

R Spatial Lab Assignment # 2

task 1: Join the COVID-19 data to the NYC zip code area data (sf or sp polygons).

task 2: Aggregate the NYC food retails store data (points) to the zip code data, so that we know how many retail stores in each zip code area. Note that not all locations are for food retail. And we need to choose the specific types according to the data.

task 3: Aggregate the NYC health facilities (points) to the zip code data. Similarly, choose appropriate subtypes such as nursing homes from the facilities.

task 4: Join the Census ACS population, race, and age data to the NYC Planning Census Tract Data.

task 5: Aggregate the ACS census data to zip code area data.