R Week 08 Assignment

R Spatial Lab Assignment # 2

Don’t use a single chunk for the entire assignment. Break it into multiple chunks.

task 1:

smpCode <- "hello, R markdown and RPubs!"

cat(smpCode)

## hello, R markdown and RPubs!

task 2:

Quarto markdown is different from R markdown in terms of chunk options. See chunk options at Quarto website.

print("This is the new code chunk options available in Quarto Markdown")

## [1] "This is the new code chunk options available in Quarto Markdown"

task 3: Load all necessary data

# Load data
ZipCodessf <- st_read("C:\\Users\\zim13\\Downloads\\Section_07\\Data\\R-Spatial_I_Lab\\ZIP_CODE_040114\\ZIP_CODE_040114.shp")

## Reading layer `ZIP_CODE_040114' from data source 
##   `C:\Users\zim13\Downloads\Section_07\Data\R-Spatial_I_Lab\ZIP_CODE_040114\ZIP_CODE_040114.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 263 features and 12 fields
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: 913129 ymin: 120020.9 xmax: 1067494 ymax: 272710.9
## Projected CRS: NAD83 / New York Long Island (ftUS)

FoodStoressf <- st_read('C:\\Users\\zim13\\Downloads\\Section_07\\Section_08\\R-Spatial_II_Lab\\R-Spatial_II_Lab\\nycFoodStore.shp')

## Reading layer `nycFoodStore' from data source 
##   `C:\Users\zim13\Downloads\Section_07\Section_08\R-Spatial_II_Lab\R-Spatial_II_Lab\nycFoodStore.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 11300 features and 16 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -74.2484 ymin: 40.50782 xmax: -73.67061 ymax: 40.91008
## Geodetic CRS:  WGS 84

CensusNYC <- st_read("C:\\Users\\zim13\\Downloads\\Section_07\\Section_08\\R-Spatial_II_Lab\\R-Spatial_II_Lab\\2010 Census Tracts\\geo_export_1dc7b645-647b-4806-b9a0-7b79660f120a.shp")

## Reading layer `geo_export_1dc7b645-647b-4806-b9a0-7b79660f120a' from data source `C:\Users\zim13\Downloads\Section_07\Section_08\R-Spatial_II_Lab\R-Spatial_II_Lab\2010 Census Tracts\geo_export_1dc7b645-647b-4806-b9a0-7b79660f120a.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 2165 features and 11 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -74.25559 ymin: 40.49612 xmax: -73.70001 ymax: 40.91553
## Geodetic CRS:  WGS84(DD)

HealthFacilitiesCsv <- read.csv("C:\\Users\\zim13\\Downloads\\Section_07\\Section_08\\R-Spatial_II_Lab\\R-Spatial_II_Lab\\NYS_Health_Facility.csv")
HealthFacilitiesCsv <- HealthFacilitiesCsv %>% filter(between(Facility.Longitude, -79.8, -71.8) & between(Facility.Latitude, 40.0, 45.5)) 
HealthFacilitiesSF <- st_as_sf(HealthFacilitiesCsv, coords = c("Facility.Longitude", "Facility.Latitude"), crs = 2263)

CovidData <- read.csv("C:\\Users\\zim13\\Downloads\\Section_07\\Section_08\\R-Spatial_II_Lab\\R-Spatial_II_Lab\\tests-by-zcta_2021_04_23.csv")
CovidClean <- CovidData %>% dplyr::select(MODIFIED_ZCTA, COVID_CASE_COUNT, COVID_CASE_RATE, PERCENT_POSITIVE, TOTAL_COVID_TESTS) %>% group_by(MODIFIED_ZCTA) %>% summarize(COVID_CASE_COUNT = sum(COVID_CASE_COUNT, na.rm = TRUE), COVID_CASE_RATE = mean(COVID_CASE_RATE, na.rm = TRUE), PERCENT_POSITIVE = mean(PERCENT_POSITIVE, na.rm = TRUE), TOTAL_COVID_TESTS = sum(TOTAL_COVID_TESTS, na.rm = TRUE))

ACS <- read.csv("C:\\Users\\zim13\\Downloads\\Section_07\\Section_08\\R-Spatial_II_Lab\\R-Spatial_II_Lab\\ACSDP5Y2018.DP05_data_with_overlays_2020-04-22T132935.csv")

task 4: Join the COVID-19 data to the NYC zip code area data (sf or sp polygons)

# Have both be characters so we can join them
CovidData$MODIFIED_ZCTA <- as.character(CovidData$MODIFIED_ZCTA)

# Left join by zipcode since its conssitent across both
ZipCodessf <- ZipCodessf %>%
  left_join(CovidData, by = c("ZIPCODE" = "MODIFIED_ZCTA"))
colnames(ZipCodessf)

##  [1] "ZIPCODE"           "BLDGZIP"           "PO_NAME"          
##  [4] "POPULATION"        "AREA"              "STATE"            
##  [7] "COUNTY"            "ST_FIPS"           "CTY_FIPS"         
## [10] "URL"               "SHAPE_AREA"        "SHAPE_LEN"        
## [13] "NEIGHBORHOOD_NAME" "BOROUGH_GROUP"     "label"            
## [16] "lat"               "lon"               "COVID_CASE_COUNT" 
## [19] "COVID_CASE_RATE"   "POP_DENOMINATOR"   "COVID_DEATH_COUNT"
## [22] "COVID_DEATH_RATE"  "PERCENT_POSITIVE"  "TOTAL_COVID_TESTS"
## [25] "geometry"

task 5: Aggregate the NYC food retails store data (points) to the zip code data, so that we know how many retail stores in each zip code area. Note that not all locations are for food retail. And we need to choose the specific types according to the data.

# Filter
FSfiltered <- FoodStoressf %>%
  filter(Oprtn_T == "Store")

#Double Check
colnames(FoodStoressf)

##  [1] "ï__Cnty"  "Lcns_Nm"  "Oprtn_T"  "Estbl_T"  "Entty_N"  "DBA_Nam" 
##  [7] "Strt_Nmb" "Stret_Nm" "Add_L_2"  "Add_L_3"  "City"     "State"   
## [13] "Zip_Cod"  "Sqr_Ftg"  "Locatin"  "Coords"   "geometry"

# Group zip codes
FSZip <- FoodStoressf %>% st_drop_geometry() %>% group_by(Zip_Cod) %>% summarize(store_count = n())

# Turn to character to join
FSZip <- FSZip %>% mutate(Zip_Cod = as.character(Zip_Cod))

CovidClean <- CovidClean %>% mutate(MODIFIED_ZCTA = as.character(MODIFIED_ZCTA))

ZipCovidsf <- ZipCodessf %>% left_join(CovidClean, by = c("ZIPCODE" = "MODIFIED_ZCTA"))

colnames(ZipCovidsf)

##  [1] "ZIPCODE"             "BLDGZIP"             "PO_NAME"            
##  [4] "POPULATION"          "AREA"                "STATE"              
##  [7] "COUNTY"              "ST_FIPS"             "CTY_FIPS"           
## [10] "URL"                 "SHAPE_AREA"          "SHAPE_LEN"          
## [13] "NEIGHBORHOOD_NAME"   "BOROUGH_GROUP"       "label"              
## [16] "lat"                 "lon"                 "COVID_CASE_COUNT.x" 
## [19] "COVID_CASE_RATE.x"   "POP_DENOMINATOR"     "COVID_DEATH_COUNT"  
## [22] "COVID_DEATH_RATE"    "PERCENT_POSITIVE.x"  "TOTAL_COVID_TESTS.x"
## [25] "COVID_CASE_COUNT.y"  "COVID_CASE_RATE.y"   "PERCENT_POSITIVE.y" 
## [28] "TOTAL_COVID_TESTS.y" "geometry"

mapview(ZipCovidsf, 
        zcol = "COVID_CASE_COUNT.x",        
        legend = TRUE,                   
        col.regions = colorRampPalette(c("lightblue", "yellow", "red"))(100))

## Warning: Found less unique colors (100) than unique zcol values (174)! 
## Interpolating color vector to match number of zcol values.

Task 6: Aggregate the NYC health facilities (points) to the zip code data. Similarly, choose appropriate subtypes such as nursing homes from the facilities.

# Filter nursing homes
NursingHomes <- HealthFacilitiesCsv %>%
  dplyr::filter(`Short.Description` == "NH") %>%
  dplyr::group_by(`Facility.Zip.Code`)

# Combine 
ZipCovidNHsf <- ZipCovidsf %>%
  left_join(NursingHomes, by = c("ZIPCODE" = "Facility.Zip.Code"))

## Warning in sf_column %in% names(g): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 6 of `x` matches multiple rows in `y`.
## ℹ Row 5 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
##   "many-to-many"` to silence this warning.

# Check, looks right
mapview(ZipCovidNHsf, zcol = "ZIPCODE", legend = TRUE)

Task 7: Join the Census ACS population, race, and age data to the NYC Planning Census Tract Data.

# Turn into spatial object
st_as_sf(CensusNYC)

## Simple feature collection with 2165 features and 11 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -74.25559 ymin: 40.49612 xmax: -73.70001 ymax: 40.91553
## Geodetic CRS:  WGS84(DD)
## First 10 features:
##    boro_code boro_ct201     boro_name cdeligibil ct2010 ctlabel ntacode
## 1          5    5000900 Staten Island          E 000900       9    SI22
## 2          1    1009800     Manhattan          I 009800      98    MN19
## 3          1    1010000     Manhattan          I 010000     100    MN19
## 4          1    1010200     Manhattan          I 010200     102    MN17
## 5          1    1010400     Manhattan          I 010400     104    MN17
## 6          1    1011300     Manhattan          I 011300     113    MN17
## 7          1    1011402     Manhattan          I 011402  114.02    MN40
## 8          1    1013000     Manhattan          I 013000     130    MN40
## 9          1    1014000     Manhattan          I 014000     140    MN40
## 10         1    1014801     Manhattan          I 014801  148.01    MN40
##                                      ntaname puma shape_area shape_leng
## 1  West New Brighton-New Brighton-St. George 3903  2497009.7   7729.017
## 2                    Turtle Bay-East Midtown 3808  1906016.4   5534.200
## 3                    Turtle Bay-East Midtown 3808  1860938.4   5692.169
## 4                      Midtown-Midtown South 3807  1860992.7   5687.802
## 5                      Midtown-Midtown South 3807  1864600.4   5693.036
## 6                      Midtown-Midtown South 3807  1890907.3   5699.861
## 7              Upper East Side-Carnegie Hill 3805  1063547.4   4125.256
## 8              Upper East Side-Carnegie Hill 3805  1918144.5   5807.973
## 9              Upper East Side-Carnegie Hill 3805  1925984.2   5820.816
## 10             Upper East Side-Carnegie Hill 3805   559216.2   3135.951
##                          geometry
## 1  MULTIPOLYGON (((-74.07921 4...
## 2  MULTIPOLYGON (((-73.96433 4...
## 3  MULTIPOLYGON (((-73.96802 4...
## 4  MULTIPOLYGON (((-73.97124 4...
## 5  MULTIPOLYGON (((-73.97446 4...
## 6  MULTIPOLYGON (((-73.98412 4...
## 7  MULTIPOLYGON (((-73.96476 4...
## 8  MULTIPOLYGON (((-73.96148 4...
## 9  MULTIPOLYGON (((-73.95495 4...
## 10 MULTIPOLYGON (((-73.95398 4...

# Double check
str(CensusNYC)

## Classes 'sf' and 'data.frame':   2165 obs. of  12 variables:
##  $ boro_code : chr  "5" "1" "1" "1" ...
##  $ boro_ct201: chr  "5000900" "1009800" "1010000" "1010200" ...
##  $ boro_name : chr  "Staten Island" "Manhattan" "Manhattan" "Manhattan" ...
##  $ cdeligibil: chr  "E" "I" "I" "I" ...
##  $ ct2010    : chr  "000900" "009800" "010000" "010200" ...
##  $ ctlabel   : chr  "9" "98" "100" "102" ...
##  $ ntacode   : chr  "SI22" "MN19" "MN19" "MN17" ...
##  $ ntaname   : chr  "West New Brighton-New Brighton-St. George" "Turtle Bay-East Midtown" "Turtle Bay-East Midtown" "Midtown-Midtown South" ...
##  $ puma      : chr  "3903" "3808" "3808" "3807" ...
##  $ shape_area: num  2497010 1906016 1860938 1860993 1864600 ...
##  $ shape_leng: num  7729 5534 5692 5688 5693 ...
##  $ geometry  :sfc_MULTIPOLYGON of length 2165; first list element: List of 1
##   ..$ :List of 1
##   .. ..$ : num [1:28, 1:2] -74.1 -74.1 -74.1 -74.1 -74.1 ...
##   ..- attr(*, "class")= chr [1:3] "XY" "MULTIPOLYGON" "sfg"
##  - attr(*, "sf_column")= chr "geometry"
##  - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA NA ...
##   ..- attr(*, "names")= chr [1:11] "boro_code" "boro_ct201" "boro_name" "cdeligibil" ...

# Replace boro names with FIP
CensusNYC %<>% dplyr::mutate(cntyFIPS = case_when(
  boro_name == 'Bronx' ~ '005', 
  boro_name == 'Brooklyn' ~ '047',
  boro_name == 'Manhattan' ~ '061',
  boro_name == 'Queens' ~ '081',
  boro_name == 'Staten Island' ~ '085'
  ),
  tractFIPS = paste(cntyFIPS, ct2010, sep='')
)

# Replace 2nd line
ACS <- readLines("C:\\Users\\zim13\\Downloads\\Section_07\\Section_08\\R-Spatial_II_Lab\\R-Spatial_II_Lab\\ACSDP5Y2018.DP05_data_with_overlays_2020-04-22T132935.csv") %>%
  magrittr::extract(-2) %>% 
  textConnection() %>%
  read.csv(header=TRUE, quote= "\"") %>%
  dplyr::select(GEO_ID, 
                totPop = DP05_0001E, elderlyPop = DP05_0024E, # >= 65
                malePop = DP05_0002E, femalePop = DP05_0003E,  
                whitePop = DP05_0037E, blackPop = DP05_0038E,
                asianPop = DP05_0067E, hispanicPop = DP05_0071E,
                adultPop = DP05_0021E, citizenAdult = DP05_0087E) %>%
  dplyr::mutate(censusCode = stringr::str_sub(GEO_ID, -9,-1));

# List first 10
ACS %>%
  magrittr::extract(1:10,)

##                  GEO_ID totPop elderlyPop malePop femalePop whitePop blackPop
## 1  1400000US36005000100   7080         51    6503       577     1773     4239
## 2  1400000US36005000200   4542        950    2264      2278     2165     1279
## 3  1400000US36005000400   5634        710    2807      2827     2623     1699
## 4  1400000US36005001600   5917        989    2365      3552     2406     2434
## 5  1400000US36005001900   2765         76    1363      1402      585     1041
## 6  1400000US36005002000   9409        977    4119      5290     3185     4487
## 7  1400000US36005002300   4600        648    2175      2425      479     2122
## 8  1400000US36005002400    172          0     121        51       69       89
## 9  1400000US36005002500   5887        548    2958      2929      903     1344
## 10 1400000US36005002701   2868        243    1259      1609      243      987
##    asianPop hispanicPop adultPop citizenAdult censusCode
## 1       130        2329     6909         6100  005000100
## 2       119        3367     3582         2952  005000200
## 3       226        3873     4507         4214  005000400
## 4        68        3603     4416         3851  005001600
## 5       130        1413     2008         1787  005001900
## 6        29        5905     6851         6170  005002000
## 7        27        2674     3498         3056  005002300
## 8        14           0      131           42  005002400
## 9        68        4562     4237         2722  005002500
## 10        0        1985     1848         1412  005002701

# Merge ACS data to census
POPdata <- merge(CensusNYC, ACS, by.x ='tractFIPS', by.y = 'censusCode')

# verify 
sum(POPdata$totPop)

## [1] 8443713

# Check gps notation
st_crs(POPdata)

## Coordinate Reference System:
##   User input: WGS84(DD) 
##   wkt:
## GEOGCRS["WGS84(DD)",
##     DATUM["WGS84",
##         ELLIPSOID["WGS84",6378137,298.257223563,
##             LENGTHUNIT["metre",1,
##                 ID["EPSG",9001]]]],
##     PRIMEM["Greenwich",0,
##         ANGLEUNIT["degree",0.0174532925199433]],
##     CS[ellipsoidal,2],
##         AXIS["longitude",east,
##             ORDER[1],
##             ANGLEUNIT["degree",0.0174532925199433]],
##         AXIS["latitude",north,
##             ORDER[2],
##             ANGLEUNIT["degree",0.0174532925199433]]]

# Match CRS
POPnyc <- sf::st_transform(POPdata, st_crs(ZipCovidNHsf))

# Check with elder population
POPdata %>% magrittr::extract('elderlyPop') %>% plot(breaks = 'jenks')

Task 8: Aggregate the ACS census data to zip code area data.

# Aggregate to zip code
ZipCovidNHPopNYCSF <- sf::st_join(ZipCovidNHsf, 
                              POPnyc %>% sf::st_centroid(), # this essentially converts census tracts to points
                              join = st_contains) %>% 
  group_by(ZIPCODE, PO_NAME, POPULATION, COUNTY, COVID_CASE_COUNT.y, TOTAL_COVID_TESTS.y) %>% # use names(zpNYC) and names(popNYC) to see what we have
  summarise(totPop = sum(totPop),
            malePctg = sum(malePop)/totPop*100, # note the totPop is the newly calculated
            asianPop = sum(asianPop),
            blackPop = sum(blackPop),
            hispanicPop = sum(hispanicPop),
            whitePop = sum(whitePop))

## Warning: st_centroid assumes attributes are constant over geometries

## `summarise()` has grouped output by 'ZIPCODE', 'PO_NAME', 'POPULATION',
## 'COUNTY', 'COVID_CASE_COUNT.y'. You can override using the `.groups` argument.

# Check
ZipCovidNHPopNYCSF %>% head()

## Simple feature collection with 6 features and 12 fields
## Geometry type: GEOMETRY
## Dimension:     XY
## Bounding box:  xmin: 971132.6 ymin: 188447.3 xmax: 998309.7 ymax: 230942.5
## Projected CRS: NAD83 / New York Long Island (ftUS)
## # A tibble: 6 × 13
## # Groups:   ZIPCODE, PO_NAME, POPULATION, COUNTY, COVID_CASE_COUNT.y [6]
##   ZIPCODE PO_NAME      POPULATION COUNTY  COVID_CASE_COUNT.y TOTAL_COVID_TESTS.y
##   <chr>   <chr>             <dbl> <chr>                <int>               <int>
## 1 00083   Central Park         25 New Yo…                 NA                  NA
## 2 10001   New York          22413 New Yo…               1542               20158
## 3 10002   New York          81305 New Yo…               5902               48197
## 4 10003   New York          55878 New Yo…               2803               41076
## 5 10004   New York           2187 New Yo…                247                3599
## 6 10005   New York           8107 New Yo…                413                6102
## # ℹ 7 more variables: totPop <int>, malePctg <dbl>, asianPop <int>,
## #   blackPop <int>, hispanicPop <int>, whitePop <int>,
## #   geometry <GEOMETRY [US_survey_foot]>

# Check and verify the data again
sum(ZipCovidNHPopNYCSF$totPop, na.rm = T)

## [1] 14200777

# Find NA
ZipCovidNHPopNYCSF %>% dplyr::filter(is.na(totPop))

## Simple feature collection with 68 features and 12 fields
## Geometry type: GEOMETRY
## Dimension:     XY
## Bounding box:  xmin: 971132.6 ymin: 151085.5 xmax: 1067494 ymax: 263362
## Projected CRS: NAD83 / New York Long Island (ftUS)
## # A tibble: 68 × 13
## # Groups:   ZIPCODE, PO_NAME, POPULATION, COUNTY, COVID_CASE_COUNT.y [68]
##    ZIPCODE PO_NAME  POPULATION COUNTY   COVID_CASE_COUNT.y TOTAL_COVID_TESTS.y
##  * <chr>   <chr>         <dbl> <chr>                 <int>               <int>
##  1 10004   New York       2187 New York                247                3599
##  2 10020   New York          0 New York                 NA                  NA
##  3 10041   New York          0 New York                 NA                  NA
##  4 10043   New York          0 New York                 NA                  NA
##  5 10045   New York          0 New York                 NA                  NA
##  6 10047   New York          0 New York                 NA                  NA
##  7 10048   New York          0 New York                 NA                  NA
##  8 10055   New York         12 New York                 NA                  NA
##  9 10075   New York      25203 New York               1453               18391
## 10 10080   New York          0 New York                 NA                  NA
## # ℹ 58 more rows
## # ℹ 7 more variables: totPop <int>, malePctg <dbl>, asianPop <int>,
## #   blackPop <int>, hispanicPop <int>, whitePop <int>,
## #   geometry <GEOMETRY [US_survey_foot]>

# Map
mapview(ZipCovidNHPopNYCSF["COVID_CASE_COUNT.y"], breaks='jenks')

# Plot
plot(ZipCovidNHPopNYCSF["asianPop"], breaks='jenks')

R Week 08 Assignment

Zim Zim

4/03/2025

Explanation of the template

R Spatial Lab Assignment # 2

task 1:

task 2:

task 3: Load all necessary data

task 4: Join the COVID-19 data to the NYC zip code area data (sf or sp polygons)

task 5: Aggregate the NYC food retails store data (points) to the zip code data, so that we know how many retail stores in each zip code area. Note that not all locations are for food retail. And we need to choose the specific types according to the data.

Task 6: Aggregate the NYC health facilities (points) to the zip code data. Similarly, choose appropriate subtypes such as nursing homes from the facilities.

Task 7: Join the Census ACS population, race, and age data to the NYC Planning Census Tract Data.

Task 8: Aggregate the ACS census data to zip code area data.