R Week 08 Assignment

Author

Victoria Campbell

Published

February 22, 2025

Explanation of the Site

This Rpub includes the code used for week 08’ lab.

In the following R code chunk, load_packages is the code chunk name. include=FALSE suggests that the code chunk will run, but the code itself and its outputs will not be included in the rendered HTML. echo=TRUE in the following code chunk suggests that the code and results from running the code will be included in the rendered HTML.

R Spatial Lab Assignment # 8

Don’t use a single chunk for the entire assignment. Break it into multiple chunks.

task 1:

Join the COVID-19 data to the NYC zip code area data (sf or sp polygons).

Show the code
##04, 12, 2020 data
nyc412<- readr::read_csv("C:/Users/campb/Downloads/R-Spatial_II_Lab/tests-by-zcta_2020_04_12.csv", lazy = FALSE)
Rows: 178 Columns: 4
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (4): MODZCTA, Positive, Total, zcta_cum.perc_pos

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Show the code
str(nyc412)
spc_tbl_ [178 × 4] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ MODZCTA          : num [1:178] NA 10001 10002 10003 10004 ...
 $ Positive         : num [1:178] 1934 211 539 279 23 ...
 $ Total            : num [1:178] 2082 448 1024 662 59 ...
 $ zcta_cum.perc_pos: num [1:178] 92.9 47.1 52.6 42.1 39 ...
 - attr(*, "spec")=
  .. cols(
  ..   MODZCTA = col_double(),
  ..   Positive = col_double(),
  ..   Total = col_double(),
  ..   zcta_cum.perc_pos = col_double()
  .. )
 - attr(*, "problems")=<externalptr> 
Show the code
#Step 2, Reading in the shapefile to a sf. 
nyctracts2010 <- st_read("D:/R-Spatial/Data/R-Spatial_II_Lab/geo_export_213dae92-dbda-4a0a-bf58-bfee89c37e04.shp")
Reading layer `geo_export_213dae92-dbda-4a0a-bf58-bfee89c37e04' from data source `D:\R-Spatial\Data\R-Spatial_II_Lab\geo_export_213dae92-dbda-4a0a-bf58-bfee89c37e04.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 178 features and 4 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -74.25559 ymin: 40.49612 xmax: -73.70001 ymax: 40.91553
Geodetic CRS:  WGS 84
Show the code
#Merging the census data into the census tract.
nyc412 <- nyc412 %>%
  rename(zcta = MODZCTA)

nyc_covid <- base::merge(nyctracts2010, nyc412, by.x = "zcta", by.y = "zcta")
names(nyc_covid) 
[1] "zcta"              "modzcta"           "label"            
[4] "pop_est"           "Positive"          "Total"            
[7] "zcta_cum.perc_pos" "geometry"         

task 2:

Aggregate the NYC food retails store data (points) to the zip code data, so that we know how many retail stores in each zip code area. Note that not all locations are for food retail. And we need to choose the specific types according to the data.

Show the code
#Aggregating the Data
NYS_Food_Retail_Stores <- st_read("D:/R-Spatial/Data/R-Spatial_II_Lab/nycFoodStore.shp")
Reading layer `nycFoodStore' from data source 
  `D:\R-Spatial\Data\R-Spatial_II_Lab\nycFoodStore.shp' using driver `ESRI Shapefile'
Simple feature collection with 11300 features and 16 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -74.2484 ymin: 40.50782 xmax: -73.67061 ymax: 40.91008
Geodetic CRS:  WGS 84
Show the code
nyc_covid %>% 
  mutate(tract_area = st_area(geometry)) %>% 
  head()
Simple feature collection with 6 features and 8 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -74.01799 ymin: 40.69978 xmax: -73.9714 ymax: 40.74542
Geodetic CRS:  WGS 84
   zcta modzcta label pop_est Positive Total zcta_cum.perc_pos
1 10002   10002 10002   74993      539  1024             52.64
2 10003   10003 10003   54682      279   662             42.15
3 10004   10004 10004    3028       23    59             38.98
4 10006   10006 10006    3454       10    43             23.26
5 10009   10009 10009   57925      381   851             44.77
6 10010   10010 10010   33730      166   468             35.47
                        geometry      tract_area
1 MULTIPOLYGON (((-73.9975 40... 2278152.0 [m^2]
2 MULTIPOLYGON (((-73.98864 4... 1496239.8 [m^2]
3 MULTIPOLYGON (((-74.00827 4...  381499.7 [m^2]
4 MULTIPOLYGON (((-74.01251 4...  251562.0 [m^2]
5 MULTIPOLYGON (((-73.98864 4... 1569385.4 [m^2]
6 MULTIPOLYGON (((-73.97979 4...  976211.8 [m^2]
Show the code
nyc_covid %>% 
  mutate(tract_area = st_area(geometry)) %>% 
  st_transform(4326) %>%
  st_join(NYS_Food_Retail_Stores) %>%
  head()
Simple feature collection with 6 features and 24 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -73.9975 ymin: 40.70912 xmax: -73.97351 ymax: 40.72414
Geodetic CRS:  WGS 84
     zcta modzcta label pop_est Positive Total zcta_cum.perc_pos    tract_area
1   10002   10002 10002   74993      539  1024             52.64 2278152 [m^2]
1.1 10002   10002 10002   74993      539  1024             52.64 2278152 [m^2]
1.2 10002   10002 10002   74993      539  1024             52.64 2278152 [m^2]
1.3 10002   10002 10002   74993      539  1024             52.64 2278152 [m^2]
1.4 10002   10002 10002   74993      539  1024             52.64 2278152 [m^2]
1.5 10002   10002 10002   74993      539  1024             52.64 2278152 [m^2]
     ï__Cnty Lcns_Nm Oprtn_T Estbl_T                        Entty_N
1   New York  628095   Store     JAC                FERNANDEZ JORGE
1.1 New York  629082   Store     JAC           124 DELI GROCERY INC
1.2 New York  628413   Store     JAC              208 RIVINGTON INC
1.3 New York  620979   Store    JACK 47 DIVISION STREET TRADING INC
1.4 New York  627543   Store     JAC            47 PITT GROCERY INC
1.5 New York  701940   Store     JAC    51 EAST BROADWAY MARKET INC
                    DBA_Nam Strt_Nmb     Stret_Nm Add_L_2 Add_L_3     City
1   122 SUFFOLK STOP 1 DELI      122   SUFFOLK ST      NA      NA NEW YORK
1.1        124 DELI GROCERY      124   E BROADWAY      NA      NA NEW YORK
1.2           208 RIVINGTON      208 RIVINGTON ST      NA      NA NEW YORK
1.3  47 DIVISION ST TRADING       47  DIVISION ST      NA      NA NEW YORK
1.4         47 PITT GROCERY       47      PITT ST      NA      NA NEW YORK
1.5 51 EAST BROADWAY MARKET       51   E BROADWAY      NA      NA NEW YORK
    State Zip_Cod Sqr_Ftg
1      NY   10002   1,000
1.1    NY   10002   1,500
1.2    NY   10002       0
1.3    NY   10002   2,000
1.4    NY   10002   1,500
1.5    NY   10002   2,500
                                                         Locatin
1    122 SUFFOLK ST\nNEW YORK, NY 10002\n(40.719204, -73.985959)
1.1  124 E BROADWAY\nNEW YORK, NY 10002\n(40.713896, -73.992096)
1.2 208 RIVINGTON ST\nNEW YORK, NY 10002\n(40.718419, -73.98291)
1.3  47 DIVISION ST\nNEW YORK, NY 10002\n(40.714205, -73.995252)
1.4      47 PITT ST\nNEW YORK, NY 10002\n(40.717311, -73.982991)
1.5     51 E BROADWAY\nNEW YORK, NY 10002\n(40.7136, -73.995653)
                   Coords                       geometry
1   40.719204, -73.985959 MULTIPOLYGON (((-73.9975 40...
1.1 40.713896, -73.992096 MULTIPOLYGON (((-73.9975 40...
1.2  40.718419, -73.98291 MULTIPOLYGON (((-73.9975 40...
1.3 40.714205, -73.995252 MULTIPOLYGON (((-73.9975 40...
1.4 40.717311, -73.982991 MULTIPOLYGON (((-73.9975 40...
1.5   40.7136, -73.995653 MULTIPOLYGON (((-73.9975 40...
Show the code
nyc_food_sf <- nyctracts2010 %>% 
  mutate(tract_area = st_area(geometry)) %>%
  st_transform(4326) %>%
  st_join(NYS_Food_Retail_Stores) %>%
  group_by(zcta)

task 3:

Aggregate the NYC health facilities (points) to the zip code data. Similarly, choose appropriate subtypes such as nursing homes from the facilities.

Show the code
#Aggregating the health data
NYS_health<- read_csv("D:/R-Spatial/Data/R-Spatial_II_Lab/NYS_Health_Facility.csv")
Rows: 3990 Columns: 36
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (28): Facility Name, Short Description, Description, Facility Open Date,...
dbl  (8): Facility ID, Facility Phone Number, Facility Fax Number, Facility ...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Show the code
NYS_HealthAg <- NYS_health %>%
  dplyr::filter(`Short Description` == "NH") %>%
  dplyr::group_by(`Facility Zip Code`)

nyc_healthjoin <- nyctracts2010 %>%
  left_join(NYS_HealthAg, by = c("zcta" = "Facility Zip Code"))

task 4:

Join the Census ACS population, race, and age data to the NYC Planning Census Tract Data.

Show the code
#Join the Census ACS population, race, and age data to the NYC Planning Census Tract Data.
nycCensus <- sf::st_read("C:/Users/campb/Downloads/R-Spatial_II_Lab/2010 Census Tracts/geo_export_1dc7b645-647b-4806-b9a0-7b79660f120a.shp", stringsAsFactors = FALSE)
Reading layer `geo_export_1dc7b645-647b-4806-b9a0-7b79660f120a' from data source `C:\Users\campb\Downloads\R-Spatial_II_Lab\2010 Census Tracts\geo_export_1dc7b645-647b-4806-b9a0-7b79660f120a.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 2165 features and 11 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -74.25559 ymin: 40.49612 xmax: -73.70001 ymax: 40.91553
Geodetic CRS:  WGS84(DD)
Show the code
str(nycCensus)
Classes 'sf' and 'data.frame':  2165 obs. of  12 variables:
 $ boro_code : chr  "5" "1" "1" "1" ...
 $ boro_ct201: chr  "5000900" "1009800" "1010000" "1010200" ...
 $ boro_name : chr  "Staten Island" "Manhattan" "Manhattan" "Manhattan" ...
 $ cdeligibil: chr  "E" "I" "I" "I" ...
 $ ct2010    : chr  "000900" "009800" "010000" "010200" ...
 $ ctlabel   : chr  "9" "98" "100" "102" ...
 $ ntacode   : chr  "SI22" "MN19" "MN19" "MN17" ...
 $ ntaname   : chr  "West New Brighton-New Brighton-St. George" "Turtle Bay-East Midtown" "Turtle Bay-East Midtown" "Midtown-Midtown South" ...
 $ puma      : chr  "3903" "3808" "3808" "3807" ...
 $ shape_area: num  2497010 1906016 1860938 1860993 1864600 ...
 $ shape_leng: num  7729 5534 5692 5688 5693 ...
 $ geometry  :sfc_MULTIPOLYGON of length 2165; first list element: List of 1
  ..$ :List of 1
  .. ..$ : num [1:28, 1:2] -74.1 -74.1 -74.1 -74.1 -74.1 ...
  ..- attr(*, "class")= chr [1:3] "XY" "MULTIPOLYGON" "sfg"
 - attr(*, "sf_column")= chr "geometry"
 - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA NA ...
  ..- attr(*, "names")= chr [1:11] "boro_code" "boro_ct201" "boro_name" "cdeligibil" ...
Show the code
nycCensus %>% dplyr::mutate(cntyFIPS = case_when(
  boro_name == 'Bronx' ~ '005',
  boro_name == 'Brooklyn' ~ '047',
  boro_name == 'Manhattan' ~ '061',
  boro_name == 'Queens' ~ '081',
  boro_name == 'Staten Island' ~ '085'),
  tractFIPS = paste(cntyFIPS, ct2010, sep='')
)
Simple feature collection with 2165 features and 13 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -74.25559 ymin: 40.49612 xmax: -73.70001 ymax: 40.91553
Geodetic CRS:  WGS84(DD)
First 10 features:
   boro_code boro_ct201     boro_name cdeligibil ct2010 ctlabel ntacode
1          5    5000900 Staten Island          E 000900       9    SI22
2          1    1009800     Manhattan          I 009800      98    MN19
3          1    1010000     Manhattan          I 010000     100    MN19
4          1    1010200     Manhattan          I 010200     102    MN17
5          1    1010400     Manhattan          I 010400     104    MN17
6          1    1011300     Manhattan          I 011300     113    MN17
7          1    1011402     Manhattan          I 011402  114.02    MN40
8          1    1013000     Manhattan          I 013000     130    MN40
9          1    1014000     Manhattan          I 014000     140    MN40
10         1    1014801     Manhattan          I 014801  148.01    MN40
                                     ntaname puma shape_area shape_leng
1  West New Brighton-New Brighton-St. George 3903  2497009.7   7729.017
2                    Turtle Bay-East Midtown 3808  1906016.4   5534.200
3                    Turtle Bay-East Midtown 3808  1860938.4   5692.169
4                      Midtown-Midtown South 3807  1860992.7   5687.802
5                      Midtown-Midtown South 3807  1864600.4   5693.036
6                      Midtown-Midtown South 3807  1890907.3   5699.861
7              Upper East Side-Carnegie Hill 3805  1063547.4   4125.256
8              Upper East Side-Carnegie Hill 3805  1918144.5   5807.973
9              Upper East Side-Carnegie Hill 3805  1925984.2   5820.816
10             Upper East Side-Carnegie Hill 3805   559216.2   3135.951
                         geometry cntyFIPS tractFIPS
1  MULTIPOLYGON (((-74.07921 4...      085 085000900
2  MULTIPOLYGON (((-73.96433 4...      061 061009800
3  MULTIPOLYGON (((-73.96802 4...      061 061010000
4  MULTIPOLYGON (((-73.97124 4...      061 061010200
5  MULTIPOLYGON (((-73.97446 4...      061 061010400
6  MULTIPOLYGON (((-73.98412 4...      061 061011300
7  MULTIPOLYGON (((-73.96476 4...      061 061011402
8  MULTIPOLYGON (((-73.96148 4...      061 061013000
9  MULTIPOLYGON (((-73.95495 4...      061 061014000
10 MULTIPOLYGON (((-73.95398 4...      061 061014801
Show the code
acsData <- readLines("C:/Users/campb/Downloads/R-Spatial_II_Lab/ACSDP5Y2018.DP05_data_with_overlays_2020-04-22T132935.csv") %>%
  magrittr::extract(-2) %>% 
  textConnection() %>%
  read.csv(header=TRUE, quote= "\"") %>%
  dplyr::select(GEO_ID, 
                totPop = DP05_0001E, elderlyPop = DP05_0024E, # >= 65
                malePop = DP05_0002E, femalePop = DP05_0003E,  
                whitePop = DP05_0037E, blackPop = DP05_0038E,
                asianPop = DP05_0067E, hispanicPop = DP05_0071E,
                adultPop = DP05_0021E, citizenAdult = DP05_0087E) %>%
  dplyr::mutate(censusCode = stringr::str_sub(GEO_ID, -9,-1));

acsData %>%
  magrittr::extract(1:10,)
                 GEO_ID totPop elderlyPop malePop femalePop whitePop blackPop
1  1400000US36005000100   7080         51    6503       577     1773     4239
2  1400000US36005000200   4542        950    2264      2278     2165     1279
3  1400000US36005000400   5634        710    2807      2827     2623     1699
4  1400000US36005001600   5917        989    2365      3552     2406     2434
5  1400000US36005001900   2765         76    1363      1402      585     1041
6  1400000US36005002000   9409        977    4119      5290     3185     4487
7  1400000US36005002300   4600        648    2175      2425      479     2122
8  1400000US36005002400    172          0     121        51       69       89
9  1400000US36005002500   5887        548    2958      2929      903     1344
10 1400000US36005002701   2868        243    1259      1609      243      987
   asianPop hispanicPop adultPop citizenAdult censusCode
1       130        2329     6909         6100  005000100
2       119        3367     3582         2952  005000200
3       226        3873     4507         4214  005000400
4        68        3603     4416         3851  005001600
5       130        1413     2008         1787  005001900
6        29        5905     6851         6170  005002000
7        27        2674     3498         3056  005002300
8        14           0      131           42  005002400
9        68        4562     4237         2722  005002500
10        0        1985     1848         1412  005002701
Show the code
# Merge (JOIN) ACS data to the census tracts
popData <- merge(nycCensus, acsData, by.x ='ct2010', by.y = 'censusCode')

task 5:

Aggregate the ACS census data to zip code area data.

Show the code
popData <- st_transform(popData, st_crs(nyc_covid))

covidPopZipNYC <- st_join(
  nyc_covid,
  st_centroid(popData),
  join = st_contains
) %>%
  group_by(modzcta, label, pop_est, Positive, Total, zcta_cum.perc_pos) %>%
  summarise(
    totPop = sum(totPop, na.rm = TRUE),
    malePctg = sum(malePop, na.rm = TRUE) / totPop * 100,
    asianPop = sum(asianPop, na.rm = TRUE),
    blackPop = sum(blackPop, na.rm = TRUE),
    hispanicPop = sum(hispanicPop, na.rm = TRUE),
    whitePop = sum(whitePop, na.rm = TRUE),
    .groups = "drop"
  )