Methods 1, Week 10

In-class exercise, prep

I like to store my spatial data separately from my tabular data to keep things neat.

Create a new folder, called geo in two folders:
- part2/data/raw/geo
- part2/data/processing/geo
Download the shapefile of the 2020 Neighborhood Tabulation Areas from NYC Planning
- Unzip and save to your part2/data/raw/geo folder
Download the geojson of the NYC Borough Boundaries from NYC Open Data
- Move it to your main_data/data/raw/geo folder

Outline

New functions and concepts list
Managing missing values in calculations
Assignment questions and overview
Import spatial data from other sources
Spatial join
Homework

New functions and concepts

na.rm = TRUE
is.na(): tests if a value is NA
is.nan(): tests if a value is NaN (Not a Number)
multiple geoms in one ggplot
st_join()
st_crs()
st_transform()

Missing values

A missing value is a way to signal an absence of information in a dataset.

Common reasons for missing values:

the information is not available for that area
the information is not reliable for that area
there was an error in data collection or processing
- data joined improperly
- join ids don’t match in all areas

Missing values are a part of messy, real-world data. Understanding how missing data are defined in R and how to perform operations with them will be a critical component of your data cleaning and analysis work.

Missing values in R

In R, missing values typically look like an NA appearing in a variable, a vector, or a dataframe:

# create a vector of numeric values with one NA value
vector1 <- c(4, 6, 2, 8, NA, 9)

# view structure of vector1
str(vector1)

 num [1:6] 4 6 2 8 NA 9

You may also encounter missing values in datasets that aren’t NA.
Sometimes the creators of a dataset will use a numeric value to indicate missing data (such as 999) or a string of characters (such as "N/A" or “–”`).

NA example

library(tidyverse)

# load the propublica desegregation data in part 1 (you'll need to use a different filepath than me)
deseg_pp_raw <- read_csv("part1/data/raw/invol_data_propublica.csv")

District.Name	City	State	Year.Lifted	Year.Placed
Abbeville 60	Abbeville	SC	1984	N/A
Aberdeen School Dist	Aberdeen	MS	STILL OPEN	1969
Acadia Parish	Crowley	LA	1981	N/A
Affton 101	St Louis	MO	1999	N/A
Alabaster City	Alabaster	AL	STILL OPEN	1963

“N/A” as a missing value can make your dataset messier than it needs to be.

glimpse(deseg_pp_raw)

Rows: 769
Columns: 5
$ District.Name <chr> "Abbeville 60", "Aberdeen School Dist", "Acadia Parish",…
$ City          <chr> "Abbeville", "Aberdeen", "Crowley", "St Louis", "Alabast…
$ State         <chr> "SC", "MS", "LA", "MO", "AL", "FL", "NC", "TN", "TX", "A…
$ Year.Lifted   <chr> "1984", "STILL OPEN", "1981", "1999", "STILL OPEN", "197…
$ Year.Placed   <chr> "N/A", "1969", "N/A", "N/A", "1963", "N/A", "N/A", "1966…

The Year.Placed column should probably be numeric, but “N/A” makes R assume each value in that column is a character.

Managing the messines of missing values

As you begin to work with a new dataset, you should always investigate and document the following:

Are missing values are present in my data?
Where are the missing values?
How will these missing values affect my analysis?

If NA values are not represented by NA you can:

Define the NA value while reading data into R
Change the data type of a column to force NA conversion
Use ifelse() to redefine value to ’NA

Define the `NA` value during import

# define the dataset's NA value during data import
deseg_pp_clean_na <- read_csv("part1/data/raw/invol_data_propublica.csv", 
                              na = "N/A")

District.Name	City	State	Year.Lifted	Year.Placed
Abbeville 60	Abbeville	SC	1984	NA
Aberdeen School Dist	Aberdeen	MS	STILL OPEN	1969
Acadia Parish	Crowley	LA	1981	NA
Affton 101	St Louis	MO	1999	NA
Alabaster City	Alabaster	AL	STILL OPEN	1963

glimpse(deseg_pp_clean_na)

Rows: 769
Columns: 5
$ District.Name <chr> "Abbeville 60", "Aberdeen School Dist", "Acadia Parish",…
$ City          <chr> "Abbeville", "Aberdeen", "Crowley", "St Louis", "Alabast…
$ State         <chr> "SC", "MS", "LA", "MO", "AL", "FL", "NC", "TN", "TX", "A…
$ Year.Lifted   <chr> "1984", "STILL OPEN", "1981", "1999", "STILL OPEN", "197…
$ Year.Placed   <dbl> NA, 1969, NA, NA, 1963, NA, NA, 1966, NA, NA, NA, 1968, …

note: the Year.Placed column is now a numeric column

Changing the data type to force conversion

# change a column's data type from character to numeric
deseg_conversion_clean <- deseg_pp_raw |>
  mutate(Year.Placed = as.numeric(Year.Lifted))

District.Name	City	State	Year.Lifted	Year.Placed
Abbeville 60	Abbeville	SC	1984	1984
Aberdeen School Dist	Aberdeen	MS	STILL OPEN	NA
Acadia Parish	Crowley	LA	1981	1981
Affton 101	St Louis	MO	1999	1999
Alabaster City	Alabaster	AL	STILL OPEN	NA

glimpse(deseg_conversion_clean)

Rows: 769
Columns: 5
$ District.Name <chr> "Abbeville 60", "Aberdeen School Dist", "Acadia Parish",…
$ City          <chr> "Abbeville", "Aberdeen", "Crowley", "St Louis", "Alabast…
$ State         <chr> "SC", "MS", "LA", "MO", "AL", "FL", "NC", "TN", "TX", "A…
$ Year.Lifted   <chr> "1984", "STILL OPEN", "1981", "1999", "STILL OPEN", "197…
$ Year.Placed   <dbl> 1984, NA, 1981, 1999, NA, 1971, 2009, NA, 2002, 2002, NA…

note: the Year.Placed column is now numeric and the “N/A” values were converted to NA.

Use `ifelse()` to redefine value of NA values

# change a column's data type from character to numeric
deseg_conversion_clean_ifelse <- deseg_pp_raw |>
  mutate(Year.Placed = ifelse(Year.Placed == "N/A", 
                              NA, 
                              Year.Placed))

District.Name	City	State	Year.Lifted	Year.Placed
Abbeville 60	Abbeville	SC	1984	NA
Aberdeen School Dist	Aberdeen	MS	STILL OPEN	1969
Acadia Parish	Crowley	LA	1981	NA
Affton 101	St Louis	MO	1999	NA
Alabaster City	Alabaster	AL	STILL OPEN	1963

glimpse(deseg_conversion_clean_ifelse)

Rows: 769
Columns: 5
$ District.Name <chr> "Abbeville 60", "Aberdeen School Dist", "Acadia Parish",…
$ City          <chr> "Abbeville", "Aberdeen", "Crowley", "St Louis", "Alabast…
$ State         <chr> "SC", "MS", "LA", "MO", "AL", "FL", "NC", "TN", "TX", "A…
$ Year.Lifted   <chr> "1984", "STILL OPEN", "1981", "1999", "STILL OPEN", "197…
$ Year.Placed   <chr> NA, "1969", NA, NA, "1963", NA, NA, "1966", NA, NA, NA, …

note: the Year.Placed column is still character and the “N/A” values were converted to NA.

NA values in calculations

It’s important not to ignore missing values when you are trying to run calculations with your data. It’s so important that R will not let you ignore it:

Calculate the median year a desegregation order was placed:

# attempt to calculate the median year deseg orders were placed
median(deseg_pp_clean_na$Year.Placed)

[1] NA

If there is even one NA in the column, mathematical calculations return NA

ignore NA values in calculations

To run a calculation with NA values, you will need to include an optional argument found in many R functions:

na.rm = TRUE
Adding this argument to a function will ask R to ignore the NA values in the operation of that function.

Recalculate the median year a desegregation order was placed:

# attempt to calculate the median year deseg orders were placed
median(deseg_pp_clean_na$Year.Placed, na.rm = TRUE)

[1] 1969

CAUTION: Do not get in the habit of adding na.rm = TRUE to functions the first time you run them. Be sure to first understand the nature of your missing data before you start ignoring it in your calculations.

Assignment questions

Assignment example - % West Indian in BK

library(tidyverse)
library(tidycensus)
library(sf)
library(scales)
library(viridis)

# load all acs variables
acs201620 <- load_variables(2020, "acs5", cache = T)

## Import table of PEOPLE REPORTING ANCESTRY: B04006
raw_ancestry <- get_acs(geography = "tract", 
                        variables = c(ancestry_pop = "B04006_001",
                                      west_indian = "B04006_094"), 
                        state='NY',
                        county = 'Kings',
                        geometry = T, 
                        year = 2020,
                        output = "wide") 

west_indian <- raw_ancestry |> 
  mutate(pct_west_indian = west_indianE/ancestry_popE)

GEOID	NAME	ancestry_popE	ancestry_popM	west_indianE	west_indianM	geometry	pct_west_indian
36047060600	Census Tract 606, Kings County, New York	2830	443	0	12	MULTIPOLYGON (((-73.96035 4…	0
36047005602	Census Tract 56.02, Kings County, New York	1787	386	0	12	MULTIPOLYGON (((-74.03707 4…	0

Explore data

Explore the values to map it effectively

## check the values of percent west indian to see how to map
summary(west_indian$pct_west_indian)

    Min.  1st Qu.   Median     Mean  3rd Qu.     Max.     NA's 
0.000000 0.003073 0.026961 0.117700 0.180517 0.645272       26

hist(west_indian$pct_west_indian)

Handle the 26 NA’s

First, look at the rows with NA value.

You can filter using the is.na() function

na_tracts <- west_indian |> 
  filter(is.na(pct_west_indian))

GEOID	NAME	ancestry_popM	west_indianM	geometry	pct_west_indian
36047001804	Census Tract 18.04, Kings County, New York	12	12	MULTIPOLYGON (((-74.03297 4…	NaN
36047070602	Census Tract 706.02, Kings County, New York	12	12	MULTIPOLYGON (((-73.90467 4…	NaN
36047008600	Census Tract 86, Kings County, New York	12	12	MULTIPOLYGON (((-74.00566 4…	NaN
36047040700	Census Tract 407, Kings County, New York	12	12	MULTIPOLYGON (((-73.90449 4…	NaN
36047031402	Census Tract 314.02, Kings County, New York	12	12	MULTIPOLYGON (((-73.9985 40…	NaN

Redefine the 26 NA’s

In this case, the NA’s are actually NaNs

Not a Number, because the denominator (population) is 0
Let’s convert these to actual NAs
Once we map them we, can determine if we should remove them entirely

west_indian <- raw_ancestry |> 
  mutate(pct_west_indian = west_indianE/ancestry_popE,
         pct_west_indian = ifelse(is.nan(pct_west_indian), NA, pct_west_indian))

% West Indian map

% West Indian map code

ggplot(data = west_indian, mapping = aes(fill = pct_west_indian))  + 
  geom_sf(color = "#ffffff") +
  theme_void() +
  scale_fill_distiller(breaks=c(0, .2, .4, .6, .8, 1),
                       direction = 1,
                       na.value = "#fafafa",
                       name="Percent West Indian Ancestry (%)",
                       labels=percent_format(accuracy = 1L)) +
  labs(
    title = "Brooklyn, West Indian Ancestry by Census Tract",
    caption = "Source: American Community Survey, 2016-20"
  )

I can see that I don’t want to display the NA values - they are parks and water… later

Import spatial data from other sources

All spatial data does not come from the census. To import spatial data (usually shapefiles or geojsons) from any source:

download the data to your computer
use the sf function st_read() to read it in

NYC sources:

New York City Planning Department has many NYC shapefiles on their Bytes of the Big Apple site
NYC Open Data has tons of data, spatial and tabular
NHGIS, from IPUMS, is another source of spatial census data.

Import spatial data

## import borough shapefiles from NYC Open Data
boros <- st_read("part2/data/raw/geo/BoroughBoundaries.geojson")

Reading layer `BoroughBoundaries' from data source 
  `/Users/sarahodges/Documents/spatial/NewSchool/methods1-materials-fall2024/methods1-slides/part2/data/raw/geo/BoroughBoundaries.geojson' 
  using driver `GeoJSON'
Simple feature collection with 5 features and 4 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -74.25559 ymin: 40.49613 xmax: -73.70001 ymax: 40.91553
Geodetic CRS:  WGS 84

## import Neighborhood Tabulation Areas for NYC
nabes <- st_read("part2/data/raw/geo/nynta2020_22b/nynta2020.shp")

Reading layer `nynta2020' from data source 
  `/Users/sarahodges/Documents/spatial/NewSchool/methods1-materials-fall2024/methods1-slides/part2/data/raw/geo/nynta2020_22b/nynta2020.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 262 features and 11 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 913175.1 ymin: 120128.4 xmax: 1067383 ymax: 272844.3
Projected CRS: NAD83 / New York Long Island (ftUS)

Add Borough boundaries to the map, code

ggplot(data = west_indian, mapping = aes(fill = pct_west_indian))  + 
  geom_sf(color = "#ffffff", 
          lwd = 0) + # removes the census tract outline
  theme_void() +
  scale_fill_distiller(breaks=c(0, .2, .4, .6, .8, 1),
                       direction = 1,
                       na.value = "transparent",
                       name="Percent West Indian Ancestry (%)",
                       labels=percent_format(accuracy = 1L)) +
  labs(
    title = "Brooklyn, West Indian Ancestry by Census Tract",
    caption = "Source: American Community Survey, 2016-20"
  ) + 
  geom_sf(data = boros, color = "black", fill = NA, lwd = .5)

Add Borough boundaries to the map

Filter to show Brooklyn only, code

ggplot(data = west_indian, mapping = aes(fill = pct_west_indian))  + 
  geom_sf(color = "#ffffff",
          lwd = 0) +
  theme_void() +
  scale_fill_distiller(breaks=c(0, .2, .4, .6, .8, 1),
                       direction = 1,
                       na.value = "transparent",
                       name="Percent West Indian Ancestry (%)",
                       labels=percent_format(accuracy = 1L)) +
  labs(
    title = "Brooklyn, West Indian Ancestry by Census Tract",
    caption = "Source: American Community Survey, 2016-20"
  ) + 
  geom_sf(data = boros |> filter(boro_name == "Brooklyn"), 
          color = "black", fill = NA, lwd = .5)

Filter to show Brooklyn only

Add neighborhoods too, code

ggplot(data = west_indian, mapping = aes(fill = pct_west_indian))  + 
  geom_sf(color = "#ffffff",
          lwd = 0) +
  theme_void() +
  scale_fill_distiller(breaks=c(0, .2, .4, .6, .8, 1),
                       direction = 1,
                       na.value = "transparent",
                       name="Percent West Indian Ancestry (%)",
                       labels=percent_format(accuracy = 1L)) +
  labs(
    title = "Brooklyn, West Indian Ancestry by Census Tract",
    caption = "Source: American Community Survey, 2016-20"
  ) + 
  geom_sf(data = nabes |> filter(BoroName == "Brooklyn"), 
          color = "gray", fill = NA, lwd = 0.25) + 
  geom_sf(data = boros |> filter(boro_name == "Brooklyn"), 
          color = "black", fill = NA, lwd = .5)

Add neighborhoods too

Select census tracts in one neighborhood

You can use a spatial join from the `sf’ package to identify what neighborhood each census tract is in so you can:

filter to create a data frame of census tracts in a particular neighborhood
create a map of census tracts in a particular neighborhood
calculate statistics by neighborhood

Spatial joins don’t need a common id, this operation joins data based on their spatial relationship.

The st_join documentation gives details on the requirements.
Most importantly must be in the same projection.

Projections

Projections are the equation used to translate the round earth into a flat map.

They are complicated and you’ll learn much more in methods 3!

The steps to perform a spatial join

First, check that the two spatial data frames overlap in space.
- Our map above shows that they do!
Second, check to see if their projections match
- Use the st_crs() to print their projections in the console
- If they aren’t in the same projection, use st_transform() to project them
  - (Really, New York City data should be projected to 2263)
Third, select the fields you want to add to your spatial data frame
Fourth, join the data

Check projection of census tract data

st_crs(west_indian)

Coordinate Reference System:
  User input: NAD83 
  wkt:
GEOGCRS["NAD83",
    DATUM["North American Datum 1983",
        ELLIPSOID["GRS 1980",6378137,298.257222101,
            LENGTHUNIT["metre",1]]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["latitude",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["longitude",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    ID["EPSG",4269]]

There is a lot of info about their projections. The key information is in the last line.

EPSG (that’s code for projection) = 4269

Check projection of NTA data

st_crs(nabes)

Coordinate Reference System:
  User input: NAD83 / New York Long Island (ftUS) 
  wkt:
PROJCRS["NAD83 / New York Long Island (ftUS)",
    BASEGEOGCRS["NAD83",
        DATUM["North American Datum 1983",
            ELLIPSOID["GRS 1980",6378137,298.257222101,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4269]],
    CONVERSION["SPCS83 New York Long Island zone (US Survey feet)",
        METHOD["Lambert Conic Conformal (2SP)",
            ID["EPSG",9802]],
        PARAMETER["Latitude of false origin",40.1666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8821]],
        PARAMETER["Longitude of false origin",-74,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8822]],
        PARAMETER["Latitude of 1st standard parallel",41.0333333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8823]],
        PARAMETER["Latitude of 2nd standard parallel",40.6666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8824]],
        PARAMETER["Easting at false origin",984250,
            LENGTHUNIT["US survey foot",0.304800609601219],
            ID["EPSG",8826]],
        PARAMETER["Northing at false origin",0,
            LENGTHUNIT["US survey foot",0.304800609601219],
            ID["EPSG",8827]]],
    CS[Cartesian,2],
        AXIS["easting (X)",east,
            ORDER[1],
            LENGTHUNIT["US survey foot",0.304800609601219]],
        AXIS["northing (Y)",north,
            ORDER[2],
            LENGTHUNIT["US survey foot",0.304800609601219]],
    USAGE[
        SCOPE["Engineering survey, topographic mapping."],
        AREA["United States (USA) - New York - counties of Bronx; Kings; Nassau; New York; Queens; Richmond; Suffolk."],
        BBOX[40.47,-74.26,41.3,-71.8]],
    ID["EPSG",2263]]

Transform projection

If you are working with New York City data, you want the projection to be 2263. So we we’ll transform the west_indian census tract data into 2263

west_indian_2263 <- st_transform(west_indian, 2263)

Check the projections to make sure it worked!

st_crs(west_indian_2263)

Coordinate Reference System:
  User input: EPSG:2263 
  wkt:
PROJCRS["NAD83 / New York Long Island (ftUS)",
    BASEGEOGCRS["NAD83",
        DATUM["North American Datum 1983",
            ELLIPSOID["GRS 1980",6378137,298.257222101,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4269]],
    CONVERSION["SPCS83 New York Long Island zone (US Survey feet)",
        METHOD["Lambert Conic Conformal (2SP)",
            ID["EPSG",9802]],
        PARAMETER["Latitude of false origin",40.1666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8821]],
        PARAMETER["Longitude of false origin",-74,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8822]],
        PARAMETER["Latitude of 1st standard parallel",41.0333333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8823]],
        PARAMETER["Latitude of 2nd standard parallel",40.6666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8824]],
        PARAMETER["Easting at false origin",984250,
            LENGTHUNIT["US survey foot",0.304800609601219],
            ID["EPSG",8826]],
        PARAMETER["Northing at false origin",0,
            LENGTHUNIT["US survey foot",0.304800609601219],
            ID["EPSG",8827]]],
    CS[Cartesian,2],
        AXIS["easting (X)",east,
            ORDER[1],
            LENGTHUNIT["US survey foot",0.304800609601219]],
        AXIS["northing (Y)",north,
            ORDER[2],
            LENGTHUNIT["US survey foot",0.304800609601219]],
    USAGE[
        SCOPE["Engineering survey, topographic mapping."],
        AREA["United States (USA) - New York - counties of Bronx; Kings; Nassau; New York; Queens; Richmond; Suffolk."],
        BBOX[40.47,-74.26,41.3,-71.8]],
    ID["EPSG",2263]]

Select the fields from NTA to add to west_indian

#remove unnecessary fields in the neighborhood shapefile
nabes_selected <- nabes |>
  select(BoroCode, BoroName, NTA2020, NTAName)

Perform the spatial join

west_indian_nabes <- west_indian_2263 |>
  st_join(nabes_selected, 
          left = TRUE,
          join = st_intersects,
          largest = TRUE)

left: defines it as a left_join (meaning all census tracts are kept)
join: defines the join definition as “if they intersect”
largest: if a census tract overlaps with more than one neighborhood, the neighborhood with the largest overlap is joined

library(knitr)
kable(head(west_indian_nabes, n=2))

GEOID	NAME	ancestry_popE	ancestry_popM	west_indianE	west_indianM	pct_west_indian	BoroCode	BoroName	NTA2020	NTAName	geometry
36047060600	Census Tract 606, Kings County, New York	2830	443	0	12	0	3	Brooklyn	BK1503	Sheepshead Bay-Manhattan Beach-Gerritsen Beach	MULTIPOLYGON (((995262.8 15…
36047005602	Census Tract 56.02, Kings County, New York	1787	386	0	12	0	3	Brooklyn	BK1001	Bay Ridge	MULTIPOLYGON (((973958.5 16…

Create a map of Crown Heights North

Create a map of Crown Heights North, code

ggplot(data = west_indian_nabes |> 
                filter(NTAName == "Crown Heights (North)"),
  mapping = aes(fill = pct_west_indian))  + 
  geom_sf(color = "#ffffff",
          lwd = 0) +
  theme_void() +
  scale_fill_distiller(breaks=c(0, .2, .4, .6, .8, 1),
                       direction = 1,
                       na.value = "transparent",
                       name="Percent West Indian Ancestry (%)",
                       labels=percent_format(accuracy = 1L)) +
  labs(
    title = "Brooklyn, West Indian Ancestry by Census Tract",
    caption = "Source: American Community Survey, 2016-20"
  ) + 
  geom_sf(data = nabes |> filter(NTAName == "Crown Heights (North)"), 
          color = "black", fill = NA, lwd = 0.5)

Calculate summary statistics too

west_indian_nabe_stats <- st_drop_geometry(west_indian_nabes) |> 
  group_by(NTAName) |> 
  summarise(Borough = first(BoroName),
            `Est. Total Population` = sum(ancestry_popE),
            `Est. Total West Indian Population` = sum(west_indianM)) |> 
  mutate(`Est. Percent West Indian Ancestry` = percent(`Est. Total West Indian Population`/`Est. Total Population`, accuracy = 1))

NTAName	Borough	Est. Total Population	Est. Total West Indian Population	Est. Percent West Indian Ancestry
Barren Island-Floyd Bennett Field	Brooklyn	26	12	46%
Bath Beach	Brooklyn	32716	324	1%
Bay Ridge	Brooklyn	80183	1437	2%
Bedford-Stuyvesant (East)	Brooklyn	86869	5935	7%
Bedford-Stuyvesant (West)	Brooklyn	83717	4048	5%
Bensonhurst	Brooklyn	96331	756	1%
Borough Park	Brooklyn	78836	299	0%
Brighton Beach	Brooklyn	29819	120	0%
Brooklyn Heights	Brooklyn	23874	383	2%
Brooklyn Navy Yard	Brooklyn	0	12	Inf

Assignment 10a

Complete the in-class assignment and submit your script to CANVAS.

Assignment 10b: Neighborhood maps!

Make at least 3 census tract-level maps of one neighborhood in NYC. Along with each map, create formatted summary tables that compare your neighborhood with other neighborhoods in the same boro, Upload your script to CANVAS.

Methods 1, Week 10

In-class exercise, prep

Outline

New functions and concepts list

Managing missing values in calculations

Assignment questions and overview

Import spatial data from other sources

Spatial join

Homework

New functions and concepts

Missing values

Missing values in R

NA example

Managing the messines of missing values

Define the NA value during import

Changing the data type to force conversion

Use ifelse() to redefine value of NA values

NA values in calculations

ignore NA values in calculations

Assignment questions

Assignment example - % West Indian in BK

Explore data

Handle the 26 NA’s

Redefine the 26 NA’s

% West Indian map

% West Indian map code

Import spatial data from other sources

Import spatial data

Add Borough boundaries to the map, code

Add Borough boundaries to the map

Filter to show Brooklyn only, code

Filter to show Brooklyn only

Add neighborhoods too, code

Add neighborhoods too

Select census tracts in one neighborhood

Projections

The steps to perform a spatial join

Check projection of census tract data

Check projection of NTA data

Transform projection

Select the fields from NTA to add to west_indian

Perform the spatial join

Create a map of Crown Heights North

Create a map of Crown Heights North, code

Calculate summary statistics too

Assignment 10a

Assignment 10b: Neighborhood maps!

Define the `NA` value during import

Use `ifelse()` to redefine value of NA values