2026-4-27

Groups!

##           group 1           group 2           group 3         group 4
## 1    Myoung, Sein       Ong, Alyssa     Moore, Allana Batson, Anthony
## 2                 Wolfenstein, Luci       Qin, Celine    Barga, Jolie
## 3 Bell, Mary Rose    Knowles, Genny Mahoney, Brigette  Devir, Lindsey
## 4   Leahy, Olivia   Kang, Christine       Pham, Canon            <NA>
##           group 5
## 1   Pacheco, Alex
## 2    Mendoza, Ava
## 3     Smith, Reid
## 4 Randall, Javion

Warm-up

  • What spatial units are shown in the following maps?
  • What spatial units are theoretically important?
  • Are all of the maps “spatial”?
##           group 1           group 2           group 3         group 4
## 1    Myoung, Sein       Ong, Alyssa     Moore, Allana Batson, Anthony
## 2                 Wolfenstein, Luci       Qin, Celine    Barga, Jolie
## 3 Bell, Mary Rose    Knowles, Genny Mahoney, Brigette  Devir, Lindsey
## 4   Leahy, Olivia   Kang, Christine       Pham, Canon            <NA>
##           group 5
## 1   Pacheco, Alex
## 2    Mendoza, Ava
## 3     Smith, Reid
## 4 Randall, Javion

Warm-up

  • What spatial units are shown in the following maps?
  • What spatial units are theoretically important?
  • Are all of the maps “spatial”?

Today’s Class

  • Warm-up: spatial data
  • Why Spatial Data?
  • The Logic of Simple Features
  • Activity: Mapping Points to Polygons

Wednesday’s Class

  • Using tidycensus to gather spatial information
  • Using tigris to gather boundaries
  • Joining Points and Polygons
  • Interactive Maps (if time)

Office Hours

  • Office Hours: Wednesday 1:30pm-3:00pm (Tyler) or Friday 1:30-3:30pm by appointment
  • Tuesdays, 10:30am-12:00pm (Yao)

Miscellaneous

  • Slides published on syllabus page
  • PSet rubrics information moving forward
  • Use warning = FALSE in chunk headers to remove warnings
  • PSets: conceptual discussion encouraged, work should be individual
  • Final Project: Collaboration encouraged, track individual contributions (e.g. through GitHub)

Why Spatial Data?

  • Spatial data corresponds to specific geophysical locations (e.g., latitude/longitude, boundaries)
  • Spatial data preserves geographic relationships between objects (e.g. distances)
  • Some data can be “spatial” or “aspatial”

What is Spatial Data?

  • Spatial data corresponds to specific geophysical locations (e.g., latitude/longitude, boundaries)
  • Spatial data preserves geographic relationships between objects (e.g. distances)
  • Some data can be “spatial” or “aspatial”

What is spatial data?

  • In pairs: which of the following are spatial data?
  • A list of 50 US states
  • Longitude/Latitude of post box locations
  • Residential addresses
  • Neighborhoods in San Francisco
  • Boundaries to a river
  • Directions from your apartment to class

What is spatial data?

  • In pairs: which of the following are spatial data?
  • A list of 50 US states (definitely aspatial)
  • Longitude/Latitude of post box locations (definitely spatial)
  • Residential addresses (could be either)
  • Neighborhoods in San Francisco (aspatial, unless boundaries are included)
  • Boundaries to a river (spatial)
  • Directions from your apartment to class (could be either)

What is Spatial Data?

  • Spatial data corresponds to specific geophysical locations (e.g., latitude/longitude, boundaries)
  • Spatial data preserves geographic relationships between objects (e.g. distances)
  • Some data can be represented in a “spatial” or “aspatial” manner

Spatial and Non-Spatial Data

Geographic Information Systems

  • To map curved objects on a flat surface, a “coordinate reference system” is used

Raster vs Vector Data

  • Which is preferred?
  • Generally vector (more detailed is always preferred)

Air Quality Across Time and Space

  • Environmental hazards often require spatial analysis

Snow’s Cholera Map

  • In 1854, London suffered a cholera outbreak
  • Unknown origin/cause
  • Snow’s spatial data analysis (and visualization) pointed to a water pump (Broad St Pump)
  • Further analysis showed different rates of infection for those living in areas serviced by different water providers
  • Interactive version here

Producing Environmental Hazards

  • These designations matter today!
  • Neighborhoods with lower HOLC designations are more polluted today
Lane et al., 2022

Lane et al., 2022

Activity

Using the data handouts:

  • What can you tell me about differences in exposure to air quality across neighborhoods?

The Modifiable Areal Unit Problem

  • The MAUP: the boundaries we choose could influence our findings!
  • Example: segregation
  • What is another example where this problem may matter?

The Modifiable Areal Unit Problem

  • The MAUP: the boundaries we choose could influence our findings!
  • How would we solve the MAUP?

The Modifiable Areal Unit Problem

  • We can’t always solve the MAUP! But we can produce better analyses by:
  • Drawing meaningful units (i.e. ones that matter theoretically)
  • Using as spatially detailed data as possible

The Logic of Simple Features

What Are Simple Features?

  • What is a simple feature?
  • Two elements:
  • Geometry noting its location in space
  • Attributes noting all other properties
  • Advantages: flexible, reproducible, join and analyze data all in one place

What are Attributes?

  • A dataframe with any kind of information

What is Geometry?

  • There were 3 types of geometries in the activity. What were they?

What is Geometry?

  • Points!
  • Lines!
  • Polygons!

Recap

  • Spatial data corresponds to geophysical locations, preserves geographic relationships
  • E.g. GIS, raster or vector data
  • The modifiable areal unit problem (MAUP):
  • The way we draw boundaries matters for statistical aggregation!
  • Three types of geometries:
  • Points, lines, and polygons

Wednesday’s Class

  • Using tidycensus to gather spatial information
  • Using tigris to gather boundaries
  • Joining Points and Polygons
  • Interactive Maps (if time)

Office Hours

  • Office Hours: Wednesday 1:30pm-3:00pm (Tyler) or Friday 1:30-3:30pm by appointment
  • Tuesdays, 10:30am-12:00pm (Yao)

Miscellaneous

  • Slides published on syllabus page
  • PSet rubrics information moving forward
  • Use warning = FALSE in chunk headers to remove warnings
  • PSets: conceptual discussion encouraged, work should be individual
  • Final Project: Collaboration encouraged, track individual contributions (e.g. through GitHub)

Miscellaneous

  • PSets: conceptual discussion encouraged, work should be individual
  • Final Project: Collaboration encouraged, track individual contributions (e.g. through GitHub)
  • Coming: final project feedback, midway course feedback, possible guest speaker

Warm up

  • For Whiteboards: work in pairs/small groups!
  • From the air pollution exercise:
  • What steps would you take to assess differences in neighborhood pollution levels?
  • What data would you need?

What Are Simple Features?

  • Let’s get started!
library(sf)

# load a simple features object
nc <- st_read(system.file("shape/nc.shp", package="sf"))
## Reading layer `nc' from data source 
##   `/Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/sf/shape/nc.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 100 features and 14 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
## Geodetic CRS:  NAD27

What Are Simple Features?

  • You might notice “CRS” in sf objects
  • What’s this? Coordinate Reference System
  • Projects points on global (or North American) maps
# look at nc
head(nc)
## Simple feature collection with 6 features and 14 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -81.74107 ymin: 36.07282 xmax: -75.77316 ymax: 36.58965
## Geodetic CRS:  NAD27
##    AREA PERIMETER CNTY_ CNTY_ID        NAME  FIPS FIPSNO CRESS_ID BIR74 SID74
## 1 0.114     1.442  1825    1825        Ashe 37009  37009        5  1091     1
## 2 0.061     1.231  1827    1827   Alleghany 37005  37005        3   487     0
## 3 0.143     1.630  1828    1828       Surry 37171  37171       86  3188     5
## 4 0.070     2.968  1831    1831   Currituck 37053  37053       27   508     1
## 5 0.153     2.206  1832    1832 Northampton 37131  37131       66  1421     9
## 6 0.097     1.670  1833    1833    Hertford 37091  37091       46  1452     7
##   NWBIR74 BIR79 SID79 NWBIR79                       geometry
## 1      10  1364     0      19 MULTIPOLYGON (((-81.47276 3...
## 2      10   542     3      12 MULTIPOLYGON (((-81.23989 3...
## 3     208  3616     6     260 MULTIPOLYGON (((-80.45634 3...
## 4     123   830     2     145 MULTIPOLYGON (((-76.00897 3...
## 5    1066  1606     3    1197 MULTIPOLYGON (((-77.21767 3...
## 6     954  1838     5    1237 MULTIPOLYGON (((-76.74506 3...

What Are Simple Features?

  • Take a look!
View(nc)

What Are Simple Features?

  • Is our object a dataframe?

What Are Simple Features?

  • Is our object a dataframe?
class(nc)
## [1] "sf"         "data.frame"

What Are Simple Features?

  • Try plotting!
  • Whiteboards: One variable is not like the others … which is it?
# plot all vars
plot(nc)

# plot area
plot(nc$AREA)