Conduct vector geoprocessing operations such as buffering, intersecting, and dissolving using sf
Learn how to spatially interpolate extensive variables using the interpolate_aw() function
Data
Historical Manila GIS data come from the Transit System in Metro Manila Project hosted at the University of the Philippines Dilman. Thanks to their hard work and commitment to open data, we have the following very interesting datasets to work with:
manila_central_city_historical_pop.gpkg: Neighborhood boundaries in the central city of 19th century Manila, along with population data
Manila_region_historical_pop.gpkg: District boundaries for greater Manila metropolitan area, with population data for several time periods
manila_tranvia_1905.gpkg: Tranvia lines in Manila’s central city as of 1905
manila_malabon_line_1905: First Tranvia line in metro Manila, opened in 1888
manila_sigurd_grava_rail_plan: Routes of rail lines proposed in a 1972 transportation plan for Manila by Sigurd Grava
manila_UTSMMA_rail_plan.gpkg: Routes of rail lines proposed in the Urban Transportation Study in the Manila Metropolitan Area, carried out by the government with support from Japan in 1973
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
require(sf)
Loading required package: sf
Linking to GEOS 3.11.2, GDAL 3.7.2, PROJ 9.3.0; sf_use_s2() is TRUE
require(maptiles)
Loading required package: maptiles
Warning: package 'maptiles' was built under R version 4.3.3
require(terra)
Loading required package: terra
Warning: package 'terra' was built under R version 4.3.3
terra 1.7.78
Attaching package: 'terra'
The following object is masked from 'package:tidyr':
extract
require(tidyterra)
Loading required package: tidyterra
Warning: package 'tidyterra' was built under R version 4.3.3
Attaching package: 'tidyterra'
The following object is masked from 'package:stats':
filter
require(ggspatial)
Loading required package: ggspatial
require(osmdata)
Loading required package: osmdata
Warning: package 'osmdata' was built under R version 4.3.3
Data (c) OpenStreetMap contributors, ODbL 1.0. https://www.openstreetmap.org/copyright
Public Transportation in Metro Manila
Evidence of settlement in area at least as early as 900 CE
Commercial center and fort at Tondo by 1300s
Taken over by Sultanate of Brunei (from Borneo) in early 1500s
Spanish establish colonial capital in 1571
With exception of two-year period of British control during Seven Years War, under Spanish control until Spanish-American War in 1898
Spanish establish fort at mouth of Pasig River, which becomes core of colonial town
View of Manila, circa 1665, by Johannes Vingboons, from Wikimedia Commons
Map of Manila in 1766, housed at National Library of Spain, from Wikimedia Commons
Segment of French map of Manila Bay, published 1810, probably representing late-1700s, David Rumsey Map Collection
Segment of French map of Manial, published in 1860, David Rumsey Map Collection
Tranvia is the main public transportation system for decades
Schematic of a Tranvia car from The Street Railway Journal, 1898, from Wikimedia Commons
Tranvia car interior, 1920s, from Wikimedia Commons
Reading in the data
Let’s read in all these cool datasets on historical Manila and have a look:
#| output: falsesetwd("G:/My Drive/My Classes/World Cities/Labs/Colonialism and Transportation")man_cent_city_pop <-st_read("manila_central_city_historical_pop.gpkg")
Reading layer `shp_old_manila_districts_quezon_city_2016__oldmanilashp' from data source `G:\My Drive\My Classes\World Cities\Labs\Colonialism and Transportation\manila_central_city_historical_pop.gpkg'
using driver `GPKG'
Simple feature collection with 14 features and 17 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 278137.2 ymin: 1608191 xmax: 292969.5 ymax: 1619315
Projected CRS: WGS 84 / UTM zone 51N
Reading layer `output' from data source
`G:\My Drive\My Classes\World Cities\Labs\Colonialism and Transportation\manila_UTSMMA_rail_plan.gpkg'
using driver `GPKG'
Simple feature collection with 5 features and 4 fields
Geometry type: MULTILINESTRING
Dimension: XY
Bounding box: xmin: 280883.8 ymin: 1605383 xmax: 293191.8 ymax: 1625895
Projected CRS: WGS 84 / UTM zone 51N
Working with line data in sf
First, let’s have a look at the different rail routes we have available:
#Just using the largest layer to set the bounding box:bbox <-st_bbox(man_grava)#Getting a basemap from OpenStreetMap:manila_osm <-get_tiles(bbox, provider="OpenStreetMap")#Because R will draw layers in the order they are written, we generally#want to code our layers in reverse order of extent (i.e., most to least#extensive):ggplot(data=manila_osm) +geom_spatraster_rgb(data=manila_osm) +geom_sf(data=man_grava, color ="blue", linewidth =0.75) +geom_sf(data=man_utsmma, color ="yellow", linewidth =0.75) +geom_sf(data=man_malabon, color ="darkgreen", linewidth =0.5) +geom_sf(data=man_tranvia, color ="darkgreen", linewidth =0.5) +coord_sf(crs =st_crs(manila_osm)) +theme_minimal()
So we can see that the line plans are similar, but not identical. We could also try plotting this while making the Sigmund Grava lines wider, so we could see them even in the sections where they overlap with the UTSMMA plan lines.
You might also have noticed that we’re doing some work with the display colors for the lines in the above code. Previously, we were using colors to symbolize variables, so we were assigning them through an aes() function. Here, we’re just directly telling ggplot() what colors to use for each layer.
One cool thing about doing this in RStudio is that it will actually highlight color words with the color that will show up, giving you a bit of a preview of what the plot will look like.
There are lots of ways to reference colors in R, including commonly used RGB and Hex codes. When starting out, though, I tend to like to just use the color words. You can find all the color words for Rat this link.
Task 1: Create a new map showing the line features in the above map in different colors, with a different basemap.
Reshaping and tidying data in tidyverse
Now let’s have a look at the two polygon objects we read in. First, here’s a look at the data they contain:
man_cent_city_pop
Simple feature collection with 14 features and 17 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 278137.2 ymin: 1608191 xmax: 292969.5 ymax: 1619315
Projected CRS: WGS 84 / UTM zone 51N
First 10 features:
OBJECTID_1 OBJECTID BRGY_CODE RegNum
1 1 3818 133908010 NATIONAL CAPITAL REGION
2 2 3815 133914033 NATIONAL CAPITAL REGION
3 3 3641 133905028 NATIONAL CAPITAL REGION
4 4 4624 137606008 NATIONAL CAPITAL REGION
5 5 3990 133903018 NATIONAL CAPITAL REGION
6 6 3639 133906059 NATIONAL CAPITAL REGION
7 7 3997 133912034 NATIONAL CAPITAL REGION
8 8 3657 133901120 NATIONAL CAPITAL REGION
9 9 3809 133902008 NATIONAL CAPITAL REGION
10 10 4529 133904001 NATIONAL CAPITAL REGION
RegName RegShort ProvName MuniName
1 NATIONAL CAPITAL REGION NCR NATIONAL CAPITAL REGION CITY OF MANILA
2 NATIONAL CAPITAL REGION NCR NATIONAL CAPITAL REGION CITY OF MANILA
3 NATIONAL CAPITAL REGION NCR NATIONAL CAPITAL REGION CITY OF MANILA
4 NATIONAL CAPITAL REGION NCR NATIONAL CAPITAL REGION PATEROS
5 NATIONAL CAPITAL REGION NCR NATIONAL CAPITAL REGION CITY OF MANILA
6 NATIONAL CAPITAL REGION NCR NATIONAL CAPITAL REGION CITY OF MANILA
7 NATIONAL CAPITAL REGION NCR NATIONAL CAPITAL REGION CITY OF MANILA
8 NATIONAL CAPITAL REGION NCR NATIONAL CAPITAL REGION CITY OF MANILA
9 NATIONAL CAPITAL REGION NCR NATIONAL CAPITAL REGION CITY OF MANILA
10 NATIONAL CAPITAL REGION NCR NATIONAL CAPITAL REGION CITY OF MANILA
BrgyName CODE DISTRICT DISTRICT_O Shape_Leng
1 Barangay 668 133908010 ERMITA ERMITA 7405.860
2 Barangay 777 133914033 SANTA ANA SANTA ANA 11966.471
3 Barangay 326 133905028 SANTA CRUZ STA. CRUZ 13286.208
4 Santo Rosario-Kanluran 137606008 PATEROS PATEROS 7393.443
5 Barangay 309 133903018 QUIAPO QUIAPO 4580.653
6 Barangay 453 133906059 SAMPALOC SAMPALOC 15781.047
7 Barangay 867 133912034 PANDACAN PANDACAN 7240.278
8 Barangay 120 133901120 TONDO TONDO 28456.008
9 Barangay 294 133902008 BINONDO BINONDO 3842.775
10 Barangay 268 133904001 SAN NICOLAS SAN NICOLAS 8525.471
Shape_Area NIPADIST Pop_1876 Pop_1818 geom
1 1397424.4 0 7051 3510 MULTIPOLYGON (((283305.2 16...
2 3260249.9 0 2554 4355 MULTIPOLYGON (((284105.8 16...
3 2941221.3 1 11300 5331 MULTIPOLYGON (((283108.2 16...
4 1955653.8 0 7522 3840 MULTIPOLYGON (((291724.9 16...
5 741649.4 0 6480 3468 MULTIPOLYGON (((282495.3 16...
6 7300498.9 1 7090 3357 MULTIPOLYGON (((285671.9 16...
7 1521772.5 0 4874 2469 MULTIPOLYGON (((285516.2 16...
8 9783185.5 1 21038 14610 MULTIPOLYGON (((282935.3 16...
9 595745.6 0 23467 21386 MULTIPOLYGON (((281963.8 16...
10 1703069.6 NA NA NA MULTIPOLYGON (((281646.5 16...
This is pretty cool! We can see the populations within the central city in the early and late 1800s, and we can see populations for districts across the region at several time points in the 20th century.
It might be interesting to look at these population maps side-by-side, and there is a nice trick, called facetting, that allows you to use ggplot() to do things like that. Facetting allows you to split a data table (including spatial data) into multiple plots based on a categorical variable.
To use facetting, however, we need to have our data in a long format. Right now, our data are in a wide format.
What’s the difference? Well, in a wide format, each unit in the data appears in only one row. In the man_cent_city_pop object, for example, each neighborhood shows up only once, and we have differenct columns showing population in different years. In a long format, by contrast, lines would be grouped by units and years, so each unit would appear for each year we have an observation.
To make this clearer, here’s an example of data in a wide format:
Person
Sleepiness at 08:00 AM
Sleepiness at 10:00 AM
Sleepiness at 12:00 PM
Sleepiness at 02:00 PM
Caleb
Sleepy
Alert
Very Alert
Very Sleepy
Rodrigo
Alert
Very Alert
Sleepy
Alert
And here’re the same data in a long format:
Person
Time
Sleepiness
Caleb
08:00 AM
Sleepy
Caleb
10:00 AM
Alert
Caleb
12:00 PM
Very Alert
Caleb
02:00 PM
Very Sleepy
Rodrigo
08:00 AM
Alert
Rodrigo
10:00 AM
Very Alert
Rodrigo
12:00 PM
Sleepy
Rodrigo
02:00 PM
Alert
The good news is that it’s pretty easy to get data into a long format using a tidyverse function called pivot_longer(). Here’s how it works:
man_cent_city_pop_long <- man_cent_city_pop %>%#select() allows you to grab only specific columns from a table:select(DISTRICT, Pop_1876, Pop_1818) %>%#we just need to tell it what columns to pivotpivot_longer(cols =c("Pop_1876", "Pop_1818"))man_cent_city_pop_long
So we could work from there, but it would be nice to get rid of those Pop_ prefixes in front of the population values. Luckily, tidyverse has a ton of helpful functions for dealing with text entries (we call them strings in R) in data.
We have at least two options for extracting the actual years from the name column in the man_cent_city_pop_long object. On the one hand, we could use the str_remove() function, which will erase specified text from a string. On the other, we could use str_sub(), which extracts text in a specified position in a string. Let’s look at how they both work:
#we can use the $ operator to extract a specific column from a tableman_cent_city_pop_long$name %>%str_remove(fixed("Pop_")) #fixed() tells R to remove EXACTLY the text
man_cent_city_pop_long$name %>%#str_sub extracts characters based on their position in a string; because#the year follows four characters - Pop_, we want to fifth through eighth#charactersstr_sub(5,8)
You can also use str_sub() to count backwards from the final character in a string, which counts as -1. So we could also accomplish our objective this way:
Simple feature collection with 28 features and 4 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 278137.2 ymin: 1608191 xmax: 292969.5 ymax: 1619315
Projected CRS: WGS 84 / UTM zone 51N
# A tibble: 28 × 5
DISTRICT geom name value year
* <chr> <MULTIPOLYGON [m]> <chr> <int> <chr>
1 ERMITA (((283305.2 1614147, 283304.8 1614108, 283251.5… Pop_… 7051 1876
2 ERMITA (((283305.2 1614147, 283304.8 1614108, 283251.5… Pop_… 3510 1818
3 SANTA ANA (((284105.8 1612536, 284111.9 1612559, 284112.2… Pop_… 2554 1876
4 SANTA ANA (((284105.8 1612536, 284111.9 1612559, 284112.2… Pop_… 4355 1818
5 SANTA CRUZ (((283108.2 1617889, 283053.9 1617891, 283050.1… Pop_… 11300 1876
6 SANTA CRUZ (((283108.2 1617889, 283053.9 1617891, 283050.1… Pop_… 5331 1818
7 PATEROS (((291724.9 1609870, 291787.3 1609868, 291799.6… Pop_… 7522 1876
8 PATEROS (((291724.9 1609870, 291787.3 1609868, 291799.6… Pop_… 3840 1818
9 QUIAPO (((282495.3 1614532, 282449.1 1614504, 282422.2… Pop_… 6480 1876
10 QUIAPO (((282495.3 1614532, 282449.1 1614504, 282422.2… Pop_… 3468 1818
# ℹ 18 more rows
Facetting in ggplot()
Okay, now we’ve got our variable for facetting, let’s make our plot:
ggplot(data = man_cent_city_pop_long, aes(fill = value)) +geom_sf() +scale_fill_viridis_c("Population") +#there are two ways to facet: grid and wrap; grid is best if we're grouping#by two variables, so we'll use wrap here. We define the facets using a ~.#We can also set the number of columns and rows to use, though with only#two time points, there isn't much need here:facet_wrap(~year) +annotation_scale(location ="br") +theme_minimal()
Task 2: Using these techniques, make a series of maps showing populations in the metropolitan Manila region over the 20th Century using the data in the man_reg_pop object. When you create the long version of the man_reg_pop object, be sure to call it man_reg_pop_long, because the rest of the code will assume that’s what you’ve named it. Your result should look something like this:
Vector geoprocessing with the sf package
Okay, so now we’ve practiced our map skills a bit more, let’s get into the real focus of today’s tutorial. We’re going to look at how we can chain together several different geoprocessing tools to try to analyze how well Manila’s residents have been served by rail transportation over time.
Objective: Estimate the share of Manila’s population that was within the walkshed of Tranvia lines in late 19th century Manila.
Walkshed or pedestrian sheds are the areas within an acceptable walking distance of a site of interest, such as a rail line. Based on this recent article in the Philippine Transportation Journal the typical walkshed in Manila is probably about 800 meters, which turns out to be a pretty common walkshed distance in other cities, as well.
How we do it:
Loading required package: DiagrammeR
Warning: package 'DiagrammeR' was built under R version 4.3.3
Okay, so here we go!
Buffering in sf
Buffering in the sf package is handled with the st_buffer() function. It takes two arguments: first, the layer whose features you want to buffer, and, second, the distance of the buffer, in the units used by the layer’s projection, unless the layer is in latitude and longitude, in which case the units will be in meters.
Let’s use the st_crs() function to check the object’s projection:
st_crs(man_tranvia)
Coordinate Reference System:
User input: WGS 84
wkt:
GEOGCRS["WGS 84",
ENSEMBLE["World Geodetic System 1984 ensemble",
MEMBER["World Geodetic System 1984 (Transit)"],
MEMBER["World Geodetic System 1984 (G730)"],
MEMBER["World Geodetic System 1984 (G873)"],
MEMBER["World Geodetic System 1984 (G1150)"],
MEMBER["World Geodetic System 1984 (G1674)"],
MEMBER["World Geodetic System 1984 (G1762)"],
MEMBER["World Geodetic System 1984 (G2139)"],
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]],
ENSEMBLEACCURACY[2.0]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
CS[ellipsoidal,2],
AXIS["geodetic latitude (Lat)",north,
ORDER[1],
ANGLEUNIT["degree",0.0174532925199433]],
AXIS["geodetic longitude (Lon)",east,
ORDER[2],
ANGLEUNIT["degree",0.0174532925199433]],
USAGE[
SCOPE["Horizontal component of 3D system."],
AREA["World."],
BBOX[-90,-180,90,180]],
ID["EPSG",4326]]
Okay; this is in WGS 1984, so in this case, the units are meters:
tranvia_walkshed <-st_buffer(man_tranvia, dist =800)#Simple way to make a quick plot to check how things are going:plot(st_geometry(tranvia_walkshed))
Hmmm . . . why all the different circles? Well, st_buffer() buffers each individual object in any layer you run it on, and the Tranvia layer is broken into lots of little line segments:
plot(man_tranvia[,c("Name", "geom")])
Because all those colors are considered different entities, we need to dissolve the Tranvia layer before buffering it. Dissolving can be accomplished with the summarise() function, which is a general-use function for aggregating data in tables. Just so you understand how it works, let’s demonstrate its most common use, before using it to dissolve our lines.
In this example, we’ll use the group_by() function to split the man_reg_pop_long data into groups by year and the summarise() function to add up the population in each year so we can see changes in Greater Manila’s population over time:
man_reg_pop_long %>%group_by(year) %>%#This next line removes the polygon data from the table, becuase we don't#need it right now:st_drop_geometry() %>%summarise(pop =sum(value))
# A tibble: 6 × 2
year pop
<chr> <dbl>
1 1903 327283
2 1939 993889
3 1948 1569128
4 1960 2462488
5 1970 3966695
6 1975 4970006
In addition to allowing us to summarize values across groups, summarise() will also aggregate the geometries for each observation in a group if the table it is applied to is an sf object. Let’s say we wanted to use this to aggregate all the polygons in the man_cent_city_pop object:
Notice that this time we didn’t use group_by(). If we don’t have groups. summarise() will just dissolve the entire object into a single polygon or line.
So now we know how to address the dissolving issue. Here we go:
Now that we’ve constructed our buffer, we’re ready to compute the original areas of the central Manila districts. The reason that we’re doing this computation is that we’re going to use the proportion of the area of a district within the Tranvia walkshed to estimate the population there. Now, we should be very clear that this approach assumes that population is spread evenly throughout each district, which is quite the assumption. Still, this is often the best technique we have available for using data collected for specific areal units to estimate values for other units.
To compute areas of sf objects, we just need to use the st_area() function. The function will return a vector of computed areas in map units. So let’s check this object’s projection, just to be safe:
st_crs(man_cent_city_pop)
Coordinate Reference System:
User input: WGS 84 / UTM zone 51N
wkt:
PROJCRS["WGS 84 / UTM zone 51N",
BASEGEOGCRS["WGS 84",
ENSEMBLE["World Geodetic System 1984 ensemble",
MEMBER["World Geodetic System 1984 (Transit)"],
MEMBER["World Geodetic System 1984 (G730)"],
MEMBER["World Geodetic System 1984 (G873)"],
MEMBER["World Geodetic System 1984 (G1150)"],
MEMBER["World Geodetic System 1984 (G1674)"],
MEMBER["World Geodetic System 1984 (G1762)"],
MEMBER["World Geodetic System 1984 (G2139)"],
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]],
ENSEMBLEACCURACY[2.0]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4326]],
CONVERSION["UTM zone 51N",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",0,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",123,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",0.9996,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",500000,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",0,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["(E)",east,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["(N)",north,
ORDER[2],
LENGTHUNIT["metre",1]],
USAGE[
SCOPE["Engineering survey, topographic mapping."],
AREA["Between 120°E and 126°E, northern hemisphere between equator and 84°N, onshore and offshore. China. Indonesia. Japan. North Korea. Philippines. Russian Federation. South Korea. Taiwan."],
BBOX[0,120,84,126]],
ID["EPSG",32651]]
Okay, so this is in a system called Universal Trans Mercator, or UTM. It’s basically a large grid superimposed on the planet, where each part of the grid has its own projection in meters. It’s often used for GPS applications.
Because we’re working with a projection in meters, the areas will be in square meters:
Now we’re ready to intersect the Tranvia buffer and the districts. Intersection is done with the st_intersection() function and returns an object that has data from both intersected objects:
Error in geos_op2_geom("intersection", x, y, ...): st_crs(x) == st_crs(y) is not TRUE
head(tranvia_walkshed_int)
Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'head': object 'tranvia_walkshed_int' not found
Reprojecting in sf
Oh, right! We forgot that we have one object that is in WGS 1984 and one that is in a local UTM projection. Before we can intersect them, we’ll need to get them in the same projection.
To change an sf object’s projection, we use the st_transform() function. All we need to do is tell it what object we want to reproject, and what CRS we want to reproject it into.
Because one of our projections is in meters and we’ll be doing further spatial analysis, we should reproject that object that is currently in WGS 1984 into the same projection as the object that’s in meters. Here’s how:
tranvia_walkshed <- tranvia_walkshed %>%st_transform(crs =st_crs(man_cent_city_pop))#Check the resultst_crs(tranvia_walkshed)
Coordinate Reference System:
User input: WGS 84 / UTM zone 51N
wkt:
PROJCRS["WGS 84 / UTM zone 51N",
BASEGEOGCRS["WGS 84",
ENSEMBLE["World Geodetic System 1984 ensemble",
MEMBER["World Geodetic System 1984 (Transit)"],
MEMBER["World Geodetic System 1984 (G730)"],
MEMBER["World Geodetic System 1984 (G873)"],
MEMBER["World Geodetic System 1984 (G1150)"],
MEMBER["World Geodetic System 1984 (G1674)"],
MEMBER["World Geodetic System 1984 (G1762)"],
MEMBER["World Geodetic System 1984 (G2139)"],
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]],
ENSEMBLEACCURACY[2.0]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4326]],
CONVERSION["UTM zone 51N",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",0,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",123,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",0.9996,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",500000,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",0,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["(E)",east,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["(N)",north,
ORDER[2],
LENGTHUNIT["metre",1]],
USAGE[
SCOPE["Engineering survey, topographic mapping."],
AREA["Between 120°E and 126°E, northern hemisphere between equator and 84°N, onshore and offshore. China. Indonesia. Japan. North Korea. Philippines. Russian Federation. South Korea. Taiwan."],
BBOX[0,120,84,126]],
ID["EPSG",32651]]
You’ll notice that we used st_crs() within the st_transform() function to extract the CRS information for our target object. That’s an easy way to make sure that you are telling Rexactly which CRS you want to reproject into.
Warning: attribute variables are assumed to be spatially constant throughout
all geometries
head(tranvia_walkshed_int)
Simple feature collection with 6 features and 18 fields
Geometry type: GEOMETRY
Dimension: XY
Bounding box: xmin: 281798.2 ymin: 1611805 xmax: 287101.9 ymax: 1618602
Projected CRS: WGS 84 / UTM zone 51N
OBJECTID_1 OBJECTID BRGY_CODE RegNum RegName
1 1 3818 133908010 NATIONAL CAPITAL REGION NATIONAL CAPITAL REGION
2 2 3815 133914033 NATIONAL CAPITAL REGION NATIONAL CAPITAL REGION
3 3 3641 133905028 NATIONAL CAPITAL REGION NATIONAL CAPITAL REGION
4 5 3990 133903018 NATIONAL CAPITAL REGION NATIONAL CAPITAL REGION
5 6 3639 133906059 NATIONAL CAPITAL REGION NATIONAL CAPITAL REGION
6 7 3997 133912034 NATIONAL CAPITAL REGION NATIONAL CAPITAL REGION
RegShort ProvName MuniName BrgyName CODE
1 NCR NATIONAL CAPITAL REGION CITY OF MANILA Barangay 668 133908010
2 NCR NATIONAL CAPITAL REGION CITY OF MANILA Barangay 777 133914033
3 NCR NATIONAL CAPITAL REGION CITY OF MANILA Barangay 326 133905028
4 NCR NATIONAL CAPITAL REGION CITY OF MANILA Barangay 309 133903018
5 NCR NATIONAL CAPITAL REGION CITY OF MANILA Barangay 453 133906059
6 NCR NATIONAL CAPITAL REGION CITY OF MANILA Barangay 867 133912034
DISTRICT DISTRICT_O Shape_Leng Shape_Area NIPADIST Pop_1876 Pop_1818
1 ERMITA ERMITA 7405.860 1397424.4 0 7051 3510
2 SANTA ANA SANTA ANA 11966.471 3260249.9 0 2554 4355
3 SANTA CRUZ STA. CRUZ 13286.208 2941221.3 1 11300 5331
4 QUIAPO QUIAPO 4580.653 741649.4 0 6480 3468
5 SAMPALOC SAMPALOC 15781.047 7300498.9 1 7090 3357
6 PANDACAN PANDACAN 7240.278 1521772.5 0 4874 2469
area_dist geom
1 1397424.4 [m^2] MULTIPOLYGON (((283304.8 16...
2 3260249.9 [m^2] MULTIPOLYGON (((285085.2 16...
3 2941221.3 [m^2] POLYGON ((282862.8 1618400,...
4 741649.4 [m^2] MULTIPOLYGON (((282449.1 16...
5 7300498.9 [m^2] MULTIPOLYGON (((287018.4 16...
6 1521772.5 [m^2] MULTIPOLYGON (((285915.9 16...
Shoot! Now we have another problem - we have an invalid polygon in our intersection layer. Invalid polygons are pretty common - they usually happen when someone accidentally puts a polygon vertex inside the polygon itself. Fortunately, we can use the st_make_valid() function to try to fix this:
Warning: attribute variables are assumed to be spatially constant throughout
all geometries
head(tranvia_walkshed_int)
Simple feature collection with 6 features and 18 fields
Geometry type: GEOMETRY
Dimension: XY
Bounding box: xmin: 281798.2 ymin: 1611805 xmax: 287101.9 ymax: 1618602
Projected CRS: WGS 84 / UTM zone 51N
OBJECTID_1 OBJECTID BRGY_CODE RegNum RegName
1 1 3818 133908010 NATIONAL CAPITAL REGION NATIONAL CAPITAL REGION
2 2 3815 133914033 NATIONAL CAPITAL REGION NATIONAL CAPITAL REGION
3 3 3641 133905028 NATIONAL CAPITAL REGION NATIONAL CAPITAL REGION
4 5 3990 133903018 NATIONAL CAPITAL REGION NATIONAL CAPITAL REGION
5 6 3639 133906059 NATIONAL CAPITAL REGION NATIONAL CAPITAL REGION
6 7 3997 133912034 NATIONAL CAPITAL REGION NATIONAL CAPITAL REGION
RegShort ProvName MuniName BrgyName CODE
1 NCR NATIONAL CAPITAL REGION CITY OF MANILA Barangay 668 133908010
2 NCR NATIONAL CAPITAL REGION CITY OF MANILA Barangay 777 133914033
3 NCR NATIONAL CAPITAL REGION CITY OF MANILA Barangay 326 133905028
4 NCR NATIONAL CAPITAL REGION CITY OF MANILA Barangay 309 133903018
5 NCR NATIONAL CAPITAL REGION CITY OF MANILA Barangay 453 133906059
6 NCR NATIONAL CAPITAL REGION CITY OF MANILA Barangay 867 133912034
DISTRICT DISTRICT_O Shape_Leng Shape_Area NIPADIST Pop_1876 Pop_1818
1 ERMITA ERMITA 7405.860 1397424.4 0 7051 3510
2 SANTA ANA SANTA ANA 11966.471 3260249.9 0 2554 4355
3 SANTA CRUZ STA. CRUZ 13286.208 2941221.3 1 11300 5331
4 QUIAPO QUIAPO 4580.653 741649.4 0 6480 3468
5 SAMPALOC SAMPALOC 15781.047 7300498.9 1 7090 3357
6 PANDACAN PANDACAN 7240.278 1521772.5 0 4874 2469
area_dist geom
1 1397424.4 [m^2] MULTIPOLYGON (((282863.7 16...
2 3260249.9 [m^2] MULTIPOLYGON (((285085.2 16...
3 2941221.3 [m^2] POLYGON ((282862.8 1618400,...
4 741649.4 [m^2] MULTIPOLYGON (((282422.2 16...
5 7300498.9 [m^2] MULTIPOLYGON (((287018.4 16...
6 1521772.5 [m^2] MULTIPOLYGON (((285915.9 16...
Great! Now we need to compute the proportion of the original district’s area that is within the walkshed and multiply the district population by that proportion to estimate the population of each district in the walkshed. Since we’re looking at the Tranvia system as it was circa 1905, let’s use the Pop_1876 column, as this is the closest population estimate we have:
See that NA in there? That’s a missing value - this is an area that we just didn’t have population data for in the two time periods.
So now we can get our (very!) rough estimate by adding up the walkshed populations. We can then compare that to the total population to get some context:
#na.rm = TRUE tells R to ignore missing valuessum(tranvia_walkshed_int$walkshed_pop_1876, na.rm=TRUE)
86429.79 [1]
sum(tranvia_walkshed_int$Pop_1876, na.rm=TRUE)
[1] 100803
So this looks pretty good - about 86% of the central city’s population are estimated to be within the Tranvia walkshed.
But what about the rest of the area? The man_reg_pop object has population estimates for municipalities in greater Manila as of 1903 - how were they served by the Tranvia as it was in 1905? The following codeblock reproduces the method we just used for the central city to look at the whole region using the 1903 data. I’m putting it all together in one code block so you can have a better sense of how it flows:
#| error: true#This object is in WGS 1984, so we need to project it in order to compute#areas properly:man_reg_pop <- man_reg_pop %>%st_transform(crs =st_crs(tranvia_walkshed))#Now we can compute the areas; unfortunately, it's hard to use st_area()#in a pipeline, which is why I broke the flow here:man_reg_pop_walkshed <- man_reg_pop %>%mutate(area_muni =st_area(man_reg_pop)) %>%st_intersection(tranvia_walkshed)
Warning: attribute variables are assumed to be spatially constant throughout
all geometries
Shoot another invalid polygon! Well, we know how to fix that, now:
#This object is in WGS 1984, so we need to project it in order to compute#areas properly:man_reg_pop <- man_reg_pop %>%st_transform(crs =st_crs(tranvia_walkshed)) %>%st_make_valid()#Now we can compute the areas; unfortunately, it's hard to use st_area()#in a pipeline, which is why I broke the flow here:man_reg_pop_walkshed <- man_reg_pop %>%mutate(area_muni =st_area(man_reg_pop)) %>%st_intersection(tranvia_walkshed)
Warning: attribute variables are assumed to be spatially constant throughout
all geometries
Task 3: Wow! That was a lot of stuff. Take the code in the previous codeblock and add line-by-line annotations to it explaining what each line is doing and why. Be sure to note any places where things are slightly different from the way we did it in the prevous section.
Areal interpolation using the st_interpolate_aw() function
In the previous section, we explored the areal interpolation process step by step, in part because it’s a great way to demonstrate how to perform a bunch of the basic vector geoprocessing operations in R. But, as you can see from the code block above, it does require quite a few steps.
Fortunately, there is a function st_interpolate_aw(), that does most of the work for us. Here’re the function’s arguments: * x: the sf object whose data we want to interpolate to another geometry (so in this case, this would be man_reg_pop) * to: the sf object to whose geometries we want to interpolate the data from “x” (in this case, tranvia_walkshed) * extensive: a TRUE/FALSE flag that indicates whether or not the data to interpolate are extensive, which means they are a simple count, like population or (if we set it to FALSE), intensive, which means it is a rate, like percentage (we’ll look at intensive variables more in future tutorials)
Okay, let’s test this out:
#Notice that we need to specify what variable or variables we want to#interpolate from the sf object by extracting only those columns:man_reg_pop_walkshed <-st_interpolate_aw(x = man_reg_pop[,"P1903"],to = tranvia_walkshed,extensive =TRUE)
Warning in st_interpolate_aw.sf(x = man_reg_pop[, "P1903"], to =
tranvia_walkshed, : st_interpolate_aw assumes attributes are constant or
uniform over areas of x
#The output object will keep the original object's variable name:man_reg_pop_walkshed$P1903
[1] 175687.2
As you can see, the output calclation is identical to the long method. So cool!
Task 4: Using the methods you have been taught so far, estimate the percentage of the greater Manila population that would have been in the walkshed of the light rail lines proposed in both the Sigurd Grava and UTSMMA plans. How do these compare with how the Tranvia was doing as of 1905?
Task 5: Create a polished map comparing the Tranvia and the 1970s plans.