2 + 2 #Any part of a line in R that starts with a # will be ignored with the code is run.[1] 4
#We use lines (or parts of lines) that start with # to make notes in the code.In this tutorial, have a gentle introduction to coding in R and help you get familiar with how the system works.
It is important to remember that writing code in R is very tricky - every capital letter and punctuation mark needs to be in the right place, or the code won’t work.
Luckily, the easiest way to learn to code in R is to look at examples and then to switch out parts of those examples to see what happens. You can use that strategy to learn how the R language works for the tasks that you are trying to accomplish. That’s the strategy that I’m going to be using to teach you in this class.
The interesting thing about R is that pretty much everything you need can be written in R scripts. The R script is a series of commands for R that it follows to conduct analysis, build maps, and so on.
Scripts are really useful because they are also a record of every step of your analysis. That means that if you wind up making a mistake that you don’t notice until much later, all you need to do is find the source of the problem in your script, fix it, and rerun your code from there. That’s a lot better than having to redo all your work from scratch!
I’m using a special format of R scripts called a Quarto Document, which allows you to combine the script with HTML text. When I generate the HTML, it runs all the code blocks in the document in order and puts the results below. That means you can check what you’re getting when you run the code against what I got when I ran it.
Quarto Documents are handy for making class materials, but they can also be used to make general websites or even presentations. We’ll see how to use those in this class.
R languageR code in RStudioRSome of the data for this tutorial will be downloaded directly using R, but there are two datasets that we will reading in from a file:
These Geotiffs are available for download in this zip file.
Let’s start with your first line of code. You can copy and paste the following into your script editor in RStudio to run it:
2 + 2 #Any part of a line in R that starts with a # will be ignored with the code is run.[1] 4
#We use lines (or parts of lines) that start with # to make notes in the code.Once you have the code pasted in your script editor, you highlight the code with the cursor and then click “Run”, which is in the upper right-hand corner of this script editor window, to run it. Try running the code above now.
Congratulations! That’s your first code. This is pretty basic. Among many other things, R works as a simple calculator. It also obeys standard order of operations rules. Run the lines below for examples:
2 + (3 * 2)[1] 8
(2 + 3) * 2[1] 10
4 / 2[1] 2
(4 + 2) / 2[1] 3
4 + (2 / 2)[1] 5
Okay, so you get the idea!
R is an object-oriented programming language. What that means is that R works by creating snippets of information in the computer memory that we call “objects” and then doing things to those objects.
I often think about the data that I am working on almost like a physical structure that I’m using R to build and manipulate. Another helpful metaphor might be to think of objects as little digital creatures that you use R to communicate with.
However it feels easiest to think about it, let’s introduce your first object by running the following line:
x <- 5The <- sign in that line is the assignment function. In R, functions are pieces of code that do things to objects. In those terms, all the mathematical symbols we used above would count as functions.
The assignment function takes whatever is on the right-hand side of the function and creates an object according to the instructions on the left-hand side that contains the thing on the right-hand side.
In this case, we created an object we called x and gave it the content of the number 5. We can then type in the name of the object to ask R to print back its contents:
x[1] 5
Importantly, R is case sensitive. That means it cares whether or not letters are capitalized. Try running this:
XError in eval(expr, envir, enclos): object 'X' not found
R should have given you this message: “Error: object ‘X’ not found”. When R says an object is not found, it’s telling you that something that you think is read into the environment (that is, the collection of objects that R has in memory at the moment) isn’t actually there.
If you look in the Environment tab in RStudio, which is probably to the right of your script editing window, you can see a list of all the objects that you have created in the environment. You should see that there is an object called “x” there. Notice that “x” isn’t capitalized. That’s why R can’t find it. R considers x and X to be completely different.
You can pack pretty much anything that R can hold in memory into an object. We’ll be putting whole layers of spatial data into objects soon! For now, let me just show you a bit of how this works:
x <- c(1, 5, 10)In this line, we used the c, or concatenate function, which takes a list of items separated by commas and puts them into a single object called a vector (not to be confused with the idea of a vector layer in GIS), which is just a line of data:
x[1] 1 5 10
The parentheses and commas are really, really important in R. Commas separate different inputs into a function. We call these separate inputs arguments. The parentheses tell R where the arguments for each function start and end.
See what happens if we mess this up:
x <- c(1 5, 10)
x <- c(1, 5) 10Error: <text>:1:10: unexpected numeric constant
1: x <- c(1 5
^
Both of those situations cause R to say Error: unexpected . . .. When you see an error that starts that way, it’s R’s way of telling you that there’s something wrong with the way you typed your code and that things are out of order.
RStudio actually helps you keep track of coding problems. In the first of the two lines above, for example, you should see a red underline under 5. That’s RStudio telling you that something fishy is going on with your code there (in this case, you’re missing the comma after 1).
Okay, let’s get our x back in working order:
x <- c(1, 2, 3, 4, 5, 10)Once you’ve created a vector, you can use functions to perform operations on all the elements of the vector at once:
x <- c(1, 5, 10)
x + 2[1] 3 7 12
x * 2[1] 2 10 20
x ^ 2[1] 1 25 100
x / 2[1] 0.5 2.5 5.0
You can also use statistical functions on the whole vector. Here are some common handy examples:
min(x)[1] 1
mean(x)[1] 5.333333
median(x)[1] 5
max(x)[1] 10
sd(x) # Standard Deviation[1] 4.50925
summary(x) Min. 1st Qu. Median Mean 3rd Qu. Max.
1.000 3.000 5.000 5.333 7.500 10.000
To turn R into a powerful GIS engine, we need to add some packages Packages in R add additional functionality to your code, similar to plugins in QGIS. The code below will help you install and load the packages we’ll be using today:
require(rnaturalearth) # Allows you to import vector layers from Natural EarthLoading required package: rnaturalearth
Warning: package 'rnaturalearth' was built under R version 4.3.3
require(tidyverse) # Useful data cleaning, management, and plotting functionsLoading required package: tidyverse
Warning: package 'ggplot2' was built under R version 4.3.3
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.4
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.0
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
require(sf) # Functions for vector data reading, writing, and geoprocessingLoading required package: sf
Linking to GEOS 3.11.2, GDAL 3.7.2, PROJ 9.3.0; sf_use_s2() is TRUE
require(ggmap) # Functions for making cool mapsLoading required package: ggmap
ℹ Google's Terms of Service: <https://mapsplatform.google.com>
Stadia Maps' Terms of Service: <https://stadiamaps.com/terms-of-service/>
OpenStreetMap's Tile Usage Policy: <https://operations.osmfoundation.org/policies/tiles/>
ℹ Please cite ggmap if you use it! Use `citation("ggmap")` for details.
require(terra) # Functions for raster input, output, and geoprocessingLoading required package: terra
Warning: package 'terra' was built under R version 4.3.3
terra 1.7.78
Attaching package: 'terra'
The following object is masked from 'package:ggmap':
inset
The following object is masked from 'package:tidyr':
extract
R reads in data using information on the paths to files on your computer. It’s helpful to set the working directory, which gives R a default folder for reading and writing data for the session.
You can set the working directory to the folder where your script and data are located using the below code. But how do you find the PATH_TO_FILE? What I recommend for beginners is to save your R script and all the data you are using into a single folder somewhere on your computer. Then, if you are in RStudio, you can click Session -> Set Working Directory -> To Source File Location. RStudio will automatically generate a line of code like the one below. Then you can copy and paste that code into your script:
setwd("/cloud/project/Intro_Geospatial_in_R_Data")Error in setwd("/cloud/project/Intro_Geospatial_in_R_Data"): cannot change working directory
Once you have set your working directory, you can read in the PM2.5 raster layer:
pm25 <- rast("GlobalPM252020.tif")
plot(pm25)Map algebra is very straightforward in the terra package, which is now the main workhorse package for working with raster data in R. terra allows you to treat the entire raster as if it were just another value, so you can perform operations on each pixel in a raster by typing some simple code.
For example, we can use a conditional statement to create a new raster showing only the parts of the world that fail to meet the US EPA’s maximum safe standard for PM 2.5 annual average levels (12 micrograms per cubic meter):
abvepa <- pm25 >= 12
plot(abvepa)We can also rescale the PM2.5 raster in terms of the percentage of the EPA’s maximum:
epapct <- (pm25 / 12) * 100
plot(epapct)Now we’re going to use our population raster to estimate exposure to unsafe levels of PM 2.5. To start, we need to read in the gridded population layer:
pop <- rast("gpw2015.tif")
plot(pop)You’ll notice that it’s difficult to see much here – that’s because global population is very unevenly distributed, so some pixels in the raster have huge numbers of people, while many have very low values. One way to address this would be to covert these into orders of magnitude by computing a logarithm.
We can chain together series of functions where the output of one function is used as the input of another. This is accomplished with the pipeline operator: %>%. If you place the pipeline after a function, it will take that function’s output and feed it in as the first argument in the following function.
Using pipelines is the preferred way to write R code with sequences of operations because it makes the step clearer, almost like making your own flowchart in the code.
pop %>% #We start with the object that has the data we will be using
log(base=10) %>% #Then we do one operation
plot() #And then anotherBecause we’ll be making comparisons across rasters, we need to make sure that the two raster datasets are in the same coordinate reference system (CRS).
The pm25 raster is in latitude and longitude with a WGS 1984 datum, and we’ll set that first. We can look up WGS 1984 at (spatialreference.org)[spatialreference.org], and we can see that it has an EPSG code of 4326.
We can use the crs() function to set or print out CRS information for a terra raster object, like this:
crs(pm25) <- "epsg:4326"
crs(pm25)[1] "GEOGCRS[\"WGS 84\",\n ENSEMBLE[\"World Geodetic System 1984 ensemble\",\n MEMBER[\"World Geodetic System 1984 (Transit)\"],\n MEMBER[\"World Geodetic System 1984 (G730)\"],\n MEMBER[\"World Geodetic System 1984 (G873)\"],\n MEMBER[\"World Geodetic System 1984 (G1150)\"],\n MEMBER[\"World Geodetic System 1984 (G1674)\"],\n MEMBER[\"World Geodetic System 1984 (G1762)\"],\n MEMBER[\"World Geodetic System 1984 (G2139)\"],\n ELLIPSOID[\"WGS 84\",6378137,298.257223563,\n LENGTHUNIT[\"metre\",1]],\n ENSEMBLEACCURACY[2.0]],\n PRIMEM[\"Greenwich\",0,\n ANGLEUNIT[\"degree\",0.0174532925199433]],\n CS[ellipsoidal,2],\n AXIS[\"geodetic latitude (Lat)\",north,\n ORDER[1],\n ANGLEUNIT[\"degree\",0.0174532925199433]],\n AXIS[\"geodetic longitude (Lon)\",east,\n ORDER[2],\n ANGLEUNIT[\"degree\",0.0174532925199433]],\n USAGE[\n SCOPE[\"Horizontal component of 3D system.\"],\n AREA[\"World.\"],\n BBOX[-90,-180,90,180]],\n ID[\"EPSG\",4326]]"
Now we need to check that the pop raster is in the same CRS:
crs(pop)[1] "GEOGCRS[\"WGS 84\",\n ENSEMBLE[\"World Geodetic System 1984 ensemble\",\n MEMBER[\"World Geodetic System 1984 (Transit)\"],\n MEMBER[\"World Geodetic System 1984 (G730)\"],\n MEMBER[\"World Geodetic System 1984 (G873)\"],\n MEMBER[\"World Geodetic System 1984 (G1150)\"],\n MEMBER[\"World Geodetic System 1984 (G1674)\"],\n MEMBER[\"World Geodetic System 1984 (G1762)\"],\n MEMBER[\"World Geodetic System 1984 (G2139)\"],\n ELLIPSOID[\"WGS 84\",6378137,298.257223563,\n LENGTHUNIT[\"metre\",1]],\n ENSEMBLEACCURACY[2.0]],\n PRIMEM[\"Greenwich\",0,\n ANGLEUNIT[\"degree\",0.0174532925199433]],\n CS[ellipsoidal,2],\n AXIS[\"geodetic latitude (Lat)\",north,\n ORDER[1],\n ANGLEUNIT[\"degree\",0.0174532925199433]],\n AXIS[\"geodetic longitude (Lon)\",east,\n ORDER[2],\n ANGLEUNIT[\"degree\",0.0174532925199433]],\n USAGE[\n SCOPE[\"Horizontal component of 3D system.\"],\n AREA[\"World.\"],\n BBOX[-90,-180,90,180]],\n ID[\"EPSG\",4326]]"
Great! They are both in WGS 1984, so we can safely continue with our analysis. Resample the raster to ensure matching resolution:
Remember that we had an object above called abvepa, which covered all the areas where average annual PM 2.5 levels were above US EPA standards? Well, that is a binary (1/0) raster, there the 1 values are the places that above the EPA limits.
We can therefore use some map algebra to make a new raster that shows population in places above the EPA’s safe limits of PM 2.5, again converting population to powers of ten:
popabvepa <- pop * abvepaError: [*] extents do not match
plot(popabvepa)Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'plot': object 'popabvepa' not found
Now you should have come across the same error as above. What this error is telling you is that, despite that the rasters are in the same projection, they don’t cover exactly the same area, which they have to if we are to conduct map algebra with both of them.
We can have a look at the extent of each raster using the ext() function:
ext(abvepa)SpatExtent : -179.999996947, 179.999996947, -55.000000762, 67.999996947 (xmin, xmax, ymin, ymax)
ext(pop)SpatExtent : -180, 180, -90, 89.9999999999999 (xmin, xmax, ymin, ymax)
When you compare the two rasters, is clear that abvepa covers less of the planet’s surface than the pop raster does.
That means we’ll need to crop() (that’s what terra calls clipping) the pop raster to match the abvepa raster before continuing.
pop_crop <- crop(pop, abvepa) #You start with the raster you are cropping, followed
#by the raster you are using to crop it.Now we should be able to do our map algebra:
popabvepa <- pop_crop * abvepaError: [*] number of rows and/or columns do not match
plot(popabvepa)Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'plot': object 'popabvepa' not found
Uh-oh! It looks like we’ve got another problem. The issue here is that to perform map algebra, rasters must have the same resolution, and overlapping pixels. That means that we need to convert one of the rasters to match the other.
To change the resolution of a raster, we use the resample() function, which applies your choice of several algorithms for estimating the value of one raster in the pixels of another. This is basically how we get rasters to talk to each other.
pop_resample <- resample(pop_crop, abvepa)Now we can do our calculation:
pop_abvepa <- log(pop_resample, base=10) * abvepa
plot(pop_abvepa)Now, we did this all step-by-step just so you could see how it works, but we could also have done all that as a pipeline, like this:
pop_resample <- pop %>%
crop(abvepa) %>%
resample(abvepa) %>%
log(base=10)
pop_abvepa <- pop_resample * abvepa
plot(pop_abvepa)So now you’ve got a sense of how raster data work in R, so it’s time to move on to vector data. Just like terra is the workhorse for raster data in R, the Simple Features, or sf, package is the workhorse for vector data, so we’ll be using it a bit here.
The nice thing is that sf and terra are designed to play well together, so it’s pretty simple to combine both raster and vector data in any projects that you work on that use R if you’re working with those two packages.
First, we need some vector data to work with. We’ll use the rnaturalearth package to import some country boundaries directly from (https://www.naturalearthdata.com/)[https://www.naturalearthdata.com/], which provides some free layers for basic mapping.
We can use the ne_countries() function to import country boundaries from Natural Earth:
countries <- ne_countries(scale = "large", returnclass = "sf")Let’s have a look at what an sf object is made of. The nice thing is that R basically treats it like any other table with data in it. It’s just that one of the columns in the table contains the information required to draw the feature whose data make up the row in question.
Here’s how you can look at the first few lines of any tabular object in R:
head(countries)Simple feature collection with 6 features and 168 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -109.4537 ymin: -55.9185 xmax: 140.9776 ymax: 7.35578
Geodetic CRS: WGS 84
featurecla scalerank labelrank sovereignt sov_a3 adm0_dif level
1 Admin-0 country 0 2 Indonesia IDN 0 2
2 Admin-0 country 0 3 Malaysia MYS 0 2
3 Admin-0 country 0 2 Chile CHL 0 2
4 Admin-0 country 0 3 Bolivia BOL 0 2
5 Admin-0 country 0 2 Peru PER 0 2
6 Admin-0 country 0 2 Argentina ARG 0 2
type tlc admin adm0_a3 geou_dif geounit gu_a3 su_dif
1 Sovereign country 1 Indonesia IDN 0 Indonesia IDN 0
2 Sovereign country 1 Malaysia MYS 0 Malaysia MYS 0
3 Sovereign country 1 Chile CHL 0 Chile CHL 0
4 Sovereign country 1 Bolivia BOL 0 Bolivia BOL 0
5 Sovereign country 1 Peru PER 0 Peru PER 0
6 Sovereign country 1 Argentina ARG 0 Argentina ARG 0
subunit su_a3 brk_diff name name_long brk_a3 brk_name brk_group
1 Indonesia IDN 0 Indonesia Indonesia IDN Indonesia <NA>
2 Malaysia MYS 0 Malaysia Malaysia MYS Malaysia <NA>
3 Chile CHL 0 Chile Chile CHL Chile <NA>
4 Bolivia BOL 0 Bolivia Bolivia BOL Bolivia <NA>
5 Peru PER 0 Peru Peru PER Peru <NA>
6 Argentina ARG 0 Argentina Argentina ARG Argentina <NA>
abbrev postal formal_en formal_fr name_ciawf note_adm0
1 Indo. INDO Republic of Indonesia <NA> Indonesia <NA>
2 Malay. MY Malaysia <NA> Malaysia <NA>
3 Chile CL Republic of Chile <NA> Chile <NA>
4 Bolivia BO Plurinational State of Bolivia <NA> Bolivia <NA>
5 Peru PE Republic of Peru <NA> Peru <NA>
6 Arg. AR Argentine Republic <NA> Argentina <NA>
note_brk name_sort name_alt mapcolor7 mapcolor8 mapcolor9 mapcolor13
1 <NA> Indonesia <NA> 6 6 6 11
2 <NA> Malaysia <NA> 2 4 3 6
3 <NA> Chile <NA> 5 1 5 9
4 <NA> Bolivia <NA> 1 5 2 3
5 <NA> Peru <NA> 4 4 4 11
6 <NA> Argentina <NA> 3 1 3 13
pop_est pop_rank pop_year gdp_md gdp_year economy
1 270625568 17 2019 1119190 2019 4. Emerging region: MIKT
2 31949777 15 2019 364681 2019 6. Developing region
3 18952038 14 2019 282318 2019 5. Emerging region: G20
4 11513100 14 2019 40895 2019 5. Emerging region: G20
5 32510453 15 2019 226848 2019 5. Emerging region: G20
6 44938712 15 2019 445445 2019 5. Emerging region: G20
income_grp fips_10 iso_a2 iso_a2_eh iso_a3 iso_a3_eh iso_n3
1 4. Lower middle income ID ID ID IDN IDN 360
2 3. Upper middle income MY MY MY MYS MYS 458
3 3. Upper middle income CI CL CL CHL CHL 152
4 4. Lower middle income BL BO BO BOL BOL 068
5 3. Upper middle income PE PE PE PER PER 604
6 3. Upper middle income AR AR AR ARG ARG 032
iso_n3_eh un_a3 wb_a2 wb_a3 woe_id woe_id_eh woe_note
1 360 360 ID IDN 23424846 23424846 Exact WOE match as country
2 458 458 MY MYS 23424901 23424901 Exact WOE match as country
3 152 152 CL CHL 23424782 23424782 Exact WOE match as country
4 068 068 BO BOL 23424762 23424762 Exact WOE match as country
5 604 604 PE PER 23424919 23424919 Exact WOE match as country
6 032 032 AR ARG 23424747 23424747 Exact WOE match as country
adm0_iso adm0_diff adm0_tlc adm0_a3_us adm0_a3_fr adm0_a3_ru adm0_a3_es
1 IDN <NA> IDN IDN IDN IDN IDN
2 MYS <NA> MYS MYS MYS MYS MYS
3 CHL <NA> CHL CHL CHL CHL CHL
4 BOL <NA> BOL BOL BOL BOL BOL
5 PER <NA> PER PER PER PER PER
6 ARG <NA> ARG ARG ARG ARG ARG
adm0_a3_cn adm0_a3_tw adm0_a3_in adm0_a3_np adm0_a3_pk adm0_a3_de adm0_a3_gb
1 IDN IDN IDN IDN IDN IDN IDN
2 MYS MYS MYS MYS MYS MYS MYS
3 CHL CHL CHL CHL CHL CHL CHL
4 BOL BOL BOL BOL BOL BOL BOL
5 PER PER PER PER PER PER PER
6 ARG ARG ARG ARG ARG ARG ARG
adm0_a3_br adm0_a3_il adm0_a3_ps adm0_a3_sa adm0_a3_eg adm0_a3_ma adm0_a3_pt
1 IDN IDN IDN IDN IDN IDN IDN
2 MYS MYS MYS MYS MYS MYS MYS
3 CHL CHL CHL CHL CHL CHL CHL
4 BOL BOL BOL BOL BOL BOL BOL
5 PER PER PER PER PER PER PER
6 ARG ARG ARG ARG ARG ARG ARG
adm0_a3_ar adm0_a3_jp adm0_a3_ko adm0_a3_vn adm0_a3_tr adm0_a3_id adm0_a3_pl
1 IDN IDN IDN IDN IDN IDN IDN
2 MYS MYS MYS MYS MYS MYS MYS
3 CHL CHL CHL CHL CHL CHL CHL
4 BOL BOL BOL BOL BOL BOL BOL
5 PER PER PER PER PER PER PER
6 ARG ARG ARG ARG ARG ARG ARG
adm0_a3_gr adm0_a3_it adm0_a3_nl adm0_a3_se adm0_a3_bd adm0_a3_ua adm0_a3_un
1 IDN IDN IDN IDN IDN IDN -99
2 MYS MYS MYS MYS MYS MYS -99
3 CHL CHL CHL CHL CHL CHL -99
4 BOL BOL BOL BOL BOL BOL -99
5 PER PER PER PER PER PER -99
6 ARG ARG ARG ARG ARG ARG -99
adm0_a3_wb continent region_un subregion
1 -99 Asia Asia South-Eastern Asia
2 -99 Asia Asia South-Eastern Asia
3 -99 South America Americas South America
4 -99 South America Americas South America
5 -99 South America Americas South America
6 -99 South America Americas South America
region_wb name_len long_len abbrev_len tiny homepart min_zoom
1 East Asia & Pacific 9 9 5 -99 1 0
2 East Asia & Pacific 8 8 6 -99 1 0
3 Latin America & Caribbean 5 5 5 -99 1 0
4 Latin America & Caribbean 7 7 7 -99 1 0
5 Latin America & Caribbean 4 4 4 -99 1 0
6 Latin America & Caribbean 9 9 4 -99 1 0
min_label max_label label_x label_y ne_id wikidataid name_ar
1 1.7 6.7 101.89295 -0.954404 1159320845 Q252 إندونيسيا
2 3.0 8.0 113.83708 2.528667 1159321083 Q833 ماليزيا
3 1.7 6.7 -72.31887 -38.151771 1159320493 Q298 تشيلي
4 3.0 7.5 -64.59343 -16.666015 1159320439 Q750 بوليفيا
5 2.0 7.0 -72.90016 -12.976679 1159321163 Q419 بيرو
6 2.0 7.0 -64.17333 -33.501159 1159320331 Q414 الأرجنتين
name_bn name_de name_en name_es name_fa name_fr name_el
1 ইন্দোনেশিয়া Indonesien Indonesia Indonesia اندونزی Indonésie Ινδονησία
2 মালয়েশিয়া Malaysia Malaysia Malasia مالزی Malaisie Μαλαισία
3 চিলি Chile Chile Chile شیلی Chili Χιλή
4 বলিভিয়া Bolivien Bolivia Bolivia بولیوی Bolivie Βολιβία
5 পেরু Peru Peru Perú پرو Pérou Περού
6 আর্জেন্টিনা Argentinien Argentina Argentina آرژانتین Argentine Αργεντινή
name_he name_hi name_hu name_id name_it name_ja name_ko
1 אינדונזיה इंडोनेशिया Indonézia Indonesia Indonesia インドネシア 인도네시아
2 מלזיה मलेशिया Malajzia Malaysia Malaysia マレーシア 말레이시아
3 צ'ילה चिली Chile Chili Cile チリ 칠레
4 בוליביה बोलिविया Bolívia Bolivia Bolivia ボリビア 볼리비아
5 פרו पेरू Peru Peru Perù ペルー 페루
6 ארגנטינה अर्जेण्टीना Argentína Argentina Argentina アルゼンチン 아르헨티나
name_nl name_pl name_pt name_ru name_sv name_tr name_uk
1 Indonesië Indonezja Indonésia Индонезия Indonesien Endonezya Індонезія
2 Maleisië Malezja Malásia Малайзия Malaysia Malezya Малайзія
3 Chili Chile Chile Чили Chile Şili Чилі
4 Bolivia Boliwia Bolívia Боливия Bolivia Bolivya Болівія
5 Peru Peru Peru Перу Peru Peru Перу
6 Argentinië Argentyna Argentina Аргентина Argentina Arjantin Аргентина
name_ur name_vi name_zh name_zht fclass_iso tlc_diff
1 انڈونیشیا Indonesia 印度尼西亚 印度尼西亞 Admin-0 country <NA>
2 ملائیشیا Malaysia 马来西亚 馬來西亞 Admin-0 country <NA>
3 چلی Chile 智利 智利 Admin-0 country <NA>
4 بولیویا Bolivia 玻利维亚 玻利維亞 Admin-0 country <NA>
5 پیرو Peru 秘鲁 秘魯 Admin-0 country <NA>
6 ارجنٹائن Argentina 阿根廷 阿根廷 Admin-0 country <NA>
fclass_tlc fclass_us fclass_fr fclass_ru fclass_es fclass_cn fclass_tw
1 Admin-0 country <NA> <NA> <NA> <NA> <NA> <NA>
2 Admin-0 country <NA> <NA> <NA> <NA> <NA> <NA>
3 Admin-0 country <NA> <NA> <NA> <NA> <NA> <NA>
4 Admin-0 country <NA> <NA> <NA> <NA> <NA> <NA>
5 Admin-0 country <NA> <NA> <NA> <NA> <NA> <NA>
6 Admin-0 country <NA> <NA> <NA> <NA> <NA> <NA>
fclass_in fclass_np fclass_pk fclass_de fclass_gb fclass_br fclass_il
1 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
2 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
3 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
4 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
5 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
6 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
fclass_ps fclass_sa fclass_eg fclass_ma fclass_pt fclass_ar fclass_jp
1 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
2 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
3 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
4 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
5 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
6 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
fclass_ko fclass_vn fclass_tr fclass_id fclass_pl fclass_gr fclass_it
1 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
2 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
3 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
4 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
5 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
6 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
fclass_nl fclass_se fclass_bd fclass_ua geometry
1 <NA> <NA> <NA> <NA> MULTIPOLYGON (((117.7036 4....
2 <NA> <NA> <NA> <NA> MULTIPOLYGON (((117.7036 4....
3 <NA> <NA> <NA> <NA> MULTIPOLYGON (((-69.51009 -...
4 <NA> <NA> <NA> <NA> MULTIPOLYGON (((-69.51009 -...
5 <NA> <NA> <NA> <NA> MULTIPOLYGON (((-69.51009 -...
6 <NA> <NA> <NA> <NA> MULTIPOLYGON (((-67.1939 -2...
Because the countries object is derived from a dataset intended for general purpose cartography, it has a bunch of columns listing the countries’ names in different languages. It also has a variety of regional and income classifications, population and GDP estimates and groupings that you can use for different color palettes to avoid plotting neighboring countries with the same color.
Admittedly, we don’t need most of this stuff, but we get it anyway, because Natural Earth data are intended to be useful for anybody, anywhere, who wants to make some simple maps.
Okay, so let’s have a look at what this looks like:
plot(countries[,c("sov_a3", "geometry")]) # Just picking one value from the table.# Otherwise, R will make a map of up to the first 9 variables - waaaay too much.We’ll be spending more time on vector data when we look at overlay functions next week, so let’s just wrap up for today with one example of the type of analysis you can do with vector and raster data using the terra package.
The pop_abvepa raster shows us the estimated population per pixel for every pixel that is above the US EPA’s recommended PM 2.5 limit. That means that we could add up all the pixels in a country to get an estimated number of people in the country living in areas that are, on average, above the limit.
Here’s how we can compute zonal statistics using the terra package. First, we need to convert our countries object from an sf object to the form of vector data terra uses, which is called a SpatVector:
countries_spatvec <- vect(countries)Now, because pop_abvepa is in orders of magnitude, we need to convert it back into regular numbers:
pop_abvepa_nums <- 10^pop_abvepaAnd now we can sum up those values within countries using the zonal function:
pop_abvepa_countries <- zonal(pop_abvepa_nums, countries_spatvec,
fun = "sum", na.rm=TRUE)
head(pop_abvepa_countries) gpw2015
1 34116802
2 3308483
3 3013887
4 1847165
5 5142253
6 6531246
So this gives you a table with one column: the population estimates corresponding to the order of the rows in countries. But what are those-Inf values? That’s what we get back if we have no pixels corresponding to a zone, so these are countries where the value is zero.
To compare them to the countries, though, we need to add them back into the countries object. Here’s how we can do that:
countries$pop_abvepa <- pop_abvepa_countries$gpw2015The above code creates a new column in the countries object, called pop_abvepa, and copies the pop_avepa_countries vector there.
You can see the new column by running:
head(countries)Simple feature collection with 6 features and 169 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -109.4537 ymin: -55.9185 xmax: 140.9776 ymax: 7.35578
Geodetic CRS: WGS 84
featurecla scalerank labelrank sovereignt sov_a3 adm0_dif level
1 Admin-0 country 0 2 Indonesia IDN 0 2
2 Admin-0 country 0 3 Malaysia MYS 0 2
3 Admin-0 country 0 2 Chile CHL 0 2
4 Admin-0 country 0 3 Bolivia BOL 0 2
5 Admin-0 country 0 2 Peru PER 0 2
6 Admin-0 country 0 2 Argentina ARG 0 2
type tlc admin adm0_a3 geou_dif geounit gu_a3 su_dif
1 Sovereign country 1 Indonesia IDN 0 Indonesia IDN 0
2 Sovereign country 1 Malaysia MYS 0 Malaysia MYS 0
3 Sovereign country 1 Chile CHL 0 Chile CHL 0
4 Sovereign country 1 Bolivia BOL 0 Bolivia BOL 0
5 Sovereign country 1 Peru PER 0 Peru PER 0
6 Sovereign country 1 Argentina ARG 0 Argentina ARG 0
subunit su_a3 brk_diff name name_long brk_a3 brk_name brk_group
1 Indonesia IDN 0 Indonesia Indonesia IDN Indonesia <NA>
2 Malaysia MYS 0 Malaysia Malaysia MYS Malaysia <NA>
3 Chile CHL 0 Chile Chile CHL Chile <NA>
4 Bolivia BOL 0 Bolivia Bolivia BOL Bolivia <NA>
5 Peru PER 0 Peru Peru PER Peru <NA>
6 Argentina ARG 0 Argentina Argentina ARG Argentina <NA>
abbrev postal formal_en formal_fr name_ciawf note_adm0
1 Indo. INDO Republic of Indonesia <NA> Indonesia <NA>
2 Malay. MY Malaysia <NA> Malaysia <NA>
3 Chile CL Republic of Chile <NA> Chile <NA>
4 Bolivia BO Plurinational State of Bolivia <NA> Bolivia <NA>
5 Peru PE Republic of Peru <NA> Peru <NA>
6 Arg. AR Argentine Republic <NA> Argentina <NA>
note_brk name_sort name_alt mapcolor7 mapcolor8 mapcolor9 mapcolor13
1 <NA> Indonesia <NA> 6 6 6 11
2 <NA> Malaysia <NA> 2 4 3 6
3 <NA> Chile <NA> 5 1 5 9
4 <NA> Bolivia <NA> 1 5 2 3
5 <NA> Peru <NA> 4 4 4 11
6 <NA> Argentina <NA> 3 1 3 13
pop_est pop_rank pop_year gdp_md gdp_year economy
1 270625568 17 2019 1119190 2019 4. Emerging region: MIKT
2 31949777 15 2019 364681 2019 6. Developing region
3 18952038 14 2019 282318 2019 5. Emerging region: G20
4 11513100 14 2019 40895 2019 5. Emerging region: G20
5 32510453 15 2019 226848 2019 5. Emerging region: G20
6 44938712 15 2019 445445 2019 5. Emerging region: G20
income_grp fips_10 iso_a2 iso_a2_eh iso_a3 iso_a3_eh iso_n3
1 4. Lower middle income ID ID ID IDN IDN 360
2 3. Upper middle income MY MY MY MYS MYS 458
3 3. Upper middle income CI CL CL CHL CHL 152
4 4. Lower middle income BL BO BO BOL BOL 068
5 3. Upper middle income PE PE PE PER PER 604
6 3. Upper middle income AR AR AR ARG ARG 032
iso_n3_eh un_a3 wb_a2 wb_a3 woe_id woe_id_eh woe_note
1 360 360 ID IDN 23424846 23424846 Exact WOE match as country
2 458 458 MY MYS 23424901 23424901 Exact WOE match as country
3 152 152 CL CHL 23424782 23424782 Exact WOE match as country
4 068 068 BO BOL 23424762 23424762 Exact WOE match as country
5 604 604 PE PER 23424919 23424919 Exact WOE match as country
6 032 032 AR ARG 23424747 23424747 Exact WOE match as country
adm0_iso adm0_diff adm0_tlc adm0_a3_us adm0_a3_fr adm0_a3_ru adm0_a3_es
1 IDN <NA> IDN IDN IDN IDN IDN
2 MYS <NA> MYS MYS MYS MYS MYS
3 CHL <NA> CHL CHL CHL CHL CHL
4 BOL <NA> BOL BOL BOL BOL BOL
5 PER <NA> PER PER PER PER PER
6 ARG <NA> ARG ARG ARG ARG ARG
adm0_a3_cn adm0_a3_tw adm0_a3_in adm0_a3_np adm0_a3_pk adm0_a3_de adm0_a3_gb
1 IDN IDN IDN IDN IDN IDN IDN
2 MYS MYS MYS MYS MYS MYS MYS
3 CHL CHL CHL CHL CHL CHL CHL
4 BOL BOL BOL BOL BOL BOL BOL
5 PER PER PER PER PER PER PER
6 ARG ARG ARG ARG ARG ARG ARG
adm0_a3_br adm0_a3_il adm0_a3_ps adm0_a3_sa adm0_a3_eg adm0_a3_ma adm0_a3_pt
1 IDN IDN IDN IDN IDN IDN IDN
2 MYS MYS MYS MYS MYS MYS MYS
3 CHL CHL CHL CHL CHL CHL CHL
4 BOL BOL BOL BOL BOL BOL BOL
5 PER PER PER PER PER PER PER
6 ARG ARG ARG ARG ARG ARG ARG
adm0_a3_ar adm0_a3_jp adm0_a3_ko adm0_a3_vn adm0_a3_tr adm0_a3_id adm0_a3_pl
1 IDN IDN IDN IDN IDN IDN IDN
2 MYS MYS MYS MYS MYS MYS MYS
3 CHL CHL CHL CHL CHL CHL CHL
4 BOL BOL BOL BOL BOL BOL BOL
5 PER PER PER PER PER PER PER
6 ARG ARG ARG ARG ARG ARG ARG
adm0_a3_gr adm0_a3_it adm0_a3_nl adm0_a3_se adm0_a3_bd adm0_a3_ua adm0_a3_un
1 IDN IDN IDN IDN IDN IDN -99
2 MYS MYS MYS MYS MYS MYS -99
3 CHL CHL CHL CHL CHL CHL -99
4 BOL BOL BOL BOL BOL BOL -99
5 PER PER PER PER PER PER -99
6 ARG ARG ARG ARG ARG ARG -99
adm0_a3_wb continent region_un subregion
1 -99 Asia Asia South-Eastern Asia
2 -99 Asia Asia South-Eastern Asia
3 -99 South America Americas South America
4 -99 South America Americas South America
5 -99 South America Americas South America
6 -99 South America Americas South America
region_wb name_len long_len abbrev_len tiny homepart min_zoom
1 East Asia & Pacific 9 9 5 -99 1 0
2 East Asia & Pacific 8 8 6 -99 1 0
3 Latin America & Caribbean 5 5 5 -99 1 0
4 Latin America & Caribbean 7 7 7 -99 1 0
5 Latin America & Caribbean 4 4 4 -99 1 0
6 Latin America & Caribbean 9 9 4 -99 1 0
min_label max_label label_x label_y ne_id wikidataid name_ar
1 1.7 6.7 101.89295 -0.954404 1159320845 Q252 إندونيسيا
2 3.0 8.0 113.83708 2.528667 1159321083 Q833 ماليزيا
3 1.7 6.7 -72.31887 -38.151771 1159320493 Q298 تشيلي
4 3.0 7.5 -64.59343 -16.666015 1159320439 Q750 بوليفيا
5 2.0 7.0 -72.90016 -12.976679 1159321163 Q419 بيرو
6 2.0 7.0 -64.17333 -33.501159 1159320331 Q414 الأرجنتين
name_bn name_de name_en name_es name_fa name_fr name_el
1 ইন্দোনেশিয়া Indonesien Indonesia Indonesia اندونزی Indonésie Ινδονησία
2 মালয়েশিয়া Malaysia Malaysia Malasia مالزی Malaisie Μαλαισία
3 চিলি Chile Chile Chile شیلی Chili Χιλή
4 বলিভিয়া Bolivien Bolivia Bolivia بولیوی Bolivie Βολιβία
5 পেরু Peru Peru Perú پرو Pérou Περού
6 আর্জেন্টিনা Argentinien Argentina Argentina آرژانتین Argentine Αργεντινή
name_he name_hi name_hu name_id name_it name_ja name_ko
1 אינדונזיה इंडोनेशिया Indonézia Indonesia Indonesia インドネシア 인도네시아
2 מלזיה मलेशिया Malajzia Malaysia Malaysia マレーシア 말레이시아
3 צ'ילה चिली Chile Chili Cile チリ 칠레
4 בוליביה बोलिविया Bolívia Bolivia Bolivia ボリビア 볼리비아
5 פרו पेरू Peru Peru Perù ペルー 페루
6 ארגנטינה अर्जेण्टीना Argentína Argentina Argentina アルゼンチン 아르헨티나
name_nl name_pl name_pt name_ru name_sv name_tr name_uk
1 Indonesië Indonezja Indonésia Индонезия Indonesien Endonezya Індонезія
2 Maleisië Malezja Malásia Малайзия Malaysia Malezya Малайзія
3 Chili Chile Chile Чили Chile Şili Чилі
4 Bolivia Boliwia Bolívia Боливия Bolivia Bolivya Болівія
5 Peru Peru Peru Перу Peru Peru Перу
6 Argentinië Argentyna Argentina Аргентина Argentina Arjantin Аргентина
name_ur name_vi name_zh name_zht fclass_iso tlc_diff
1 انڈونیشیا Indonesia 印度尼西亚 印度尼西亞 Admin-0 country <NA>
2 ملائیشیا Malaysia 马来西亚 馬來西亞 Admin-0 country <NA>
3 چلی Chile 智利 智利 Admin-0 country <NA>
4 بولیویا Bolivia 玻利维亚 玻利維亞 Admin-0 country <NA>
5 پیرو Peru 秘鲁 秘魯 Admin-0 country <NA>
6 ارجنٹائن Argentina 阿根廷 阿根廷 Admin-0 country <NA>
fclass_tlc fclass_us fclass_fr fclass_ru fclass_es fclass_cn fclass_tw
1 Admin-0 country <NA> <NA> <NA> <NA> <NA> <NA>
2 Admin-0 country <NA> <NA> <NA> <NA> <NA> <NA>
3 Admin-0 country <NA> <NA> <NA> <NA> <NA> <NA>
4 Admin-0 country <NA> <NA> <NA> <NA> <NA> <NA>
5 Admin-0 country <NA> <NA> <NA> <NA> <NA> <NA>
6 Admin-0 country <NA> <NA> <NA> <NA> <NA> <NA>
fclass_in fclass_np fclass_pk fclass_de fclass_gb fclass_br fclass_il
1 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
2 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
3 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
4 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
5 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
6 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
fclass_ps fclass_sa fclass_eg fclass_ma fclass_pt fclass_ar fclass_jp
1 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
2 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
3 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
4 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
5 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
6 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
fclass_ko fclass_vn fclass_tr fclass_id fclass_pl fclass_gr fclass_it
1 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
2 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
3 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
4 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
5 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
6 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
fclass_nl fclass_se fclass_bd fclass_ua geometry
1 <NA> <NA> <NA> <NA> MULTIPOLYGON (((117.7036 4....
2 <NA> <NA> <NA> <NA> MULTIPOLYGON (((117.7036 4....
3 <NA> <NA> <NA> <NA> MULTIPOLYGON (((-69.51009 -...
4 <NA> <NA> <NA> <NA> MULTIPOLYGON (((-69.51009 -...
5 <NA> <NA> <NA> <NA> MULTIPOLYGON (((-69.51009 -...
6 <NA> <NA> <NA> <NA> MULTIPOLYGON (((-67.1939 -2...
pop_abvepa
1 34116802
2 3308483
3 3013887
4 1847165
5 5142253
6 6531246
Now we can even use that column to create a new column with the values in orders of magnitue:
countries$pop_abvepa_log10 <- log(countries$pop_abvepa, base=10)Now let’s look at the result:
plot(countries[,c("pop_abvepa_log10", "geometry")])We also can export geospatial layer from R for further analysis or visualization in QGIS or other software:
#TO EXPORT terra RASTER OBJECTS:
writeRaster(pop_abvepa, filename="Pop_Above_EPA_Lims.tif", overwrite=TRUE)
#TO EXPORT sf VECTOR OBJECTS
st_write(countries, "Pop_Above_EPA_Lims.gpkg", append=FALSE)Deleting layer `Pop_Above_EPA_Lims' using driver `GPKG'
Writing layer `Pop_Above_EPA_Lims' to data source
`Pop_Above_EPA_Lims.gpkg' using driver `GPKG'
Writing 258 features with 170 fields and geometry type Multi Polygon.