In this lab and subsequent labs, we’ll be using a set of R packages called tidyverse. Packages are bunches of code written by people that do useful things. The code below calls them up (they’re already installed for you, but you need to tell R you want to use them). Tidyverse contains packages that help you make pretty graphs, organize data, and more. We also need another package to help us import data from the Macrostrat API.
DO NOT COPY AND PASTE THE CODE BELOW - TYPE IT OUT! This seems silly, but the typing part really makes a difference…
library(RCurl)
library(tidyverse)
You’ve already become familiar with how to look at data from Macrostrat using the API via a regular URL in a web broswer. Now we’ll learn how to directly import those data into R in one tidy step. Note below, that we’ve specified the format as csv, and then we use a separate command to move that data into a nice R data table. Click on “liths” in your Global Environment panel on the right of your Rstudio screen and see what it looks like. Below I’ve also made it into a “tibble” which is basically just a nice way of looking at big data frames.
x=getURL("https://macrostrat.org/api/defs/lithologies?all&format=csv&response=long")
liths=read.csv(text=x)
as_tibble(liths)
Earlier I asked you how many different rock names are in Macrostrat’s lithology definitions route. Answering this question is SUPER easy now that we have the data in R:
count(liths)
We could instead want to know how many rocks there are in each type (bigger category) by using pipes. Tidyverse packages all use what are called pipes - basically, it’s this set of symbols %>%. What a pipe does is it says “take this data, then do this thing to it, then do this thing to it”. The %>% is basically a way of saying “then do…”, below, we’re saying, take the liths dataframe, then count how many things are in each type category.
liths %>% count(type)
For part 2, I asked you to pick two different sedimentary lithologies/groups of lithologies in Macrostrat. I’m going to use “wackestone” as an example here, in part because it doesn’t fit the criterea of having been assigned to at least 40 units.
x=getURL("https://macrostrat.org/api/units?&project_id=1,7&lith=wackestone&format=csv&response=long")
wack=read.csv(text=x)
as_tibble(wack)
Now we can look at some features of this lithology. For example, what environments is this lithology found in and how many occurrences are there in each environment?
wack %>% count(environ)
Let’s visualize this now with a ggplot figure. Note that geom_bar basically does the counting for you that we used ‘count’ from before so we don’t need to ask it to count again.
wack %>% ggplot(aes (x=environ))+geom_bar()
Another question we can ask is how is this lithology distributed through time? Is it even? Uneven? Why? To do this, we can create a histogram of the bottom (oldest) age of units containing wackestone. This will give us a sense of how it is distributed in time. To keep the convention of the present on the right, we need to use another piece of code to flip the axes.
wack %>% ggplot(aes (x=b_age))+geom_histogram(binwidth=100)+scale_x_reverse()
Now let’s think about paleogeography - where are the units that contain wackestone now, and where did they used to be, and how far have they moved? There’s many ways to look at this, we’ll just try a simple one now, which involves looking at the difference in latitude between the current location of the unit and the paleo-location of the unit (we’ll use the location of the bottom of the unit here). Since lat and long have positive and negative values, we’ll need to use the absolute value of the difference in order to compare properly. See the code below - we’re remaking the dataframe ‘wack’ with a new variable called ‘latdiff’ which we’re defining as the absolute value of the difference between the paleo-latitude and the current one. Mutate is the command for making new columns that are some arithmatic process of other colums.
wack<-wack %>% mutate(latdiff=abs(b_plat-clat))
Once we run this code, re-open ‘wack’, scroll to the right, and see the new variable colum we’ve created! Looking at just the numbers is hard, so let’s make another figure comparing ‘latdiff’ with ‘b_age’
wack %>% ggplot(aes (x=b_age, y=latdiff))+geom_point()+scale_x_reverse()