| Your written answers to the questions can be brief, but be prepared to discuss your findings in detail during your tutorial meeting. You may work with your partner but you are not required to, you can submit one joint problem set or individual ones. |
First, if you haven’t installed the package tidyverse yet, you’ll need to do that. Go to the top bar and go to Tools->Install Packages and in the serach box type tidyverse then click install.
Next up, we have to load the package tidyverse (pre-written packages of code of that we’ll use to analyze and plot our data)
library(tidyverse)
I’m going to walk you through how to get data from the PBDB and analyze it - after you walk through the following steps, you’ll then go back and adjust the code (as per my instructions below) to do new / different analyses, and answer the questions I pose below.
We’re going to look at all the available fossil data from the Ordovician-Silurian and we’ll use the PBDB diversity download interface to make this a little easier. We’re going to do this at the level of genera. Take a look at the URL below - the URL code contains all the information we’re asking PBDB for - diversity data, at the level of genera, and in the Cambrian-Ordovician time intervals.
URLs need to be entered into R with NO breaks, i.e., spaces or returns - if your data aren’t showing up in the “global environment” panel at the upper right corner of R Studio, then your data hasn’t been loaded in - check the URL and make sure it is a continuous line with no breaks
genera<-read.csv("https://paleobiodb.org/data1.2/occs/diversity.csv?count=genera&interval=Cambrian,Ordovician")
Once you have this loaded in, go take a look at the data (over in global environment) and note how the data is organized, what the column headers are, etc.
• Interval_no: this is just a PBDB code which you can basically ignore.
• Interval_name: the time interval - you downloaded data at the Epoch level
• max_ma: the maximum age of the time interval in millions of years
• min_ma: the minimum age of the time interval in millions of years
• X_Ft: number of taxa crossing a top stratigraphic boundary and first appearing in interval
• X_bL: number of taxa crossing a bottom stratigraphic boundary and last appearing in interval
• X_FL: number of taxa with first and last appearance within the interval
• X_bt: number of taxa crossing both bottom and top interval boundaries
• sampled_in_bin: number of unique taxa sampled in each time bin
• n_occs: this is the number of fossil occurrences in each time bin — there may be multiple occurrences for a single taxon.
Now we need to make some new columns. I’ll show you how to make one, and you can do the rest. Here’s the code for making a new column that’s the midpoint between the min and max ages for each time bin in your data.
genera<-genera %>% mutate(mean_age=((max_ma+min_ma)/2))
mutate is a command that makes a new variable out of existing variables - in this case, the average between two values to get a midpoint.
Again, I’ll remind you how to make a figure of these data, then you can use this code to help you make the rest of your figures.
Here, I’ll show you how to make a figure of the variable Sampled in Bin which is basically just how many genera are found in each time bin (no range through assumed, etc)
p<-ggplot(genera, aes(x=mean_age, y=sampled_in_bin))+geom_point()+scale_x_reverse()+geom_line()
p
If we want to add in some lines to help us remember where the major
extinctions / Period boundaries are we can do it like this (this line is
the Cambrian-Ordovician boundary, go to https://stratigraphy.org/timescale/ to see them all)
p<-p+geom_vline(xintercept=485, size=1, alpha=.3)
p
This is the global data - but what about what Finnegan et al looked at, which was just North America? To do that, we need to add additional code to the URL above. The easiest way to get the right URL is to head to https://paleobiodb.org/classic/displayDownloadGenerator and fill out the form with the right time periods and geographic locations.
Check off “diversity over time” under “what do you want to download”, then make sure taxonomic resolution is set to genera (no need to put anything in the taxa to include box - leaving it empty will download everything), then under select by time put in your starting and ending Periods (i.e., Ordovician and Silurian) and put temporal resoluton to ‘stage’ which is the best we can do in the Paleozoic. Then in location, you can select a specific continent, or even a country! Last thing to do is UNCLICK “Include metadata at the beginning of the output” at the bottom. Now scroll back up, copy the URL that has been generated, and replace the URL in the code above with the new one, making sure to keep the quotes around it in the code. Once you get the hang of this, you can edit the URL directly instead of copying and pasting.
Are the similar? Different? How? Would this affect the results of Finnegan et al 2012? Does it change your thoughts on the validity of their analyses?
Diversity for Phylum Cnidaria Diversity for Phylum Chordata Diversity for two other clades (taxonomic groups - can be phyla, class, or orders – but probably don’t go lower than orders) of interest to you (go back to the readings to think about whcih ones you’d like to examine)
Now we’re going to shift gears away from looking at diversity to looking at occurrence data - occurence data will give you a spreadsheet where each row is a fossil occurence (a fossil someone found and described). Download occurrence data for the Eifelian (Middle Devonian) to Tournaisian (early Carboniferous). Do this just for North America, include all taxa and lump by genus, and under “choose output options” click “ecospace”, “classification”, “time binning” and “paleoenvironment” – you MUST unclick “include metadata at the beginning of the output” under “choode output options” or it will include header rows that will confuse R. Grab the URL and load it in!
The example I’m giving you here will just look at bivalves - walk through it and then answer the questions below.
bivalve_eco<-read.csv("https://paleobiodb.org/data1.2/occs/list.csv?base_name=Bivalvia&taxon_reso=genus&interval=Eifelian,Tournaisian&cc=NOA&show=class,ecospace,timebins,env")
Click on ‘bivalve_eco’ over in the Environment panel and scroll over to take a look at all the column headers to see what kind of information has been downloaded.
So now we can look at how life habits (how organims live, where they live, what they do) have changed over time. One way to do this is to group by the max_ma of each occurrence, and then count how many life habit are in each category and each time bin.
bivalve_eco %>% group_by(max_ma) %>% count(life_habit)
## # A tibble: 53 × 3
## # Groups: max_ma [13]
## max_ma life_habit n
## <dbl> <chr> <int>
## 1 354. "" 8
## 2 354. "deep infaunal" 1
## 3 354. "epifaunal" 4
## 4 354. "infaunal" 11
## 5 359. "" 23
## 6 359. "deep infaunal" 1
## 7 359. "epifaunal" 2
## 8 359. "infaunal" 14
## 9 359. "low-level epifaunal" 2
## 10 365. "" 36
## # … with 43 more rows
This is sort of hard to interpret, however, so we can use a plot to make it easier to see what’s going on. Here, I’ve done the same code as above, and then added a ggplot pipe to the end. My x axis is just max_ma, my y axis is the count of how many occurences are in each life habit, and I’ve color coded it by life habit so we can see all the different ones more clearly.
bivalve_eco %>% group_by(max_ma) %>% count(life_habit) %>% ggplot(aes(x=max_ma, y=n, colour=life_habit))+geom_point()+scale_x_reverse()
This is cool, but there is still a lot to look at here and it’s hard to
pull apart a single life habit. So we can do one more step, a
filter, and specify the life habit we want to look at, in this
case, the deep infaunal habitat.
bivalve_eco %>% group_by(max_ma) %>% count(life_habit) %>% filter(life_habit=="deep infaunal") %>% ggplot(aes(x=max_ma, y=n, colour=life_habit))+geom_point()+scale_x_reverse()
Much nicer!! However, this makes it hard to see how the relative
proportions of different life habits have changed over time. To do this,
we’ll bin the data by the time_bin variable, which is at the stage
level. This code below will order the time_bins from oldest to youngest
so they’ll plot correctly. The code tells R to order the factor (a
categorical variable) time_bins by the variable max_ma, and then reverse
the order so they plot in the correct order with the oldest on the
left.
bivalve_eco$time_bins<-(fct_reorder(bivalve_eco$time_bins, bivalve_eco$max_ma, max,.desc=TRUE))
Next, we’ll get rid of the rows that don’t specify any ecological information and don’t have a time bin specified either
bivalve_eco <- bivalve_eco %>%
filter(life_habit != "") %>%
filter(time_bins !="") %>%
droplevels()
Now we plot! First we will group by time bins and count the number of taxa in each life habit in each time bin. Then, we plot, and we’re making a stacked proportional bar plot here to show relative proportions of different types of ecological modes. Lastly, we rotate the x-axis labels 90 degrees so that we can read them!
bivalve_eco %>% group_by(time_bins) %>% count(life_habit)%>%
ggplot(aes(fill=life_habit, y=n, x=time_bins)) +
geom_bar(position="fill", stat="identity")+
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
**If for some reason the ordering isn’t working for you using the coding
above, try this - add %>% mutate(time_bins = fct_reorder(time_bins,
desc(max_ma))) %>% into your ggplot code ) - so it’ll look like
bivalve_eco %>% mutate(time_bins = fct_reorder(time_bins, desc(max_ma))) %>% group_by(time_bins) %>% count(life_habit)%>% ggplot(aes(fill=life_habit, y=n, x=time_bins)) + geom_bar(position=“fill”, stat=“identity”)+ theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))