Other things we can do with PBDB and R

Here’s another example of how we can use the PBDB to look at how bivalves have changed the depth of their burrowing over the Phanerozoic. We’re going to download data, and this time we’re adding some terms at the end, including ecospace. Once you download this data, take a look at it over in Environment so you can see the extra columns you’ve downloaded in addition to the regular occurrence data.

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.4.4     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

bivalve_eco<-read.csv("https://paleobiodb.org/data1.2/occs/list.csv?base_name=Bivalvia&taxon_reso=family&interval=Devonian,Jurassic&show=class,ecospace,timebins")

Once you run this code yourself, make sure to click on the bivalve_eco Data over in Global Environment and see what it looks like. Scroll over to the right to see all the columns that you’ve downloaded.

So now we can look at how life habits have changed over time. One way to do this is to group by the max_ma of each occurrence, and then count how many life habit are in each category and each time bin.

bivalve_eco_summary<-bivalve_eco %>% group_by(early_interval, max_ma)%>% dplyr::count(life_habit)

This is sort of hard to interpret, however, so we can use a plot to make it easier to see what’s going on. Here, I’ve done the same code as above, and then added a ggplot pipe to the end. My x axis is just max_ma, my y axis is the count of how many occurences are in each life habit, and I’ve color coded it by life habit so we can see all the different ones more clearly.

bivalve_eco_summary %>% ggplot(aes(x=max_ma, y=n, colour=life_habit))+geom_point()+scale_x_reverse()

This is cool, but there is still a lot to look at here and it’s hard to pull apart a single life habit. So we can do one more step, a filter, and specify the life habit we want to look at:

bivalve_eco_summary %>% filter(life_habit=="shallow infaunal") %>% ggplot(aes(x=max_ma, y=n, colour=life_habit))+geom_point()+scale_x_reverse()

Much nicer!!

But what if we want to see how the relative proportion of life habits has changed over time? For that, we can use a 100% stacked bar plot - see the code below.

bivalve_eco_summary %>% ggplot(aes(x=max_ma, fill=life_habit))+geom_bar(position="fill")+scale_x_reverse()

Hmm - this is sort of hard to see, in part because our x-axis scale is continuous, so our bars are too narrow! If we use stage name instead of max_ma, that will help, BUT we need to make sure that these are in the right order! So below we use the commant fct_reorder to reorder our early_interval factor by max_ma, and in reverse order so that the present is towards the right.

bivalve_eco_summary %>% ggplot(aes(x=fct_reorder(early_interval, max_ma, .desc=TRUE), fill=life_habit))+geom_bar(position="fill")

That’s better! But what if we are only interested in a limited time range? We can filter by max_ma to just look at 100 million years, see below:

bivalve_eco_summary %>% filter(max_ma<300, max_ma>200) %>% ggplot(aes(x=fct_reorder(early_interval, max_ma, .desc=TRUE), fill=life_habit))+geom_bar(position="fill")

Even better! But there are a lot of life habits, and some of them are pretty similar - can we combining variables to have fewer? First we need to see all the factor categories in a column

unique(bivalve_eco_summary$life_habit)

##  [1] "boring"                                  
##  [2] "deep infaunal"                           
##  [3] "epifaunal"                               
##  [4] "infaunal"                                
##  [5] "low-level epifaunal"                     
##  [6] "semi-infaunal"                           
##  [7] "epifaunal, gregarious"                   
##  [8] "shallow infaunal"                        
##  [9] "nektobenthic"                            
## [10] "intermediate-level epifaunal, gregarious"

Now we want to combine factors to have fewer variables - here, I’ve just combined the epifaunal ones into one larger category using the function fct_collapse

bivalve_eco_summary$life_habit<-fct_collapse(bivalve_eco_summary$life_habit,
  epifaunal = c("epifaunal", "low-level epifaunal", "intermediate-level epifaunal, gregarious", "epifaunal, gregarious"))

Now when we plot, we can see a bit more of what’s going on

bivalve_eco_summary %>% filter(max_ma<300, max_ma>200) %>% ggplot(aes(x=fct_reorder(early_interval, max_ma, .desc=TRUE), fill=life_habit))+geom_bar(position="fill")

It’s pretty hard to see the X-axis! So we need to rotate the axes so that we can see better:

bivalve_eco_summary %>% filter(max_ma<300, max_ma>200) %>% ggplot(aes(x=fct_reorder(early_interval, max_ma, .desc=TRUE), fill=life_habit))+geom_bar(position="fill")+theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))