R and Tidyverse

In this lab and subsequent labs, we’ll be using a set of R packages called tidyverse. Packages are bunches of code written by people that do useful things. The code below calls them up (they’re already installed for you, but you need to tell R you want to use them). Tidyverse contains packages that help you make pretty graphs, organize data, and more.

knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.4.4     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Tidyverse packages all use what are called pipes - basically, it’s this set of symbols %>%. What a pipe does is it says “take this data, then do this thing to it, then do this thing to it”. The %>% is basically a way of saying “then do…”

As we move through the semester we’ll learn more about specific functions. You’ll use a few below like group_by, summarise, and count.

Get some data

First, we need some data! We’re going to call data from the PBDB using this handy dandy url data grabber. It’s going to get us the data we want. Try copying and pasting this URL into a browswer. What happens? We could save this file and upload it, or we could just let R do the work for us.

We’re going to look at all genera within the family Canidae (doggies)

doggies<-read.csv("https://paleobiodb.org/data1.2/occs/list.csv?&base_name=Canidae&taxon_reso=lump_genus")

So now we have a data frame called doggies Go take a look at it (under Environment, click on “doggies”). What do you see. A LOT of data, like what we had to deal with in Excel. We’re going to use R to help us wrangle this data with a little more ease (hopefully). The added benefit vs Excel is that we can swap out the url above and do this for ANY other set of data from the PBDB by just running the same code, which is basically magic.

Organize that data

Now we want to group by name, so we can learn about each genus as a whole, and then summarize the resulting data by a new variable we’ll call ‘ext’ for extant, which asks us to organize the data by the minimum min_ma for each name, i.e., go through all the data, and find the youngest (min) minimum age for any occurence of a name, and tell me what it is:

living<-doggies %>% group_by(accepted_name) %>% summarise(ext=min(min_ma))
living
## # A tibble: 66 × 2
##    accepted_name     ext
##    <chr>           <dbl>
##  1 Aelurodon      3.6   
##  2 Aenocyon       0.0117
##  3 Alopex         0.0117
##  4 Archaeocyon   18.5   
##  5 Borophagus     1.4   
##  6 Caedocyon     18.5   
##  7 Canis          0     
##  8 Carpocyon      4.7   
##  9 Cerdocyon      0     
## 10 Chailicyon    37.7   
## # ℹ 56 more rows

This gives the minimum min_ma for each accepted_name. If we add one more pipe and count how many of these minimum min_ma’s are exactly equal to zero, we can figure out exactly how many genera in Canidae are still alive today!

count_living<-doggies %>% group_by(accepted_name) %>% summarise(ext=min(min_ma)) %>%count(ext==0)
count_living
## # A tibble: 2 × 2
##   `ext == 0`     n
##   <lgl>      <int>
## 1 FALSE         50
## 2 TRUE          16

OK so next we not only want to know who’s alive and who’s not, but we want to know how long each genus lived for and when – it’s range. In order to do this, we need to get the max and min for each genus, and then combine them in a new dataframe.

minage<-doggies %>% group_by(accepted_name) %>% summarise(min_age=min(min_ma))
 
maxage<-doggies %>% group_by(accepted_name) %>% summarise(max_age=max(max_ma))
 
dogrange<-cbind(minage, maxage)

dogrange
##      accepted_name min_age   accepted_name max_age
## 1        Aelurodon  3.6000       Aelurodon 23.0300
## 2         Aenocyon  0.0117        Aenocyon  3.6000
## 3           Alopex  0.0117          Alopex  5.3330
## 4      Archaeocyon 18.5000     Archaeocyon 31.8000
## 5       Borophagus  1.4000      Borophagus 23.0300
## 6        Caedocyon 18.5000       Caedocyon 29.5000
## 7            Canis  0.0000           Canis 29.5000
## 8        Carpocyon  4.7000       Carpocyon 16.3000
## 9        Cerdocyon  0.0000       Cerdocyon  9.4000
## 10      Chailicyon 37.7100      Chailicyon 56.0000
## 11      Chrysocyon  0.0000      Chrysocyon  4.7000
## 12       Cormocyon  5.3330       Cormocyon 41.2000
## 13        Cubacyon  0.0000        Cubacyon  0.1290
## 14            Cuon  0.0117            Cuon 16.3000
## 15    Cynarctoides 12.5000    Cynarctoides 29.5000
## 16       Cynarctus  4.7000       Cynarctus 23.0300
## 17      Cynodesmus 18.5000      Cynodesmus 31.8000
## 18     Cynotherium  0.0117     Cynotherium  3.6000
## 19       Desmocyon 16.3000       Desmocyon 29.5000
## 20        Dusicyon  0.0000        Dusicyon  3.6000
## 21     Ectopocynus 16.3000     Ectopocynus 31.8000
## 22     Enhydrocyon 18.5000     Enhydrocyon 31.8000
## 23         Epicyon  4.7000         Epicyon 23.0300
## 24          Eucyon  0.7740          Eucyon 12.5000
## 25      Euoplocyon 12.5000      Euoplocyon 18.5000
## 26       Ferrucyon  1.4000       Ferrucyon  4.7000
## 27     Hesperocyon 18.5000     Hesperocyon 46.2000
## 28       Hydrocyon  0.0000       Hydrocyon  0.0117
## 29       Leptocyon  4.7000       Leptocyon 29.5000
## 30       Lupulella  0.0000       Lupulella  5.3330
## 31           Lupus  0.7740           Lupus  2.5800
## 32       Lycalopex  0.0000       Lycalopex 23.0300
## 33          Lycaon  0.0000          Lycaon  3.6000
## 34         Lycorus  0.0117         Lycorus  2.5800
## 35        Mesocyon 16.3000        Mesocyon 33.9000
## 36       Metalopex  4.7000       Metalopex 18.5000
## 37   Metatomarctus  5.3330   Metatomarctus 23.0300
## 38  Microtomarctus  5.3330  Microtomarctus 18.5000
## 39        Nurocyon  3.1000        Nurocyon  5.3330
## 40     Nyctereutes  0.0000     Nyctereutes 11.6300
## 41      Osbornodon 12.5000      Osbornodon 33.9000
## 42       Otarocyon 18.5000       Otarocyon 37.0000
## 43         Otocyon  0.0000         Otocyon  3.6000
## 44       Oxetocyon 29.5000       Oxetocyon 31.8000
## 45   Paracynarctus  5.3330   Paracynarctus 23.0300
## 46 Paraenhydrocyon 18.5000 Paraenhydrocyon 29.5000
## 47   Paratomarctus  4.7000   Paratomarctus 23.0300
## 48       Philotrox 18.5000       Philotrox 29.5000
## 49       Phlaocyon 16.3000       Phlaocyon 31.8000
## 50  Prohesperocyon 33.9000  Prohesperocyon 37.0000
## 51     Protepicyon 12.5000     Protepicyon 16.3000
## 52       Protocyon  0.0000       Protocyon  3.6000
## 53    Protomarctus 12.5000    Protomarctus 20.4400
## 54     Prototocyon  0.0117     Prototocyon  3.6000
## 55     Psalidocyon 12.5000     Psalidocyon 16.3000
## 56       Rhizocyon 24.7000       Rhizocyon 29.5000
## 57       Schaeffia  0.0000       Schaeffia  3.6000
## 58        Sinicuon  0.7740        Sinicuon  5.3330
## 59        Speothos  0.0000        Speothos  2.5800
## 60    Sunkahetanka 18.5000    Sunkahetanka 29.5000
## 61      Tephrocyon 12.5000      Tephrocyon 18.5000
## 62    Theriodictis  0.0117    Theriodictis  4.7000
## 63       Tomarctus  9.4000       Tomarctus 18.5000
## 64         Urocyon  0.0000         Urocyon 12.5000
## 65          Vulpes  0.0000          Vulpes 12.5000
## 66        Xenocyon  0.0117        Xenocyon  5.3330

We have an annoying double row for names, so we’ll delete that

dogrange<-dogrange[,-3]

Next we want to add an age range column, which is just the max age minus the min age. Mutate gives us a new column we’ll call range

dogrange<-dogrange %>% mutate(range = max_age - min_age)

Now we’ll make another new column of the mean age for plotting purposes, which is just the average between the max and min age.

dogrange<-dogrange %>% mutate(mean=((max_age+min_age)/2))
dogrange
##      accepted_name min_age max_age   range     mean
## 1        Aelurodon  3.6000 23.0300 19.4300 13.31500
## 2         Aenocyon  0.0117  3.6000  3.5883  1.80585
## 3           Alopex  0.0117  5.3330  5.3213  2.67235
## 4      Archaeocyon 18.5000 31.8000 13.3000 25.15000
## 5       Borophagus  1.4000 23.0300 21.6300 12.21500
## 6        Caedocyon 18.5000 29.5000 11.0000 24.00000
## 7            Canis  0.0000 29.5000 29.5000 14.75000
## 8        Carpocyon  4.7000 16.3000 11.6000 10.50000
## 9        Cerdocyon  0.0000  9.4000  9.4000  4.70000
## 10      Chailicyon 37.7100 56.0000 18.2900 46.85500
## 11      Chrysocyon  0.0000  4.7000  4.7000  2.35000
## 12       Cormocyon  5.3330 41.2000 35.8670 23.26650
## 13        Cubacyon  0.0000  0.1290  0.1290  0.06450
## 14            Cuon  0.0117 16.3000 16.2883  8.15585
## 15    Cynarctoides 12.5000 29.5000 17.0000 21.00000
## 16       Cynarctus  4.7000 23.0300 18.3300 13.86500
## 17      Cynodesmus 18.5000 31.8000 13.3000 25.15000
## 18     Cynotherium  0.0117  3.6000  3.5883  1.80585
## 19       Desmocyon 16.3000 29.5000 13.2000 22.90000
## 20        Dusicyon  0.0000  3.6000  3.6000  1.80000
## 21     Ectopocynus 16.3000 31.8000 15.5000 24.05000
## 22     Enhydrocyon 18.5000 31.8000 13.3000 25.15000
## 23         Epicyon  4.7000 23.0300 18.3300 13.86500
## 24          Eucyon  0.7740 12.5000 11.7260  6.63700
## 25      Euoplocyon 12.5000 18.5000  6.0000 15.50000
## 26       Ferrucyon  1.4000  4.7000  3.3000  3.05000
## 27     Hesperocyon 18.5000 46.2000 27.7000 32.35000
## 28       Hydrocyon  0.0000  0.0117  0.0117  0.00585
## 29       Leptocyon  4.7000 29.5000 24.8000 17.10000
## 30       Lupulella  0.0000  5.3330  5.3330  2.66650
## 31           Lupus  0.7740  2.5800  1.8060  1.67700
## 32       Lycalopex  0.0000 23.0300 23.0300 11.51500
## 33          Lycaon  0.0000  3.6000  3.6000  1.80000
## 34         Lycorus  0.0117  2.5800  2.5683  1.29585
## 35        Mesocyon 16.3000 33.9000 17.6000 25.10000
## 36       Metalopex  4.7000 18.5000 13.8000 11.60000
## 37   Metatomarctus  5.3330 23.0300 17.6970 14.18150
## 38  Microtomarctus  5.3330 18.5000 13.1670 11.91650
## 39        Nurocyon  3.1000  5.3330  2.2330  4.21650
## 40     Nyctereutes  0.0000 11.6300 11.6300  5.81500
## 41      Osbornodon 12.5000 33.9000 21.4000 23.20000
## 42       Otarocyon 18.5000 37.0000 18.5000 27.75000
## 43         Otocyon  0.0000  3.6000  3.6000  1.80000
## 44       Oxetocyon 29.5000 31.8000  2.3000 30.65000
## 45   Paracynarctus  5.3330 23.0300 17.6970 14.18150
## 46 Paraenhydrocyon 18.5000 29.5000 11.0000 24.00000
## 47   Paratomarctus  4.7000 23.0300 18.3300 13.86500
## 48       Philotrox 18.5000 29.5000 11.0000 24.00000
## 49       Phlaocyon 16.3000 31.8000 15.5000 24.05000
## 50  Prohesperocyon 33.9000 37.0000  3.1000 35.45000
## 51     Protepicyon 12.5000 16.3000  3.8000 14.40000
## 52       Protocyon  0.0000  3.6000  3.6000  1.80000
## 53    Protomarctus 12.5000 20.4400  7.9400 16.47000
## 54     Prototocyon  0.0117  3.6000  3.5883  1.80585
## 55     Psalidocyon 12.5000 16.3000  3.8000 14.40000
## 56       Rhizocyon 24.7000 29.5000  4.8000 27.10000
## 57       Schaeffia  0.0000  3.6000  3.6000  1.80000
## 58        Sinicuon  0.7740  5.3330  4.5590  3.05350
## 59        Speothos  0.0000  2.5800  2.5800  1.29000
## 60    Sunkahetanka 18.5000 29.5000 11.0000 24.00000
## 61      Tephrocyon 12.5000 18.5000  6.0000 15.50000
## 62    Theriodictis  0.0117  4.7000  4.6883  2.35585
## 63       Tomarctus  9.4000 18.5000  9.1000 13.95000
## 64         Urocyon  0.0000 12.5000 12.5000  6.25000
## 65          Vulpes  0.0000 12.5000 12.5000  6.25000
## 66        Xenocyon  0.0117  5.3330  5.3213  2.67235

Now let’s say we want to know how many dog genera are around today, and also have a fossil record that’s older than the Quaternary.

dogrange%>% count(min_age== 0 & max_age >2.58)
##   min_age == 0 & max_age > 2.58  n
## 1                         FALSE 53
## 2                          TRUE 13

Cool! So now we know how many living genera have been around for at least 2.58 million years.

Figures

Let’s try to make a figure of these ranges now. If we graph it now, it won’t be in the right order. We want it to look nice so that the ranges line up according to oldest to youngest. In order to do this, we have to do some annoying R stuff that I’m not going to get into, the code is below if you want to check it out.

dogrange$accepted_name <- factor(dogrange$accepted_name, levels = dogrange$accepted_name[order(dogrange$min_age)])

This gets it in the right order! Now we can plot our ranges using ggplot. The dataframe we’re calling is dogrange, and we want the x-axis to be time, which we’re using mean for. We want the names (each genus) along the y-axis. In order to get ranges we have to do something a little fiddly which is to pretend they are actually error bars, and use that part of ggplot to make this work, just telling it that our error bars go from our max to our min (centered around our mean, which is our x-value!). As paleontologists, we like to put the present on the right, so we’ll flip the axis here too.

ggplot(dogrange, aes(x=mean, y=accepted_name))+geom_point(shape=1, size=.1)+geom_errorbarh(aes(xmin = min_age,xmax = max_age))+scale_x_reverse()

Try it with Cetaceans!

Now it’s your turn – do the same things for our Cetacean example! All you have to do to start is go up to the url above and change the text “Canidae” to “Cetacea”. Type the code out - do not copy and paste from this document

Read the lab to see what you need to do next and what to turn in.