In this lab and subsequent labs, we’ll be using a set of R packages called tidyverse. Packages are bunches of code written by people that do useful things. The code below calls them up (they’re already installed for you, but you need to tell R you want to use them). Tidyverse contains packages that help you make pretty graphs, organize data, and more.
knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.4.4 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Tidyverse packages all use what are called pipes - basically, it’s this set of symbols %>%. What a pipe does is it says “take this data, then do this thing to it, then do this thing to it”. The %>% is basically a way of saying “then do…”
As we move through the semester we’ll learn more about specific functions. You’ll use a few below like group_by, summarise, and count.
First, we need some data! We’re going to call data from the PBDB using this handy dandy url data grabber. It’s going to get us the data we want. Try copying and pasting this URL into a browswer. What happens? We could save this file and upload it, or we could just let R do the work for us.
We’re going to look at all genera within the family Canidae (doggies)
doggies<-read.csv("https://paleobiodb.org/data1.2/occs/list.csv?&base_name=Canidae&taxon_reso=lump_genus")
So now we have a data frame called doggies Go take a look at it (under Environment, click on “doggies”). What do you see. A LOT of data, like what we had to deal with in Excel. We’re going to use R to help us wrangle this data with a little more ease (hopefully). The added benefit vs Excel is that we can swap out the url above and do this for ANY other set of data from the PBDB by just running the same code, which is basically magic.
Now we want to group by name, so we can learn about each genus as a whole, and then summarize the resulting data by a new variable we’ll call ‘ext’ for extant, which asks us to organize the data by the minimum min_ma for each name, i.e., go through all the data, and find the youngest (min) minimum age for any occurence of a name, and tell me what it is:
living<-doggies %>% group_by(accepted_name) %>% summarise(ext=min(min_ma))
living
## # A tibble: 66 × 2
## accepted_name ext
## <chr> <dbl>
## 1 Aelurodon 3.6
## 2 Aenocyon 0.0117
## 3 Alopex 0.0117
## 4 Archaeocyon 18.5
## 5 Borophagus 1.4
## 6 Caedocyon 18.5
## 7 Canis 0
## 8 Carpocyon 4.7
## 9 Cerdocyon 0
## 10 Chailicyon 37.7
## # ℹ 56 more rows
This gives the minimum min_ma for each accepted_name. If we add one more pipe and count how many of these minimum min_ma’s are exactly equal to zero, we can figure out exactly how many genera in Canidae are still alive today!
count_living<-doggies %>% group_by(accepted_name) %>% summarise(ext=min(min_ma)) %>%count(ext==0)
count_living
## # A tibble: 2 × 2
## `ext == 0` n
## <lgl> <int>
## 1 FALSE 50
## 2 TRUE 16
OK so next we not only want to know who’s alive and who’s not, but we want to know how long each genus lived for and when – it’s range. In order to do this, we need to get the max and min for each genus, and then combine them in a new dataframe.
minage<-doggies %>% group_by(accepted_name) %>% summarise(min_age=min(min_ma))
maxage<-doggies %>% group_by(accepted_name) %>% summarise(max_age=max(max_ma))
dogrange<-cbind(minage, maxage)
dogrange
## accepted_name min_age accepted_name max_age
## 1 Aelurodon 3.6000 Aelurodon 23.0300
## 2 Aenocyon 0.0117 Aenocyon 3.6000
## 3 Alopex 0.0117 Alopex 5.3330
## 4 Archaeocyon 18.5000 Archaeocyon 31.8000
## 5 Borophagus 1.4000 Borophagus 23.0300
## 6 Caedocyon 18.5000 Caedocyon 29.5000
## 7 Canis 0.0000 Canis 29.5000
## 8 Carpocyon 4.7000 Carpocyon 16.3000
## 9 Cerdocyon 0.0000 Cerdocyon 9.4000
## 10 Chailicyon 37.7100 Chailicyon 56.0000
## 11 Chrysocyon 0.0000 Chrysocyon 4.7000
## 12 Cormocyon 5.3330 Cormocyon 41.2000
## 13 Cubacyon 0.0000 Cubacyon 0.1290
## 14 Cuon 0.0117 Cuon 16.3000
## 15 Cynarctoides 12.5000 Cynarctoides 29.5000
## 16 Cynarctus 4.7000 Cynarctus 23.0300
## 17 Cynodesmus 18.5000 Cynodesmus 31.8000
## 18 Cynotherium 0.0117 Cynotherium 3.6000
## 19 Desmocyon 16.3000 Desmocyon 29.5000
## 20 Dusicyon 0.0000 Dusicyon 3.6000
## 21 Ectopocynus 16.3000 Ectopocynus 31.8000
## 22 Enhydrocyon 18.5000 Enhydrocyon 31.8000
## 23 Epicyon 4.7000 Epicyon 23.0300
## 24 Eucyon 0.7740 Eucyon 12.5000
## 25 Euoplocyon 12.5000 Euoplocyon 18.5000
## 26 Ferrucyon 1.4000 Ferrucyon 4.7000
## 27 Hesperocyon 18.5000 Hesperocyon 46.2000
## 28 Hydrocyon 0.0000 Hydrocyon 0.0117
## 29 Leptocyon 4.7000 Leptocyon 29.5000
## 30 Lupulella 0.0000 Lupulella 5.3330
## 31 Lupus 0.7740 Lupus 2.5800
## 32 Lycalopex 0.0000 Lycalopex 23.0300
## 33 Lycaon 0.0000 Lycaon 3.6000
## 34 Lycorus 0.0117 Lycorus 2.5800
## 35 Mesocyon 16.3000 Mesocyon 33.9000
## 36 Metalopex 4.7000 Metalopex 18.5000
## 37 Metatomarctus 5.3330 Metatomarctus 23.0300
## 38 Microtomarctus 5.3330 Microtomarctus 18.5000
## 39 Nurocyon 3.1000 Nurocyon 5.3330
## 40 Nyctereutes 0.0000 Nyctereutes 11.6300
## 41 Osbornodon 12.5000 Osbornodon 33.9000
## 42 Otarocyon 18.5000 Otarocyon 37.0000
## 43 Otocyon 0.0000 Otocyon 3.6000
## 44 Oxetocyon 29.5000 Oxetocyon 31.8000
## 45 Paracynarctus 5.3330 Paracynarctus 23.0300
## 46 Paraenhydrocyon 18.5000 Paraenhydrocyon 29.5000
## 47 Paratomarctus 4.7000 Paratomarctus 23.0300
## 48 Philotrox 18.5000 Philotrox 29.5000
## 49 Phlaocyon 16.3000 Phlaocyon 31.8000
## 50 Prohesperocyon 33.9000 Prohesperocyon 37.0000
## 51 Protepicyon 12.5000 Protepicyon 16.3000
## 52 Protocyon 0.0000 Protocyon 3.6000
## 53 Protomarctus 12.5000 Protomarctus 20.4400
## 54 Prototocyon 0.0117 Prototocyon 3.6000
## 55 Psalidocyon 12.5000 Psalidocyon 16.3000
## 56 Rhizocyon 24.7000 Rhizocyon 29.5000
## 57 Schaeffia 0.0000 Schaeffia 3.6000
## 58 Sinicuon 0.7740 Sinicuon 5.3330
## 59 Speothos 0.0000 Speothos 2.5800
## 60 Sunkahetanka 18.5000 Sunkahetanka 29.5000
## 61 Tephrocyon 12.5000 Tephrocyon 18.5000
## 62 Theriodictis 0.0117 Theriodictis 4.7000
## 63 Tomarctus 9.4000 Tomarctus 18.5000
## 64 Urocyon 0.0000 Urocyon 12.5000
## 65 Vulpes 0.0000 Vulpes 12.5000
## 66 Xenocyon 0.0117 Xenocyon 5.3330
We have an annoying double row for names, so we’ll delete that
dogrange<-dogrange[,-3]
Next we want to add an age range column, which is just the max age minus the min age. Mutate gives us a new column we’ll call range
dogrange<-dogrange %>% mutate(range = max_age - min_age)
Now we’ll make another new column of the mean age for plotting purposes, which is just the average between the max and min age.
dogrange<-dogrange %>% mutate(mean=((max_age+min_age)/2))
dogrange
## accepted_name min_age max_age range mean
## 1 Aelurodon 3.6000 23.0300 19.4300 13.31500
## 2 Aenocyon 0.0117 3.6000 3.5883 1.80585
## 3 Alopex 0.0117 5.3330 5.3213 2.67235
## 4 Archaeocyon 18.5000 31.8000 13.3000 25.15000
## 5 Borophagus 1.4000 23.0300 21.6300 12.21500
## 6 Caedocyon 18.5000 29.5000 11.0000 24.00000
## 7 Canis 0.0000 29.5000 29.5000 14.75000
## 8 Carpocyon 4.7000 16.3000 11.6000 10.50000
## 9 Cerdocyon 0.0000 9.4000 9.4000 4.70000
## 10 Chailicyon 37.7100 56.0000 18.2900 46.85500
## 11 Chrysocyon 0.0000 4.7000 4.7000 2.35000
## 12 Cormocyon 5.3330 41.2000 35.8670 23.26650
## 13 Cubacyon 0.0000 0.1290 0.1290 0.06450
## 14 Cuon 0.0117 16.3000 16.2883 8.15585
## 15 Cynarctoides 12.5000 29.5000 17.0000 21.00000
## 16 Cynarctus 4.7000 23.0300 18.3300 13.86500
## 17 Cynodesmus 18.5000 31.8000 13.3000 25.15000
## 18 Cynotherium 0.0117 3.6000 3.5883 1.80585
## 19 Desmocyon 16.3000 29.5000 13.2000 22.90000
## 20 Dusicyon 0.0000 3.6000 3.6000 1.80000
## 21 Ectopocynus 16.3000 31.8000 15.5000 24.05000
## 22 Enhydrocyon 18.5000 31.8000 13.3000 25.15000
## 23 Epicyon 4.7000 23.0300 18.3300 13.86500
## 24 Eucyon 0.7740 12.5000 11.7260 6.63700
## 25 Euoplocyon 12.5000 18.5000 6.0000 15.50000
## 26 Ferrucyon 1.4000 4.7000 3.3000 3.05000
## 27 Hesperocyon 18.5000 46.2000 27.7000 32.35000
## 28 Hydrocyon 0.0000 0.0117 0.0117 0.00585
## 29 Leptocyon 4.7000 29.5000 24.8000 17.10000
## 30 Lupulella 0.0000 5.3330 5.3330 2.66650
## 31 Lupus 0.7740 2.5800 1.8060 1.67700
## 32 Lycalopex 0.0000 23.0300 23.0300 11.51500
## 33 Lycaon 0.0000 3.6000 3.6000 1.80000
## 34 Lycorus 0.0117 2.5800 2.5683 1.29585
## 35 Mesocyon 16.3000 33.9000 17.6000 25.10000
## 36 Metalopex 4.7000 18.5000 13.8000 11.60000
## 37 Metatomarctus 5.3330 23.0300 17.6970 14.18150
## 38 Microtomarctus 5.3330 18.5000 13.1670 11.91650
## 39 Nurocyon 3.1000 5.3330 2.2330 4.21650
## 40 Nyctereutes 0.0000 11.6300 11.6300 5.81500
## 41 Osbornodon 12.5000 33.9000 21.4000 23.20000
## 42 Otarocyon 18.5000 37.0000 18.5000 27.75000
## 43 Otocyon 0.0000 3.6000 3.6000 1.80000
## 44 Oxetocyon 29.5000 31.8000 2.3000 30.65000
## 45 Paracynarctus 5.3330 23.0300 17.6970 14.18150
## 46 Paraenhydrocyon 18.5000 29.5000 11.0000 24.00000
## 47 Paratomarctus 4.7000 23.0300 18.3300 13.86500
## 48 Philotrox 18.5000 29.5000 11.0000 24.00000
## 49 Phlaocyon 16.3000 31.8000 15.5000 24.05000
## 50 Prohesperocyon 33.9000 37.0000 3.1000 35.45000
## 51 Protepicyon 12.5000 16.3000 3.8000 14.40000
## 52 Protocyon 0.0000 3.6000 3.6000 1.80000
## 53 Protomarctus 12.5000 20.4400 7.9400 16.47000
## 54 Prototocyon 0.0117 3.6000 3.5883 1.80585
## 55 Psalidocyon 12.5000 16.3000 3.8000 14.40000
## 56 Rhizocyon 24.7000 29.5000 4.8000 27.10000
## 57 Schaeffia 0.0000 3.6000 3.6000 1.80000
## 58 Sinicuon 0.7740 5.3330 4.5590 3.05350
## 59 Speothos 0.0000 2.5800 2.5800 1.29000
## 60 Sunkahetanka 18.5000 29.5000 11.0000 24.00000
## 61 Tephrocyon 12.5000 18.5000 6.0000 15.50000
## 62 Theriodictis 0.0117 4.7000 4.6883 2.35585
## 63 Tomarctus 9.4000 18.5000 9.1000 13.95000
## 64 Urocyon 0.0000 12.5000 12.5000 6.25000
## 65 Vulpes 0.0000 12.5000 12.5000 6.25000
## 66 Xenocyon 0.0117 5.3330 5.3213 2.67235
Now let’s say we want to know how many dog genera are around today, and also have a fossil record that’s older than the Quaternary.
dogrange%>% count(min_age== 0 & max_age >2.58)
## min_age == 0 & max_age > 2.58 n
## 1 FALSE 53
## 2 TRUE 13
Cool! So now we know how many living genera have been around for at least 2.58 million years.
Let’s try to make a figure of these ranges now. If we graph it now, it won’t be in the right order. We want it to look nice so that the ranges line up according to oldest to youngest. In order to do this, we have to do some annoying R stuff that I’m not going to get into, the code is below if you want to check it out.
dogrange$accepted_name <- factor(dogrange$accepted_name, levels = dogrange$accepted_name[order(dogrange$min_age)])
This gets it in the right order! Now we can plot our ranges using ggplot. The dataframe we’re calling is dogrange, and we want the x-axis to be time, which we’re using mean for. We want the names (each genus) along the y-axis. In order to get ranges we have to do something a little fiddly which is to pretend they are actually error bars, and use that part of ggplot to make this work, just telling it that our error bars go from our max to our min (centered around our mean, which is our x-value!). As paleontologists, we like to put the present on the right, so we’ll flip the axis here too.
ggplot(dogrange, aes(x=mean, y=accepted_name))+geom_point(shape=1, size=.1)+geom_errorbarh(aes(xmin = min_age,xmax = max_age))+scale_x_reverse()
Now it’s your turn – do the same things for our Cetacean example! All you have to do to start is go up to the url above and change the text “Canidae” to “Cetacea”. Type the code out - do not copy and paste from this document
Read the lab to see what you need to do next and what to turn in.