gpubr - allometric data

Allometric data - classic case of regression, using logs, using non-linear model too

library(compbio4all)

Vocab

Learning objectives

Introduction

This Software Check point will have you practice using ggpubr.

Preliminaries

Download packaged

Only do this once, then comment out of the script. You probably already did this in the previous Code Checkpoint.

#install.packages("ggplot2")
#install.packages("ggpubr")

Load the libraries

library(ggplot2)
library(ggpubr)

Load the msleep package

The mammals dataset is a classic dataset in the MASS package. msleep is an updated version of the data that includes more numeric data (e.g. hours of sleep) and categorical data (e.g. if a species is endangered)

data(msleep)

Make a basic ggpubr scatterplot

ggpubr syntax

ggpubr does NOT use formula notation like base R function. You have to explicitly define a y and an x variable. Additionally, the variables MUST be in quotes.

Scatter plots in ggpubr

Ignore any errors.

ggscatter(y = "sleep_rem",
          x = "sleep_total",
          data = msleep)
## Warning: Removed 22 rows containing missing values (geom_point).

Adding a line of best fit

Ignore any errors.

ggscatter(y = "sleep_rem",
          x = "sleep_total",
          add = "reg.line",  # line of best fit
          data = msleep)
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 22 rows containing non-finite values (stat_smooth).
## Warning: Removed 22 rows containing missing values (geom_point).

Adding a data ellipse

ggscatter(y = "sleep_rem",
          x = "sleep_total",
          ellipse = TRUE,   # data ellipse
          data = msleep)
## Warning: Removed 22 rows containing non-finite values (stat_ellipse).
## Warning: Removed 22 rows containing missing values (geom_point).

Add a correlation coefficient

By adding "cor.coef = TRUE’ correlation coefficient, as well as a p-value for the significance of the correlation coefficient (testing the hypothesis that it is 0).

ggscatter(y = "sleep_rem",
          x = "sleep_total",
          cor.coef = TRUE,
          data = msleep)
## Warning: Removed 22 rows containing non-finite values (stat_cor).
## Warning: Removed 22 rows containing missing values (geom_point).

Allometry

NOTE: A question about the basic biological issues related to allometry could appear on the next test.

The mammal and msleep data are often used to display the concept of allometric relationships. “Allometry, in its broadest sense, describes how the characteristics of living creatures change with size” (Shingletone 2010). Ecologists and evolutionary biologists are often interested how different morphological, physiological, ecological, and life history factors vary with size. For example, larger organism typically have smaller brains, few offspring, and life longer, and these relationships are often linear when plotted on a log-log scale.

Allometry is not an inherently computational discipline, though it can involve a lot of math. One area of interest to ecologists is how thing like metabolic rate and energy consumption vary as organisms increase in size.

Allometry research is often used to inform computational research, especially simulation models. For example, allometric models can be used to predict the size and growth patterns of trees in models that simulate the growth of forests.

Allometry and log scales

Taking the natural log of data is often used to re-scale it or make relationships linear. Allometric data is often plotted on a log-log scale: both the x and the y variables are logged.

To do this, we’ll make new columns. Let’s take the log of brain weight (brainwt) and body weight (bodywt)

First, brain weight. We can make a new column using the $ operator. This operator can be used to select a single column, like this

mean(msleep$brainwt, na.rm = T)
## [1] 0.2815814

It can also be used to create a new variable in a dataframe; here, I make a new column “brainwt_log” that does not yet exist in the msleep dataframe.

msleep$brainwt_log <- log(msleep$brainwt)

The same thing for bodywt

msleep$bodywt_log <- log(msleep$bodywt)

Make an allometric plot

ggpubr does NOT use formula notation like base R function. You have to explicitly define a y and an x variable. Additionally, the variables MUST be in quotes.

Note: Ignore any errors; these are due to NAs in the data.

ggscatter(y = "brainwt_log",
          x = "bodywt_log",
          data = msleep)
## Warning: Removed 27 rows containing missing values (geom_point).

Change color by a categorical variable

ggscatter(y = "brainwt_log",
          x = "bodywt_log",
          color = "vore",    # color = 
          data = msleep)
## Warning: Removed 27 rows containing missing values (geom_point).

Change color by a continuous variable

ggscatter(y = "brainwt_log",
          x = "bodywt_log",
          color = "sleep_total",
          data = msleep)
## Warning: Removed 27 rows containing missing values (geom_point).

Add a smoother

A smoother is a data exploration tool which helps you visualize trends in the data. There are many ways to calculate them, but conceptually they work by taking doing something akin to taking a weight average of sets of adjacent points. A simple type is a loess smoother, which can easily be added in ggpubr.

ggscatter(y = "brainwt",
          x = "bodywt_log",
          add  = "loess",
          data = msleep)
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 27 rows containing non-finite values (stat_smooth).
## Warning: Removed 27 rows containing missing values (geom_point).

Task

If you have a problem, make sure that things are in quotes as appropriate, and that there is a comma at the end of each line as needed. Note that cor.coef = TRUE doesn’t use quotes.

msleep$sleep_total_log <- log(msleep$sleep_total)
ggscatter(y = "sleep_total_log",
          x = "brainwt_log",
          add = "reg.line",
          cor.coef = TRUE,
          color = "bodywt_log",
          data = msleep)
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 27 rows containing non-finite values (stat_smooth).
## Warning: Removed 27 rows containing non-finite values (stat_cor).
## Warning: Removed 27 rows containing missing values (geom_point).