Allometric data - classic case of regression, using logs, using non-linear model too
library(compbio4all)
This Software Check point will have you practice using ggpubr.
Only do this once, then comment out of the script. You probably already did this in the previous Code Checkpoint.
#install.packages("ggplot2")
#install.packages("ggpubr")
library(ggplot2)
library(ggpubr)
The mammals dataset is a classic dataset in the MASS package. msleep is an updated version of the data that includes more numeric data (e.g. hours of sleep) and categorical data (e.g. if a species is endangered)
data(msleep)
ggpubr does NOT use formula notation like base R function. You have to explicitly define a y and an x variable. Additionally, the variables MUST be in quotes.
Ignore any errors.
ggscatter(y = "sleep_rem",
x = "sleep_total",
data = msleep)
## Warning: Removed 22 rows containing missing values (geom_point).
Ignore any errors.
ggscatter(y = "sleep_rem",
x = "sleep_total",
add = "reg.line", # line of best fit
data = msleep)
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 22 rows containing non-finite values (stat_smooth).
## Warning: Removed 22 rows containing missing values (geom_point).
ggscatter(y = "sleep_rem",
x = "sleep_total",
ellipse = TRUE, # data ellipse
data = msleep)
## Warning: Removed 22 rows containing non-finite values (stat_ellipse).
## Warning: Removed 22 rows containing missing values (geom_point).
By adding "cor.coef = TRUE’ correlation coefficient, as well as a p-value for the significance of the correlation coefficient (testing the hypothesis that it is 0).
ggscatter(y = "sleep_rem",
x = "sleep_total",
cor.coef = TRUE,
data = msleep)
## Warning: Removed 22 rows containing non-finite values (stat_cor).
## Warning: Removed 22 rows containing missing values (geom_point).
NOTE: A question about the basic biological issues related to allometry could appear on the next test.
The mammal and msleep data are often used to display the concept of allometric relationships. “Allometry, in its broadest sense, describes how the characteristics of living creatures change with size” (Shingletone 2010). Ecologists and evolutionary biologists are often interested how different morphological, physiological, ecological, and life history factors vary with size. For example, larger organism typically have smaller brains, few offspring, and life longer, and these relationships are often linear when plotted on a log-log scale.
Allometry is not an inherently computational discipline, though it can involve a lot of math. One area of interest to ecologists is how thing like metabolic rate and energy consumption vary as organisms increase in size.
Allometry research is often used to inform computational research, especially simulation models. For example, allometric models can be used to predict the size and growth patterns of trees in models that simulate the growth of forests.
Taking the natural log of data is often used to re-scale it or make relationships linear. Allometric data is often plotted on a log-log scale: both the x and the y variables are logged.
To do this, we’ll make new columns. Let’s take the log of brain weight (brainwt) and body weight (bodywt)
First, brain weight. We can make a new column using the $ operator. This operator can be used to select a single column, like this
mean(msleep$brainwt, na.rm = T)
## [1] 0.2815814
It can also be used to create a new variable in a dataframe; here, I make a new column “brainwt_log” that does not yet exist in the msleep dataframe.
msleep$brainwt_log <- log(msleep$brainwt)
The same thing for bodywt
msleep$bodywt_log <- log(msleep$bodywt)
ggpubr does NOT use formula notation like base R function. You have to explicitly define a y and an x variable. Additionally, the variables MUST be in quotes.
Note: Ignore any errors; these are due to NAs in the data.
ggscatter(y = "brainwt_log",
x = "bodywt_log",
data = msleep)
## Warning: Removed 27 rows containing missing values (geom_point).
ggscatter(y = "brainwt_log",
x = "bodywt_log",
color = "vore", # color =
data = msleep)
## Warning: Removed 27 rows containing missing values (geom_point).
ggscatter(y = "brainwt_log",
x = "bodywt_log",
color = "sleep_total",
data = msleep)
## Warning: Removed 27 rows containing missing values (geom_point).
A smoother is a data exploration tool which helps you visualize trends in the data. There are many ways to calculate them, but conceptually they work by taking doing something akin to taking a weight average of sets of adjacent points. A simple type is a loess smoother, which can easily be added in ggpubr.
ggscatter(y = "brainwt",
x = "bodywt_log",
add = "loess",
data = msleep)
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 27 rows containing non-finite values (stat_smooth).
## Warning: Removed 27 rows containing missing values (geom_point).
If you have a problem, make sure that things are in quotes as appropriate, and that there is a comma at the end of each line as needed. Note that cor.coef = TRUE doesn’t use quotes.
msleep$sleep_total_log <- log(msleep$sleep_total)
ggscatter(y = "sleep_total_log",
x = "brainwt_log",
add = "reg.line",
cor.coef = TRUE,
color = "bodywt_log",
data = msleep)
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 27 rows containing non-finite values (stat_smooth).
## Warning: Removed 27 rows containing non-finite values (stat_cor).
## Warning: Removed 27 rows containing missing values (geom_point).