It often happens that two variables are related by a so-called power-law relationship:
\[y=ax^b\]
where \(a\) and \(b\) are constants. Sometimes one has a theory that suggests that this should be the case, and sometimes one discovers that it is, and then wonders why.
In either case, one needs to be able determine whether the power-law relationship is in fact the case and, if so, to determine what the constants \(a\) and, expecially \(b\) the exponent might be.
As an example, here is some data for body mass and metabolic rate, as measured by heat produced by the body in a steady state, for different mammals.
| Mammal | Body Weight (kg) | metabolic rate (kcal/day) |
|---|---|---|
| Mouse | 0.02 | 3 |
| Rat | 0.2 | 30 |
| Guinea Pig | 0.25 | 32 |
| Cats | 2 | 100 |
| Rabbits | 3 | 120 |
| Macaque | 80 | 170 |
| Goat | 30 | 800 |
| Chimpanzee | 60 | 1000 |
| Sheep | 90 | 1100 |
| Steer | 300 | 7000 |
| Cow | 700 | 10000 |
| Elephant | 3000 | 40000 |
We wish to explore the relationship between the metabolic rate of an animal and its body weight.
If you have not done so already, create a folder called RStuff on your computer, and inside that create two others, one called ‘data’ and the other called ‘scripts’.
Make the Rstuff folder to be your Working Directory. You fan do this using Session / Set Working Directory / Choose Directory.
Create a script ( File / New File / R Script) called power_law andave it in the scripts folder.
We will almost laways use tidyversw(), so we need to load it into our R session. It is probaly a good idea to just do this as a matter of course, in every script you write.
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.2.1 ✓ purrr 0.3.3
## ✓ tibble 2.1.3 ✓ dplyr 0.8.3
## ✓ tidyr 1.0.0 ✓ stringr 1.4.0
## ✓ readr 1.3.1 ✓ forcats 0.4.0
## ── Conflicts ────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
In R, create a vector each of numbers for the body weight (weight) and metabilic rate (R) , and a vector of the names of the mammals. Combine these vectors into a data frame called ‘metaRate’.
mammal<-c("Mouse","Rat","GuineaPig","Cats","Rabbits","Macaque","Goat","Chimpanzee","Sheep","Steer","Cow","Elephant")
R<-c(3.1,30.,32.,100.,120.,170.,800.,1000.,1100.,7000.,10000,40000)
weight<-c(0.02,0.2,0.25,2,3,8,30,60,90,300,700,3000)
metaRate<-tibble(mammal=mammal,weight=weight,R=R)
Save this as a csv file in your data folder. Since w have chosem the RStuff folder to be the working directory, we can do this using
write_csv(metaRate,"../data/metabolicRate.csv")
While we are at it, let us clean up our work space.
rm(list=ls())
You should now notice that the Environment window in RStudio, top-right, is empty. It is a good idea to clean up your work space from time to time, usually at the beginning of every session.
Now, read in the metabolicRate.csv file to a new data frame called “mr” using
mr<-read_csv("../data/metabolicRate.csv")
## Parsed with column specification:
## cols(
## mammal = col_character(),
## weight = col_double(),
## R = col_double()
## )
Check out the structure of this data frame. It should contain three variables and 12 observations.
glimpse(mr)
## Observations: 12
## Variables: 3
## $ mammal <chr> "Mouse", "Rat", "GuineaPig", "Cats", "Rabbits", "Macaque", "Go…
## $ weight <dbl> 0.02, 0.20, 0.25, 2.00, 3.00, 8.00, 30.00, 60.00, 90.00, 300.0…
## $ R <dbl> 3.1, 30.0, 32.0, 100.0, 120.0, 170.0, 800.0, 1000.0, 1100.0, 7…
Mathematical analysis suggests that metabolic rate and body weight should be related by a power law relation of the form
\[R=Aw^{\frac{2}{3}}\] where R is the metabolic rate, A is a constant and w is the body weight.
Let us see if the data agree with that. One way to do this is to plot the logarithm of both sides of the equation above:, \[\log (R)=\log(A)+\frac{2}{3}\log (w)\] If the power-law equation is correct, we should get a straight line if we plot \(\log(R)\) against \(\log(w)\).
To check this, we first create new columns in our data frame which are the log of the metabolic rate and the log of the weight. We use mutate() to do this. This one of several very useful commands in dplyr() for data manipulation.
mr<-mr %>% mutate(logR=log(R),logW=log(weight))
Then we use ggplot() to create the plot we want:
g<-ggplot(mr,aes(x=logW,y=logR))
g<-g+geom_point()
g<-g+geom_smooth(method=lm,se=FALSE)
g<-g+xlab('log weight')+
ylab('log metabolic rate')+
ggtitle ('log-log plot of metabolic rate vs body weight for a range of mammals')
g
Does that look linear? If not then the power-law model is probably wrong. If so then our model looks promising.
Does the gradient of the line agree with the prediction of the model?
First, look at the equation above and decide what you think the prediction is for the gradient of the line.
Now let us find out what the gradient actually is. To do this we use the powerful lm() command in R
lfit=lm(logR~logW,data=mr)
summary(lfit)
##
## Call:
## lm(formula = logR ~ logW, data = mr)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.5776 -0.2207 -0.0422 0.3780 0.4735
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.14308 0.13576 30.52 3.34e-11 ***
## logW 0.75516 0.03215 23.49 4.43e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3842 on 10 degrees of freedom
## Multiple R-squared: 0.9822, Adjusted R-squared: 0.9804
## F-statistic: 551.7 on 1 and 10 DF, p-value: 4.435e-10
Do the data follow a power-law relationship?
Is the exponent (2/3) as proposed. If not, what is it?
Now try the same for some turkey data
read_csv() to upload the file into a data frame called “turkey” This file contains some turkey weights (in pounds) and cooking times (in hours). Using the same methods as for the metabolic rate data, find out: