Fitting straight lines to power law data

mbh

13/12/27 - 12/03/2020

Power Law related data

It often happens that two variables are related by a so-called power-law relationship:

\[y=ax^b\]

where \(a\) and \(b\) are constants. Sometimes one has a theory that suggests that this should be the case, and sometimes one discovers that it is, and then wonders why.

In either case, one needs to be able determine whether the power-law relationship is in fact the case and, if so, to determine what the constants \(a\) and, expecially \(b\) the exponent might be.

Metabolic rate

As an example, here is some data for body mass and metabolic rate, as measured by heat produced by the body in a steady state, for different mammals.

Mammal Body Weight (kg) metabolic rate (kcal/day)
Mouse 0.02 3
Rat 0.2 30
Guinea Pig 0.25 32
Cats 2 100
Rabbits 3 120
Macaque 80 170
Goat 30 800
Chimpanzee 60 1000
Sheep 90 1100
Steer 300 7000
Cow 700 10000
Elephant 3000 40000

We wish to explore the relationship between the metabolic rate of an animal and its body weight.

Preliminaries

If you have not done so already, create a folder called RStuff on your computer, and inside that create two others, one called ‘data’ and the other called ‘scripts’.

Make the Rstuff folder to be your Working Directory. You fan do this using Session / Set Working Directory / Choose Directory.

Create a script ( File / New File / R Script) called power_law andave it in the scripts folder.

Load any packages we need

We will almost laways use tidyversw(), so we need to load it into our R session. It is probaly a good idea to just do this as a matter of course, in every script you write.

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.2.1     ✓ purrr   0.3.3
## ✓ tibble  2.1.3     ✓ dplyr   0.8.3
## ✓ tidyr   1.0.0     ✓ stringr 1.4.0
## ✓ readr   1.3.1     ✓ forcats 0.4.0
## ── Conflicts ────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

Save the data as a csv file

In R, create a vector each of numbers for the body weight (weight) and metabilic rate (R) , and a vector of the names of the mammals. Combine these vectors into a data frame called ‘metaRate’.

mammal<-c("Mouse","Rat","GuineaPig","Cats","Rabbits","Macaque","Goat","Chimpanzee","Sheep","Steer","Cow","Elephant")
R<-c(3.1,30.,32.,100.,120.,170.,800.,1000.,1100.,7000.,10000,40000)
weight<-c(0.02,0.2,0.25,2,3,8,30,60,90,300,700,3000)
metaRate<-tibble(mammal=mammal,weight=weight,R=R)

Save this as a csv file in your data folder. Since w have chosem the RStuff folder to be the working directory, we can do this using

write_csv(metaRate,"../data/metabolicRate.csv")

While we are at it, let us clean up our work space.

rm(list=ls())

You should now notice that the Environment window in RStudio, top-right, is empty. It is a good idea to clean up your work space from time to time, usually at the beginning of every session.

Now, read in the metabolicRate.csv file to a new data frame called “mr” using

mr<-read_csv("../data/metabolicRate.csv")
## Parsed with column specification:
## cols(
##   mammal = col_character(),
##   weight = col_double(),
##   R = col_double()
## )

Check out the structure of this data frame. It should contain three variables and 12 observations.

glimpse(mr)
## Observations: 12
## Variables: 3
## $ mammal <chr> "Mouse", "Rat", "GuineaPig", "Cats", "Rabbits", "Macaque", "Go…
## $ weight <dbl> 0.02, 0.20, 0.25, 2.00, 3.00, 8.00, 30.00, 60.00, 90.00, 300.0…
## $ R      <dbl> 3.1, 30.0, 32.0, 100.0, 120.0, 170.0, 800.0, 1000.0, 1100.0, 7…

Mathematical analysis suggests that metabolic rate and body weight should be related by a power law relation of the form

\[R=Aw^{\frac{2}{3}}\] where R is the metabolic rate, A is a constant and w is the body weight.

Let us see if the data agree with that. One way to do this is to plot the logarithm of both sides of the equation above:, \[\log (R)=\log(A)+\frac{2}{3}\log (w)\] If the power-law equation is correct, we should get a straight line if we plot \(\log(R)\) against \(\log(w)\).

To check this, we first create new columns in our data frame which are the log of the metabolic rate and the log of the weight. We use mutate() to do this. This one of several very useful commands in dplyr() for data manipulation.

mr<-mr %>% mutate(logR=log(R),logW=log(weight))

Then we use ggplot() to create the plot we want:

g<-ggplot(mr,aes(x=logW,y=logR))
g<-g+geom_point()
g<-g+geom_smooth(method=lm,se=FALSE)
g<-g+xlab('log weight')+
     ylab('log metabolic rate')+
     ggtitle ('log-log plot of metabolic rate vs body weight for a range of mammals')
g

Does that look linear? If not then the power-law model is probably wrong. If so then our model looks promising.

Does the gradient of the line agree with the prediction of the model?

First, look at the equation above and decide what you think the prediction is for the gradient of the line.

Now let us find out what the gradient actually is. To do this we use the powerful lm() command in R

lfit=lm(logR~logW,data=mr)
summary(lfit)
## 
## Call:
## lm(formula = logR ~ logW, data = mr)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.5776 -0.2207 -0.0422  0.3780  0.4735 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  4.14308    0.13576   30.52 3.34e-11 ***
## logW         0.75516    0.03215   23.49 4.43e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3842 on 10 degrees of freedom
## Multiple R-squared:  0.9822, Adjusted R-squared:  0.9804 
## F-statistic: 551.7 on 1 and 10 DF,  p-value: 4.435e-10

Do the data follow a power-law relationship?

Is the exponent (2/3) as proposed. If not, what is it?

Turkey Cooking Time

Now try the same for some turkey data

  1. Download the file “turkey.csv” from the Moodle site.
  2. Put it in the data folder inside your RStuff folder on your machine or elsewhere.
  3. Ensure that the working directory is still the RStuff folder.
  4. Use read_csv() to upload the file into a data frame called “turkey” This file contains some turkey weights (in pounds) and cooking times (in hours). Using the same methods as for the metabolic rate data, find out:
  5. Whether there is a power-law relationship between turkey cooking time and turkey weight.
  6. If there is, what is the exponent?
  7. Write an equation for turkey cooking time as a function of turkey weight
  8. Predict the cooking time for a 17 pound turkey.