ISDA609: Mathematical Modeling Techniques for Data Analytics \ Final Project Report

Introduction

In this project, I will investigate the relationship of the body weights of birds and their pulse rates.

Heart rate of birds- Warm-blooded animals use large quantities of energy to maintain body temperature because of the heat loss through the body surface. In fact, biologists believe that the primary energy drain on a resting warm blooded animal is maintenance of body temperature.

Problem Description

Construct a model relating blood flow through the heart to body weight.
The following data relate weights of some birds to their heart rate measured in beats per minute. Construct a model that relates heart rate to body weight. Discuss the assumptions of your model. Use the data to check your model.

bird<- c('Canary','Pigeon','Crow','Buzzard','Duck','Hen','Goose','Turkey','Ostrich')
body_weight<- c(20,300,341,658,1100,2000,2300,8750,71000)
pulse_rate<-c(1000, 185,378,300,190,312,240,193,60)
df<- data.frame(cbind(bird, body_weight,pulse_rate ))

knitr::kable(df)

bird	body_weight	pulse_rate
Canary	20	1000
Pigeon	300	185
Crow	341	378
Buzzard	658	300
Duck	1100	190
Hen	2000	312
Goose	2300	240
Turkey	8750	193
Ostrich	71000	60

Assumptions

It is well-known that small warm-blooded animals have much higher resting pulse rates than large ones. For examples, a canary, with a body mass of 20 grams, may have a heart rate of 1000 beats/minute, as compared to 70-80 beats/minute for a human being.
Warm-blooded animals at rest use large quantities of energy to maintain body temperature, due to heat loss through the body surface. Let us make the following assumptions about energy balance over a fixed time interval (1 minute, say):

A. Energy lost is proportional to the surface area of the body.
B. Energy gained is proportional to the amount of blood flowing through the lungs, the source of oxygen for the body.
C. Energy lost = energy gained for a body in temperature equilibrium.

Model Construction

E = energy
S = surface area
m = mass
W = weight of the animal
r = pulse rate
V = volume of the animal
b = volume of blood pumped by the heart in one stroke.
From assumption A, we can assume that the energy required to maintain the body temperature of a bird E is proportional to its body surface area S, i.e. \(E\propto S\)

Since blood flow through the lungs provides animals oxygen, larger animals have larger lungs and require more oxygen. Therefore, we assume that the amount of energy available is proportional to the blood flow b through the lungs, i.e \(E \propto b\)

Also, we assume that the least amount of blood needed to circulate, the amount of available energy will equal the amount of energy used to maintain the body temperature. This implies \(b \propto S\)

Since larger animals have larger hearts and larger lungs, we can assume the birds are geometrically similar. Thus, \(S\propto V^{\frac 2 3}\) where V is the volume of the animal.

Since \(V \propto m \propto W\) where m is the mass of the animal and W is the weight of the animal, we have \(S \propto W^{\frac 2 3}\) . This implies \(b \propto W^{\frac 2 3}\).

Also, since blood flow is determined by the total amount of blood pumped by the heart per unit time (minutes), and every time the heart beats the blood is pumped into the arteries, we have \(b \propto rV \propto rW\) where r is the pulse rate and Vh is heart volume. From \(b\propto W^{\frac 2 3}\) and \(b \propto rW\), we have \(W^{ \frac 2 3} \propto rW\), i.e. \(r \propto W^{\frac {-1} {3}}\).

Methodology

First we will be modeling using Geometric Similarity given our power curve model function
We will be using Least squares criterion to fit our curve model of the form \(y= Ax^n\) where n is fixed to a given collection of data points. In our case, our curve is \(r = Aw^{\frac {-1} {3}}\). Therefore we will use the least squares criterion to estimate \(A\) with \(n = -1/3\).
In addition we will use the Chebyshev criterion to find the bounds

Modeling using Geometric Similarity

body_weight<- c(20,300,341,658,1100,2000,2300,8750,71000)
pulse_rate<-c(1000, 185,378,300,190,312,240,193,60)
df<- data.frame(bird, body_weight,pulse_rate )

df$z = c(NA,tail(df$pulse_rate,-1)-head(df$pulse_rate,-1))
df$z2= c(NA,tail(df$body_weight,-1)-head(df$body_weight,-1))

# finding the slope
df$slope<- df$z / df$z2

slope<- mean(df$slope, na.rm= TRUE)
#slope

# calculating f(x) using the slope 
df$fx<- 0.148476*(df$body_weight)^(-1/3)

#########################################
# Multiple plot function
#
# ggplot objects can be passed in ..., or to plotlist (as a list of ggplot objects)
# - cols:   Number of columns in layout
# - layout: A matrix specifying the layout. If present, 'cols' is ignored.
#
# If the layout is something like matrix(c(1,2,3,3), nrow=2, byrow=TRUE),
# then plot 1 will go in the upper left, 2 will go in the upper right, and
# 3 will go all the way across the bottom.
#
multiplot <- function(..., plotlist=NULL, file, cols=1, layout=NULL) {
  library(grid)
  
  # Make a list from the ... arguments and plotlist
  plots <- c(list(...), plotlist)
  
  numPlots = length(plots)
  
  # If layout is NULL, then use 'cols' to determine layout
  if (is.null(layout)) {
    # Make the panel
    # ncol: Number of columns of plots
    # nrow: Number of rows needed, calculated from # of cols
    layout <- matrix(seq(1, cols * ceiling(numPlots/cols)),
                     ncol = cols, nrow = ceiling(numPlots/cols))
  }
  
  if (numPlots==1) {
    print(plots[[1]])
    
  } else {
    # Set up the page
    grid.newpage()
    pushViewport(viewport(layout = grid.layout(nrow(layout), ncol(layout))))
    
    # Make each plot, in the correct location
    for (i in 1:numPlots) {
      # Get the i,j matrix positions of the regions that contain this subplot
      matchidx <- as.data.frame(which(layout == i, arr.ind = TRUE))
      
      print(plots[[i]], vp = viewport(layout.pos.row = matchidx$row,
                                      layout.pos.col = matchidx$col))
    }
  }
}

library(ggplot2)

## Warning: package 'ggplot2' was built under R version 3.2.5

p1<- ggplot(data = df, aes(x = df$body_weight, y = df$pulse_rate)) +
  geom_line()+
  ggtitle("Body Weight vs Pulse Rate (Original data)")

p2<- ggplot(data = df, aes(x = df$body_weight, y = df$fx)) +
  geom_line()+
  ggtitle("Body Weight vs Pulse Rate using Fitted model \n f(x) = 0.148476*(df$body_weight)^(-1/3)")

library(gridExtra)

## Warning: package 'gridExtra' was built under R version 3.2.5

multiplot(p1, p2, cols=1)

dfold<- df

Least squares criterion Methodology details

Recall model of form power curve \(f(x) = ax^n\) requires minimization of: \[ S = \sum_{i=1}^m[y_i -f(x_i)]^2\] \[ s = \sum_{i=1}^m[y_i -ax_i^n]^2\] solving the the equation for \(a\) yeilds: \[ a = \frac {\sum x_i^2y_i} {\sum x_i^{2n}} \]

Least squares criterion Model results (table)

bird<- c('Canary','Pigeon','Crow','Buzzard','Duck','Hen','Goose','Turkey','Ostrich')
body_weight<- c(20,300,341,658,1100,2000,2300,8750,71000)
pulse_rate<-c(1000, 185,378,300,190,312,240,193,60)
df<- data.frame(cbind(bird, body_weight,pulse_rate ))
library(varhandle)

## Warning: package 'varhandle' was built under R version 3.2.3

sum1<- sum(unfactor(df$body_weight)^(-1/3) * unfactor(df$pulse_rate))
sum2<- sum(unfactor(df$body_weight)^(-2/3))
a<- sum1/sum2
a

## [1] 2576.555

df$pulse_rate_fit<- round(a*(unfactor(df$body_weight)^(-1/3)), 2)
df

##      bird body_weight pulse_rate pulse_rate_fit
## 1  Canary          20       1000         949.21
## 2  Pigeon         300        185         384.89
## 3    Crow         341        378         368.80
## 4 Buzzard         658        300         296.23
## 5    Duck        1100        190         249.60
## 6     Hen        2000        312         204.50
## 7   Goose        2300        240         195.19
## 8  Turkey        8750        193         125.04
## 9 Ostrich       71000         60          62.22

Least squares criterion Model results (graphical)

library(reshape2)
library(ggplot2)

final <- data.frame( body_weight= df$body_weight, pulse_rate = df$pulse_rate, pulse_rate_fit = df$pulse_rate_fit)
#plot(final) 

DF <- melt(final, id= 'body_weight')

## Warning: attributes are not identical across measure variables; they will
## be dropped

ggplot(data = DF, aes(x = body_weight, y = value, color = variable)) +
  geom_area()

DF<- data.frame(DF)
DF<- unfactor(DF)
#str(DF)
DF$value<- as.numeric(DF$value)
ggplot(data = DF, aes(x = body_weight, y = value, color = variable)) +
  geom_line()

Visual comparison original data, least Square, and Geometric Similarity

Chebyshev Criterion model

Given N points (x_1, y_1),(x_2, y_2),(x_3, y_3), …., (x_n, y_n), to fit the model y=cx using Chebyshev criterion, we need to:
\[\epsilon = max_i\bigl| y_i - y(x_i) \bigr|\] \[ Goal: \ to \ minimize\ \epsilon\]

Chebyshev LP model Statement

\[r = max_i\bigl| y_i - cx_i \bigr|\] \[ Goal: to \ minimize\ r\] subject to: \[r-(1000 -c*20) >=0\] \[r+(1000 -c*20) >=0\] \[r-(185 -c*300) >=0\] \[r+(185 -c*300) >=0\] \[r-(378 -c*341) >=0\] \[r+(378 -c*341) >=0\] \[r-(300 -c*658) >=0\] \[r+(300 -c*658) >=0\] \[r-(190 -c*1100) >=0\] \[r+(190 -c*1100) >=0\] \[r-(312 -c*2000) >=0\] \[r+(312 -c*2000) >=0\] \[r-(240 -c*2300) >=0\] \[r+(240 -c*2300) >=0\] \[r-(193 -c*8750) >=0\] \[r+(193 -c*8750) >=0\] \[r-(60 -c*71000) >=0\] \[r+(60 -c*71000) >=0\]

Chebyshev Criterion Geometric interpretation

library(ggplot2)

fun1<- function(c) (1000-20*c)
fun2<- function(c) (185-300*c)
fun3<- function(c) (378-341*c)
fun4<- function(c) (300-658*c)
fun5<- function(c) (190-1100*c)
fun6<- function(c) (312-2000*c)
fun7<- function(c) (240-2300*c)
fun8<- function(c) (193-8750*c)
fun9<- function(c) (60-71000*c)

fun11<- function(c) 20*c-1000
fun12<- function(c) 300*c-185
fun13<- function(c) 341*c-378
fun14<- function(c) 658*c-300
fun15<- function(c) 1100*c-190
fun16<- function(c) 2000*c-312
fun17<- function(c) 2300*c-240
fun18<- function(c) 8750*c-193
fun19<- function(c) 71000*c-60


x1 = seq(-5,1000)
mydf = data.frame(x1, y1=fun1(x1), y2=fun2(x1),y3= fun3(x1), y4=fun4(x1), 
                  y5=fun5(x1),y6= fun6(x1),
                  y7=fun7(x1), y8=fun8(x1),y9= fun9(x1),
                  y11=fun11(x1), y12=fun12(x1),y13= fun13(x1), y14=fun14(x1), 
                  y15=fun15(x1),y16= fun16(x1),
                  y17=fun17(x1), y18=fun18(x1),y19= fun19(x1)
                  )
mydf <-  transform(mydf, z = pmax(y1,pmin(y2,y3, y4, y5, y6, y7, y8, y9,
                                          y12,y13, y14, y15, y16, y17, y18, y19)))
ggplot(mydf, aes(x = x1)) + 
  geom_line(aes(y = y1), colour = 'blue') +
  geom_line(aes(y = y2), colour = 'black') +
  geom_line(aes(y = y3), colour = 'red') +
  geom_line(aes(y = y4), colour = 'yellow') +
  geom_line(aes(y = y5), colour = 'green') +
  geom_line(aes(y = y6), colour = 'orange') +
  geom_line(aes(y = y7), colour = 'gray') +
  geom_line(aes(y = y8), colour = 'pink') +
  geom_line(aes(y = y9), colour = 'brown') +
  geom_line(aes(y = y11), colour = 'blue') +
  geom_line(aes(y = y12), colour = 'black') +
  geom_line(aes(y = y13), colour = 'red') +
  geom_line(aes(y = y14), colour = 'yellow') +
  geom_line(aes(y = y15), colour = 'green') +
  geom_line(aes(y = y16), colour = 'orange') +
  geom_line(aes(y = y17), colour = 'gray') +
  geom_line(aes(y = y18), colour = 'pink') +
  geom_line(aes(y = y19), colour = 'brown') +
  geom_ribbon(aes(ymin=y1,ymax = z), fill = 'gray60')+
  coord_flip()

From the above Chebyshev graph , the minmax is r+(60-c71000) and maxmin is r-(1000 -c20). Hence, setting r+(60-c71000) = 0 and r-(1000 -c20) = 0, where r = 1000 -c*20 Therefore, the optimal value of c = 67. The residual error is 340. Please note that the feasible region could not visibly be confirmed. Hence the maxmin was just visual approximation from the graph

Conclusion

For the majority of the birds, the model makes good predictions in both Least Square and Geometric Similarity, and support the biology that larger animals have slower pulse rate. However, this is not always true for all birds. The pulse rate of a Pigeon is almost the same as the pulse rate of a Turkey, but the weight of a Turkey is 29.16 times of the weight of a pigeon. This model reveals that assuming there is no measurement error, there is something else affecting pigeon’s pulse rate; thus further investigations and possible model refinement are needed for birds like pigeon.

However, a research by Florence Buchanan in his paper, The significance of the pulse rate in vertebrate animals University college, London 1910, explains the pigeon data discrepancy. The pigeon weights 300g and pulse rate of 185. However, if we examine its average dioxide per kilo per hour(in grams) which is 3.4 as compared to that of the goose 1.07 and yet body weight of 2300 and pulse rate of 240, we can derive that larger average dioxide per kilo per hour(in grams) leads to faster heart rates.. It is clearly demonstrated in by Florence Buchanan that larger average dioxide per kilo per hour (in grams) leads to faster heart rates. link: https://books.google.com/books?id=3jEsAAAAYAAJ&printsec=frontcover#v=onepage&q&f=false

Please note that Chebyshev criterion resulted in optimal value of 67; however, large error of 340 which is not acceptable. In fact the feasible region could not be geometrically confirmed.

ISDA609: Mathematical Modeling Techniques for Data Analytics \[ \ \] Final Project Report

Mohamed Elmoudni

May 28, 2016