Introduction to Regression Modeling


Data for today: comparative zology of milk

How does milk compositon vary with animal life history & biology?

Skibiel et al 2013. Journal of Animal Ecology. The evolution of the nutrient composition of mammalian milks.

  • Table S1. Milk composition of mammals at mid-lactation
  • Table S2. Ecology and life histories of mammals whose milk composition has been described


The milk dataset



























Look at data

# Hom much data?
dim(milk2)
## [1] 130  13

#focal columns
##(use negative indexing to drop some)
summary(milk2[,-c(2,3,8,9)])
##            order     mass.female        gest.month.NUM    lacat.mo.NUM   
##  Artiodactyla :23   Min.   :        8   Min.   : 0.400   Min.   : 0.300  
##  Carnivora    :23   1st Qu.:      857   1st Qu.: 1.405   1st Qu.: 1.625  
##  Primates     :22   Median :     5716   Median : 5.000   Median : 4.500  
##  Rodentia     :17   Mean   :  2229475   Mean   : 5.624   Mean   : 6.092  
##  Chiroptera   :10   3rd Qu.:   107500   3rd Qu.: 8.365   3rd Qu.: 8.225  
##  Diprotodontia:10   Max.   :170000000   Max.   :21.460   Max.   :42.000  
##  (Other)      :25                                                        
##   mass.litter               diet     arid            biome    
##  Min.   :      0.3   carnivore:32   no :91   aquatic    : 22  
##  1st Qu.:     42.0   herbivore:61   yes:39   terrestrial:108  
##  Median :    423.5   omnivore :37                             
##  Mean   :  52563.8                                            
##  3rd Qu.:   7038.2                                            
##  Max.   :2272500.0                                            
##                                                               
##   fat.percent    
##  Min.   : 0.200  
##  1st Qu.: 4.575  
##  Median : 8.550  
##  Mean   :14.068  
##  3rd Qu.:17.575  
##  Max.   :61.100  
##  NA's   :2

Focal families in dataset

  • Artiodactyla = 23; Even-toed ungulate
  • Carnivora = 23
  • Primates = 22
  • Rodentia = 17
  • Chiroptera = 10; bats
  • Diprotodontia = 10; marsupials
  • Cetacea = 6
  • Perrissodactyla = 7; odd-toed ungulates
  • Lagomorpha = 3



























Explore milk data with ggplot2

Look at % milk fat vs. mass of female

  • Plot is dominated by whales!
library(ggplot2)

#set font
theme_set(theme_bw(base_size = 18))

qplot(y = fat.percent,
      x = mass.female,
      data = milk2,
      main = "Regression data: continous vs continuous",
      xlab = "Continous x: mass of female",
      ylab = "Continous y: mass of female") 

























Look at % milk fat vs. log(mass of female)

  • Logging helps
  • NOTE: statiticians typically use natural log = ln()
  • Biologists, especially when allometry is invovled, typically use base 10 = log10()
  • Original authors used log10, but I just realized this, so I’ll stick with ln()
  • I believe (?) give similar answers, but log10 I think (?) makes biological interpretation easier (???)

























Compare log10

  • Looks very similar

























Add “smoother”

  • “geom = c(”point“,”smooth“)”
  • Multiple types of smoothers
    • loess
    • spline
    • GAM

qplot(y = fat.percent,
      x = log(mass.female),
      geom = c("point","smooth"),
      data = milk2,
            main = "Regression data: continous vs log(continuous)",
      xlab = "Continous x: log(mass of female)",
      ylab = "Continous y: mass of female")

























Look at % milk fat by a categorical variable (diet)





















Look at % milk fat by diet AND biome

  • “facets” are a powerful tool in ggplot
  • Turns out ALL of our aquatic spp are carnivores (sorry, no manatees)
  • This shows that “aquatic” and “diet” are “Confounded”
library(ggplot2)

qplot(y = fat.percent,
      x = diet,
      data = milk2,
      facets = ~ biome)





















Look at % milk fat by diet AND biome, colored by body mass

  • The color scale is DOMINATED by a whale





















…Look at % milk fat by diet AND biome, colored by log(body mass)

  • logging should equalize things





















Change to boxplot using “geoms”





















Overlay raw data on boxplots























Focal data subset: Primates & Relatives


Subset data

This is a fairly fancy subsetting * Note use of with() * which() * %in%

#Select subset
i.use <- with(milk2, which(order %in% c("Rodentia",
                                        "Primates",
                                        "Lagomorpha")))
#make subset using row indexing
milk3 <- milk2[i.use,]

write.csv(milk3, file = "Skibiel_clean_milk_focal_genera.csv")

Compare original data and our working subset

  • Original = 130 speices
  • Working data = 42
# Original data
dim(milk2)
## [1] 130  13

#out sowkring subset
dim(milk3)
## [1] 42 13

























Plot working data:

milk fat.percent ~ female size





















The big animal is a Gorrilla!

























Plot logged data

milk fat.percent ~ log(female size)





















Add smoother

% Fat generally declines as body mass increases





















Add regression line

  • % Fat generally declines as body mass increases
  • Grey areas is “confidence band”
  • ggplot makes these really easily
  • harder using plot()





















Alternate predictor: Duration of gestation

fat.percent ~ log(gest.month.NUM)

  • % Fat generally declines as gestation duration increases
  • (the longer animal is pregnant, the less fatty the milk is)
  • longer pregnancy, lower milk fat





















Duration of lactation

fat.percent ~ log(lacat.mo.NUM)

  • % Fat generally declines as lactation duration increases
  • longer time kids are dependent, lower milk fat





















Mass of litter

fat.percent ~ log(mass.litter)

  • % Fat generally declines as litter size increases
  • more or bigger babies, less milk fat