In this regression tutorial we’ll
Skibiel et al 2013. Journal of Animal Ecology. The evolution of the nutrient composition of mammalian milks. 82: 1254-1264.
For my computer, this is the full “path” to my current R working directory.
setwd("C:/Users/lisanjie2/Desktop/TEACHING/1_STATS_CalU/1_STAT_CalU_2016_by_NLB/1_STAT_Duq_2017/HOMEWORK/Duq_Biostats_Homework_4_Milk_regression/homework4_RMD_vs2")
I have cleaned the data and set it up in a csv file saved into the working directory.
dat <- read.csv(file = "Skibiel_clean_milk.csv")
## order family spp mass.female
## 1 Artiodactyla Bovidae Bos frontalis 800000
## 2 Artiodactyla Bovidae Capra ibex 53000
## 3 Artiodactyla Bovidae Connocheates taurinus taurinus 170500
## 4 Artiodactyla Bovidae Connocheates gnou 200000
## 5 Artiodactyla Bovidae Damaliscus pygargus phillipsi 61000
## 6 Artiodactyla Bovidae Gazella dorcas 20600
## gestation.month lacatation.months mass.litter repro.output
## 1 9.02 4.5 26949 0.03
## 2 5.60 7.5 3489 0.07
## 3 8.32 8.0 17717 0.10
## 4 8.50 7.5 11110 0.06
## 5 8.00 4.0 6500 0.11
## 6 4.74 2.8 1771 0.09
## dev.stage.at.birth diet arid biome N lactation.stage.orig
## 1 3 herbivore no terrestrial 4+ <NA>
## 2 3 herbivore no terrestrial 24 30-60
## 3 3 herbivore yes terrestrial 5 150
## 4 3 herbivore yes terrestrial 3 150
## 5 3 herbivore yes terrestrial 4 150
## 6 3 herbivore yes terrestrial 16 30-60
## dry.matter fat protein sugar energy ref
## 1 20.0 7.0 6.3 5.2 1.21 Oftedal & Iverson (1995)
## 2 23.3 12.4 5.7 <NA> <NA> Oftedal & Iverson (1995)
## 3 13.4 7.5 4.1 5.3 1.13 Osthoff, Hugo & de Wit (2009a)
## 4 12.0 5.5 4.3 4.1 0.91 Osthoff, Hugo & de Wit (2009a)
## 5 16.0 8.6 5.6 4.9 1.31 Osthoff, Hugo & de Wit (2009a)
## 6 24.1 8.8 8.8 <NA> <NA> Oftedal & Iverson (1995)
## gest.month.NUM lacat.mo.NUM
## 1 9.02 4.5
## 2 5.60 7.5
## 3 8.32 8.0
## 4 8.50 7.5
## 5 8.00 4.0
## 6 4.74 2.8
## order family spp mass.female
## 125 Rodentia Muridae Pseudomys australis 65
## 126 Rodentia Muridae Rattus norvegicus 253
## 127 Rodentia Octodontidae Octodon degus 235
## 128 Rodentia Scuiridae Tamias amoenus 53
## 129 Rodentia Scuiridae Urocitellus columbianus 406
## 130 Soricomorpha11 Soricidae Crocidura russula 14
## gestation.month lacatation.months mass.litter repro.output
## 125 1.02 0.9 13 0.20
## 126 0.71 0.8 51 0.20
## 127 2.96 1.2 74 0.31
## 128 0.98 1.5 14 0.26
## 129 0.84 1.0 32 0.08
## 130 0.97 0.8 4 0.29
## dev.stage.at.birth diet arid biome N
## 125 0 herbivore yes terrestrial 7-Jun
## 126 0 omnivore no terrestrial 18-Mar
## 127 3 herbivore yes terrestrial 7
## 128 0 omnivore no terrestrial 11
## 129 0 herbivore no terrestrial 26
## 130 0 carnivore no terrestrial 3
## lactation.stage.orig dry.matter fat protein sugar energy
## 125 12-Jul 25.4 12.1 6.4 3.6 1.62
## 126 17-Aug 22.1 8.8 8.1* 3.8 1.43
## 127 15-21 30.5 20.1 4.4 2.7 2.2
## 128 15-20 36.7 21.7 8.1 4.3 2.62
## 129 19 29.9 9.2 10.7 3.4 1.6
## 130 12-Aug 51.0 30.0 9.4 3 3.4
## ref gest.month.NUM lacat.mo.NUM
## 125 Oftedal & Iverson (1995) 1.02 0.9
## 126 Oftedal & Iverson (1995) 0.71 0.8
## 127 Veloso & Kenagy (2005) 2.96 1.2
## 128 Veloso, Place & Kenagy (2003) 0.98 1.5
## 129 Skibiel & Hood (in press) 0.84 1.0
## 130 Oftedal & Iverson (1995) 0.97 0.8
## [1] 130 22
## order family spp
## Artiodactyla :23 Bovidae :13 Acomys cahirinus : 1
## Carnivora :23 Cercopithecidae: 8 Alces alces : 1
## Primates :22 Cervidae : 7 Aloutta palliata : 1
## Rodentia :17 Muridae : 7 Aloutta seniculus : 1
## Chiroptera :10 Otariidae : 7 Arctocephalus australis: 1
## Diprotodontia:10 Phocidae : 7 Arctocephalus gazella : 1
## (Other) :25 (Other) :81 (Other) :124
## mass.female gestation.month lacatation.months
## Min. : 8 Min. : 0.400 Min. : 0.300
## 1st Qu.: 857 1st Qu.: 1.405 1st Qu.: 1.625
## Median : 5716 Median : 5.000 Median : 4.500
## Mean : 2229475 Mean : 5.624 Mean : 6.092
## 3rd Qu.: 107500 3rd Qu.: 8.365 3rd Qu.: 8.225
## Max. :170000000 Max. :21.460 Max. :42.000
##
## mass.litter repro.output dev.stage.at.birth diet
## Min. : 0.3 Min. :0.00003 Min. :0.000 carnivore:32
## 1st Qu.: 42.0 1st Qu.:0.04000 1st Qu.:1.000 herbivore:61
## Median : 423.5 Median :0.08000 Median :2.000 omnivore :37
## Mean : 52563.8 Mean :0.10374 Mean :1.831
## 3rd Qu.: 7038.2 3rd Qu.:0.13750 3rd Qu.:3.000
## Max. :2272500.0 Max. :0.50000 Max. :4.000
##
## arid biome N lactation.stage.orig
## no :91 aquatic : 22 4 :13 14-Aug : 3
## yes:39 terrestrial:108 3 :11 150 : 3
## 6 :10 21-63 : 3
## 7 : 9 30-60 : 3
## 5 : 8 i : 3
## 24 : 5 (Other):113
## (Other):74 NA's : 2
## dry.matter fat protein sugar energy
## Min. : 8.80 Min. : 0.200 1.6* : 4 3 : 5 1.44 : 4
## 1st Qu.:16.27 1st Qu.: 4.575 1.5 : 3 4.5 : 5 0.81 : 3
## Median :22.75 Median : 8.550 10.3 : 3 5.3 : 5 1.13 : 3
## Mean :27.06 Mean :14.068 10.7 : 3 5.2 : 4 0.49 : 2
## 3rd Qu.:32.05 3rd Qu.:17.575 6.3 : 3 6.6 : 4 0.5 : 2
## Max. :71.10 Max. :61.100 7 : 3 (Other):92 (Other):101
## NA's :6 NA's :2 (Other):111 NA's :15 NA's : 15
## ref gest.month.NUM
## Oftedal & Iverson (1995) :99 Min. : 0.400
## Hood et al. (2001) : 4 1st Qu.: 1.405
## Osthoff, Hugo & de Wit (2009a) : 3 Median : 5.000
## Derrickson, Jerrard & Oftedal (1996): 2 Mean : 5.624
## Arnould & Boyd (1995) : 1 3rd Qu.: 8.365
## Arnould & Hindell (1999) : 1 Max. :21.460
## (Other) :20
## lacat.mo.NUM
## Min. : 0.300
## 1st Qu.: 1.625
## Median : 4.500
## Mean : 6.092
## 3rd Qu.: 8.225
## Max. :42.000
##
After working with the data a bit I realized I had 2 NA values in my “fat” column that should’t be there. I use the following code to
# Look at fat column
summary(dat$fat)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.200 4.575 8.550 14.070 17.580 61.100 2
# Find NAS
i.NA <- is.na(dat$fat)
#Identify the species w/ NAs
dat$spp[i.NA]
## [1] Hippotragus niger Cervus elaphus hispanicus
## 130 Levels: Acomys cahirinus Alces alces ... Zalophus californianus
# Extract the indexes of the appropriate species
i.H.niger <- which(dat$spp == "Hippotragus niger")
i.C.elaphus <- which(dat$spp == "Cervus elaphus hispanicus")
#Overwrite the bad values
dat[i.H.niger,"fat"] <- 5.0
dat[i.C.elaphus,"fat"] <- 12.6
There are letters, commas, and asterisks that were annotations in the original datafile. Easy to remove with find-replace, but also very easy with gsub(), with not chance of changing anything else in your spreadsheet you dont want
Look at all of the protein column
summary(dat$protein)
## 1.1* 1.3* 1.4 1.4* 1.5 1.5* 1.6 1.6* 1.7* 1.9 10 10.2,
## 1 2 1 1 3 1 2 4 1 1 2 1
## 10.3 10.5 10.7 11 11.1 11.2* 11.8 11.9 12.1 12.2 12.3 12.4
## 3 2 3 2 1 1 1 1 1 1 1 1
## 12.5 14.6 15.8 2.1 2.2 2.5 2.6 2.7 3 3.1 3.4 3.6
## 2 1 1 2 1 1 1 2 1 1 1 2
## 3.9* 4 4.1 4.2* 4.3 4.3* 4.4 4.5 4.6 4.8 4.8* 4.9
## 1 2 1 1 1 1 2 2 1 1 1 1
## 5.1 5.2* 5.4 5.5 5.6 5.6* 5.7 5.9 6 6.2 6.3 6.3*
## 1 1 1 1 2 1 2 1 1 1 3 1
## 6.4 6.5 6.6 6.7 6.8 6.9 7 7.1* 7.2 7.3 7.4 7.7
## 1 2 1 1 1 1 3 1 2 2 1 2
## 7.8 7.8S 8 8.1 8.1* 8.2 8.4 8.4* 8.5 8.6* 8.7* 8.8
## 1 1 2 1 1 2 1 1 1 1 1 1
## 8.9 9 9.1 9.2 9.2* 9.4 9.5 9.7 9.9*
## 1 2 1 1 1 2 3 1 1
i.letters <- grep("[a-zA-Z]",dat$protein)
dat$protein[i.letters]
## [1] 7.8S
## 93 Levels: 1.1* 1.3* 1.4 1.4* 1.5 1.5* 1.6 1.6* 1.7* 1.9 10 10.2, ... 9.9*
There’s jsut an “S”. I coudl find it like this with grep too
#these are equivalent for the simple case of a single letter
i.S <- grep("S",dat$protein) #no brackets
i.S <- grep("[S]",dat$protein)#brackets
dat$protein[i.S]
## [1] 7.8S
## 93 Levels: 1.1* 1.3* 1.4 1.4* 1.5 1.5* 1.6 1.6* 1.7* 1.9 10 10.2, ... 9.9*
dat$protein <- gsub("[a-zA-Z]", #pattern
"", #replace
dat$protein)#data
Check
grep("[a-zA-Z]",dat$protein)
## integer(0)
dat$protein <- gsub("\\*","",dat$protein)
Check
grep("\\*",dat$protein)
## integer(0)
Commas are also species characters
dat$protein <- gsub("\\,","",dat$protein)
Check
grep("\\,",dat$protein)
## integer(0)
These are frequent typos in data
grep(" ", dat$protein)
## integer(0)
None
dat$protein.num <- as.numeric(dat$protein)
Compare old and new columns
head(dat[,c("protein.num","protein")])
## protein.num protein
## 1 6.3 6.3
## 2 5.7 5.7
## 3 4.1 4.1
## 4 4.3 4.3
## 5 5.6 5.6
## 6 8.8 8.8
tail(dat[,c("protein.num","protein")])
## protein.num protein
## 125 6.4 6.4
## 126 8.1 8.1
## 127 4.4 4.4
## 128 8.1 8.1
## 129 10.7 10.7
## 130 9.4 9.4
Check other columns
summary(dat[,c("sugar","energy")])
## sugar energy
## 3 : 5 1.44 : 4
## 4.5 : 5 0.81 : 3
## 5.3 : 5 1.13 : 3
## 5.2 : 4 0.49 : 2
## 6.6 : 4 0.5 : 2
## (Other):92 (Other):101
## NA's :15 NA's : 15
dat$sugar <- gsub("[a-zA-Z<]","",dat$sugar)
Convert and check
dat$sugar.num <- as.numeric(dat$sugar)
## Warning: NAs introduced by coercion
head(dat[,c("sugar.num","sugar")])
## sugar.num sugar
## 1 5.2 5.2
## 2 NA <NA>
## 3 5.3 5.3
## 4 4.1 4.1
## 5 4.9 4.9
## 6 NA <NA>
#The followign are equivalent
summary(dat[,c("energy")])
## 0.36 0.45 0.46 0.47 0.49 0.5 0.52 0.54 0.57 0.65 0.68 0.71
## 1 1 1 1 2 2 2 1 2 1 1 1
## 0.72 0.78 0.81 0.82 0.84 0.9 0.91 0.92 0.93 0.96 0.98 1.03
## 1 1 3 1 1 1 2 1 1 1 1 2
## 1.06 1.08 1.09 1.1 1.11 1.13 1.16 1.17 1.19 1.2 1.21 1.23
## 2 1 1 1 1 3 2 2 1 1 2 1
## 1.25 1.26 1.27 1.28 1.31 1.36 1.37 1.41 1.42 1.43 1.44 1.48
## 1 1 1 1 1 2 2 1 1 1 4 1
## 1.5 1.52 1.6 1.61 1.62 1.64 1.65 1.68 1.71 1.79 1.8 1.83
## 1 2 1 1 1 1 1 1 1 2 1 1
## 1.88 1.95 1.96 2 2.06 2.07 2.1 2.18 2.2 2.34 2.45 2.57
## 1 1 1 1 1 1 2 1 1 1 1 1
## 2.59 2.61 2.62 2.65 2.76 2.82 2.85 2.94 3.29 3.4 3.49 3.64
## 1 1 1 1 1 1 1 1 1 2 1 1
## 3.73 4.47 4.71 5.23 5.36 5.41 5.69* 5.89 NA's
## 1 1 1 1 1 1 1 1 15
summary(dat$energy)
## 0.36 0.45 0.46 0.47 0.49 0.5 0.52 0.54 0.57 0.65 0.68 0.71
## 1 1 1 1 2 2 2 1 2 1 1 1
## 0.72 0.78 0.81 0.82 0.84 0.9 0.91 0.92 0.93 0.96 0.98 1.03
## 1 1 3 1 1 1 2 1 1 1 1 2
## 1.06 1.08 1.09 1.1 1.11 1.13 1.16 1.17 1.19 1.2 1.21 1.23
## 2 1 1 1 1 3 2 2 1 1 2 1
## 1.25 1.26 1.27 1.28 1.31 1.36 1.37 1.41 1.42 1.43 1.44 1.48
## 1 1 1 1 1 2 2 1 1 1 4 1
## 1.5 1.52 1.6 1.61 1.62 1.64 1.65 1.68 1.71 1.79 1.8 1.83
## 1 2 1 1 1 1 1 1 1 2 1 1
## 1.88 1.95 1.96 2 2.06 2.07 2.1 2.18 2.2 2.34 2.45 2.57
## 1 1 1 1 1 1 2 1 1 1 1 1
## 2.59 2.61 2.62 2.65 2.76 2.82 2.85 2.94 3.29 3.4 3.49 3.64
## 1 1 1 1 1 1 1 1 1 2 1 1
## 3.73 4.47 4.71 5.23 5.36 5.41 5.69* 5.89 NA's
## 1 1 1 1 1 1 1 1 15
#clean
dat$energy <- gsub("[a-zA-Z<]","",dat$energy)
#convert and check
dat$energy.num <- as.numeric(dat$energy)
## Warning: NAs introduced by coercion
head(dat[,c("energy.num","energy")])
## energy.num energy
## 1 1.21 1.21
## 2 NA <NA>
## 3 1.13 1.13
## 4 0.91 0.91
## 5 1.31 1.31
## 6 NA <NA>
write.csv(dat, "Skibiel_clean2_milk.csv")