I used the following code to list all data sets built into R.
#data(package = .packages(all.available = TRUE))
I chose ‘msleep’, which describes the sleep patterns of mammals along with several other important descriptors. I used this dataset for Discussion 11 when the assignment was to check the assumptions of the linear model. For this discussion, I’ll be adapting the columns into a multivariate model to predict the total sleep in hours.
print(head(msleep))
## # A tibble: 6 x 11
## name genus vore order conservation sleep_total sleep_rem sleep_cycle awake
## <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Chee~ Acin~ carni Carn~ lc 12.1 NA NA 11.9
## 2 Owl ~ Aotus omni Prim~ <NA> 17 1.8 NA 7
## 3 Moun~ Aplo~ herbi Rode~ nt 14.4 2.4 NA 9.6
## 4 Grea~ Blar~ omni Sori~ lc 14.9 2.3 0.133 9.1
## 5 Cow Bos herbi Arti~ domesticated 4 0.7 0.667 20
## 6 Thre~ Brad~ herbi Pilo~ <NA> 14.4 2.2 0.767 9.6
## # ... with 2 more variables: brainwt <dbl>, bodywt <dbl>
colnames(msleep)
## [1] "name" "genus" "vore" "order" "conservation"
## [6] "sleep_total" "sleep_rem" "sleep_cycle" "awake" "brainwt"
## [11] "bodywt"
print(nrow(msleep))
## [1] 83
In my previous discusison, I divided brain weight by body weight to find the proportion of brain weight relative to body weight. I’ll do this again for the multiple regression. I also did a little research into the colleciton of this data, which is based on regressions in the following paper by Savage et al., https://www.pnas.org/content/104/3/1051. For this reason, I’m also going to include the root of body weight as an accepted measure of metabolism. I also expressed rem sleep as a proportion of total sleep.
msleep_df <- msleep
msleep_df$brainprop <- 100*(round(msleep$brainwt/msleep$bodywt,3))
msleep_df$remprop <- 100*(round(msleep$sleep_rem/msleep$sleep_total,3))
glimpse(msleep_df)
## Rows: 83
## Columns: 13
## $ name <chr> "Cheetah", "Owl monkey", "Mountain beaver", "Greater s...
## $ genus <chr> "Acinonyx", "Aotus", "Aplodontia", "Blarina", "Bos", "...
## $ vore <chr> "carni", "omni", "herbi", "omni", "herbi", "herbi", "c...
## $ order <chr> "Carnivora", "Primates", "Rodentia", "Soricomorpha", "...
## $ conservation <chr> "lc", NA, "nt", "lc", "domesticated", NA, "vu", NA, "d...
## $ sleep_total <dbl> 12.1, 17.0, 14.4, 14.9, 4.0, 14.4, 8.7, 7.0, 10.1, 3.0...
## $ sleep_rem <dbl> NA, 1.8, 2.4, 2.3, 0.7, 2.2, 1.4, NA, 2.9, NA, 0.6, 0....
## $ sleep_cycle <dbl> NA, NA, NA, 0.1333333, 0.6666667, 0.7666667, 0.3833333...
## $ awake <dbl> 11.9, 7.0, 9.6, 9.1, 20.0, 9.6, 15.3, 17.0, 13.9, 21.0...
## $ brainwt <dbl> NA, 0.01550, NA, 0.00029, 0.42300, NA, NA, NA, 0.07000...
## $ bodywt <dbl> 50.000, 0.480, 1.350, 0.019, 600.000, 3.850, 20.490, 0...
## $ brainprop <dbl> NA, 3.2, NA, 1.5, 0.1, NA, NA, NA, 0.5, 0.7, 0.3, 0.8,...
## $ remprop <dbl> NA, 10.6, 16.7, 15.4, 17.5, 15.3, 16.1, NA, 28.7, NA, ...
100*round(sum(is.na(msleep_df$brainprop))/nrow(msleep_df),3)
## [1] 32.5
The most informative variables to explore are all except name and genus. I believe both of these are too specific to be a meaningful input. This means I’ll be considering an initial 9 variables to predict sleep total. Also of note are the three categorical variables that still have class of character. I will convert them to factors to detect their missing values.
msleep_df$vore <- factor(msleep_df$vore)
msleep_df$conservation <- factor(msleep_df$conservation)
msleep_df$order <- factor(msleep_df$order)
sleep_multi_lm <- lm(data = msleep_df, formula = sleep_total ~ order + conservation + vore + sleep_rem + sleep_cycle + awake + brainwt + bodywt + brainprop + sqrt(bodywt) + remprop)
summary(sleep_multi_lm)
##
## Call:
## lm(formula = sleep_total ~ order + conservation + vore + sleep_rem +
## sleep_cycle + awake + brainwt + bodywt + brainprop + sqrt(bodywt) +
## remprop, data = msleep_df)
##
## Residuals:
## ALL 20 residuals are 0: no residual degrees of freedom!
##
## Coefficients: (5 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.400e+01 NA NA NA
## orderCarnivora -1.038e-16 NA NA NA
## orderChiroptera 4.598e-15 NA NA NA
## orderCingulata -9.134e-16 NA NA NA
## orderDidelphimorphia 2.094e-17 NA NA NA
## orderErinaceomorpha 3.508e-15 NA NA NA
## orderLagomorpha 2.365e-15 NA NA NA
## orderPerissodactyla -2.122e-15 NA NA NA
## orderRodentia 4.849e-15 NA NA NA
## orderSoricomorpha 7.170e-15 NA NA NA
## conservationen -1.563e-15 NA NA NA
## conservationlc -1.367e-15 NA NA NA
## conservationnt 8.609e-16 NA NA NA
## conservationvu 8.130e-16 NA NA NA
## voreherbi 3.925e-16 NA NA NA
## voreinsecti -6.255e-16 NA NA NA
## voreomni NA NA NA NA
## sleep_rem 9.821e-16 NA NA NA
## sleep_cycle 1.520e-14 NA NA NA
## awake -1.000e+00 NA NA NA
## brainwt 0.000e+00 NA NA NA
## bodywt NA NA NA NA
## brainprop NA NA NA NA
## sqrt(bodywt) NA NA NA NA
## remprop NA NA NA NA
##
## Residual standard error: NaN on 0 degrees of freedom
## (63 observations deleted due to missingness)
## Multiple R-squared: 1, Adjusted R-squared: NaN
## F-statistic: NaN on 19 and 0 DF, p-value: NA
This initial summary has too few degrees of freedom to provide meaningful analysis. Sixty-three observations were excluded because of becaues of missing values. I will attempt to eliminate a variable if it is contributing too few values, and if that’s not sufficient I will provide an inferred values based on comparable species.
summary(msleep_df)
## name genus vore order
## Length:83 Length:83 carni :19 Rodentia :22
## Class :character Class :character herbi :32 Carnivora :12
## Mode :character Mode :character insecti: 5 Primates :12
## omni :20 Artiodactyla: 6
## NA's : 7 Soricomorpha: 5
## Cetacea : 3
## (Other) :23
## conservation sleep_total sleep_rem sleep_cycle
## cd : 2 Min. : 1.90 Min. :0.100 Min. :0.1167
## domesticated:10 1st Qu.: 7.85 1st Qu.:0.900 1st Qu.:0.1833
## en : 4 Median :10.10 Median :1.500 Median :0.3333
## lc :27 Mean :10.43 Mean :1.875 Mean :0.4396
## nt : 4 3rd Qu.:13.75 3rd Qu.:2.400 3rd Qu.:0.5792
## vu : 7 Max. :19.90 Max. :6.600 Max. :1.5000
## NA's :29 NA's :22 NA's :51
## awake brainwt bodywt brainprop
## Min. : 4.10 Min. :0.00014 Min. : 0.005 Min. :0.100
## 1st Qu.:10.25 1st Qu.:0.00290 1st Qu.: 0.174 1st Qu.:0.400
## Median :13.90 Median :0.01240 Median : 1.670 Median :0.700
## Mean :13.57 Mean :0.28158 Mean : 166.136 Mean :1.039
## 3rd Qu.:16.15 3rd Qu.:0.12550 3rd Qu.: 41.750 3rd Qu.:1.500
## Max. :22.10 Max. :5.71200 Max. :6654.000 Max. :4.000
## NA's :27 NA's :27
## remprop
## Min. : 3.7
## 1st Qu.:11.3
## Median :15.6
## Mean :17.4
## 3rd Qu.:22.7
## Max. :34.7
## NA's :22
According to the dataframe summary, there are 51 missing entries for sleep cycle. I’ll omit this column and see what the regression looks like.
sleep_multi_lm <- lm(data = msleep_df, formula = sleep_total ~ order + vore + conservation + awake + brainwt + bodywt + brainprop + sqrt(bodywt) + remprop)
summary(sleep_multi_lm)
## Warning in summary.lm(sleep_multi_lm): essentially perfect fit: summary may be
## unreliable
##
## Call:
## lm(formula = sleep_total ~ order + vore + conservation + awake +
## brainwt + bodywt + brainprop + sqrt(bodywt) + remprop, data = msleep_df)
##
## Residuals:
## 4 5 9 11 12 14 15
## 1.161e-15 -6.826e-17 -1.500e-16 1.752e-16 2.386e-15 -2.198e-15 5.738e-16
## 17 18 19 20 22 23 24
## -1.735e-15 -2.712e-31 1.319e-15 -6.409e-31 -7.396e-32 1.217e-16 -1.217e-16
## 25 26 28 32 33 40 42
## 1.726e-31 -1.726e-31 1.377e-16 1.230e-17 -1.319e-15 -7.642e-31 2.219e-31
## 48 49 62 64 67 71 74
## 2.219e-31 -1.070e-16 2.712e-31 -1.285e-15 -3.205e-31 1.098e-15 7.396e-32
## 77
## 2.465e-32
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.400e+01 1.105e-14 2.172e+15 <2e-16 ***
## orderCarnivora 2.702e-15 6.323e-15 4.270e-01 0.6911
## orderChiroptera -1.831e-14 9.818e-15 -1.865e+00 0.1356
## orderCingulata -3.504e-15 7.071e-15 -4.960e-01 0.6462
## orderDidelphimorphia -1.490e-14 6.179e-15 -2.412e+00 0.0734 .
## orderErinaceomorpha -1.617e-15 6.971e-15 -2.320e-01 0.8280
## orderHyracoidea -4.859e-15 4.688e-15 -1.037e+00 0.3584
## orderLagomorpha -6.016e-15 4.376e-15 -1.375e+00 0.2412
## orderPerissodactyla -1.274e-14 5.090e-15 -2.503e+00 0.0665 .
## orderPrimates -1.028e-14 7.548e-15 -1.362e+00 0.2450
## orderRodentia -7.223e-15 4.778e-15 -1.512e+00 0.2051
## orderSoricomorpha -2.904e-15 8.137e-15 -3.570e-01 0.7392
## voreherbi 8.126e-15 7.171e-15 1.133e+00 0.3204
## voreinsecti 1.144e-14 7.200e-15 1.589e+00 0.1873
## voreomni 8.578e-15 7.612e-15 1.127e+00 0.3228
## conservationen -5.688e-16 3.503e-15 -1.620e-01 0.8789
## conservationlc -1.720e-15 2.052e-15 -8.380e-01 0.4491
## conservationnt -4.268e-15 2.967e-15 -1.439e+00 0.2236
## conservationvu 2.257e-14 8.744e-15 2.581e+00 0.0612 .
## awake -1.000e+00 4.679e-16 -2.137e+15 <2e-16 ***
## brainwt 6.408e-14 2.578e-14 2.486e+00 0.0678 .
## bodywt 2.908e-17 3.431e-17 8.480e-01 0.4444
## brainprop -8.122e-16 9.670e-16 -8.400e-01 0.4482
## sqrt(bodywt) -2.012e-15 1.341e-15 -1.500e+00 0.2080
## remprop -2.641e-16 2.120e-16 -1.246e+00 0.2809
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.328e-15 on 4 degrees of freedom
## (54 observations deleted due to missingness)
## Multiple R-squared: 1, Adjusted R-squared: 1
## F-statistic: 4.975e+30 on 24 and 4 DF, p-value: < 2.2e-16
While missing values have improved, the summary is giving me an additional error message that one variable, ‘awake’, may be too highly correlated to total sleep. Below is a simple linear regression of these two variables.
sleep_vs_awake <- lm(data = msleep_df, formula = sleep_total ~ awake)
summary(sleep_vs_awake)
##
## Call:
## lm(formula = sleep_total ~ awake, data = msleep_df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.004529 -0.002016 -0.001256 0.000283 0.046893
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 23.9959187 0.0026771 8963 <2e-16 ***
## awake -0.9996104 0.0001876 -5329 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.007563 on 81 degrees of freedom
## Multiple R-squared: 1, Adjusted R-squared: 1
## F-statistic: 2.84e+07 on 1 and 81 DF, p-value: < 2.2e-16
Of note is the intercept equal to almost 24 hours. I likely think this in an artifact that total sleep and awake make up two parts of a 24-hour period. I will omit ‘awake’ as a variable for this reason.
sleep_multi_lm <- lm(data = msleep_df, formula = sleep_total ~ order + vore + conservation + brainwt + bodywt + brainprop + sqrt(bodywt) + remprop)
summary(sleep_multi_lm)
##
## Call:
## lm(formula = sleep_total ~ order + vore + conservation + brainwt +
## bodywt + brainprop + sqrt(bodywt) + remprop, data = msleep_df)
##
## Residuals:
## 4 5 9 11 12 14 15
## 3.519e+00 1.969e-01 -5.733e-01 -4.338e-01 -1.571e+00 1.252e+00 -1.459e+00
## 17 18 19 20 22 23 24
## -2.060e+00 1.887e-15 -4.754e-01 -1.110e-16 -2.220e-16 -3.568e-01 3.568e-01
## 25 26 28 32 33 40 42
## -3.331e-16 4.441e-16 4.586e-01 1.147e-01 4.754e-01 1.776e-15 3.331e-16
## 48 49 62 64 67 71 74
## 6.661e-16 2.368e-01 -1.887e-15 -2.655e-01 1.332e-15 5.845e-01 2.220e-16
## 77
## -1.110e-16
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.69000 7.64083 1.006 0.360
## orderCarnivora 3.93208 5.78213 0.680 0.527
## orderChiroptera 7.30460 8.79715 0.830 0.444
## orderCingulata 8.51145 5.58449 1.524 0.188
## orderDidelphimorphia 2.00259 5.83751 0.343 0.746
## orderErinaceomorpha -6.82913 5.92202 -1.153 0.301
## orderHyracoidea -3.24507 4.23889 -0.766 0.479
## orderLagomorpha 0.36607 4.17960 0.088 0.934
## orderPerissodactyla 3.43276 4.61658 0.744 0.491
## orderPrimates -0.89859 7.20309 -0.125 0.906
## orderRodentia 2.61859 4.41373 0.593 0.579
## orderSoricomorpha -4.27167 7.53924 -0.567 0.595
## voreherbi 0.78596 6.84473 0.115 0.913
## voreinsecti 2.25732 6.80705 0.332 0.754
## voreomni 5.91705 6.77759 0.873 0.423
## conservationen 1.80489 3.24975 0.555 0.603
## conservationlc 1.16388 1.89139 0.615 0.565
## conservationnt 0.94926 2.80365 0.339 0.749
## conservationvu -2.66780 8.27202 -0.323 0.760
## brainwt -13.64962 23.87114 -0.572 0.592
## bodywt 0.02903 0.03012 0.964 0.379
## brainprop -0.20850 0.91953 -0.227 0.830
## sqrt(bodywt) -0.72539 1.24033 -0.585 0.584
## remprop 0.08428 0.19910 0.423 0.690
##
## Residual standard error: 2.225 on 5 degrees of freedom
## (54 observations deleted due to missingness)
## Multiple R-squared: 0.9617, Adjusted R-squared: 0.7857
## F-statistic: 5.465 on 23 and 5 DF, p-value: 0.03382
Next, I noticed that several of the categorical values may benefit from treating them aas ordinal. In particular, conservation status varies based on the size and vulnerability of the species. Additionally, I’ll look closer at any ranking that could be included in the taxonomic order. I’ll also continue to work on the large amount of missing values.
levels(msleep_df$conservation)
## [1] "cd" "domesticated" "en" "lc" "nt"
## [6] "vu"
msleep_df$conservation <- replace_na(msleep_df$conservation, 'lc') #replace na values with 'least concern', as the original data source na meant that the species have not conservation status
msleep_df$conservation <- msleep_df$conservation %>% recode_factor('cd' = 'nt') #the two 'cd' values are giraffe and pilot whale, which are designated as nearly threatened
msleep_df$conservation <- factor(msleep_df$conservation, order = TRUE, levels = c("domesticated", 'lc', 'nt', 'vu', 'en')) #reordered conservation status by severity
summary(msleep_df$conservation)
## domesticated lc nt vu en
## 10 56 6 7 4
Next, I’m going to recode diet as expressed in the ‘vore’ column. I will order this to offer a general spectrum between herbivore and carnivore. I will recode insectivores to be considered carnivores in this instance. This assumption also explores the idea that nutrient density also says something about an organism’s metabolism and sleep needs.
levels(msleep_df$vore)
## [1] "carni" "herbi" "insecti" "omni"
msleep_df$vore <- msleep_df$vore %>% recode_factor('insecti' = 'carni') #recode insectivores as carnivores
msleep_df$vore <- factor(msleep_df$vore, order = TRUE, levels = c("herbi", 'omni', 'carni')) #reordered conservation status by severity
summary(msleep_df$vore)
## herbi omni carni NA's
## 32 20 24 7
Last, I’m going to explore the idea of making the ‘order’ column ordinal by making an assumption about each order containing morphologically similar organisms. To test this, I’m going to group by order, find the body weight, and generate a dot plot to see if these orders occupy distinct intervals of body weight.
levels(msleep_df$order)
## [1] "Afrosoricida" "Artiodactyla" "Carnivora" "Cetacea"
## [5] "Chiroptera" "Cingulata" "Didelphimorphia" "Diprotodontia"
## [9] "Erinaceomorpha" "Hyracoidea" "Lagomorpha" "Monotremata"
## [13] "Perissodactyla" "Pilosa" "Primates" "Proboscidea"
## [17] "Rodentia" "Scandentia" "Soricomorpha"
msleep_order_stats <- msleep_df %>%
select(order, bodywt) %>%
group_by(order) %>% #group by taxonomic designation
mutate(mean_wt = mean(bodywt)) %>% #find average weight by order
arrange(mean_wt) %>% #arrange by average weight
mutate(min_wt = min(bodywt)) %>% #find minimum weight by order
mutate(max_wt = max(bodywt)) %>% #find max weight by order
distinct(order, mean_wt, min_wt, max_wt) %>% #remove duplicate entries
mutate(order = factor(order, levels = .$order))
msleep_order_stats
## # A tibble: 19 x 4
## # Groups: order [19]
## order mean_wt min_wt max_wt
## <fct> <dbl> <dbl> <dbl>
## 1 Chiroptera 0.0165 0.01 0.023
## 2 Soricomorpha 0.0414 0.005 0.075
## 3 Scandentia 0.104 0.104 0.104
## 4 Rodentia 0.288 0.021 1.35
## 5 Erinaceomorpha 0.66 0.55 0.77
## 6 Afrosoricida 0.9 0.9 0.9
## 7 Didelphimorphia 1.03 0.37 1.7
## 8 Diprotodontia 1.36 1.1 1.62
## 9 Lagomorpha 2.5 2.5 2.5
## 10 Hyracoidea 3.06 2.62 3.6
## 11 Pilosa 3.85 3.85 3.85
## 12 Monotremata 4.5 4.5 4.5
## 13 Primates 13.9 0.2 62
## 14 Cingulata 31.8 3.5 60
## 15 Carnivora 57.7 2 163.
## 16 Artiodactyla 282. 14.8 900.
## 17 Perissodactyla 305. 187 521
## 18 Cetacea 342. 53.2 800
## 19 Proboscidea 4600. 2547 6654
msleep_order_vector <- as.character(msleep_order_stats$order) #save orders to vector for recoding purposes
ggplot(msleep_order_stats, aes(log(mean_wt), order)) +
geom_point() +
geom_point(aes(log(max_wt)), color = 'red') +
geom_point(aes(log(min_wt)), color = 'blue')
After all that, it appears that there’s a decent amount of overlap between the different orders. There are some outliers that appear distinct, but the largest orders, like rodentia, carnivora and primates, have overlap with each other. Ultimately, I don’t think that the order variable is a good one to model as I don’t have enough values to assume underlying normality. Using order as a variable didn’t pan out, so instead I’ll use order to interpolate some missing values for body weight and move onto backward elimination finally.
library(imputeTS) #interpolate
## Warning: package 'imputeTS' was built under R version 3.6.3
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
msleep_df$estimated_bodywt <- na.interpolation(msleep_df$bodywt, option = "linear")
## Warning: na.interpolation will be replaced by na_interpolation.
## Functionality stays the same.
## The new function name better fits modern R code style guidelines.
## Please adjust your code accordingly.
summary(msleep_df)
## name genus vore order
## Length:83 Length:83 herbi:32 Rodentia :22
## Class :character Class :character omni :20 Carnivora :12
## Mode :character Mode :character carni:24 Primates :12
## NA's : 7 Artiodactyla: 6
## Soricomorpha: 5
## Cetacea : 3
## (Other) :23
## conservation sleep_total sleep_rem sleep_cycle
## domesticated:10 Min. : 1.90 Min. :0.100 Min. :0.1167
## lc :56 1st Qu.: 7.85 1st Qu.:0.900 1st Qu.:0.1833
## nt : 6 Median :10.10 Median :1.500 Median :0.3333
## vu : 7 Mean :10.43 Mean :1.875 Mean :0.4396
## en : 4 3rd Qu.:13.75 3rd Qu.:2.400 3rd Qu.:0.5792
## Max. :19.90 Max. :6.600 Max. :1.5000
## NA's :22 NA's :51
## awake brainwt bodywt brainprop
## Min. : 4.10 Min. :0.00014 Min. : 0.005 Min. :0.100
## 1st Qu.:10.25 1st Qu.:0.00290 1st Qu.: 0.174 1st Qu.:0.400
## Median :13.90 Median :0.01240 Median : 1.670 Median :0.700
## Mean :13.57 Mean :0.28158 Mean : 166.136 Mean :1.039
## 3rd Qu.:16.15 3rd Qu.:0.12550 3rd Qu.: 41.750 3rd Qu.:1.500
## Max. :22.10 Max. :5.71200 Max. :6654.000 Max. :4.000
## NA's :27 NA's :27
## remprop estimated_bodywt
## Min. : 3.7 Min. : 0.005
## 1st Qu.:11.3 1st Qu.: 0.174
## Median :15.6 Median : 1.670
## Mean :17.4 Mean : 166.136
## 3rd Qu.:22.7 3rd Qu.: 41.750
## Max. :34.7 Max. :6654.000
## NA's :22
sleep_multi_lm <- lm(data = msleep_df, formula = sleep_total ~ vore + conservation + estimated_bodywt + sqrt(estimated_bodywt) + remprop)
summary(sleep_multi_lm)
##
## Call:
## lm(formula = sleep_total ~ vore + conservation + estimated_bodywt +
## sqrt(estimated_bodywt) + remprop, data = msleep_df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.899 -2.466 -0.140 2.165 6.434
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 12.49366 1.70200 7.341 2.82e-09 ***
## vore.L 1.43765 0.89386 1.608 0.11460
## vore.Q 0.25338 0.90500 0.280 0.78075
## conservation.L 2.96008 1.85081 1.599 0.11659
## conservation.Q 2.51115 2.00372 1.253 0.21645
## conservation.C 2.93365 1.60895 1.823 0.07476 .
## conservation^4 2.08926 1.90782 1.095 0.27917
## estimated_bodywt 0.01432 0.01002 1.429 0.15986
## sqrt(estimated_bodywt) -0.76159 0.25874 -2.943 0.00507 **
## remprop 0.06723 0.07150 0.940 0.35204
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.431 on 46 degrees of freedom
## (27 observations deleted due to missingness)
## Multiple R-squared: 0.5455, Adjusted R-squared: 0.4565
## F-statistic: 6.134 on 9 and 46 DF, p-value: 1.252e-05
plot(sleep_multi_lm)
### Conclusions
The above model uses four variables, one of which is transformed, to predict total sleep duration. These variables explain about 45% of the variation of sleep duration. The largest outliers differ from the trendline by over 5 hours, or about 20% of the day. This is a lot of variation and suggests that sleep time isn’t purely related to indicators of metabolism.
The most significant contributor is the root of total body weight, which according to the sign of the coefficient has a negative correlation with sleep time. This may be explained that sleep is primarily a neurological function and increased body mass does not increase the demand for regeneration. Additionally, Savage et al mention that some of the largest mammals, marine animals like dolphins and whales, have very different sleep schedules compared to terrestrial counterparts. Instead of their entire brain sleeping at one time, one hemisphere will rest completely while the other remains altert. This phenomenon is called unihemispheric slow-wave sleep (USWS), and was compensated for in the original publication by dividing the sleep times by half. I think this behavior is unique and more difficult to model than by halving sleep time.
If this model were to be developed further, a dataset with more species and fewer missing values would support a stronger relationship. While there was overlap in size between orders like carnivora and primates, species of megafauna versus smaller mammals like bats and rodents are very distinct and could be treated as their own categories.