Sleep in Mammals

Author

Charlie Roth

Sleep in Mammals Dataset

The dataset I am exploring analyzes sleep in mammals. This dataset shows 62 different species of mammals and describes each species’ sleep. Many variables are included such as total hours sleeping (total_sleep), number of hours dreaming (dreaming), number of hours not dreaming (non_dreaming), how long each species lives (life_span), gestation of each species in days (gestation), and how much danger each species faces from other animals, not specifically other mammals (danger). I plan to explore the sleep types in species that sleep longer than 10 hours and have a longer life span. My source for this dataset is from StatSci.org:

http://www.statsci.org/data/general/sleep.txt

Load Packages and Data

I loaded the package ‘tidyverse’, then loaded my dataset using read_csv and named it “mammals”. I then used the head() function to show the first 6 observations in my dataset.

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.4.4     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

mammals <- read_csv("mammals.csv")

Rows: 62 Columns: 11
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (1): species
dbl (10): body_wt, brain_wt, non_dreaming, dreaming, total_sleep, life_span,...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

head(mammals)

# A tibble: 6 × 11
  species body_wt brain_wt non_dreaming dreaming total_sleep life_span gestation
  <chr>     <dbl>    <dbl>        <dbl>    <dbl>       <dbl>     <dbl>     <dbl>
1 Africa… 6654      5712           NA       NA           3.3      38.6       645
2 Africa…    1         6.6          6.3      2           8.3       4.5        42
3 Arctic…    3.38     44.5         NA       NA          12.5      14          60
4 Arctic…    0.92      5.7         NA       NA          16.5      NA          25
5 Asiane… 2547      4603            2.1      1.8         3.9      69         624
6 Baboon    10.6     180.           9.1      0.7         9.8      27         180
# ℹ 3 more variables: predation <dbl>, exposure <dbl>, danger <dbl>

Checking for NAs

I used the function anyNA() to check if there are any NAs in the variables I’m going to use. If there are NAs, when I run the chunk TRUE will show up.

anyNA(mammals$dreaming)

[1] TRUE

anyNA(mammals$species)

[1] FALSE

anyNA(mammals$life_span)

[1] TRUE

anyNA(mammals$danger)

[1] FALSE

anyNA(mammals$gestation)

[1] TRUE

anyNA(mammals$total_sleep)

[1] TRUE

Removing NAs

I used filter(!is.na()) to remove the NAs from every variable I checked above that showed they had NAs. I made a new dataset called “mammals2” with the adjusted variables.

mammals2 <- mammals |>
  filter(!is.na(dreaming), !is.na(life_span), !is.na(gestation), !is.na(total_sleep))

Mutating Variable

I used the function mutate() the change the variable ‘danger’ from quantitative to categorical. 1 = “Least Danger”, 2 = “Less Danger”, 3 = “Intermediate Danger”, 4 = “More Danger”, and 5 = “Most Danger”.

mammals3 <- mammals2 |>
  mutate(
    danger = factor(danger, levels = 1:5, labels = c("Least Danger", "Less Danger", "Intermediate Danger", "More Danger", "Most Danger"))
  )

Linear Regression

I used the function cor() to find the correlation coefficient of the linear regression. Then I used lm(y ~ x) to find the slope, intercept, and p-value.

cor(mammals3$life_span, mammals3$gestation)

[1] 0.6463887

mammals_lm <- lm(life_span ~ gestation, data = mammals3)
summary(mammals_lm)


Call:
lm(formula = life_span ~ gestation, data = mammals3)

Residuals:
    Min      1Q  Median      3Q     Max 
-23.108  -7.067  -3.542   4.694  66.590 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  6.06229    3.46419   1.750   0.0878 .  
gestation    0.10242    0.01912   5.358 3.76e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 15.65 on 40 degrees of freedom
Multiple R-squared:  0.4178,    Adjusted R-squared:  0.4033 
F-statistic: 28.71 on 1 and 40 DF,  p-value: 3.762e-06

The correlation coefficient is 0.61 which shows a moderate to weak correlation. The equation for this linear regression is: life_span = 0.087(gestation) + 7.25. The p-value has three asterisks which suggests that it is meaningful but we have to take into account the adjusted R-squared which shows that only 35% of observations may be explained by the model.

Filtering Out

I used filter() to filter out species of mammals that sleep more than 10 hours and have a life span of more than 8 years.

mammals4 <- mammals3 |>
  filter(total_sleep > 10 & life_span > 8)

After filtering I got 9 observations that fit the requirements I set. I will use these 9 observations as my x-axis variables.

Chart

I made a bar chart with species as the x axis, the hours dreaming as the y axis, and the fill is danger. I changed the theme to classic and changed the labels of the species to make them capitalized and have spaces in between.

plot1 <- mammals4 |>
  ggplot(aes(species, dreaming, fill = danger)) +
  geom_col() +
  theme_classic() +
  scale_x_discrete(labels = c("Big\nBrown Bat", "Cat", "Galago\n(Bush Baby)", "Ground\nSquirrel", "Little\nBrown Bat", "Owl\nMonkey\n(Night Monkey)", "Patas\nMonkey", "Phalanger", "Vervet")) +
  scale_fill_manual(values = c("#FFCDB2", "#FFB8B2", "#E5989D", "#B5838D", "#6D6875")) +
  labs(x = "Species of Mammal",
       y = "Hours of Dreaming",
       title = "Mammals' Hours of Dreaming and\nHow Much Danger They Might Experience",
       fill = "Danger",
       caption = "Sleep in mammals: ecological and constitutional correlates from StatSci.org") +
  theme(legend.position = c(0.75, .8))
plot1

I cleaned up the dataset by first, removing NAs, if necessary, from the variables that I used. Then I filtered out species of mammals that sleep more than 10 hours and have a life span of more than 8 years. My bar chart shows the 9 species that fit the requirements I listed above, on the x axis. The y axis is hours dreaming for each species. The legend shows how much danger each species might experience from other animals. I noticed when filtering that mammal species that sleep more and live longer are in less danger from other animals, you can see this in my chart because when I filtered, every species with a ‘Most Danger’ label were filtered out. In my chart, you can see that there is a correlation between animals that dream more and how much danger they are in. For example, the Vervet and Patas Monkey who are the only variables in the ‘More Danger’, in my chart, dream less than an hour. I would have liked to include dreaming hours compared to total hours of sleep but I couldn’t figure out how to do that in a way that was coherent and that I liked.