The dataset I am exploring analyzes sleep in mammals. This dataset shows 62 different species of mammals and describes each species’ sleep. Many variables are included such as total hours sleeping (total_sleep), number of hours dreaming (dreaming), number of hours not dreaming (non_dreaming), how long each species lives (life_span), gestation of each species in days (gestation), and how much danger each species faces from other animals, not specifically other mammals (danger). I plan to explore the sleep types in species that sleep longer than 10 hours and have a longer life span. My source for this dataset is from StatSci.org:
http://www.statsci.org/data/general/sleep.txt
Load Packages and Data
I loaded the package ‘tidyverse’, then loaded my dataset using read_csv and named it “mammals”. I then used the head() function to show the first 6 observations in my dataset.
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.4.4 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
mammals <-read_csv("mammals.csv")
Rows: 62 Columns: 11
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): species
dbl (10): body_wt, brain_wt, non_dreaming, dreaming, total_sleep, life_span,...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(mammals)
# A tibble: 6 × 11
species body_wt brain_wt non_dreaming dreaming total_sleep life_span gestation
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Africa… 6654 5712 NA NA 3.3 38.6 645
2 Africa… 1 6.6 6.3 2 8.3 4.5 42
3 Arctic… 3.38 44.5 NA NA 12.5 14 60
4 Arctic… 0.92 5.7 NA NA 16.5 NA 25
5 Asiane… 2547 4603 2.1 1.8 3.9 69 624
6 Baboon 10.6 180. 9.1 0.7 9.8 27 180
# ℹ 3 more variables: predation <dbl>, exposure <dbl>, danger <dbl>
Checking for NAs
I used the function anyNA() to check if there are any NAs in the variables I’m going to use. If there are NAs, when I run the chunk TRUE will show up.
anyNA(mammals$dreaming)
[1] TRUE
anyNA(mammals$species)
[1] FALSE
anyNA(mammals$life_span)
[1] TRUE
anyNA(mammals$danger)
[1] FALSE
anyNA(mammals$gestation)
[1] TRUE
anyNA(mammals$total_sleep)
[1] TRUE
Removing NAs
I used filter(!is.na()) to remove the NAs from every variable I checked above that showed they had NAs. I made a new dataset called “mammals2” with the adjusted variables.
I used the function mutate() the change the variable ‘danger’ from quantitative to categorical. 1 = “Least Danger”, 2 = “Less Danger”, 3 = “Intermediate Danger”, 4 = “More Danger”, and 5 = “Most Danger”.
I used the function cor() to find the correlation coefficient of the linear regression. Then I used lm(y ~ x) to find the slope, intercept, and p-value.
cor(mammals3$life_span, mammals3$gestation)
[1] 0.6463887
mammals_lm <-lm(life_span ~ gestation, data = mammals3)summary(mammals_lm)
Call:
lm(formula = life_span ~ gestation, data = mammals3)
Residuals:
Min 1Q Median 3Q Max
-23.108 -7.067 -3.542 4.694 66.590
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.06229 3.46419 1.750 0.0878 .
gestation 0.10242 0.01912 5.358 3.76e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 15.65 on 40 degrees of freedom
Multiple R-squared: 0.4178, Adjusted R-squared: 0.4033
F-statistic: 28.71 on 1 and 40 DF, p-value: 3.762e-06
The correlation coefficient is 0.61 which shows a moderate to weak correlation. The equation for this linear regression is: life_span = 0.087(gestation) + 7.25. The p-value has three asterisks which suggests that it is meaningful but we have to take into account the adjusted R-squared which shows that only 35% of observations may be explained by the model.
Filtering Out
I used filter() to filter out species of mammals that sleep more than 10 hours and have a life span of more than 8 years.
After filtering I got 9 observations that fit the requirements I set. I will use these 9 observations as my x-axis variables.
Chart
I made a bar chart with species as the x axis, the hours dreaming as the y axis, and the fill is danger. I changed the theme to classic and changed the labels of the species to make them capitalized and have spaces in between.
plot1 <- mammals4 |>ggplot(aes(species, dreaming, fill = danger)) +geom_col() +theme_classic() +scale_x_discrete(labels =c("Big\nBrown Bat", "Cat", "Galago\n(Bush Baby)", "Ground\nSquirrel", "Little\nBrown Bat", "Owl\nMonkey\n(Night Monkey)", "Patas\nMonkey", "Phalanger", "Vervet")) +scale_fill_manual(values =c("#FFCDB2", "#FFB8B2", "#E5989D", "#B5838D", "#6D6875")) +labs(x ="Species of Mammal",y ="Hours of Dreaming",title ="Mammals' Hours of Dreaming and\nHow Much Danger They Might Experience",fill ="Danger",caption ="Sleep in mammals: ecological and constitutional correlates from StatSci.org") +theme(legend.position =c(0.75, .8))plot1
I cleaned up the dataset by first, removing NAs, if necessary, from the variables that I used. Then I filtered out species of mammals that sleep more than 10 hours and have a life span of more than 8 years. My bar chart shows the 9 species that fit the requirements I listed above, on the x axis. The y axis is hours dreaming for each species. The legend shows how much danger each species might experience from other animals. I noticed when filtering that mammal species that sleep more and live longer are in less danger from other animals, you can see this in my chart because when I filtered, every species with a ‘Most Danger’ label were filtered out. In my chart, you can see that there is a correlation between animals that dream more and how much danger they are in. For example, the Vervet and Patas Monkey who are the only variables in the ‘More Danger’, in my chart, dream less than an hour. I would have liked to include dreaming hours compared to total hours of sleep but I couldn’t figure out how to do that in a way that was coherent and that I liked.