library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.3     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.3     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggthemes)
library(ggrepel)

msleep <- read.csv("C:/Users/ABHIRAM/Downloads/msleep.csv")

Response Variable:

I am going to choose the “sleep_total” column as the response variable. It represents the total sleep time for each animal.

Explanatory Categorical Variable:

I’ll select the “vore” column, which indicates the diet of the animals. This categorical variable may influence the total sleep time.

Null Hypothesis for ANOVA:

Null Hypothesis (H0): There is no significant difference in total sleep time among animals with different diets. Alternative Hypothesis (Ha): There is a significant difference in total sleep time among animals with different diets.

ANOVA test:

library(dplyr)

data <- msleep 

# Performing ANOVA
anova_result <- aov(sleep_total ~ vore, data = data)
summary(anova_result)
##             Df Sum Sq Mean Sq F value Pr(>F)  
## vore         3  133.7   44.57   2.235 0.0914 .
## Residuals   72 1435.7   19.94                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 7 observations deleted due to missingness

Interpreting ANOVA Results:

The ANOVA results will provide an F-statistic and a p-value. If the p-value is less than our chosen significance level (e.g., 0.05), you would reject the null hypothesis. If the p-value is greater than the significance level, we would fail to reject the null hypothesis. Based on the results, we would conclude whether the type of diet significantly influences total sleep time in animals.

Another Continuous Variable:

I’ll choose the “awake” column as another continuous variable that might influence total sleep time. There should be a roughly linear relationship between these two variables.

Linear Regression Model:

# Performing linear regression
lm_result <- lm(sleep_total ~ awake, data = data)
summary(lm_result)
## 
## Call:
## lm(formula = sleep_total ~ awake, data = data)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.004529 -0.002016 -0.001256  0.000283  0.046893 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 23.9959187  0.0026771    8963   <2e-16 ***
## awake       -0.9996104  0.0001876   -5329   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.007563 on 81 degrees of freedom
## Multiple R-squared:      1,  Adjusted R-squared:      1 
## F-statistic: 2.84e+07 on 1 and 81 DF,  p-value: < 2.2e-16

Interpreting Regression Results:

The coefficients in the linear regression model will tell us how much the “awake” variable influences the “sleep_total” variable. The intercept represents the expected “sleep_total” when “awake” is zero. The coefficient for “awake” represents the change in “sleep_total” for a one-unit increase in “awake.”

Including Another Variable:

I’ll include “vore” from the ANOVA test to see how it helps the regression model.

# Performing multiple linear regression
mlm_result <- lm(sleep_total ~ awake + vore, data = data)
summary(mlm_result)
## 
## Call:
## lm(formula = sleep_total ~ awake + vore, data = data)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.008264 -0.002809 -0.000728  0.001458  0.042631 
## 
## Coefficients:
##               Estimate Std. Error   t value Pr(>|t|)    
## (Intercept) 23.9993141  0.0032797  7317.526   <2e-16 ***
## awake       -0.9995634  0.0002029 -4926.645   <2e-16 ***
## voreherbi   -0.0056405  0.0022344    -2.524   0.0138 *  
## voreinsecti -0.0032696  0.0039752    -0.822   0.4135    
## voreomni    -0.0050225  0.0024664    -2.036   0.0455 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.007691 on 71 degrees of freedom
##   (7 observations deleted due to missingness)
## Multiple R-squared:      1,  Adjusted R-squared:      1 
## F-statistic: 6.633e+06 on 4 and 71 DF,  p-value: < 2.2e-16

Interpreting Multiple Regression Results:

The coefficients for “awake” and “vore” will tell us how they influence “sleep_total.” The p-values associated with each coefficient will tell you whether the variables are statistically significant in explaining “sleep_total.” The results of these analyses will provide insights into how diet and “awake” time influence the total sleep time of animals.