library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.3 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.3 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggthemes)
library(ggrepel)
msleep <- read.csv("C:/Users/ABHIRAM/Downloads/msleep.csv")
Response Variable: sleep_total (Total hours of sleep).
Explanatory Variable 1: sleep_rem (Total hours of REM sleep).
Explanatory Variable 2: bodywt (Body weight in kilograms).
Explanatory Variable 3: brainwt (Brain weight in kilograms).
Explanatory Variable 4: sleep_cycle (Sleep cycle duration in hours, calculated as sleep_total / sleep_rem).
In this set, we are using sleep_total as the response variable, and the other variables provide information that could potentially explain or predict the total hours of sleep.
Let’s create a scatterplot to visualize the relationship between sleep_total and sleep_cycle:
# Scatterplot of sleep_total vs. sleep_cycle
plot(msleep$sleep_cycle, msleep$sleep_total, main="Sleep Total vs. Sleep Cycle", xlab="Sleep Cycle (hrs)", ylab="Sleep Total (hrs)", pch=19, col="blue")
Conclusion: From the scatterplot, it appears that there may be a positive linear relationship between ‘sleep_total’ and ‘sleep_cycle’, indicating that animals with longer sleep cycles tend to have more total sleep. There are no significant outliers in this relationship.
cor(msleep$sleep_total, msleep$sleep_cycle, use = "complete.obs")
## [1] -0.4737127
Explanation: Since we observed a positive linear relationship in the scatterplot, a positive correlation coefficient is expected. If the value is close to 1, it suggests a strong positive correlation, indicating that animals with longer sleep cycles tend to have more total sleep. If the value is closer to 0, it indicates a weaker correlation.
# Confidence interval for sleep_total
sleep_total_ci <- t.test(msleep$sleep_total)$conf.int
sleep_total_ci
## [1] 9.461972 11.405497
## attr(,"conf.level")
## [1] 0.95
Conclusion: The confidence interval for sleep_total provides a range within which we can be confident that the true population mean of sleep total hours lies. For example, if the confidence interval is (10, 12), it means we are 95% confident that the population mean sleep total hours falls between 10 and 12 hours.
Response Variable: conservation (Conservation status, Ordered: ‘lc’ < ‘nt’ < ‘vu’ < ‘en’ < ‘cd’).
Explanatory Variable 1: sleep_total (Total hours of sleep).
Explanatory Variable 2: sleep_rem (Total hours of REM sleep).
Explanatory Variable 3: sleep_cycle (Sleep cycle duration in hours, calculated as sleep_total / sleep_rem).
In this set, we are using conservation as the response variable, and the sleep characteristics provide explanatory variables that could help understand the relationship between conservation status and sleep patterns.
Let’s create a boxplot to visualize the relationship between ‘conservation’ and ‘sleep_total’:
# Boxplot of conservation vs. sleep_total
boxplot(msleep$sleep_total ~ msleep$conservation, main="Conservation vs. Sleep Total", xlab="Conservation Status", ylab="Sleep Total (hrs)", col="lightblue")
Conclusion: The boxplot shows variations in ‘sleep_total’ across different conservation statuses. Species with ‘lc’ (least concern) conservation status tend to have higher median sleep totals, while ‘cd’ (critically endangered) species have lower sleep totals. There are some outliers in each conservation category.
Since conservation is a categorical variable, we cannot calculate correlation coefficient and a confidence interval for it.
Response Variable: brainwt (Brain weight in kilograms).
Explanatory Variable 1: bodywt (Body weight in kilograms).
Explanatory Variable 2: body_to_brain_ratio (Body-to-brain weight ratio, calculated as bodywt / brainwt).
In this set, we are using brainwt as the response variable, and the body metrics provide explanatory variables to explore the relationship between body weight, brain weight, and the body-to-brain weight ratio.
Let’s create a scatterplot to visualize the relationship between brainwt and bodywt:
# Scatterplot of brainwt vs. bodywt
plot(msleep$bodywt, msleep$brainwt, main="Brain Weight vs. Body Weight", xlab="Body Weight (kg)", ylab="Brain Weight (kg)", pch=19, col="green")
Conclusion: The scatterplot shows a positive linear relationship between ‘brainwt’ and ‘bodywt’, indicating that animals with larger body weights tend to have larger brain weights. There are some outliers in the data where animals have relatively small body weights but relatively large brain weights.
cor(msleep$brainwt, msleep$bodywt, use = "complete.obs")
## [1] 0.9337822
Explanation: Given the observed positive linear relationship in the scatterplot, a positive correlation coefficient is expected. If the value is close to 1, it suggests a strong positive correlation, indicating that animals with larger body weights tend to have larger brain weights. If the value is closer to 0, it indicates a weaker correlation.
# Confidence interval for brainwt
brainwt_ci <- t.test(msleep$brainwt)$conf.int
brainwt_ci
## [1] 0.02009613 0.54306673
## attr(,"conf.level")
## [1] 0.95
Conclusion: The confidence interval for brainwt provides a range within which we can be confident that the true population mean of brain weight (in kg) lies. For example, if the confidence interval is (0.02, 0.03), it means we are 95% confident that the population mean brain weight falls between 0.02 kg and 0.03 kg.