Sleep Data

  extra group ID
1   0.7     1  1
2  -1.6     1  2
3  -0.2     1  3
4  -1.2     1  4
5  -0.1     1  5
6   3.4     1  6

What is a linear regression?

Linear regression a statistical method used to model the relationship between a dependent variable and one or more independent variables by fitting a linear equation to the data.

Linear Regression

  • We model tree extra as a function of group.
  • Model:
    Extra = β0 + β1 * group + ε
  • R code:
    • lm(Extra ~ group, data=sleep)
  • Purpose:
    • Compare the mean sleep increase between the two drug groups.
    • Overlay regression line on scatter plot

Regression Model

Plotly

  extra group ID
1   0.7     1  1
2  -1.6     1  2
3  -0.2     1  3
4  -1.2     1  4
5  -0.1     1  5
6   3.4     1  6

Relationship Between Drug Group and Extra Sleep

ggplot(sleep, aes(x = factor(group), y = extra)) +
  geom_jitter(width = 0.05, color = "lightblue", size = 2) +
  labs(title = "Sleep Gain by Drug Type",
       x = "Drug Group (1 = A, 2 = B)",
       y = "Extra Hours Slept") +
  theme_minimal()

lwd = 2

Average extra sleep by Drug group

library(ggplot2)
data(sleep)

sleep_summary <- aggregate(extra ~ group, data = sleep, mean)
ggplot(sleep_summary, aes(x = factor(group), y = extra, fill = factor(group))) +
  geom_col(width = 0.6, color = "black", alpha = 0.7) +
  labs(title = "Average Sleep Gain by Drug Group",
       x = "Drug Group", y = "Mean Extra Hours Slept") +
  scale_fill_manual(values = c("green", "gray"), name = "Group") +
  theme_minimal()

Model Summary

Call:
lm(formula = extra ~ group, data = sleep)

Residuals:
   Min     1Q Median     3Q    Max 
-2.430 -1.305 -0.580  1.455  3.170 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)   0.7500     0.6004   1.249   0.2276  
group2        1.5800     0.8491   1.861   0.0792 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.899 on 18 degrees of freedom
Multiple R-squared:  0.1613,    Adjusted R-squared:  0.1147 
F-statistic: 3.463 on 1 and 18 DF,  p-value: 0.07919

Hypothesis Test for the Slope

We test whether there is a difference between groups:

\[ H_0:\ \beta_1=0 \qquad\text{vs}\qquad H_a:\ \beta_1\neq 0 \]

R Code

Below is my code used to create a scatter plot with a regression line

library(ggplot2)
data(sleep)

# Plot data + regression line
ggplot(sleep, aes(x = group, y = extra)) +
  geom_point(size = 3, color = "#8C1D40") +
  geom_smooth(method = "lm", se = TRUE, color = "black") +
  labs(title = "Linear Regression on Sleep Data",
       x = "Drug Group (1 = A, 2 = B)",
       y = "Extra Hours Slept") +
  theme_minimal()