Slide 2: What is linear regression

Simple Linear Regression is a statistical method focused on the relationship between two variables. It uses a linear equation to observe and measure the data. The two variables that this method focuses on are independent variables and dependent variables. In this dataset we will be focusing on hair color in relation to individuals; more specifically females.

Slide 3: Introduce Equation using LaTeX

Simple Linear Regression uses the equation \[ Y = mx + b \] You might have seen this equation often since high school, and it seems we just can’t get rid of it! The Y represents the dependent variable. The m is the slope. The x is the independent variable. And the b is the y-intercept.

Slide 4: Importance of linear regression

The importance behind simple linear regression is to observe the relationship between the two variables, to conclude our data.Simple linear regressions are much simpler to understand and interpret. ##Slide5 Based off this data set: Independent varibale : Hair color Dependent variable : Frequency(number of females observed with hair color) The equation for this : \[ 5.50x + 10.00 \] Here’s how we perform simple linear regression using the HairEyeColor dataset:

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
HairFreqdf <- as.data.frame(HairEyeColor)

FemaleData <- HairFreqdf %>% filter(Sex == "Female")

model <- lm(Freq ~ Hair, data = FemaleData)

summary(model)
## 
## Call:
## lm(formula = Freq ~ Hair, data = FemaleData)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -21.750 -11.312  -3.125   0.375  43.750 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)   13.000      9.988   1.302    0.217
## HairBrown     22.750     14.125   1.611    0.133
## HairRed       -3.750     14.125  -0.265    0.795
## HairBlond      7.250     14.125   0.513    0.617
## 
## Residual standard error: 19.98 on 12 degrees of freedom
## Multiple R-squared:  0.256,  Adjusted R-squared:  0.07002 
## F-statistic: 1.376 on 3 and 12 DF,  p-value: 0.2972

##Slide 6 Plotly Plotly Plot

library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
p <- plot_ly(data = FemaleData, 
              x = ~Hair, 
              y = ~Freq, 
              type = 'scatter', mode = 'markers', 
              text = ~paste("Count:", Freq), 
              hoverinfo = 'text') %>%
  layout(title = "Frequency of Females by Hair Color",
         xaxis = list(title = "Hair Color"),
         yaxis = list(title = "Frequency"))

p

##Slide 7 ggplot 1 ggplot 2

ggplot(FemaleData, aes(x = Hair, y = Freq, fill = Hair)) +
  geom_bar(stat = "identity") +
  labs(title = "Frequency of Females by Hair Color",
       x = "Hair Color",
       y = "Frequency") +
  theme_minimal()

##Slide 8 ggplot 2 ggplot2

ggplot(FemaleData, aes(x = Hair, y = Freq)) +
  geom_point(aes(color = Hair), size = 3) +
  geom_smooth(method = "lm", se = FALSE, color = "blue") +
  labs(title = "Linear Regression of Frequency by Hair Color",
       x = "Hair Color",
       y = "Frequency") +
  theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'