Simple Linear Regression

2024-09-25

Slide 1: Introduction

This presentation covers the basics of Simple Linear Regression, focusing on the relationship between hair color and frequency among females.

Slide 2: What is Linear Regression

Simple Linear Regression is a statistical method focused on the relationship between two variables. It uses a linear equation to observe and measure the data. The two variables that this method focuses on are independent and dependent variables. In this dataset, we will be focusing on hair color in relation to individuals; more specifically females.

Slide 3: Introduce Equation using LaTeX

Simple Linear Regression uses the equation \[ Y = mx + b \]. You might have seen this equation often since high school, and it seems we just can’t get rid of it! The Y represents the dependent variable. The m is the slope. The x is the independent variable. And the b is the y-intercept.

Slide 4: Importance of Linear Regression

The importance behind simple linear regression is to observe the relationship between the two variables, helping us conclude our data. Simple linear regressions are much simpler to understand and interpret.

Slide 5: Overview of the Dataset

Based on this dataset: - Independent variable: Hair color - Dependent variable: Frequency (number of females observed with hair color)

The equation for this: \[ 5.50x + 10.00 \]

Slide 6: Performing Simple Linear Regression

Here’s how we perform simple linear regression using the HairEyeColor dataset: Loading the packages:

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Slide 6(Cont.) : Performing Simple Linear Regression

Code:

HairFreqdf <- as.data.frame(HairEyeColor)
FemaleData <- HairFreqdf %>% filter(Sex == "Female")

model <- lm(Freq ~ Hair, data = FemaleData)
summary(model)

## 
## Call:
## lm(formula = Freq ~ Hair, data = FemaleData)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -21.750 -11.312  -3.125   0.375  43.750 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)   13.000      9.988   1.302    0.217
## HairBrown     22.750     14.125   1.611    0.133
## HairRed       -3.750     14.125  -0.265    0.795
## HairBlond      7.250     14.125   0.513    0.617
## 
## Residual standard error: 19.98 on 12 degrees of freedom
## Multiple R-squared:  0.256,  Adjusted R-squared:  0.07002 
## F-statistic: 1.376 on 3 and 12 DF,  p-value: 0.2972

Slide 7: Plotly

Slide 7: Cont. Plotly Plot Code

suppressMessages(library(plotly))
p <- plot_ly(data = FemaleData, 
              x = ~Hair,  y = ~Freq, 
              type = 'scatter',  mode = 'markers', 
              text = ~paste("Count:", Freq), hoverinfo = 'text') %>%
  layout(title = "Frequency of Females by Hair Color",
         xaxis = list(title = "Hair Color"),
         yaxis = list(title = "Frequency"))
p

Slide 8: ggplot1

ggplot2

## `geom_smooth()` using formula = 'y ~ x'