2025-10-26

Simple Linear Regression

This topic seeks to find the relationship between a single input and output using a straight line to figure out the correlation.

The formula for the Simple Linear Regression Model is as follows: \[\hat{y}= \beta_0 + \beta_1X\] - \(\beta_0\) : Constant, \(\beta_1\) : Slope, X : Input, \(\hat{y}\) : Output

Using Linear Regression

Linear Regression is used mostly for predictions that follow a a strong correlation between the inputs and outputs.

Real World Problem

Lets use baseball for our Real World Example We are going to use the Lahman R Library for this test

Problem: Can we predict the hits given the average?

Slide with GGPlot 1 (scatterplot)

We can see that there is a linear correlation between hits and the batting average.

GGplot slide 2 (geom_smooth)

The line also known as the “best-fit” line shows the relationship between the input batting average and the output the hits. We can use that line to predict the hits given the batting average.

## `geom_smooth()` using formula = 'y ~ x'

Description

The “best-fit” line can be created / sovled with the formula seen below: \[\hat{y}= \beta_0 + \beta_1X\] although sometimes depending on the situation we can add the error: \[+ \epsilon\]

## `geom_smooth()` using formula = 'y ~ x'

Simple Example

Using this graph we can use it to help predict the hits given the batting average. Problem: Predict Hits given BA = 0.25 Answer: Using the regression line we can predict that the total amount of hits the average person would have is 100 hits in the season.

## `geom_smooth()` using formula = 'y ~ x'

Interactive Scatterplot (plotly)

We can see where the data tends to congregate as well as any outliers that may influence our regression line.

Latex pt 2

Error: The difference between the actual and predicted values Use: To identify strength of “best-fit” regression line. Below is the formula for error: \[e_i = y_i - \hat{y}_i\] -\(e_i\) : error for a single observation -\(y_i\) : actual value -\(\hat{y}_i\): predicted value