February 2026

s

Introduction to Simple Linear Regression

Simple Linear Regression is a statistical method used to model the relationship between a single dependent variable \(Y\) and a single independent variable \(X\).

The goal is to find a linear relationship that best predicts the outcome based on the predictor. In this example, we use the built-in cars dataset to analyze the relationship between speed and stopping distance.

In this presentation, we will: - Define the mathematical model. - Visualize data using ggplot2. - Explore interactive 3D visualizations with plotly.

The Mathematical Model (LaTeX)

The relationship between the variables is expressed by the following equation:

\[Y_i = \beta_0 + \beta_1 X_i + \epsilon_i\]

Where: - \(Y_i\) is the dependent variable (Stopping Distance). - \(\beta_0\) is the y-intercept. - \(\beta_1\) is the slope coefficient. - \(\epsilon_i\) represents the random error term.

Estimating Parameters (LaTeX)

To find the line of best fit, we use the method of Least Squares to minimize the sum of squared residuals:

\[RSS = \sum_{i=1}^{n} (y_i - \hat{\beta}_0 - \hat{\beta}_1 x_i)^2\]

The estimated slope \(\hat{\beta}_1\) is calculated as:

\[\hat{\beta}_1 = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sum (x_i - \bar{x})^2}\]

Exploratory Data Analysis (ggplot2 - Plot 1)

Before modeling, we visualize the raw data to check for a linear trend.

Fitting the Model (R Code Slide)

Below is the R code used to fit the linear model and generate the regression visualization:

# Load library
library(ggplot2)

# Fit the linear model
model <- lm(dist ~ speed, data = cars)

# Create the plot with a regression line
ggplot(cars, aes(x=speed, y=dist)) +
  geom_point(color="#8C1D40") +
  geom_smooth(method="lm", col="blue", se=TRUE) +
  theme_light() +
  labs(title="Linear Regression: speed on dist")

Regression Results (ggplot2 - Plot 2)

The blue line represents the predicted values, while the shaded area shows the confidence interval.

Interactive 3D Visualization (plotly)

Using plotly, we can visualize the relationship in a 3D space, which is useful when considering higher-order interactions or multiple variables.

Conclusion

Simple Linear Regression provides a clear framework for predicting outcomes based on a single input variable. By using R and ioslides, we can:

  • Quantify relationships between variables.
  • Generate high-quality, reproducible reports.
  • Use interactive tools like plotly for deeper data exploration.

This concludes the presentation on statistical modeling in R.