2026-02-08

R Markdown

This is an R Markdown presentation. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document.

Slide with Bullets

  • Bullet 1
  • Bullet 2
  • Bullet 3

Slide with R Output

summary(cars)
##      speed           dist       
##  Min.   : 4.0   Min.   :  2.00  
##  1st Qu.:12.0   1st Qu.: 26.00  
##  Median :15.0   Median : 36.00  
##  Mean   :15.4   Mean   : 42.98  
##  3rd Qu.:19.0   3rd Qu.: 56.00  
##  Max.   :25.0   Max.   :120.00

Slide with Plot

Simple Linear Regression

Introduction:

Simple Linear Regression (SLR) is a statistical method that allows us to summarize and study relationships between two continuous variables:

predictor (X): The independent variable.

Response (Y): The dependent variable

In this presentation, we will explore the mathematical theory behing SLR, visualize the data using ggplot2, and interact with the error surface using plotly.

The Statistical Model

To perform Simple Linear Regression (SLR), we assume the relationship between X and Y can be modeled by a linear function plus some random error.

The population regression line is defined as:

\[Y_i = \beta_0 + \beta_1 X_i + \epsilon_i\]

Where:

\(\beta_0\): The Y-intercept.

\(\beta_1\): The slope (change in \(Y\) for every 1 unit increase in \(X\)).

\(\epsilon_i\): The random error term, assumed \(\epsilon_i \sim N(0, \sigma^2)\).

Ordinary Least Squares (OLS)

How do we find the “best” line?

We use the Ordinary Least Squares method to minimize the sum of squared residuals (\(SSR\)).

We define the residual as the difference between the observed value and the predicted value:

\[e_i = y_i - \hat{y}_i\]

The goal is to minimize: \[Q = \sum_{i=1}^{n} (y_i - (\beta_0 + \beta_1 x_i))^2\]

The solution provides our estimates \(\hat{\beta}_0\) and \(\hat{\beta}_1\).

Data Exploration: The “Cars” Dataset

We will use the built-in cars dataset, which measures the speed of cars and the distances taken to stop.

ggplot2 plot

  ggplot(cars, aes(x = speed, y = dist)) +
  geom_point(color = "blue", size = 3, alpha = 0.7) +
  theme_minimal() +
  labs(title = "Stopping Distance vs. Speed",
       x = "Speed (mph)",
       y = "Stopping Distance (ft)")

Second ggplot2 plot: Residual Analysis

Visualizing the Fit

Adding the regression line helps us see the trend. Here, we use geom_smooth to apply the linear model.

R Implementation

The following code demonstrates how to calculate the linear model and view the summary statistics in R.

# Fit the model
model <- lm(dist ~ speed, data = cars)

# Display coefficients
summary(model)$coefficients
##               Estimate Std. Error   t value     Pr(>|t|)
## (Intercept) -17.579095  6.7584402 -2.601058 1.231882e-02
## speed         3.932409  0.4155128  9.463990 1.489836e-12

The output shows that for every 1 mph increase in speed, the stopping distance increases by approximately 3.93 feet.

Exploring the Error Surface (3D)

To understand how OLS works, we can look at the “Loss Function” in 3D. This plotly graph visualizes how changing the intercept (\(\beta_0\)) and slope (\(\beta_1\)) affects the Sum of Squared Errors.

Conclusion

Simple Linear Regression is a powerful tool for predicting outcomes and understanding variable relationships.

Ordinary Least Squares (OLS) ensures we find the line that mathematically minimizes our prediction errors.

Visual tools like ggplot2 and plotly allow us to communicate these complex relationships clearly to stakeholders.

References & Resources

  • Data: R Core Team (2023). R: A language and environment for statistical computing.
  • Visualization: * Wickham, H. (2016). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York.
    • Sievert, C. (2020). Interactive Data Visualization with R, plotly, and shiny. CRC Press.
  • Theory: Montgomery, D. C., Peck, E. A., & Vining, G. G. (2012). Introduction to Linear Regression Analysis. Wiley.
  • Tools: Created using ioslides via RStudio.

Thank you!