Introduction

Simple Linear Regression is a statistical method that models the relationship between a dependent variable (Y) and one independent variable (X) using a straight line.

Real-Life Application

A company wants to understand how advertising expenditure influences sales. We collect data on money spent on ads and corresponding sales for 50 days.

LaTeX Math - Regression Model

We assume a linear model:

\[ Y = \beta_0 + \beta_1 X + \epsilon \]

Where: - \(Y\) = dependent variable (Sales)
- \(X\) = independent variable (Advertising)
- \(\beta_0\), \(\beta_1\) = regression coefficients
- \(\epsilon\) = error term

Simulated Data

R Code: Linear Regression

model <- lm(Sales ~ Advertising, data = df)
summary(model)
## 
## Call:
## lm(formula = Sales ~ Advertising, data = df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -10.0560  -3.1111  -0.4097   3.3295  10.7983 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  5.33830    1.30305   4.097  0.00016 ***
## Advertising  0.79667    0.02246  35.478  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.676 on 48 degrees of freedom
## Multiple R-squared:  0.9633, Adjusted R-squared:  0.9625 
## F-statistic:  1259 on 1 and 48 DF,  p-value: < 2.2e-16

ggplot Plot: Scatterplot with Regression Line

ggplot(df, aes(x = Advertising, y = Sales)) +
  geom_point() +
  geom_smooth(method = "lm", col = "blue") +
  labs(title = "Sales vs. Advertising", x = "Advertising", y = "Sales")
## `geom_smooth()` using formula = 'y ~ x'

ggplot Plot: Residuals Plot

df$residuals <- residuals(model)
ggplot(df, aes(x = Advertising, y = residuals)) +
  geom_point(color = "red") +
  geom_hline(yintercept = 0, linetype = "dashed") +
  labs(title = "Residuals vs. Advertising", y = "Residuals")

LaTeX Math - Coefficient Estimation

The slope and intercept estimates are given by:

\[ \hat{\beta}_1 = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sum (x_i - \bar{x})^2} \]

\[ \hat{\beta}_0 = \bar{y} - \hat{\beta}_1 \bar{x} \]

Plotly 3D Plot: Simulated Data View

library(plotly)
z <- 5 + 0.8 * x + rnorm(50, 0, 5)
plot_ly(x = ~x, y = ~y, z = ~z, type = "scatter3d", mode = "markers") %>%
  layout(title = "3D View of Advertising and Sales")

Conclusion

Simple linear regression is a powerful yet interpretable technique for modeling linear relationships. It’s widely used in business, science, and engineering to predict outcomes based on known factors.