Simple Linear Regression is a statistical method that models the relationship between a dependent variable (Y) and one independent variable (X) using a straight line.
Simple Linear Regression is a statistical method that models the relationship between a dependent variable (Y) and one independent variable (X) using a straight line.
A company wants to understand how advertising expenditure influences sales. We collect data on money spent on ads and corresponding sales for 50 days.
We assume a linear model:
\[ Y = \beta_0 + \beta_1 X + \epsilon \]
Where: - \(Y\) = dependent variable (Sales)
- \(X\) = independent variable (Advertising)
- \(\beta_0\), \(\beta_1\) = regression coefficients
- \(\epsilon\) = error term
model <- lm(Sales ~ Advertising, data = df) summary(model)
## ## Call: ## lm(formula = Sales ~ Advertising, data = df) ## ## Residuals: ## Min 1Q Median 3Q Max ## -10.0560 -3.1111 -0.4097 3.3295 10.7983 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 5.33830 1.30305 4.097 0.00016 *** ## Advertising 0.79667 0.02246 35.478 < 2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 4.676 on 48 degrees of freedom ## Multiple R-squared: 0.9633, Adjusted R-squared: 0.9625 ## F-statistic: 1259 on 1 and 48 DF, p-value: < 2.2e-16
ggplot(df, aes(x = Advertising, y = Sales)) + geom_point() + geom_smooth(method = "lm", col = "blue") + labs(title = "Sales vs. Advertising", x = "Advertising", y = "Sales")
## `geom_smooth()` using formula = 'y ~ x'
df$residuals <- residuals(model) ggplot(df, aes(x = Advertising, y = residuals)) + geom_point(color = "red") + geom_hline(yintercept = 0, linetype = "dashed") + labs(title = "Residuals vs. Advertising", y = "Residuals")
The slope and intercept estimates are given by:
\[ \hat{\beta}_1 = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sum (x_i - \bar{x})^2} \]
\[ \hat{\beta}_0 = \bar{y} - \hat{\beta}_1 \bar{x} \]
library(plotly) z <- 5 + 0.8 * x + rnorm(50, 0, 5) plot_ly(x = ~x, y = ~y, z = ~z, type = "scatter3d", mode = "markers") %>% layout(title = "3D View of Advertising and Sales")
Simple linear regression is a powerful yet interpretable technique for modeling linear relationships. It’s widely used in business, science, and engineering to predict outcomes based on known factors.