Simple Linear Regression is a method to explore the linear relationship between a dependent variable and a single independent variable.
Simple Linear Regression is a method to explore the linear relationship between a dependent variable and a single independent variable.
The equation is:
\[ y = \beta_0 + \beta_1 x + \epsilon \]
Where: - \(y\) is the predicted outcome - \(x\) is the input or predictor - \(\beta_0\) is the intercept - \(\beta_1\) is the slope - \(\epsilon\) is the error term
Suppose we want to predict a person’s weight based on their height.
We’ll use a small dataset of made-up values to illustrate.
library(ggplot2) data <- data.frame( height = c(150, 155, 160, 165, 170, 175, 180), weight = c(50, 53, 58, 61, 65, 69, 72) ) ggplot(data, aes(x = height, y = weight)) + geom_point(color = "blue", size = 3) + labs(title = "Height vs Weight", x = "Height (cm)", y = "Weight (kg)") + theme_minimal()
model <- lm(weight ~ height, data = data) ggplot(data, aes(x = height, y = weight)) + geom_point(color = "blue", size = 3) + geom_smooth(method = "lm", se = FALSE, color = "red") + labs(title = "Regression Line", x = "Height (cm)", y = "Weight (kg)") + theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
The slope:
\[ \beta_1 = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sum (x_i - \bar{x})^2} \]
The intercept:
\[ \beta_0 = \bar{y} - \beta_1 \bar{x} \]
summary(model)
## ## Call: ## lm(formula = weight ~ height, data = data) ## ## Residuals: ## 1 2 3 4 5 6 7 ## 0.1071 -0.6429 0.6071 -0.1429 0.1071 0.3571 -0.3929 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -62.60714 2.94000 -21.30 4.23e-06 *** ## height 0.75000 0.01779 42.17 1.41e-07 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.4706 on 5 degrees of freedom ## Multiple R-squared: 0.9972, Adjusted R-squared: 0.9966 ## F-statistic: 1778 on 1 and 5 DF, p-value: 1.415e-07
This gives you coefficients, p-values, and R-squared to judge how well the model fits.
library(plotly)
set.seed(1)
x <- rnorm(100)
y <- rnorm(100)
z <- 4 + 2*x + 3*y + rnorm(100)
plot_ly(x = ~x, y = ~y, z = ~z, type = "scatter3d", mode = "markers",
marker = list(size = 3, color = z, colorscale = 'Viridis')) %>%
layout(title = "3D Linear Regression (Simulated)")