Simple Linear Regression is a statistical method that helps us understand the relationship between two continuous variables.
Simple Linear Regression is a statistical method that helps us understand the relationship between two continuous variables.
The equation of a simple linear regression model:
\[ y = \beta_0 + \beta_1 x + \epsilon \]
Where: - \(y\) is the dependent variable - \(x\) is the independent variable - \(\beta_0\) is the intercept - \(\beta_1\) is the slope - \(\epsilon\) is the error term
Let’s say we want to predict a student’s final exam score (\(y\)) based on the number of hours studied (\(x\)).
library(ggplot2)
data <- data.frame(hours = c(1, 2, 3, 4, 5, 6, 7),
score = c(50, 55, 65, 70, 75, 78, 85))
ggplot(data, aes(x = hours, y = score)) +
geom_point(color = "blue", size = 3) +
ggtitle("Hours Studied vs. Exam Score") +
theme_minimal()
model <- lm(score ~ hours, data = data)
ggplot(data, aes(x = hours, y = score)) +
geom_point(color = "blue", size = 3) +
geom_smooth(method = "lm", se = FALSE, color = "red") +
ggtitle("Regression Line") +
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
The slope (\(\beta_1\)) is calculated by:
\[ \beta_1 = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sum (x_i - \bar{x})^2} \]
And the intercept (\(\beta_0\)):
\[ \beta_0 = \bar{y} - \beta_1 \bar{x} \]
summary(model)
## ## Call: ## lm(formula = score ~ hours, data = data) ## ## Residuals: ## 1 2 3 4 5 6 7 ## -1.0357 -1.7857 2.4643 1.7143 0.9643 -1.7857 -0.5357 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 45.2857 1.5892 28.50 9.97e-07 *** ## hours 5.7500 0.3554 16.18 1.64e-05 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 1.88 on 5 degrees of freedom ## Multiple R-squared: 0.9813, Adjusted R-squared: 0.9775 ## F-statistic: 261.8 on 1 and 5 DF, p-value: 1.643e-05
This code shows coefficients, R-squared value, and other statistics.
library(plotly)
set.seed(123)
x <- rnorm(100)
y <- rnorm(100)
z <- 2 + 3*x + 4*y + rnorm(100)
plot_ly(x = ~x, y = ~y, z = ~z, type = "scatter3d", mode = "markers",
marker = list(size = 3, color = z, colorscale = 'Viridis'))