Simple Linear Regression is a statistical method that models the relationship between two variables (x and y) by fitting a linear equation to observed data.
2025-04-13
Simple Linear Regression is a statistical method that models the relationship between two variables (x and y) by fitting a linear equation to observed data.
The general form of the regression line is:
\[ y = \beta_0 + \beta_1 x + \varepsilon \]
Where: - \(y\) is the dependent variable - \(x\) is the independent variable - \(\beta_0\) is the intercept - \(\beta_1\) is the slope - \(\varepsilon\) is the error term
Let’s examine the relationship between study hours and exam scores using simulated data.
## study_hours exam_scores ## 1 3.588198 59.50752 ## 2 8.094746 94.66267 ## 3 4.680792 74.17083 ## 4 8.947157 89.04510 ## 5 9.464206 100.00000 ## 6 1.410008 59.18236
## `geom_smooth()` using formula = 'y ~ x'
There is a positive linear relationship between study hours and exam scores
With a 3D plot we can see what the graph would be like with another variable (practice)
The least squares estimates \(\hat{\beta}_0\) and \(\hat{\beta}_1\) minimize the sum of squared errors:
\[ S(\beta_0, \beta_1) = \sum_{i=1}^{n} (y_i - \beta_0 - \beta_1 x_i)^2 \]
## ## Call: ## lm(formula = exam_scores ~ study_hours, data = data) ## ## Residuals: ## Min 1Q Median 3Q Max ## -11.4639 -2.3416 0.1516 2.2989 10.9300 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 49.9300 1.4916 33.47 <2e-16 *** ## study_hours 4.9961 0.2384 20.96 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 4.42 on 48 degrees of freedom ## Multiple R-squared: 0.9015, Adjusted R-squared: 0.8994 ## F-statistic: 439.2 on 1 and 48 DF, p-value: < 2.2e-16