Introduction

  • Simple linear regression is a statistical method that allows us to study the relationship between two continuous variables.
  • One variable, denoted \(x\), is the independent variable.
  • The other variable, denoted \(y\), is the dependent variable.

The Linear Regression Model

  • The simple linear regression model is: \[ y_i = \beta_0 + \beta_1 x_i + \epsilon_i, \quad i = 1, 2, \dots, n \] where:
    • \(y_i\): Dependent variable.
    • \(x_i\): Independent variable.
    • \(\beta_0\): Intercept.
    • \(\beta_1\): Slope.
    • \(\epsilon_i\): Error term.

Estimating the Parameters

  • Estimate \(\beta_0\) and \(\beta_1\) using the least squares method.
  • Minimizing the sum of squared errors: \[ S(\beta_0, \beta_1) = \sum_{i=1}^{n} (y_i - \beta_0 - \beta_1 x_i)^2 \]

Example Dataset

We will use a dataset relating Years of Experience to Salary.

Fitting the Model

3D Plotly Visualization

R Code for Model Fitting

## 
## Call:
## lm(formula = salary ~ experience, data = data)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -11348  -5624  -1392   3854  16814 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    35255       6673   5.283 0.000743 ***
## experience      4180       1075   3.887 0.004628 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9768 on 8 degrees of freedom
## Multiple R-squared:  0.6538, Adjusted R-squared:  0.6106 
## F-statistic: 15.11 on 1 and 8 DF,  p-value: 0.004628