2024-11-18

What is simple linear regression?

Linear regression is statistical method that is used to predict what a dependent variable will be based on independent variable by fitting a straight line through data points assuming that there is a linear relationship between them.

When is simple linear regression is used?

We tend to use linear regression when the relationship between data points is pretty obviously linear to begin with especially when we are using it to extrapolate data points that are missing. We can also use it to determine how strong the relationship between to variables used.

Mathematical Formulation for Linear Regression

The mathematical formula for simple linear regression is \[y = \beta_0 + \beta_1 * X_1 + \varepsilon\].

\[\varepsilon \sim \mathcal{N}(0,\sigma^2)\]

Dataset

The head data of Salary Dataset
YearsExperience Salary
1.2 39344
1.4 46206
1.6 37732
2.1 43526
2.3 39892

Plot 1

Code for the Previous Plot

x = salary_data\(YearsExperience y = salary_data\)Salary

mod = lm(y~x) xax <- list( title = “Years Experience”, titlefont = list(family=“Modern Computer Roman”) )

yax <- list( title = “Salary”, titlefont = list(family=“Modern Computer Roman”) ) graph <- plot_ly(salary_data, x=x, y = y, type = “scatter”, mode = “markers”) %>% add_lines(x = x, y = fitted(mod)) %>% layout(xaxis = xax, yaxis = yax) config(graph, displaylogo = FALSE)

Analysis of Variance

Key Values

The below table shows the key values for the variables in the equation used for linear regression \[y = \beta_0 + \beta_1X_1\]

##              Estimate Std. Error  t value     Pr(>|t|)
## (Intercept) 24848.204  2306.6537 10.77240 1.816526e-11
## x            9449.962   378.7546 24.95009 1.143068e-20

Conclusions

In conclusion we can see that there is a strong relationship bewtween the years of experience and the salary that people make. \(\newline\) Thus we can draw a linear relationship that will give us an idea of what values we can expect for certain values of the indepdent variable