March 25, 2025

okay so what is Simple Linear Regression?

  • Simple Linear Regression is a method used in statistics which helps us find linear relation between a dependent variable and one or more independent variables.
  • Basically, it helps us understand how the change in one or more independent variable can affect its dependent variable.

equation for linear regression

The basic linear regression equation is:

\[ Y_i = f(X_i, \beta) + e_i \]

Where:

  • \(Y_i\) is the dependent variable

  • \(X_i\) is the independent variable

  • \(f\) is the function

  • \(\beta\) is the unknown parapmenter

  • \(e_i\) is the error term

some assumptions about linear regression

  1. Linearity: Relationship between X and Y is linear, which means the points need to form something very close to straight line. NO CURVES. NO ZIGZAG
  2. Independence: Each obervation should stand on its own whichout depending on any other point
  3. Homoscedasticity: Gap between the line and each point should be almost the same ssize for all points, no group of points can get a bigger gap
  4. Normality: Is when we line up all those gaps, it looks like a smooth hill or a bell shape
  5. No multicollinearity: means that the independent variables should not be correlated with each other too much

example dataset study hrs and exam scores

Let’s explore the relationship between house size and price:

3D visualization

R code for the above plot

library(plotly)
plot_ly(edata, x = ~score, y = ~hours, type = 'scatter3d',
        mode = 'markers',
        marker = list(color = ~hours, colorscale = 'Viridis')) %>%
  add_trace(z = ~fitted(lm(hours ~ score, data = edata)),
            type = 'mesh3d',
            opacity = 0.5)

math in regression

linear regression is used to minimize the sum of squared errors (SSE) to estimate coefficients

\[ \text{SSE} = \sum_{i=1}^{n}(Y_i - \hat{Y_i})^2 \]

where: - \(Y_i\) is the observed value. - \(\hat{Y_i}\) is the predicted value.

conclusion

  • Simple Linear Regression is like drawing a line through the dots to see how they affect each other and it helps us to understand relationships between variables and make assumptions and how strong the correlation between points is

  • Always make sure that the dots are in a straight line and not curved or zigzag