What is Linear Regression?

The easiest and most understandable way to explain this to me is we need to look at the old classic equation \[ Y=mx+b \] y=mx+b that nearly anyone reading this should be familiar with, in that equation we have a DETERMINISTIC relationship, it just means that no matter how many times you plug in a value for x into f it will always match the value of the outcome, conversely our Simple Linear Regression (SLR) which can be simply written as \[ Y=f(X)+ \varepsilon \] where instead of F(X) we have \[ Y=\beta_0+\beta_1X+ \varepsilon \] we now have a STOCHASTIC relationship, simply meaning the relationship is now “Random” where epsilon which represents our noise will no longer be a fixed variable. What is Beta0 and Beta1? They are our intercept coefficient and our NON-intercept coefficient, it is important to note that we have one predictor otherwise we are doing Multiple Linear Regression which would be written as \[ Y=\beta_0+\sum_{k=1}^p\beta_k X_k+ \varepsilon \] Our next slide will explain what it means to be an intercept coeff and non intercept coeff

Intercept vs non-intercept coefficient

The easiest way to see this is when Beta1 is 0 we get \[ Y=\beta_0+\varepsilon \] if we just look at this we get that the average outcome is around BetaZero and varies around it due to epsilon mathematically we can derive this by isolating Beta zero by taking the expected value of the equation \[ \mathbb{E}[Y]=\mathbb{E}[\beta_0+\varepsilon] \] and we get

\[ \mathbb{E}[Y]=\beta_0 \]

recall that epsilon vanishes because we must assume its average is zero (will be discussed later) which means verbatim the Average (E) of the equation (Y) is beta0 (when B1=0) now simply put if Beta1 is included then it will just shift the answer by itself or, more eloquently put, “it is the change in the average outcome for a unit of increase in the covariate” now mathematically we can concretely determine this with our equations

\[ \mathbb{E}[Y|X=0]=\beta_0 \] \[ \mathbb{E}[Y|X=1]=\beta_0+\beta_1 \] The first equation just means avg value of y when x=0 and the second one means average value of y when x=1 and its just our equations with out epsilon then we subtract them and get Beta1 because Bzero-Bzero=0. Since we are only left with B1 we know that all it does is shift the value of the problem. As you can hopefully see we these math formulas can be used to come to very fundamental properties that otherwise we would just state out of “intuition”.The next slide will speak about some assumptions and important things to note about linear regression

Important details about SLR

It is extremely important that we must understand while doing SLR we are making assumptions (Obviously its a mathematical model) about our data set and everything within it, such as that the average of the noise (epsilon) is zero,

\[ \mathbb{E}[\varepsilon]=0 \]

and that the variance of our epsilon is exactly sigma squared

\[ \operatorname{Var}(\varepsilon|X)=\sigma^2 \]

Additionally on its own SLR can only determine association relationships between variables not ever causal relationships, we must bring in outside things to make any such determinations.

This is just scratching the surface of linear regression one final thing to leave any given read off with is the fact that you can write linear regression as a matrix

\[ \mathbf{Y}=\mathbf{X}\beta+\varepsilon \] where Y is a 1 column vector with n rows (n being sample size) X has n rows and P+1 columns (p is the number of non intercept coeffs and the one is the column for the intercept) and our Beta is p+1*1 and similarly our epsilon is n rows by 1 column this has some interesting properties and allows us to we can estimate all of the regression coefficents at the same time using ordinary least squares estimators

\[ \hat{\boldsymbol{\beta}}=(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^{T}\mathbf{Y} \]

HP Vs Disp

`geom_smooth()` using formula = 'y ~ x'

Residuals versus Fitted (HP disp)

SSE Surface for Beta0 and Beta1 HP disp

This knits when its not a presentation but will not knit as a presentation the code works and it displays in R no plotly graph will knit for me anymore i have no idea whats going on

x=mtcars$disp
y=mtcars$hp
b0_grid=seq(min(y),max(y), length.out=40)
b1_grid=seq(-1,2, length.out=40)
SSE=outer(
  b0_grid,b1_grid,
  Vectorize(function(b0,b1) sum((y - (b0+b1*x))^2)))
df=expand.grid(b0=b0_grid,b1=b1_grid)
df$SSE=as.vector(SSE)
p= plot_ly(
  x=~b1_grid,y=~b0_grid,z=~SSE,
  type="surface",
  width="200",height="200"
)%>%
layout(title="SSE(Beta0,Beta1) surface for hp=Beta0+Beta1*Disp+epsilon",
  scene=list(xaxis=list(title="B1"),yaxis=list(title="B0"),
    zaxis=list(title="SSE")
  )
)
p

HP Vs Disp code

ggplot(mtcars,aes(x=disp,y=hp)) +geom_point() +geom_smooth(method = “lm”,se=TRUE) + labs( title=“Horsepower vs Displacement (SLR bestfit)”, x=“Displacement (cu. in.)”, y=“Horsepower (HP)”)