Linear Regression

2024-06-09

Linear Regression

Given input term(s), we want to be able to predict an output/response using a linear model
The simplest case: 1 input

Longley’s Economic Regression Data, showing the relationship between the number of people employed and gross national product.

R function `lm()`

lm_econ <- lm(GNP ~ Employed, data=longley)

The lm()(linear model) function accepts the inputs in the order of response ~ terms, plus the dataset in question.
By calling the summary() function on lm_econ, we can see useful facts about the regression model. (See next slides.)
For the current data, it will match a \(y = mx + b\) formula.

`lm()` continued

The first column of numbers under “Coefficients” shows the y-intercept and the slope, which is, in this case, the weight assigned to the variable “Employed”. The R-squared values are very close to 1, indicating a good fit.

`lm()` continued

We can also see the p-values in the column Pr(>|t|). In this instance, the p-value is much smaller than the common threshold of 0.05, again indicating that our model is a good predictor.

Graphing linear regression (plotly)

We add y = fitted(lm_econ) as an argument to plotly’s add_lines() to graph our line of regression.

Graphing linear regression (ggplot2)

We can employ geom_smooth(method = "lm") without using the lm_econ we previously calculated. By default, ggplot2 shows the confidence interval.

Graphing residuals

Graphing residuals is another way of checking the validity of our model. They should look fairly “random”, if not, our model might be a bad fit.

Multiple input terms

For \(n\) input terms, instead of a simple \(y = mx+b\) formula, our equation would resemble something more like \[y = w_{0} + w_{1}x_{1} + w_{2}x_{2} + ... + w_{n}x_{n} \]
\(w_{0}\) replaces the y-intercept from our simple 2-d formula, and all the other \(w\) terms are weights for the input variables, giving us a vector of weights, \(\mathbf{w}\)

Multiple input terms, continued

For the function lm(), we can add multiple inputs using +:

lm_econ_multi <- lm(GNP ~ Employed + Armed.Forces, data=longley)

With such a high p-value, the number of people in the armed forces is a bad predictor of GNP.

Linear Regression

R function lm()

lm() continued

lm() continued

Graphing linear regression (plotly)

Graphing linear regression (ggplot2)

Graphing residuals

Multiple input terms

Multiple input terms, continued

R function `lm()`

`lm()` continued

`lm()` continued