Best Fit Lines

Jared Cross

3/18/2021

What is a Best Fit Line?

.Note: There are MANY different types of best fit lines. The most common type, which we’ll discuss here, is an “ordinary least squares” best-fit line.

best fit line app

A Graph of Cubits and Feet

The best fit line passes the point (\(\bar{x}\), \(\bar{y}\)).

A Graph of Standard Scores of Cubits and Feet

## [1] 0.406289

The best-fit line for a z-score v. z-score graph passes through (0,0) and has a slope is the correlation between x and y.

A Little Algebra

\[y = mx + b\]

\[m = r \cdot \frac{\sigma_y}{\sigma_x}\]

and since it passes through (\(\bar{x}\), \(\bar{y}\)), we know that \(\bar{y} = m \cdot \bar{x} + b\) which rearranges to: \(y = m \cdot x + \bar{y} - m\bar{x}\) and thus

\[b = \bar{y} - m\bar{x}\]

Code in R

lm(cubit ~ foot, data=mes)
## 
## Call:
## lm(formula = cubit ~ foot, data = mes)
## 
## Coefficients:
## (Intercept)         foot  
##     19.2385       0.9472

which means:

\[cubit = 0.947 \cdot foot + 19.2\] (in centimeters)

Predictions

In the case of cubits and feet we get the equation:

\[cubit = 0.947 \cdot foot + 19.2\] (in centimeters)

If we know that someone’s foot is 28 cm long, we would guess that their cubit is, \(0.947 \cdot 28 + 19.2 = 45.7\) cm long

Bootstrapping!

To get the uncertainty in the slope…

bootstrapped_equations <- replicate(1e3,
{m <- lm(cubit~ foot, data=sample_frac(mes, 1, replace = TRUE));
coef(m)})
#slope
mean(bootstrapped_equations[2,])
## [1] 0.965405
sd(bootstrapped_equations[2,])
## [1] 0.6338283
#intercept
mean(bootstrapped_equations[1,])
## [1] 18.87761
sd(bootstrapped_equations[1,])
## [1] 16.71232

Or…

m <- lm(cubit ~ foot, data=mes)
summary(m)
## 
## Call:
## lm(formula = cubit ~ foot, data = mes)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.4976 -1.7864 -0.1113  0.5611 11.3688 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)  19.2385    17.1206   1.124    0.287
## foot          0.9472     0.6736   1.406    0.190
## 
## Residual standard error: 4.566 on 10 degrees of freedom
## Multiple R-squared:  0.1651, Adjusted R-squared:  0.08158 
## F-statistic: 1.977 on 1 and 10 DF,  p-value: 0.19