2025-11-09

Value of creating a Best Fit Line

  • Identify relations between variables
  • Helps predict values
  • Easy to interpret
  • Low computational requirements

Description of Least Squares Regression

  • Describes the line with the least predictive error
  • Predictive error evaluated as least sum of squares
  • “Squares” are the squared value of each data point’s displacement from the prediction line
  • Generally the best fit line will run through the middle of the data

Example graph with prediction line

## `geom_smooth()` using formula = 'y ~ x'

How to compute a prediction line

  • Find the means of both independent and dependent variables
  • Find the difference between each point and the means
  • Square these differences
  • Plug into this equation: \(m = \sum_{}^{}(x - \bar{x}) * (y - \bar{y}) / (\sum_{}^{}(x-\bar{x})^2)\)
  • Plug b into \(\bar{y} = m\bar{x}+b\) to find b

Example Chart

  Girth Volume     xbar     ybar xDifferences yDifferences xDiffSquared
1   8.3   10.3 13.24839 30.17097     4.948387     19.87097    24.486535
2   8.6   10.3 13.24839 30.17097     4.648387     19.87097    21.607503
3   8.8   10.2 13.24839 30.17097     4.448387     19.97097    19.788148
4  10.5   16.4 13.24839 30.17097     2.748387     13.77097     7.553632
5  10.7   18.8 13.24839 30.17097     2.548387     11.37097     6.494277
6  10.8   19.7 13.24839 30.17097     2.448387     10.47097     5.994599
   Product
1 98.32924
2 92.36795
3 88.83860
4 37.84795
5 28.97763
6 25.63698

We get a slope of 5.1

Code to create the model automatically from data

  • In R it is very simple
  • You can use code instead of calculating it all by hand
  • For example:
model <- lm(Volume ~ Girth, data = trees)
print(model)
Call:
lm(formula = Volume ~ Girth, data = trees)

Coefficients:
(Intercept)        Girth  
    -36.943        5.066  
  • It’s that easy!

How to use these numbers:

Plug a value of your choosing into \(y = 5.066*x - 36.943\) If we have a tree with Girth of 14.0, we would expect the Volume to be about \(Volume = (5.066*14)-36.943\) That comes out to 33.981, which is similar to the number in our actual results (34.5)

Plot showing our data and prediction line in ggplot

h <- ggplot(data = trees, aes(x = Girth, y = Volume)) + geom_point()
h + geom_smooth(method='lm')
`geom_smooth()` using formula = 'y ~ x'

One More Graphical Example with plotly!

Other uses

  • Use Hooke’s Law and Best fit line to estimate a spring constant
  • Experimentally estimate an object’s friction coefficient
  • You can even use the log of a variable if the true relationship is exponential

Resummarize the steps

Now you can do this yourself!