2025-03-15

What is SLR?

    • Regression is a mathematical way of predicting a numerical value or drawing inferences based on data
    • Simple Linear Regression, or SLR, is a form of regression used when there are two variables- an input (x), called the predictor, and an output (y), called the response
    • As the name implies, SLR predicts best when there is a linear relationship between the variables
    • SLR works by finding the best slope-intercept values, ’ \(m\) ’
      and ’ \(b\) ’, in the familiar equation:
    \[ y = mx + b \]

Real Applications of SLR

    Some of the ways SLR can be used to provide helpful insight:

    • Predicting the future value of a house based on its historical price over the years
    • Analyzing the impact of time spent playing sports (x) on average GPA (y)
    • Determining the relationship between age and memory
    • Predicting the circumference of a tree based on age (or vice versa)
    And so many more! Let’s pick one to dive deeper into…

Orange Trees: Age vs Circumference

    This data set tracks the growth of orange trees, with 35 instances, or rows. The predictor variable is Age (in years), and the response variable is Circumference (in millimeters). The Tree column marks the tree being measured for the data point.

    Let’s look at a visual representation of the predictor and response variables…

Scatter Plot of Age vs Circumference: Code

    Here is the code in R that will generate a plotly scatter plot of our data:
mod = lm(circumference ~ age, data=Orange) 
x = Orange$age; y = Orange$circumference
x_axis = list(title = "Age", titlefont = list(
  family = "Modern Computer Roman"))
y_axis = list(title = "Circumference", titlefont = 
                list(family = "Modern Computer Roman"))
graph = plot_ly(x = ~x, y = ~y, type = "scatter", mode = "markers",
                name = "Data", marker = list(color = "#8B4513", 
                size = 8)) %>%
  add_lines(x = ~x, y = ~fitted(mod), name = "Best Fit",
              line = list(color = "#FA9C1C", width = 3)) %>%
  layout(xaxis = x_axis, yaxis = y_axis)

Scatter Plot of Age vs Circumference

    There is a fairly strong positive linear relationship between the age of a tree and its circumference.

    Let’s see how else we can represent this data with a little more specifics…

GGPlot for Tree Comparison

Which Tree Grows the Fastest?

    Using this graph and the SLR lines, we can determine the rate at which each tree grew.

    The line with the greatest slope is Tree 4’s, meaning this tree tended to grow the quickest: \(\small y = 14.6 + 0.135x\)

    The line with the smallest slope is Tree 3’s, meaning this tree tended to grow the slowest: \(\small y = 19.2 + 0.0811x\)

Further Applications

    This SLR analysis is helpful because it can lead to other insights; for example, a researcher could measure the amount of water, sunlight, and the temperature for tree 4 versus tree 3 to determine optimal and sub optimal conditions for tree growth.

    ~~~~

    One other important consideration of Simple Linear Regression is how well the model fits the data.

    Let’s look at the error of the regression lines for each tree.

Error: Scatter Plot of Residuals


    In this case, a residual is the difference between the actual circumference of a tree and what the tree’s regression equation estimates the circumference to be for each data point…

Mean Average Error

    One way of analyzing error is to find the Mean Average Error, or MAE. The better the fit, the closer the MAE will be to zero.

    Let’s look at the MAE equation, and the MAE for each tree: \[ MAE = \scriptsize \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y}_i| \]
    The regression was the best fit for tree 5 (meaning tree 5 grew the most linearly)
    The regression was the worst fit for tree 4 (meaning tree 4 grew the least linearly)

    ##  Tree       MAE
    ##     3 21.857558
    ##     1 16.286129
    ##     5  7.718817
    ##     2 19.450425
    ##     4 25.450425

End

    That was a quick lesson in Simple Linear Regression, go explore on your own!


    🙂