- We’ll be going over the concept of Simple Linear Regression
- What it’s used for, and how it connects to the concept of Confidence Interval
- How it can be applied to the Trees data set
2025-03-16
Simple Linear Regression is a way to find a linear relationship that describes a possible correlation between an independent(x) and dependent variable (y).
\(y = \beta_0 + \beta_1 x + \epsilon\)
Where: \(y\) is the dependent variable, \(x\) is the independent variable, \(\beta_0\) is the y-intercept, \(\beta_1\) is the slope, and \(\epsilon\) is the term that accounts for error.
The Confidence Interval of a data set is a range of values where a certain piece of data is likely to exist and is expressed as a percentage of Confidence. A higher confidence typically means a wider interval of data.
To Calculate the Confidence Interval of a data set, you’ll need find its Z Value which is calculated by: \(Z = \frac{x - \mu}{\sigma}\)
Where: \(x\) is the individual data point, \(\mu\) is the mean of the data set, \(\sigma\) is the standard deviation.
Based on this, we can use this table to calculate the Confidence Interval Based on the Z-Value:
| Z.Score | Confidence.Interval |
|---|---|
| 1.280 | 80% |
| 1.645 | 90% |
| 1.960 | 95% |
| 2.330 | 98% |
| 2.580 | 99% |
Here’s a plot of the Trees data set describing the correlation of Girth and Volume. The orange line you see describes the Linear Regression of the two variables. As you can see, there’s a linear, positive correlation between Girth and Volume of a tree.
Let’s describe the correlation between Girth and Volume of Trees using GGPlot:
The gray area that you see on the graph represents data that has a 95% Confidence Interval Value, meaning we are 95% confident that we can we can guess a correct value within that interval.
`geom_smooth()` using formula = 'y ~ x'
If we increase the level to 99%, we see that it grows in size to compensate for the ability to be confident.
g = ggplot(data = trees, aes(x = Girth, y = Volume)) + geom_point() g+geom_smooth(method = 'lm',level = .99) + ylim(0,80)
`geom_smooth()` using formula = 'y ~ x'