- X = independent variable set by researcher
- Y = dependent variable measured by researcher
- Linear Regression is the Expected change in Y per unit
2023-09-15
data(iris) str(iris)
'data.frame': 150 obs. of 5 variables: $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ... $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ... $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ... $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
\[
y = a + bX
\] - y is predicted average of Y at a given X
- a is the intercept
- b is the slope
Limiting the data to a single species makes the information more useful
\[
\beta = \text{b} \pm \text{(t}_\text{n-2.975} \text{)(se}_\text{b})
\] Point estimate for slope
Plus/minus 97.5 precentile from t table times the standard error of slope calculated from standard error of regression.
fig9 <- ggplot(data = iris, aes(
x = Sepal.Length,
y = Sepal.Width,
col = Species,
shape = Species)) +
geom_point(size = 3)+
scale_color_manual(values = c("setosa" = "orchid4",
"versicolor" ="maroon",
"virginica"="steelblue")) +
theme_classic() +
labs(
title = "Iris Sepal Width vs. Length",
subtitle = "Species Comparison",
caption = "Data from 'iris'",
x = "Sepal Length",
y = "Sepal Width") +
geom_smooth(formula = y ~ x,method = "lm", se=FALSE) +
stat_regline_equation()