library(datasets)
library(ggplot2)
data(Orange)
Below, I’ve fitted a linear model to a data set containing the ages (the number of days that have passed since tracking began) and the circumferences of orange trees. The data appeared sufficiently linear to fit a model, \(y=17.40+0.11x\), indicating that for nearly every ten days, the trees grew one millimeter in circumference. Based on the stastics, the model appears to be reasonably good, since the p-value is very low and the correlation coefficient, 0.83, is close to 1, indicating a strong correlation. The five number summary of the residuals also appears to be relatively normally distributed, with a median very close to 0. However, the plot of the residuals does not have constant variance, and the q-q plot is moderately linear. Overall, this appears to be a relatively strong linear model.
ggplot(Orange, mapping = aes(x = age, y = circumference)) +
geom_point() +
labs(title = "Tree Circumference vs. Age", x = "Age (days)", y = "Circumference (mm)")
orange_lm <- lm(circumference ~ age, data = Orange)
summary(orange_lm)
##
## Call:
## lm(formula = circumference ~ age, data = Orange)
##
## Residuals:
## Min 1Q Median 3Q Max
## -46.310 -14.946 -0.076 19.697 45.111
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17.399650 8.622660 2.018 0.0518 .
## age 0.106770 0.008277 12.900 1.93e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 23.74 on 33 degrees of freedom
## Multiple R-squared: 0.8345, Adjusted R-squared: 0.8295
## F-statistic: 166.4 on 1 and 33 DF, p-value: 1.931e-14
ggplot(Orange, mapping = aes(x = age, y = circumference)) +
geom_point() +
stat_smooth(method = "lm", se = FALSE) +
labs(title = "Linear Model: Tree Circumference vs. Age", x = "Age (days)", y = "Circumference (mm)")
## `geom_smooth()` using formula = 'y ~ x'
plot(fitted(orange_lm),resid(orange_lm))
ggplot(data = orange_lm, aes(sample = .resid)) +
stat_qq()
par(mfrow=c(2,2))
plot(orange_lm)