library(datasets)
library(ggplot2)
data(Orange)

Below, I’ve fitted a linear model to a data set containing the ages (the number of days that have passed since tracking began) and the circumferences of orange trees. The data appeared sufficiently linear to fit a model, \(y=17.40+0.11x\), indicating that for nearly every ten days, the trees grew one millimeter in circumference. Based on the stastics, the model appears to be reasonably good, since the p-value is very low and the correlation coefficient, 0.83, is close to 1, indicating a strong correlation. The five number summary of the residuals also appears to be relatively normally distributed, with a median very close to 0. However, the plot of the residuals does not have constant variance, and the q-q plot is moderately linear. Overall, this appears to be a relatively strong linear model.

ggplot(Orange, mapping = aes(x = age, y = circumference)) + 
  geom_point() +
  labs(title = "Tree Circumference vs. Age", x = "Age (days)", y = "Circumference (mm)")

orange_lm <- lm(circumference ~ age, data = Orange)
summary(orange_lm)
## 
## Call:
## lm(formula = circumference ~ age, data = Orange)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -46.310 -14.946  -0.076  19.697  45.111 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 17.399650   8.622660   2.018   0.0518 .  
## age          0.106770   0.008277  12.900 1.93e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 23.74 on 33 degrees of freedom
## Multiple R-squared:  0.8345, Adjusted R-squared:  0.8295 
## F-statistic: 166.4 on 1 and 33 DF,  p-value: 1.931e-14
ggplot(Orange, mapping = aes(x = age, y = circumference)) + 
  geom_point() +
  stat_smooth(method = "lm", se = FALSE) +
  labs(title = "Linear Model: Tree Circumference vs. Age", x = "Age (days)", y = "Circumference (mm)")
## `geom_smooth()` using formula = 'y ~ x'

 plot(fitted(orange_lm),resid(orange_lm))

ggplot(data = orange_lm, aes(sample = .resid)) +
  stat_qq()

par(mfrow=c(2,2))
plot(orange_lm)