Week 1
February 26, 2019
What to expect from this course? Â
Analysis of variance and linear regression are closely related linear models and can both be specified with the same function (lm
) using a formula notation. This approach uses the '~' symbol (tilde), which reads 'as a function of' and connects the left hand side (LHS) of the formula with the right hand side (RHS), i.e. it links the response variable to the explanatory variable(s).
lm(y ~ x, data = ...) lm(y ~ x1 + x2, data = ...)
Interactive effects of the explantory variables on the predictor, so-called interactions are specified using the colon ':
'. To save ourselves some coding, we can simply use the shorthand asterisk '*
' notation to include all main effects and interactions.
lm(y ~ x1 + x2 + x1:x2, data = ...) ## Can be written as: lm(y ~ x1 * x2, data = ...) ## ...even with more than two explanatory variables lm(y ~ x1 * x2 * x3, data = ...)
## Using the built-in data set PlantGrowth data(PlantGrowth) str(PlantGrowth)
## 'data.frame': 30 obs. of 2 variables: ## $ weight: num 4.17 5.58 5.18 6.11 4.5 4.61 5.17 4.53 5.33 5.14 ... ## $ group : Factor w/ 3 levels "ctrl","trt1",..: 1 1 1 1 1 1 1 1 1 1 ...
summary(PlantGrowth)
## weight group ## Min. :3.590 ctrl:10 ## 1st Qu.:4.550 trt1:10 ## Median :5.155 trt2:10 ## Mean :5.073 ## 3rd Qu.:5.530 ## Max. :6.310
str
, summary
) and plot your data before you get carried away with statistical modelling. What do you conclude looking at the boxplot below?