The reading material for this exercise the same as for the previous (the second part of ”Introduction to R” (ch 7-12), found from the R home page or https://cran.r-project.org/doc/manuals/r-release/R-intro.html)
The exercise will be discussed on Mon, Apr 29.
##Exercise 1 Read the help page of
?formula, ?model.frame, and ?model.matrix
#?formula
#?model.frame
#?model.matrix##Exercise 2 Create a formula, e.g. by f <- y ~ x
after creating this what does this object mean, or do? Does object
f exist? Do x and y exist?
as.formula(f <- y ~ x)## y ~ x
class(f)## [1] "formula"
f## y ~ x
#Calling variable x and y will generate an error in the code because it does not exist in the environment yet.
#x
#yThe code f <- y ~ x defines an unevaluated
expression. f exists as storage for the definition of the
arguments x as the dependent variable and y as
an independent variable, but without making use of them. Consequently,
x and y does not exist yet in the working
environment.
##Exercise 3 Create x and y,
e.g. x <- 1:10 and y <- 10:1, and
evaluate model.frame(f) and
model.frame(f, data.frame(x=x,y=y)[1:3,]), and explain
where x and y are taken from.
x <- 1:10
y<-10:1
model.frame(f)## y x
## 1 10 1
## 2 9 2
## 3 8 3
## 4 7 4
## 5 6 5
## 6 5 6
## 7 4 7
## 8 3 8
## 9 2 9
## 10 1 10
model.frame(f, data.frame(x=x,y=y)[1:3,])## y x
## 1 10 1
## 2 9 2
## 3 8 3
model.frame(f) shows the arguments defined for the
formula f and xand y are taken
from the list definition x <- 1:10 and
y <- 10:1 respectively that currently exists as
variables in the Global Environment. For
model.frame(f, data.frame(x=x,y=y)[1:3,]), x
and y are a cloned/overwritten version of the original
objects limited by the first 3 rows.
##Exercise 4 Expressions in formula’s: explain the difference you see
from model.frame(y~x+1) and
model.frame(y~I(x+1))
model.frame(y~x+1)## y x
## 1 10 1
## 2 9 2
## 3 8 3
## 4 7 4
## 5 6 5
## 6 5 6
## 7 4 7
## 8 3 8
## 9 2 9
## 10 1 10
model.frame(y~I(x+1))## y I(x + 1)
## 1 10 2
## 2 9 3
## 3 8 4
## 4 7 5
## 5 6 6
## 6 5 7
## 7 4 8
## 8 3 9
## 9 2 10
## 10 1 11
For model.frame(y~x+1) , the function will return
the data frame with the formula definition with y as the
independent variable and x+1 as dependent variables. The
symbol + is used to separate the terms and do their
inclusion in the model. In model.frame(y~I(x+1)), The use
of function I() will allow the use of the operator
+ as an arithmetic operator. Consequently, it will perform
a sum of terms. This behavior comes from the Wilkinson-Rogers (1973)
Notation
##Exercise 5 Factors in formula’s. Define a factor,
f <- factor(c(rep("a",3),
rep("b", 3),rep("c", 4))). Explain how it will
work in a linear model, say lm(y ~ x + f)
f <- factor(c(rep("a",3), rep("b", 3),rep("c", 4)))
f## [1] a a a b b b c c c c
## Levels: a b c
lm(y ~ x + f)##
## Call:
## lm(formula = y ~ x + f)
##
## Coefficients:
## (Intercept) x fb fc
## 1.100e+01 -1.000e+00 -1.157e-16 9.060e-16
In lm(y ~ x + f), f is an inclusion
variable in the model. It will work as a simple factorial model
formulae (Wilkinson-Rogers, 1973). The observations
y will appear indexed by the factors in f with
levels specified in f previously (this can check by the
value of f).
##Exercise 6 Model.matrix: explain the difference between
model.frame(y~x+f) and model.matrix(y~x+f)
model.frame(y~x+f)## y x f
## 1 10 1 a
## 2 9 2 a
## 3 8 3 a
## 4 7 4 b
## 5 6 5 b
## 6 5 6 b
## 7 4 7 c
## 8 3 8 c
## 9 2 9 c
## 10 1 10 c
The function model.frame(y~x+f) is used to return
the model data.frame from the fitted object without
additional arguments, the result is an object with the outcome
covariates, in this case, f factor levels against values in
y and x respectively.
model.matrix(y~x+f)## (Intercept) x fb fc
## 1 1 1 0 0
## 2 1 2 0 0
## 3 1 3 0 0
## 4 1 4 1 0
## 5 1 5 1 0
## 6 1 6 1 0
## 7 1 7 0 1
## 8 1 8 0 1
## 9 1 9 0 1
## 10 1 10 0 1
## attr(,"assign")
## [1] 0 1 2 2
## attr(,"contrasts")
## attr(,"contrasts")$f
## [1] "contr.treatment"
On the other hand, model.matrix(y~x+f) returns a
matrix with already computed values for the model. In this case, the
intercepts. The factor levels ‘a,b,c’ has been converted into numeric
indicators. As a general rule from L levels the
model.matrix() will generated L-1 indicator
variables as dummy variables. Using the function
model.matrix is useful to get the least squares estimates
in order to fit the model.
This is helpful in linear model design to identify the exploratory variables and allocate, for example, the control groups in the data analysis by making use of factors as a binary combination.