Exercises for the meeting of 29 April, 2019

The reading material for this exercise the same as for the previous (the second part of ”Introduction to R” (ch 7-12), found from the R home page or https://cran.r-project.org/doc/manuals/r-release/R-intro.html)

The exercise will be discussed on Mon, Apr 29.

##Exercise 1 Read the help page of ?formula, ?model.frame, and ?model.matrix

#?formula
#?model.frame
#?model.matrix

##Exercise 2 Create a formula, e.g. by f <- y ~ x after creating this what does this object mean, or do? Does object f exist? Do x and y exist?

as.formula(f <- y ~ x)
## y ~ x
class(f)
## [1] "formula"
f
## y ~ x
#Calling variable x and y will generate an error in the code because it does not exist in the environment yet.
#x
#y

The code f <- y ~ x defines an unevaluated expression. f exists as storage for the definition of the arguments x as the dependent variable and y as an independent variable, but without making use of them. Consequently, x and y does not exist yet in the working environment.

##Exercise 3 Create x and y, e.g. x <- 1:10 and y <- 10:1, and evaluate model.frame(f) and model.frame(f, data.frame(x=x,y=y)[1:3,]), and explain where x and y are taken from.

x <- 1:10
y<-10:1
model.frame(f)
##     y  x
## 1  10  1
## 2   9  2
## 3   8  3
## 4   7  4
## 5   6  5
## 6   5  6
## 7   4  7
## 8   3  8
## 9   2  9
## 10  1 10
model.frame(f, data.frame(x=x,y=y)[1:3,])
##    y x
## 1 10 1
## 2  9 2
## 3  8 3

model.frame(f) shows the arguments defined for the formula f and xand y are taken from the list definition x <- 1:10 and y <- 10:1 respectively that currently exists as variables in the Global Environment. For model.frame(f, data.frame(x=x,y=y)[1:3,]), x and y are a cloned/overwritten version of the original objects limited by the first 3 rows.

##Exercise 4 Expressions in formula’s: explain the difference you see from model.frame(y~x+1) and model.frame(y~I(x+1))

model.frame(y~x+1)
##     y  x
## 1  10  1
## 2   9  2
## 3   8  3
## 4   7  4
## 5   6  5
## 6   5  6
## 7   4  7
## 8   3  8
## 9   2  9
## 10  1 10
model.frame(y~I(x+1))
##     y I(x + 1)
## 1  10        2
## 2   9        3
## 3   8        4
## 4   7        5
## 5   6        6
## 6   5        7
## 7   4        8
## 8   3        9
## 9   2       10
## 10  1       11

For model.frame(y~x+1) , the function will return the data frame with the formula definition with y as the independent variable and x+1 as dependent variables. The symbol + is used to separate the terms and do their inclusion in the model. In model.frame(y~I(x+1)), The use of function I() will allow the use of the operator + as an arithmetic operator. Consequently, it will perform a sum of terms. This behavior comes from the Wilkinson-Rogers (1973) Notation

##Exercise 5 Factors in formula’s. Define a factor, f <- factor(c(rep("a",3), rep("b", 3),rep("c", 4))). Explain how it will work in a linear model, say lm(y ~ x + f)

f <- factor(c(rep("a",3), rep("b", 3),rep("c", 4)))
f
##  [1] a a a b b b c c c c
## Levels: a b c
lm(y ~ x + f)
## 
## Call:
## lm(formula = y ~ x + f)
## 
## Coefficients:
## (Intercept)            x           fb           fc  
##   1.100e+01   -1.000e+00   -1.157e-16    9.060e-16

In lm(y ~ x + f), f is an inclusion variable in the model. It will work as a simple factorial model formulae (Wilkinson-Rogers, 1973). The observations y will appear indexed by the factors in f with levels specified in f previously (this can check by the value of f).

##Exercise 6 Model.matrix: explain the difference between model.frame(y~x+f) and model.matrix(y~x+f)

model.frame(y~x+f)
##     y  x f
## 1  10  1 a
## 2   9  2 a
## 3   8  3 a
## 4   7  4 b
## 5   6  5 b
## 6   5  6 b
## 7   4  7 c
## 8   3  8 c
## 9   2  9 c
## 10  1 10 c

The function model.frame(y~x+f) is used to return the model data.frame from the fitted object without additional arguments, the result is an object with the outcome covariates, in this case, f factor levels against values in y and x respectively.

model.matrix(y~x+f)
##    (Intercept)  x fb fc
## 1            1  1  0  0
## 2            1  2  0  0
## 3            1  3  0  0
## 4            1  4  1  0
## 5            1  5  1  0
## 6            1  6  1  0
## 7            1  7  0  1
## 8            1  8  0  1
## 9            1  9  0  1
## 10           1 10  0  1
## attr(,"assign")
## [1] 0 1 2 2
## attr(,"contrasts")
## attr(,"contrasts")$f
## [1] "contr.treatment"

On the other hand, model.matrix(y~x+f) returns a matrix with already computed values for the model. In this case, the intercepts. The factor levels ‘a,b,c’ has been converted into numeric indicators. As a general rule from L levels the model.matrix() will generated L-1 indicator variables as dummy variables. Using the function model.matrix is useful to get the least squares estimates in order to fit the model.

This is helpful in linear model design to identify the exploratory variables and allocate, for example, the control groups in the data analysis by making use of factors as a binary combination.