Exercise 8.5.1 to 8.5.3

(a) Create a vector of three elements (2,4,6) and name that vector vec_a. Create a second vector, vec_b, that contains (8,10,12). Add these two vectors together and name the result vec_c.

vec_a <- c(2,4,6)
vec_b <- c(8,10,12)
vec_c <- vec_a + vec_b
vec_c
## [1] 10 14 18

(b) Create a vector, named vec_d, that contains only two elements (14,20). Add this vector to vec_a. What is the result and what do you think R did (look up the recycling rule using Google)? What is the warning message that R gives you?

vec_d <- c(14,20)
vec_d + vec_a
## Warning in vec_d + vec_a: longer object length is not a multiple of shorter
## object length
## [1] 16 24 20

I think that R will give me a warning that there aren’t the same number of values in each vector. My guess is that it will start at the beginning of vec_d to “fill in the blanks” until each value in vec_a has something added to it? So, in this case, it would add 14 (vec_d[1]) to 6 (vec_a[3]) and be done.

(c) Next add 5 to the vector vec_a. What is the result and what did R do? Why doesn’t in give you a warning message similar to what you saw in the previous problem?

vec_a + 5
## [1]  7  9 11

R added 5 to each element of vec_a. The longer vector (vec_a) is a multiple of the length of 5 (1), so 5 could be added to each value of vec_a without any hanging chads, if you will.

Exercise 8.5.4

Generate the vector of integers {1, 2,…5} in two different ways.

(a) First using the seq() function

vec_int <- seq(1, 5, 1)
vec_int
## [1] 1 2 3 4 5

(b) Using the a:b shortcut

vec_int <- 1:5
vec_int
## [1] 1 2 3 4 5

Exercise 8.5.5

Generate the vector of even numbers {2, 4, 6,…20}

(a) Using the sequence function.

vec_even <- seq(2, 20, 2)
vec_even
##  [1]  2  4  6  8 10 12 14 16 18 20

(b) Using the a:b shortcut and some subsequent algebra.

vec_even <- 1:10
vec_even*2
##  [1]  2  4  6  8 10 12 14 16 18 20

Exercise 8.5.6

Generate a vector of 21 elements that are evenly placed between 0 and 1 using the seq() command and name this vector x.

x <- seq(0, 1, length.out=21)
x
##  [1] 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70
## [16] 0.75 0.80 0.85 0.90 0.95 1.00

Exercise 8.5.7

Generate the vector {2, 4, 8, 2, 4, 8, 2, 4, 8} using the rep() command.

vec <- rep(c(2, 4, 8), 3)
vec
## [1] 2 4 8 2 4 8 2 4 8

Exercise 8.5.8

Generate the vector {2, 2, 2, 2, 4, 4, 4, 4, 8, 8, 8, 8} using the rep() command.

vec <- rep(c(2, 4, 8), each=4)
vec
##  [1] 2 2 2 2 4 4 4 4 8 8 8 8

Exercise 8.5.9

The vector letters is a built-in vector to R and contains the lower case English alphabet.

(a) Extract the 9th element of the letters vector.

letters[9]
## [1] "i"

(b) Extract the sub-vector that contains the 9th, 11th, and 19th elements.

letters[c(9, 11, 19)]
## [1] "i" "k" "s"

(c) Extract the sub-vector that contains everything except the last two elements.

letters[c(25,26)*-1]
##  [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s"
## [20] "t" "u" "v" "w" "x"

Exercise 8.5.10

In this problem, we will work with the matrix:

\[ \left[\begin{array}{ccccc} 2 & 4 & 6 & 8 & 10\\ 12 & 14 & 16 & 18 & 20\\ 22 & 24 & 26 & 28 & 30 \end{array}\right]\]

(a) Create the matrix in two ways and save the resulting matrix as M.

(i) Create the matrix using some combination of the seq() and matrix() commands.
M <- matrix(seq(2, 30, 2), nrow=3, ncol=5, byrow=TRUE)
M
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    2    4    6    8   10
## [2,]   12   14   16   18   20
## [3,]   22   24   26   28   30
(ii) Create the same matrix by some combination of multiple seq() commands and either the rbind() or cbind() command.
m1 <- seq(2, 10, 2)
m2 <- seq(12, 20, 2)
m3 <- seq(22, 30, 2)

M <- rbind(m1, m2, m3)
M
##    [,1] [,2] [,3] [,4] [,5]
## m1    2    4    6    8   10
## m2   12   14   16   18   20
## m3   22   24   26   28   30

(b) Extract the second row out of M.

M[2, ]
## [1] 12 14 16 18 20

(c) Extract the element in the third row and second column of M.

unname(M[3,2])
## [1] 24

Exercise 8.5.11

Create and manipulate a data frame.

(a) Create a data.frame named my.trees that has the following columns:

Girth = {8.3, 8.6, 8.8, 10.5, 10.7, 10.8, 11.0} Height= {70, 65, 63, 72, 81, 83, 66} Volume= {10.3, 10.3, 10.2, 16.4, 18.8, 19.7, 15.6}

Girth Height Volume
8.3 70 10.3
8.6 65 10.3
8.8 63 10.2
10.5 72 16.4
10.7 81 18.8
10.8 83 19.7
11.0 66 15.6

(b) Without using dplyr functions, extract the third observation (i.e. the third row).

my.trees[3,]
##   Girth Height Volume
## 3   8.8     63   10.2

(c) Without using dplyr functions, extract the Girth column referring to it by name (don’t use whatever order you placed the columns in).

my.trees[ , "Girth"]
## [1]  8.3  8.6  8.8 10.5 10.7 10.8 11.0

(d) Without using dplyr functions, print out a data frame of all the observations except for the fourth observation. (i.e. Remove the fourth observation/row.)

my.trees[-4, ]
##   Girth Height Volume
## 1   8.3     70   10.3
## 2   8.6     65   10.3
## 3   8.8     63   10.2
## 5  10.7     81   18.8
## 6  10.8     83   19.7
## 7  11.0     66   15.6

(e) Without using dplyr functions, use the which() command to create a vector of row indices that have a girth greater than 10. Call that vector index.

index <- which(my.trees$Girth > 10)
index
## [1] 4 5 6 7

(f) Without using dplyr functions, use the index vector to create a small data set with just the large girth trees.

small.data <- my.trees[index, ]
small.data
##   Girth Height Volume
## 4  10.5     72   16.4
## 5  10.7     81   18.8
## 6  10.8     83   19.7
## 7  11.0     66   15.6

(g) Without using dplyr functions, use the index vector to create a small data set with just the small girth trees.

smaller.data <- my.trees[-index, ]
smaller.data
##   Girth Height Volume
## 1   8.3     70   10.3
## 2   8.6     65   10.3
## 3   8.8     63   10.2

Exercise 8.5.12

The following code creates a data.frame and then has two different methods for removing the rows with NA values in the column Grade. Explain the difference between the two.

df <- data.frame(name= c('Alice','Bob','Charlie','Daniel'),
                 Grade = c(6,8,NA,9))

df[ -which(  is.na(df$Grade) ), ]
##     name Grade
## 1  Alice     6
## 2    Bob     8
## 4 Daniel     9
df[  which( !is.na(df$Grade) ), ]
##     name Grade
## 1  Alice     6
## 2    Bob     8
## 4 Daniel     9

With the (-) before which() in the first option, R is dropping all of the rows that evaluate to TRUE within which(); in this case, any row that is NA in the Grade column will be dropped. In the second option, R is selecting all of the rows that evaluate to true within which(); in this case, those values are not (!) NA.

Exercise 8.5.13

Creation of data frames is usually done by binding together vectors while using seq and rep commands. However often we need to create a data frame that contains all possible combinations of several variables. The function expand.grid() addresses this need.

A fun example of using this function is making several graphs of the standard normal distribution versus the t-distribution. Use the expand.grid function to create a data.frame with all combinations of x=seq(-4,4,by=.01), dist=c(‘Normal’,‘t’), and df=c(2,3,4,5,10,15,20,30). Use the dplyr::mutate command with the if_else command to generate the function heights y using either dt(x,df) or dnorm(x) depending on what is in the distribution column.

newdata <- 
  expand.grid(x=seq(-4,4,by=0.1), dist=c('Normal', 't'), df=c(2, 3, 4, 5, 10, 15, 20, 30)) %>%
  mutate(y=if_else(dist=='t', dt(x, df), dnorm(x)))

newdata %>% ggplot(aes(x=x, y=y, color=dist)) +
  geom_line() +
  facet_wrap(df~.) +
  labs(title="T vs Normal Dist for different DF") +
  theme_minimal()

Exercise 8.5.14

Create and manipulate a list.

(a) Create a list named my.test with elements: x = c(4,5,6,7,8,9,10), y = c(34,35,41,40,45,47,51), slope = 2.82, p.value = 0.000131.

my.test <- list(x=c(4, 5, 6, 7, 8, 9, 10), y=c(34, 35, 41, 40, 45, 47, 51), slope=2.82, p.value=0.000131)
str(my.test)
## List of 4
##  $ x      : num [1:7] 4 5 6 7 8 9 10
##  $ y      : num [1:7] 34 35 41 40 45 47 51
##  $ slope  : num 2.82
##  $ p.value: num 0.000131

(b) Extract the second element in the list.

my.test[[2]]
## [1] 34 35 41 40 45 47 51

(c) Extract the element named p.value from the list.

my.test[['p.value']]
## [1] 0.000131

Exercise 8.5.15

The function lm() creates a linear model, which is a general class of model that includes both regression and ANOVA. We will call this on a data frame and examine the results. For this problem, there isn’t much to figure out, but rather the goal is to recognize the data structures being used in common analysis functions.

(a) There are many data sets that are included with R and its packages. One of which is the trees data which is a data set of n=31 cherry trees. Load this dataset into your current workspace.

Girth Height Volume
8.3 70 10.3
8.6 65 10.3
8.8 63 10.2

(b) Examine the data frame using the str() command. Look at the help file for the data using the command help(trees) or ?trees.

## 'data.frame':    31 obs. of  3 variables:
##  $ Girth : num  8.3 8.6 8.8 10.5 10.7 10.8 11 11 11.1 11.2 ...
##  $ Height: num  70 65 63 72 81 83 66 75 80 75 ...
##  $ Volume: num  10.3 10.3 10.2 16.4 18.8 19.7 15.6 18.2 22.6 19.9 ...

(c) Perform a regression relating the volume of lumber produced to the girth and height of the tree.

m <- lm(Volume~Girth+Height, data=trees) #girth*height and girth+height: which is preferable?
m
## 
## Call:
## lm(formula = Volume ~ Girth + Height, data = trees)
## 
## Coefficients:
## (Intercept)        Girth       Height  
##    -57.9877       4.7082       0.3393

(d) Use the str() command to inspect m. Extract the model coefficients from this list.

str(m)
## List of 12
##  $ coefficients : Named num [1:3] -57.988 4.708 0.339
##   ..- attr(*, "names")= chr [1:3] "(Intercept)" "Girth" "Height"
##  $ residuals    : Named num [1:31] 5.462 5.746 5.383 0.526 -1.069 ...
##   ..- attr(*, "names")= chr [1:31] "1" "2" "3" "4" ...
##  $ effects      : Named num [1:31] -167.985 87.073 10.118 -0.812 -1.489 ...
##   ..- attr(*, "names")= chr [1:31] "(Intercept)" "Girth" "Height" "" ...
##  $ rank         : int 3
##  $ fitted.values: Named num [1:31] 4.84 4.55 4.82 15.87 19.87 ...
##   ..- attr(*, "names")= chr [1:31] "1" "2" "3" "4" ...
##  $ assign       : int [1:3] 0 1 2
##  $ qr           :List of 5
##   ..$ qr   : num [1:31, 1:3] -5.57 0.18 0.18 0.18 0.18 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:31] "1" "2" "3" "4" ...
##   .. .. ..$ : chr [1:3] "(Intercept)" "Girth" "Height"
##   .. ..- attr(*, "assign")= int [1:3] 0 1 2
##   ..$ qraux: num [1:3] 1.18 1.23 1.24
##   ..$ pivot: int [1:3] 1 2 3
##   ..$ tol  : num 1e-07
##   ..$ rank : int 3
##   ..- attr(*, "class")= chr "qr"
##  $ df.residual  : int 28
##  $ xlevels      : Named list()
##  $ call         : language lm(formula = Volume ~ Girth + Height, data = trees)
##  $ terms        :Classes 'terms', 'formula'  language Volume ~ Girth + Height
##   .. ..- attr(*, "variables")= language list(Volume, Girth, Height)
##   .. ..- attr(*, "factors")= int [1:3, 1:2] 0 1 0 0 0 1
##   .. .. ..- attr(*, "dimnames")=List of 2
##   .. .. .. ..$ : chr [1:3] "Volume" "Girth" "Height"
##   .. .. .. ..$ : chr [1:2] "Girth" "Height"
##   .. ..- attr(*, "term.labels")= chr [1:2] "Girth" "Height"
##   .. ..- attr(*, "order")= int [1:2] 1 1
##   .. ..- attr(*, "intercept")= int 1
##   .. ..- attr(*, "response")= int 1
##   .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
##   .. ..- attr(*, "predvars")= language list(Volume, Girth, Height)
##   .. ..- attr(*, "dataClasses")= Named chr [1:3] "numeric" "numeric" "numeric"
##   .. .. ..- attr(*, "names")= chr [1:3] "Volume" "Girth" "Height"
##  $ model        :'data.frame':   31 obs. of  3 variables:
##   ..$ Volume: num [1:31] 10.3 10.3 10.2 16.4 18.8 19.7 15.6 18.2 22.6 19.9 ...
##   ..$ Girth : num [1:31] 8.3 8.6 8.8 10.5 10.7 10.8 11 11 11.1 11.2 ...
##   ..$ Height: num [1:31] 70 65 63 72 81 83 66 75 80 75 ...
##   ..- attr(*, "terms")=Classes 'terms', 'formula'  language Volume ~ Girth + Height
##   .. .. ..- attr(*, "variables")= language list(Volume, Girth, Height)
##   .. .. ..- attr(*, "factors")= int [1:3, 1:2] 0 1 0 0 0 1
##   .. .. .. ..- attr(*, "dimnames")=List of 2
##   .. .. .. .. ..$ : chr [1:3] "Volume" "Girth" "Height"
##   .. .. .. .. ..$ : chr [1:2] "Girth" "Height"
##   .. .. ..- attr(*, "term.labels")= chr [1:2] "Girth" "Height"
##   .. .. ..- attr(*, "order")= int [1:2] 1 1
##   .. .. ..- attr(*, "intercept")= int 1
##   .. .. ..- attr(*, "response")= int 1
##   .. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
##   .. .. ..- attr(*, "predvars")= language list(Volume, Girth, Height)
##   .. .. ..- attr(*, "dataClasses")= Named chr [1:3] "numeric" "numeric" "numeric"
##   .. .. .. ..- attr(*, "names")= chr [1:3] "Volume" "Girth" "Height"
##  - attr(*, "class")= chr "lm"
m$coefficients
## (Intercept)       Girth      Height 
## -57.9876589   4.7081605   0.3392512
m['coefficients']
## $coefficients
## (Intercept)       Girth      Height 
## -57.9876589   4.7081605   0.3392512
m[['coefficients']]
## (Intercept)       Girth      Height 
## -57.9876589   4.7081605   0.3392512
#do m$ and m[[]] do the same thing?