vec_a <- c(2,4,6)
vec_b <- c(8,10,12)
vec_c <- vec_a + vec_b
vec_c
## [1] 10 14 18
vec_d <- c(14,20)
vec_d + vec_a
## Warning in vec_d + vec_a: longer object length is not a multiple of shorter
## object length
## [1] 16 24 20
I think that R will give me a warning that there aren’t the same number of values in each vector. My guess is that it will start at the beginning of vec_d to “fill in the blanks” until each value in vec_a has something added to it? So, in this case, it would add 14 (vec_d[1]) to 6 (vec_a[3]) and be done.
vec_a + 5
## [1] 7 9 11
R added 5 to each element of vec_a. The longer vector (vec_a) is a multiple of the length of 5 (1), so 5 could be added to each value of vec_a without any hanging chads, if you will.
Generate the vector of integers {1, 2,…5} in two different ways.
vec_int <- seq(1, 5, 1)
vec_int
## [1] 1 2 3 4 5
vec_int <- 1:5
vec_int
## [1] 1 2 3 4 5
Generate the vector of even numbers {2, 4, 6,…20}
vec_even <- seq(2, 20, 2)
vec_even
## [1] 2 4 6 8 10 12 14 16 18 20
vec_even <- 1:10
vec_even*2
## [1] 2 4 6 8 10 12 14 16 18 20
Generate a vector of 21 elements that are evenly placed between 0 and 1 using the seq() command and name this vector x.
x <- seq(0, 1, length.out=21)
x
## [1] 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70
## [16] 0.75 0.80 0.85 0.90 0.95 1.00
Generate the vector {2, 4, 8, 2, 4, 8, 2, 4, 8} using the rep() command.
vec <- rep(c(2, 4, 8), 3)
vec
## [1] 2 4 8 2 4 8 2 4 8
Generate the vector {2, 2, 2, 2, 4, 4, 4, 4, 8, 8, 8, 8} using the rep() command.
vec <- rep(c(2, 4, 8), each=4)
vec
## [1] 2 2 2 2 4 4 4 4 8 8 8 8
The vector letters is a built-in vector to R and contains the lower case English alphabet.
letters[9]
## [1] "i"
letters[c(9, 11, 19)]
## [1] "i" "k" "s"
letters[c(25,26)*-1]
## [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s"
## [20] "t" "u" "v" "w" "x"
In this problem, we will work with the matrix:
\[ \left[\begin{array}{ccccc} 2 & 4 & 6 & 8 & 10\\ 12 & 14 & 16 & 18 & 20\\ 22 & 24 & 26 & 28 & 30 \end{array}\right]\]
M <- matrix(seq(2, 30, 2), nrow=3, ncol=5, byrow=TRUE)
M
## [,1] [,2] [,3] [,4] [,5]
## [1,] 2 4 6 8 10
## [2,] 12 14 16 18 20
## [3,] 22 24 26 28 30
m1 <- seq(2, 10, 2)
m2 <- seq(12, 20, 2)
m3 <- seq(22, 30, 2)
M <- rbind(m1, m2, m3)
M
## [,1] [,2] [,3] [,4] [,5]
## m1 2 4 6 8 10
## m2 12 14 16 18 20
## m3 22 24 26 28 30
M[2, ]
## [1] 12 14 16 18 20
unname(M[3,2])
## [1] 24
Create and manipulate a data frame.
Girth = {8.3, 8.6, 8.8, 10.5, 10.7, 10.8, 11.0} Height= {70, 65, 63, 72, 81, 83, 66} Volume= {10.3, 10.3, 10.2, 16.4, 18.8, 19.7, 15.6}
| Girth | Height | Volume |
|---|---|---|
| 8.3 | 70 | 10.3 |
| 8.6 | 65 | 10.3 |
| 8.8 | 63 | 10.2 |
| 10.5 | 72 | 16.4 |
| 10.7 | 81 | 18.8 |
| 10.8 | 83 | 19.7 |
| 11.0 | 66 | 15.6 |
my.trees[3,]
## Girth Height Volume
## 3 8.8 63 10.2
my.trees[ , "Girth"]
## [1] 8.3 8.6 8.8 10.5 10.7 10.8 11.0
my.trees[-4, ]
## Girth Height Volume
## 1 8.3 70 10.3
## 2 8.6 65 10.3
## 3 8.8 63 10.2
## 5 10.7 81 18.8
## 6 10.8 83 19.7
## 7 11.0 66 15.6
index <- which(my.trees$Girth > 10)
index
## [1] 4 5 6 7
small.data <- my.trees[index, ]
small.data
## Girth Height Volume
## 4 10.5 72 16.4
## 5 10.7 81 18.8
## 6 10.8 83 19.7
## 7 11.0 66 15.6
smaller.data <- my.trees[-index, ]
smaller.data
## Girth Height Volume
## 1 8.3 70 10.3
## 2 8.6 65 10.3
## 3 8.8 63 10.2
The following code creates a data.frame and then has two different methods for removing the rows with NA values in the column Grade. Explain the difference between the two.
df <- data.frame(name= c('Alice','Bob','Charlie','Daniel'),
Grade = c(6,8,NA,9))
df[ -which( is.na(df$Grade) ), ]
## name Grade
## 1 Alice 6
## 2 Bob 8
## 4 Daniel 9
df[ which( !is.na(df$Grade) ), ]
## name Grade
## 1 Alice 6
## 2 Bob 8
## 4 Daniel 9
With the (-) before which() in the first option, R is dropping all of the rows that evaluate to TRUE within which(); in this case, any row that is NA in the Grade column will be dropped. In the second option, R is selecting all of the rows that evaluate to true within which(); in this case, those values are not (!) NA.
Creation of data frames is usually done by binding together vectors while using seq and rep commands. However often we need to create a data frame that contains all possible combinations of several variables. The function expand.grid() addresses this need.
A fun example of using this function is making several graphs of the standard normal distribution versus the t-distribution. Use the expand.grid function to create a data.frame with all combinations of x=seq(-4,4,by=.01), dist=c(‘Normal’,‘t’), and df=c(2,3,4,5,10,15,20,30). Use the dplyr::mutate command with the if_else command to generate the function heights y using either dt(x,df) or dnorm(x) depending on what is in the distribution column.
newdata <-
expand.grid(x=seq(-4,4,by=0.1), dist=c('Normal', 't'), df=c(2, 3, 4, 5, 10, 15, 20, 30)) %>%
mutate(y=if_else(dist=='t', dt(x, df), dnorm(x)))
newdata %>% ggplot(aes(x=x, y=y, color=dist)) +
geom_line() +
facet_wrap(df~.) +
labs(title="T vs Normal Dist for different DF") +
theme_minimal()
Create and manipulate a list.
my.test <- list(x=c(4, 5, 6, 7, 8, 9, 10), y=c(34, 35, 41, 40, 45, 47, 51), slope=2.82, p.value=0.000131)
str(my.test)
## List of 4
## $ x : num [1:7] 4 5 6 7 8 9 10
## $ y : num [1:7] 34 35 41 40 45 47 51
## $ slope : num 2.82
## $ p.value: num 0.000131
my.test[[2]]
## [1] 34 35 41 40 45 47 51
my.test[['p.value']]
## [1] 0.000131
The function lm() creates a linear model, which is a general class of model that includes both regression and ANOVA. We will call this on a data frame and examine the results. For this problem, there isn’t much to figure out, but rather the goal is to recognize the data structures being used in common analysis functions.
| Girth | Height | Volume |
|---|---|---|
| 8.3 | 70 | 10.3 |
| 8.6 | 65 | 10.3 |
| 8.8 | 63 | 10.2 |
## 'data.frame': 31 obs. of 3 variables:
## $ Girth : num 8.3 8.6 8.8 10.5 10.7 10.8 11 11 11.1 11.2 ...
## $ Height: num 70 65 63 72 81 83 66 75 80 75 ...
## $ Volume: num 10.3 10.3 10.2 16.4 18.8 19.7 15.6 18.2 22.6 19.9 ...
m <- lm(Volume~Girth+Height, data=trees) #girth*height and girth+height: which is preferable?
m
##
## Call:
## lm(formula = Volume ~ Girth + Height, data = trees)
##
## Coefficients:
## (Intercept) Girth Height
## -57.9877 4.7082 0.3393
str(m)
## List of 12
## $ coefficients : Named num [1:3] -57.988 4.708 0.339
## ..- attr(*, "names")= chr [1:3] "(Intercept)" "Girth" "Height"
## $ residuals : Named num [1:31] 5.462 5.746 5.383 0.526 -1.069 ...
## ..- attr(*, "names")= chr [1:31] "1" "2" "3" "4" ...
## $ effects : Named num [1:31] -167.985 87.073 10.118 -0.812 -1.489 ...
## ..- attr(*, "names")= chr [1:31] "(Intercept)" "Girth" "Height" "" ...
## $ rank : int 3
## $ fitted.values: Named num [1:31] 4.84 4.55 4.82 15.87 19.87 ...
## ..- attr(*, "names")= chr [1:31] "1" "2" "3" "4" ...
## $ assign : int [1:3] 0 1 2
## $ qr :List of 5
## ..$ qr : num [1:31, 1:3] -5.57 0.18 0.18 0.18 0.18 ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : chr [1:31] "1" "2" "3" "4" ...
## .. .. ..$ : chr [1:3] "(Intercept)" "Girth" "Height"
## .. ..- attr(*, "assign")= int [1:3] 0 1 2
## ..$ qraux: num [1:3] 1.18 1.23 1.24
## ..$ pivot: int [1:3] 1 2 3
## ..$ tol : num 1e-07
## ..$ rank : int 3
## ..- attr(*, "class")= chr "qr"
## $ df.residual : int 28
## $ xlevels : Named list()
## $ call : language lm(formula = Volume ~ Girth + Height, data = trees)
## $ terms :Classes 'terms', 'formula' language Volume ~ Girth + Height
## .. ..- attr(*, "variables")= language list(Volume, Girth, Height)
## .. ..- attr(*, "factors")= int [1:3, 1:2] 0 1 0 0 0 1
## .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. ..$ : chr [1:3] "Volume" "Girth" "Height"
## .. .. .. ..$ : chr [1:2] "Girth" "Height"
## .. ..- attr(*, "term.labels")= chr [1:2] "Girth" "Height"
## .. ..- attr(*, "order")= int [1:2] 1 1
## .. ..- attr(*, "intercept")= int 1
## .. ..- attr(*, "response")= int 1
## .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
## .. ..- attr(*, "predvars")= language list(Volume, Girth, Height)
## .. ..- attr(*, "dataClasses")= Named chr [1:3] "numeric" "numeric" "numeric"
## .. .. ..- attr(*, "names")= chr [1:3] "Volume" "Girth" "Height"
## $ model :'data.frame': 31 obs. of 3 variables:
## ..$ Volume: num [1:31] 10.3 10.3 10.2 16.4 18.8 19.7 15.6 18.2 22.6 19.9 ...
## ..$ Girth : num [1:31] 8.3 8.6 8.8 10.5 10.7 10.8 11 11 11.1 11.2 ...
## ..$ Height: num [1:31] 70 65 63 72 81 83 66 75 80 75 ...
## ..- attr(*, "terms")=Classes 'terms', 'formula' language Volume ~ Girth + Height
## .. .. ..- attr(*, "variables")= language list(Volume, Girth, Height)
## .. .. ..- attr(*, "factors")= int [1:3, 1:2] 0 1 0 0 0 1
## .. .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. .. ..$ : chr [1:3] "Volume" "Girth" "Height"
## .. .. .. .. ..$ : chr [1:2] "Girth" "Height"
## .. .. ..- attr(*, "term.labels")= chr [1:2] "Girth" "Height"
## .. .. ..- attr(*, "order")= int [1:2] 1 1
## .. .. ..- attr(*, "intercept")= int 1
## .. .. ..- attr(*, "response")= int 1
## .. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
## .. .. ..- attr(*, "predvars")= language list(Volume, Girth, Height)
## .. .. ..- attr(*, "dataClasses")= Named chr [1:3] "numeric" "numeric" "numeric"
## .. .. .. ..- attr(*, "names")= chr [1:3] "Volume" "Girth" "Height"
## - attr(*, "class")= chr "lm"
m$coefficients
## (Intercept) Girth Height
## -57.9876589 4.7081605 0.3392512
m['coefficients']
## $coefficients
## (Intercept) Girth Height
## -57.9876589 4.7081605 0.3392512
m[['coefficients']]
## (Intercept) Girth Height
## -57.9876589 4.7081605 0.3392512
#do m$ and m[[]] do the same thing?