Source file ⇒ 2017-lec2.Rmd
Chap 3 Chaining Syntax
The general chaining syntax:
#Object_Name <-
#Data_Table %>%
#function_name(arguments) %>%
#function_name(arguments)
example 1:
MyBabies<-
BabyNames %>%
head(3)
MyBabies
name | sex | count | year |
---|---|---|---|
Mary | F | 7065 | 1880 |
Anna | F | 2604 | 1880 |
Emma | F | 2003 | 1880 |
this is equivalent to:
MyBabies <-
head(BabyNames,3)
MyBabies
name | sex | count | year |
---|---|---|---|
Mary | F | 7065 | 1880 |
Anna | F | 2604 | 1880 |
Emma | F | 2003 | 1880 |
example 2:
Princes <-
BabyNames %>%
filter(name=="Prince") %>%
group_by(year,sex) %>%
summarise(sum(count))
Princes
## Source: local data frame [194 x 3]
## Groups: year [?]
##
## year sex `sum(count)`
## <int> <chr> <int>
## 1 1880 M 16
## 2 1881 M 17
## 3 1882 M 18
## 4 1883 M 18
## 5 1884 M 17
## 6 1885 M 21
## 7 1886 M 22
## 8 1887 M 18
## 9 1888 M 19
## 10 1889 M 14
## # ... with 184 more rows
equivalent to:
Princes <- summarise(group_by(filter(BabyNames,name=="Prince"),year,sex),sum(count))
Princes
## Source: local data frame [194 x 3]
## Groups: year [?]
##
## year sex `sum(count)`
## <int> <chr> <int>
## 1 1880 M 16
## 2 1881 M 17
## 3 1882 M 18
## 4 1883 M 18
## 5 1884 M 17
## 6 1885 M 21
## 7 1886 M 22
## 8 1887 M 18
## 9 1888 M 19
## 10 1889 M 14
## # ... with 184 more rows
Note: If you look at the codebook for dplyr::filter you will see that the first arguement is the datatable (BabyNames
in this case). When you do chaining don’t put this first arguement in —this blocks the pipeline and you will get an error.
DataCamp Intro to R, chap 5: Data Frames
You can make data frames as follows:
a <- c(10,30,15)
b <- c("Bob","John","Ben")
df <- data.frame(a,b, stringsAsFactors = FALSE) #want strings to be characters not factors
df
a | b |
---|---|
10 | Bob |
30 | John |
15 | Ben |
You | can name your data frame as follows: |
names(df) <- c("age","friend")
df
age | friend |
---|---|
10 | Bob |
30 | John |
15 | Ben |
You can look at elements of your data frame as follows:
df[1,2] #first case, second variable
## [1] "Bob"
df[,2] #second variable
## [1] "Bob" "John" "Ben"
df[,1] #first variable
## [1] 10 30 15
How do you select last two cases of the friend variable?
df[2:3,2]
## [1] "John" "Ben"
df[2:3,"friend"]
## [1] "John" "Ben"
df[-1,"friend"]
## [1] "John" "Ben"
How do you select case with “Bob” as friend?
bob_friend <- c(TRUE,FALSE,FALSE)
df[bob_friend,]
age | friend |
---|---|
10 | Bob |
#subset(my_data_frame, subset = some_condition)
subset(df, subset= friend=="Bob")
age | friend |
---|---|
10 | Bob |
DataCamp Intro to R, chap 6: Lists
You can make a list as follows:
x <- df
y <- 7
z <- c(1,2)
my_list <- list(x,y,z)
my_list
## [[1]]
## age friend
## 1 10 Bob
## 2 30 John
## 3 15 Ben
##
## [[2]]
## [1] 7
##
## [[3]]
## [1] 1 2
you can create a named list as follows
names(my_list) <- c("data_frame", "num", "vec")
my_list
## $data_frame
## age friend
## 1 10 Bob
## 2 30 John
## 3 15 Ben
##
## $num
## [1] 7
##
## $vec
## [1] 1 2
You can select the “data frame” element of your list as follows:
my_list[[1]]
age | friend |
---|---|
10 | Bob |
30 | John |
15 | Ben |
my_list[["data_frame"]]
age | friend |
---|---|
10 | Bob |
30 | John |
15 | Ben |
You ca | n select the second element of “vec” as follows: |
my_list[["vec"]][2]
## [1] 2
Replicator and Sequence functions These functions are handy for making sequences of numbers.
Seq is a generalization of “:”
Seq uses the arguement by
1:5
## [1] 1 2 3 4 5
5:1
## [1] 5 4 3 2 1
seq(0,11, by=2)
## [1] 0 2 4 6 8 10
seq(10,0, by=-2)
## [1] 10 8 6 4 2 0
The replicator function uses the argument times or each
rep(c(0,1), times=5)
## [1] 0 1 0 1 0 1 0 1 0 1
rep(letters[1:5],each=2)
## [1] "a" "a" "b" "b" "c" "c" "d" "d" "e" "e"
rep(1:3, each =2, times=3)
## [1] 1 1 2 2 3 3 1 1 2 2 3 3 1 1 2 2 3 3
rep(1:3, times=3)
## [1] 1 2 3 1 2 3 1 2 3
These are good midterm questions: Soln: The answer is b. See below.
x<-1:4
x
## [1] 1 2 3 4
names(x) <- letters[1:4]
x
## a b c d
## 1 2 3 4
x[1:2] <- 2:1
x
## a b c d
## 2 1 3 4
x["a"] <- 100
x[x==100] <- NA
x
## a b c d
## NA 1 3 4
solution: the answer is d.
rep(seq(0,6,by=2),each=5)
## [1] 0 0 0 0 0 2 2 2 2 2 4 4 4 4 4 6 6 6 6 6
rep(1:5, times=4)
## [1] 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
rep(1:5, times=4) + rep(0:3, each=5)
## [1] 1 2 3 4 5 2 3 4 5 6 3 4 5 6 7 4 5 6 7 8
For next time do chapters 5 and 6 in DataCamp’s Intro to R course on Data Frames and Lists (skip chaps 3,4). Make sure that you have R, Rstudio and DataComputing working on your computer. Also finish chapter 1,2,3 “In class exercises”.
Next time we will cover chapter 4 in the book DC and 4 on Factors in DataComputing.