Source file ⇒ 2017-lec2.Rmd

Today

  1. DC chap 3: R Command Patterns
  2. DataCamp Intro to R, chap 5: Data Frames
  3. DataCamp Intro to R, chap 6: Lists

Chap 3 Chaining Syntax

The general chaining syntax:

#Object_Name <- 
#Data_Table %>%
#function_name(arguments) %>%
#function_name(arguments)

example 1:

MyBabies<-
  BabyNames %>% 
  head(3)
MyBabies
name sex count year
Mary F 7065 1880
Anna F 2604 1880
Emma F 2003 1880

this is equivalent to:

MyBabies <-
  head(BabyNames,3)  
MyBabies
name sex count year
Mary F 7065 1880
Anna F 2604 1880
Emma F 2003 1880

example 2:

Princes <- 
  BabyNames %>%
  filter(name=="Prince") %>%
  group_by(year,sex) %>%
  summarise(sum(count))
Princes
## Source: local data frame [194 x 3]
## Groups: year [?]
## 
##     year   sex `sum(count)`
##    <int> <chr>        <int>
## 1   1880     M           16
## 2   1881     M           17
## 3   1882     M           18
## 4   1883     M           18
## 5   1884     M           17
## 6   1885     M           21
## 7   1886     M           22
## 8   1887     M           18
## 9   1888     M           19
## 10  1889     M           14
## # ... with 184 more rows

equivalent to:

Princes <- summarise(group_by(filter(BabyNames,name=="Prince"),year,sex),sum(count))
Princes
## Source: local data frame [194 x 3]
## Groups: year [?]
## 
##     year   sex `sum(count)`
##    <int> <chr>        <int>
## 1   1880     M           16
## 2   1881     M           17
## 3   1882     M           18
## 4   1883     M           18
## 5   1884     M           17
## 6   1885     M           21
## 7   1886     M           22
## 8   1887     M           18
## 9   1888     M           19
## 10  1889     M           14
## # ... with 184 more rows

Note: If you look at the codebook for dplyr::filter you will see that the first arguement is the datatable (BabyNames in this case). When you do chaining don’t put this first arguement in —this blocks the pipeline and you will get an error.

In-class exercises

DC chapter 3 exercises

DataCamp Intro to R, chap 5: Data Frames

You can make data frames as follows:

a <- c(10,30,15)
b <- c("Bob","John","Ben")
df <- data.frame(a,b, stringsAsFactors = FALSE)  #want strings to be characters not factors
df
a b
10 Bob
30 John
15 Ben
You can name your data frame as follows:
names(df) <- c("age","friend")
 df
age friend
10 Bob
30 John
15 Ben

You can look at elements of your data frame as follows:

df[1,2] #first case, second variable
## [1] "Bob"
 df[,2] #second variable
## [1] "Bob"  "John" "Ben"
 df[,1] #first variable
## [1] 10 30 15

How do you select last two cases of the friend variable?

df[2:3,2]
## [1] "John" "Ben"
df[2:3,"friend"]
## [1] "John" "Ben"
df[-1,"friend"]
## [1] "John" "Ben"

How do you select case with “Bob” as friend?

bob_friend <- c(TRUE,FALSE,FALSE)
df[bob_friend,]
age friend
10 Bob
#subset(my_data_frame, subset = some_condition)
subset(df, subset= friend=="Bob")
age friend
10 Bob

DataCamp Intro to R, chap 6: Lists

You can make a list as follows:

x <- df
y <- 7
z <- c(1,2)
my_list <- list(x,y,z)
my_list
## [[1]]
##   age friend
## 1  10    Bob
## 2  30   John
## 3  15    Ben
## 
## [[2]]
## [1] 7
## 
## [[3]]
## [1] 1 2

you can create a named list as follows

names(my_list) <- c("data_frame", "num", "vec")
my_list
## $data_frame
##   age friend
## 1  10    Bob
## 2  30   John
## 3  15    Ben
## 
## $num
## [1] 7
## 
## $vec
## [1] 1 2

You can select the “data frame” element of your list as follows:

my_list[[1]]
age friend
10 Bob
30 John
15 Ben
my_list[["data_frame"]]
age friend
10 Bob
30 John
15 Ben
You ca n select the second element of “vec” as follows:
my_list[["vec"]][2]
## [1] 2

Replicator and Sequence functions These functions are handy for making sequences of numbers.
Seq is a generalization of “:”
Seq uses the arguement by

1:5
## [1] 1 2 3 4 5
5:1
## [1] 5 4 3 2 1
seq(0,11, by=2)
## [1]  0  2  4  6  8 10
seq(10,0, by=-2)
## [1] 10  8  6  4  2  0

The replicator function uses the argument times or each

rep(c(0,1), times=5)
##  [1] 0 1 0 1 0 1 0 1 0 1
rep(letters[1:5],each=2)
##  [1] "a" "a" "b" "b" "c" "c" "d" "d" "e" "e"
rep(1:3, each =2, times=3)
##  [1] 1 1 2 2 3 3 1 1 2 2 3 3 1 1 2 2 3 3
rep(1:3, times=3)
## [1] 1 2 3 1 2 3 1 2 3

I-clicker questions

These are good midterm questions: Soln: The answer is b. See below.

x<-1:4
x
## [1] 1 2 3 4
names(x) <- letters[1:4]
x
## a b c d 
## 1 2 3 4
x[1:2] <- 2:1
x
## a b c d 
## 2 1 3 4
x["a"] <- 100
x[x==100] <-  NA
x
##  a  b  c  d 
## NA  1  3  4

solution: the answer is d.

rep(seq(0,6,by=2),each=5)
##  [1] 0 0 0 0 0 2 2 2 2 2 4 4 4 4 4 6 6 6 6 6
rep(1:5, times=4)
##  [1] 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
rep(1:5, times=4) + rep(0:3, each=5)
##  [1] 1 2 3 4 5 2 3 4 5 6 3 4 5 6 7 4 5 6 7 8

To do

For next time do chapters 5 and 6 in DataCamp’s Intro to R course on Data Frames and Lists (skip chaps 3,4). Make sure that you have R, Rstudio and DataComputing working on your computer. Also finish chapter 1,2,3 “In class exercises”.

Next time we will cover chapter 4 in the book DC and 4 on Factors in DataComputing.