Source file ⇒ lec11.Rmd

Today

  1. Data verb separate() (not in book)
  2. Chap 11 (wide versus narrow tables)
  3. Aestetics versus Attributes in ggplot

1. The data verb: separate() (not in book)

You can separate one column into multiple columns using the date verb separate()

Example

iris_narrow %>% head()
Species key Value
setosa Sepal.Length 5.1
setosa Sepal.Length 4.9
setosa Sepal.Length 4.7
setosa Sepal.Length 4.6
setosa Sepal.Length 5.0
setosa Sepal.Length 5.4
iris_narrow %>% separate(key, into=c("Part", "Measure"), sep="\\.")
Species Part Measure Value
setosa Sepal Length 5.1
setosa Sepal Length 4.9
setosa Sepal Length 4.7
setosa Sepal Length 4.6
setosa Sepal Length 5.0
setosa Sepal Length 5.4

Task for you:

Here is a data frame with a column x we wish to split into three columns

df <- data.frame(x=c("1-2-3", "a-b-c"),  y=c(1,2))
df
x y
1-2-3 1
a-b-c 2

Do this using into=c(“a”,“b”,“c”) and sep=“-”

df %>% separate(x,into=c("a","b","c"), sep="-")
a b c y
1 2 3 1
a b c 2

2. Wide versus Narrow data tables (chapter 11)

A data table can be presented in wide or narrow format. Each have their own advantatges.

Wide format is easier to get the difference of before and after of a test for each patient.

BP_wide
subject before after
BHO 120 160
GWB 115 135
WJC 105 145

Narrow format is easier to include additional cases of a patient if they are tested on different days. A narrow format is sometimes called a tidy data table.

BP_narrow
subject when sbp
BHO before 160
GWB before 115
WJC after 145
GWB after 135
WJC before 105
BHO after 160

The data verbs ’spread()andgather()` convert between these formats.

gather() transforms BP_wide into BP_narrow

The key variable is the name of the new variable in the narrow format that is gathered.

BP_narrow1 <-  BP_wide %>%
  gather(key= when, value = sbp, before, after)
BP_narrow1
subject when sbp
BHO before 120
GWB before 115
WJC before 105
BHO after 160
GWB after 135
WJC after 145

spread() transforms BP_narrow into BP_wide

The key variable is the name of the original variable in the narrow format that is spread.

BP_wide1 <-  BP_narrow %>% 
  spread(key= when, value = sbp)
BP_wide1
subject after before
BHO 160 160
GWB 135 115
WJC 145 105

task for you

Is the following data set narrow or wide? Convert it to the other data table format.

Baby_narrow <- BabyNames %>% 
  filter(name == "Sue") %>%
  group_by(name,sex) %>%
  summarise(total=sum(count))
Baby_wide <- Baby_narrow %>% spread(key=sex, value= total )
Baby_wide 
name F M
Sue 144410 519
Baby_wide %>% gather(key=sex, value=value, F, M)
name sex value
Sue F 144410
Sue M 519

Note that a narrow table is tidy as we defined in the first day of class. There are no column names as there are in the wide format.

example

Lets examine the wide iris data table:

head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
5.1 3.5 1.4 0.2 setosa
4.9 3.0 1.4 0.2 setosa
4.7 3.2 1.3 0.2 setosa
4.6 3.1 1.5 0.2 setosa
5.0 3.6 1.4 0.2 setosa
5.4 3.9 1.7 0.4 setosa

Suppose you want to make the following plot:

The data table iris isn’t gyph ready. Here is the glyph ready table:

Species Part Measure Value
setosa Sepal Length 5.1
setosa Sepal Length 4.9
setosa Sepal Length 4.7
setosa Sepal Length 4.6
setosa Sepal Length 5.0
setosa Sepal Length 5.4

step 1: Use gather

iris_narrow <- iris %>%
  gather(key, Value, -Species) %>%  #here -Species means all columns except Species
  head()
iris_narrow %>% head()
Species key Value
setosa Sepal.Length 5.1
setosa Sepal.Length 4.9
setosa Sepal.Length 4.7
setosa Sepal.Length 4.6
setosa Sepal.Length 5.0
setosa Sepal.Length 5.4

step 2: Use the data verb separate()

iris_narrow_sep <- iris_narrow %>% separate(key, into=c("Part", "Measure"), sep="\\.")
head(iris_narrow_sep)
Species Part Measure Value
setosa Sepal Length 5.1
setosa Sepal Length 4.9
setosa Sepal Length 4.7
setosa Sepal Length 4.6
setosa Sepal Length 5.0
setosa Sepal Length 5.4

3. Aesthetics versus fixed attributes

Aesthetics are properties of the graph that we map to a variable.
(example col=sex in the BabyNames data set)
Attribute are properties of the graph that we set equal to a fixed value.
(example col=“red”)

Examples

mtcars %>% ggplot(aes(x=wt,y=mpg)) + geom_point(aes(col=as.factor(cyl))) 

mtcars %>% ggplot(aes(x=wt,y=mpg)) + geom_point(col="red") 

Note: attributes don’t have a legend since since it takes only a fixed value.

i-clicker questions

Q3

unloadNamespace('printr')
iris<- iris %>% mutate(Flower=1:nrow(iris))
head(iris)
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species Flower
## 1          5.1         3.5          1.4         0.2  setosa      1
## 2          4.9         3.0          1.4         0.2  setosa      2
## 3          4.7         3.2          1.3         0.2  setosa      3
## 4          4.6         3.1          1.5         0.2  setosa      4
## 5          5.0         3.6          1.4         0.2  setosa      5
## 6          5.4         3.9          1.7         0.4  setosa      6
iris.wide <- iris %>%
  gather(key, value, -Species, -Flower) %>%
  separate(key, c("Part", "Measure"), "\\.") %>%
  spread(Measure, value) 
iris.wide %>% head()
##   Species Flower  Part Length Width
## 1  setosa      1 Petal    1.4   0.2
## 2  setosa      1 Sepal    5.1   3.5
## 3  setosa      2 Petal    1.4   0.2
## 4  setosa      2 Sepal    4.9   3.0
## 5  setosa      3 Petal    1.3   0.2
## 6  setosa      3 Sepal    4.7   3.2