Today

Data verb separate() (not in book)
Chap 11 (wide versus narrow tables)
Aestetics versus Attributes in ggplot

1. The data verb: separate() (not in book)

You can separate one column into multiple columns using the date verb separate()

Example

iris_narrow %>% head()

Species	key	Value
setosa	Sepal.Length	5.1
setosa	Sepal.Length	4.9
setosa	Sepal.Length	4.7
setosa	Sepal.Length	4.6
setosa	Sepal.Length	5.0
setosa	Sepal.Length	5.4

iris_narrow %>% separate(key, into=c("Part", "Measure"), sep="\\.")

Species	Part	Measure	Value
setosa	Sepal	Length	5.1
setosa	Sepal	Length	4.9
setosa	Sepal	Length	4.7
setosa	Sepal	Length	4.6
setosa	Sepal	Length	5.0
setosa	Sepal	Length	5.4

Task for you:

Here is a data frame with a column x we wish to split into three columns

df <- data.frame(x=c("1-2-3", "a-b-c"),  y=c(1,2))
df

x	y
1-2-3	1
a-b-c	2

Do this using into=c(“a”,“b”,“c”) and sep=“-”

df %>% separate(x,into=c("a","b","c"), sep="-")

a	b	c	y
1	2	3	1
a	b	c	2

2. Wide versus Narrow data tables (chapter 11)

A data table can be presented in wide or narrow format. Each have their own advantatges.

Wide format is easier to get the difference of before and after of a test for each patient.

BP_wide

subject	before	after
BHO	120	160
GWB	115	135
WJC	105	145

Narrow format is easier to include additional cases of a patient if they are tested on different days. A narrow format is sometimes called a tidy data table.

BP_narrow

subject	when	sbp
BHO	before	160
GWB	before	115
WJC	after	145
GWB	after	135
WJC	before	105
BHO	after	160

The data verbs ’spread()andgather()` convert between these formats.

`gather()` transforms BP_wide into BP_narrow

The key variable is the name of the new variable in the narrow format that is gathered.

BP_narrow1 <-  BP_wide %>%
  gather(key= when, value = sbp, before, after)
BP_narrow1

subject	when	sbp
BHO	before	120
GWB	before	115
WJC	before	105
BHO	after	160
GWB	after	135
WJC	after	145

`spread()` transforms BP_narrow into BP_wide

The key variable is the name of the original variable in the narrow format that is spread.

BP_wide1 <-  BP_narrow %>% 
  spread(key= when, value = sbp)
BP_wide1

subject	after	before
BHO	160	160
GWB	135	115
WJC	145	105

task for you

Is the following data set narrow or wide? Convert it to the other data table format.

Baby_narrow <- BabyNames %>% 
  filter(name == "Sue") %>%
  group_by(name,sex) %>%
  summarise(total=sum(count))
Baby_wide <- Baby_narrow %>% spread(key=sex, value= total )
Baby_wide

name	F	M
Sue	144410	519

Baby_wide %>% gather(key=sex, value=value, F, M)

name	sex	value
Sue	F	144410
Sue	M	519

Note that a narrow table is tidy as we defined in the first day of class. There are no column names as there are in the wide format.

example

Lets examine the wide iris data table:

head(iris)

Sepal.Length	Sepal.Width	Petal.Length	Petal.Width	Species
5.1	3.5	1.4	0.2	setosa
4.9	3.0	1.4	0.2	setosa
4.7	3.2	1.3	0.2	setosa
4.6	3.1	1.5	0.2	setosa
5.0	3.6	1.4	0.2	setosa
5.4	3.9	1.7	0.4	setosa

Suppose you want to make the following plot:

The data table iris isn’t gyph ready. Here is the glyph ready table:

Species	Part	Measure	Value
setosa	Sepal	Length	5.1
setosa	Sepal	Length	4.9
setosa	Sepal	Length	4.7
setosa	Sepal	Length	4.6
setosa	Sepal	Length	5.0
setosa	Sepal	Length	5.4

step 1: Use gather

iris_narrow <- iris %>%
  gather(key, Value, -Species) %>%  #here -Species means all columns except Species
  head()
iris_narrow %>% head()

Species	key	Value
setosa	Sepal.Length	5.1
setosa	Sepal.Length	4.9
setosa	Sepal.Length	4.7
setosa	Sepal.Length	4.6
setosa	Sepal.Length	5.0
setosa	Sepal.Length	5.4

step 2: Use the data verb separate()

iris_narrow_sep <- iris_narrow %>% separate(key, into=c("Part", "Measure"), sep="\\.")
head(iris_narrow_sep)

Species	Part	Measure	Value
setosa	Sepal	Length	5.1
setosa	Sepal	Length	4.9
setosa	Sepal	Length	4.7
setosa	Sepal	Length	4.6
setosa	Sepal	Length	5.0
setosa	Sepal	Length	5.4

3. Aesthetics versus fixed attributes

Aesthetics are properties of the graph that we map to a variable.
(example col=sex in the BabyNames data set)
Attribute are properties of the graph that we set equal to a fixed value.
(example col=“red”)

Examples

mtcars %>% ggplot(aes(x=wt,y=mpg)) + geom_point(aes(col=as.factor(cyl)))

mtcars %>% ggplot(aes(x=wt,y=mpg)) + geom_point(col="red")

Note: attributes don’t have a legend since since it takes only a fixed value.

i-clicker questions

Q3

unloadNamespace('printr')
iris<- iris %>% mutate(Flower=1:nrow(iris))
head(iris)

##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species Flower
## 1          5.1         3.5          1.4         0.2  setosa      1
## 2          4.9         3.0          1.4         0.2  setosa      2
## 3          4.7         3.2          1.3         0.2  setosa      3
## 4          4.6         3.1          1.5         0.2  setosa      4
## 5          5.0         3.6          1.4         0.2  setosa      5
## 6          5.4         3.9          1.7         0.4  setosa      6

iris.wide <- iris %>%
  gather(key, value, -Species, -Flower) %>%
  separate(key, c("Part", "Measure"), "\\.") %>%
  spread(Measure, value) 
iris.wide %>% head()

##   Species Flower  Part Length Width
## 1  setosa      1 Petal    1.4   0.2
## 2  setosa      1 Sepal    5.1   3.5
## 3  setosa      2 Petal    1.4   0.2
## 4  setosa      2 Sepal    4.9   3.0
## 5  setosa      3 Petal    1.3   0.2
## 6  setosa      3 Sepal    4.7   3.2

Lec11

stat 133

February 12 2016

Today

1. The data verb: separate() (not in book)

Example

Task for you:

2. Wide versus Narrow data tables (chapter 11)

`gather()` transforms BP_wide into BP_narrow

`spread()` transforms BP_narrow into BP_wide

task for you

example

3. Aesthetics versus fixed attributes

Examples

i-clicker questions

Q3

Lec11

stat 133

February 12 2016

Today

1. The data verb: separate() (not in book)

Example

Task for you:

2. Wide versus Narrow data tables (chapter 11)

gather() transforms BP_wide into BP_narrow

spread() transforms BP_narrow into BP_wide

task for you

example

3. Aesthetics versus fixed attributes

Examples

i-clicker questions

Q3

`gather()` transforms BP_wide into BP_narrow

`spread()` transforms BP_narrow into BP_wide