Hint: problem 3c will help figure out how to create iris.tidy and 4c will help you figure out how to create iris.wide. See what you can figure out on your own to create iris.wide2!
library(tidyr)
## Warning: package 'tidyr' was built under R version 3.4.4
iris$Flower <- 1:nrow(iris)
iris.wide <- iris %>%
gather(key, value, -Flower, -Species) %>%
separate(key, c("Part", "Measure"), "\\.") %>%
spread(Measure, value)
#delete this and insert your code to create iris.wide2
iris.tidy <- iris %>%
gather(key, Value, -Species) %>%
separate(key, c("Part", "Measure"), "\\.")
## Warning: Expected 2 pieces. Missing pieces filled with `NA` in 150 rows
## [601, 602, 603, 604, 605, 606, 607, 608, 609, 610, 611, 612, 613, 614, 615,
## 616, 617, 618, 619, 620, ...].
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.4.4
# Option 1
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) +
geom_point() +
geom_point(aes(x = Petal.Length, y = Petal.Width), col = "red")
# Option 2
ggplot(iris.wide, aes(x = Length, y = Width, col = Part)) +
geom_point()
Which one is preferable? You have access to iris and you created iris.wide in problem #1 so you can experiment with both of these pieces of code. State Option 1 or Option 2 as your final answer.
To see this in action,
Hint: working through the parts below will help you answer part a.
library(ggplot2)
# Think about which dataset you would use to get the plot shown in Figure 1
# Fill in the ___ to produce the plot shown in Figure 1
ggplot(iris.wide, aes(x = Length, y = Width, color = Part)) +
geom_jitter() +
facet_grid(. ~ Species)
The resulting iris.tidy data should look as follows:
Species Part Measure Value
1 setosa Sepal Length 5.1
2 setosa Sepal Length 4.9
3 setosa Sepal Length 4.7
4 setosa Sepal Length 4.6
5 setosa Sepal Length 5.0
6 setosa Sepal Length 5.4
...
You can have a look at the iris dataset by typing head(iris) in the Console.
If you’re not familiar with %>%, gather() and separate(), keep reading. In a nutshell, a dataset is called tidy when every row is an observation and every column is a variable. The gather() function moves information from the columns to the rows. It takes multiple columns and gathers them into a single column by adding rows. The separate() function splits one column into two or more columns according to a pattern you define. Lastly, the %>% (or “pipe”) operator passes the result of the left-hand side as the first argument of the function on the right-hand side.
You’ll use two functions from the tidyr package. Make sure you have installed the tidyr package.
gather() rearranges the data frame by specifying the columns that are categorical variables with a - notation. Complete the command in the r code chunk below. Notice that only one variable is categorical in iris.
separate() splits up the new key column, which contains the former headers, according to . . The new column names “Part” and “Measure” are given in a character vector. Don’t forget the quotes.
# Load the tidyr package
library(tidyr)
# Fill in the ___ to produce to the correct iris.tidy dataset
iris.tidy <- iris %>%
gather(key, Value, -Species) %>%
separate(key, c("Part", "Measure"), "\\.")
## Warning: Expected 2 pieces. Missing pieces filled with `NA` in 150 rows
## [601, 602, 603, 604, 605, 606, 607, 608, 609, 610, 611, 612, 613, 614, 615,
## 616, 617, 618, 619, 620, ...].
Hint: working through the parts below will help you answer part a.
library(ggplot2)
# Think about which dataset you would use to get the plot in Figure 2
# Fill in the ___ to produce the plot in Figure 2
ggplot(iris.wide, aes(x = Length, y = Width, color = Part)) +
geom_jitter() +
facet_grid(. ~ Species)
The head of the iris.wide should look like this in the end:
Species Part Length Width 1 setosa Petal 1.4 0.2 2 setosa Petal 1.4 0.2 3 setosa Petal 1.3 0.2 4 setosa Petal 1.5 0.2 5 setosa Petal 1.4 0.2 6 setosa Petal 1.7 0.4 …
You can have a look at the iris dataset by typing head(iris) in the console.
Before you begin, you need to add a new column called Flower that contains a unique identifier for each row in the data frame. This is because you’ll rearrange the data frame afterwards and you need to keep track of which row, or which specific flower, each value came from. It’s done for you, no need to add anything yourself.
gather() rearranges the data frame by specifying the columns that are categorical variables with a - notation. In this case, Species and Flower are categorical. Complete the command in the r code chunk below.
separate() splits up the new key column, which contains the former headers, according to .. The new column names “Part” and “Measure” are given in a character vector.
The last step is to use spread() to distribute the new Measure column and associated value column into two columns.
# Load the tidyr package
library(tidyr)
# Add column with unique ids (don't need to change)
iris$Flower <- 1:nrow(iris)
# Fill in the ___ to produce to the correct iris.wide dataset
iris.wide <- iris %>%
gather(key, value, -Flower, -Species) %>%
separate(key, c("Part", "Measure"), "\\.") %>%
spread(Measure, value)