Messy data is common. A large portion of your time may be spent cleaning, parsing, and organizing a data set. Tidy data is often a goal. Four functions available in the tidyr
package will help make this process easier:
gather()
spread()
separate()
unite()
Package tidyr
is automatically loaded when you load tidyverse
.
library(tidyverse)
Open the file gapminder-raw.csv
to see what it contains and its format. Read in gapminder-raw.csv
to R and save it as an object called gapminder
. Check a few of the rows below to make sure your data was read in properly.
Think about why the above data frame is not tidy.
You will now tidy gapminder1
by using a series of functions in the tidyr
package.
In each step, examine the resulting data frame, and attempt to produce code that generates the resulting data frame. Carefully examine the variable names, types, and first few rows.
Function gather()
takes multiple columns and collapses into key-value pairs, duplicating all other columns as needed. You use gather()
when you notice that you have columns that are not variables.
Function gather()
will transform a data frame from wide to long format.
You want to gather all but the first column of gapminder1
.
Run each line of the code below in your console for a small example.
mini_iris <- iris[c(1, 51, 101), ]
gather(mini_iris, key = flower_att, value = measurement, -Species)
names()
to change the variable names in the data frame.Function separate()
turns a single character column into multiple columns.
Change year
to type integer
Recreate plots 1 and 2. Try to create the plot without looking at the hints, and comment on any interesting trends/relationships you observe.
Use subset()
to filter gapminder
for United States
geom_line(size = 1.5, color = "blue")
annotate("text", 1863, y = 28, label = "Civil War", color = "red")
Use subset()
to filter gapminder
for c("China", "India", "Indonesia", "United States", "Brazil")
geom_line(size = 1.5)
theme(legend.position = "bottom")
Use ggplot()
and gapminder
to create any plot of your choice. Think about the data you have and what type of plot makes sense. See http://r-statistics.co/Top50-Ggplot2-Visualizations-MasterList-R-Code.html for inspiration.