Introduction

Messy data is common. A large portion of your time may be spent cleaning, parsing, and organizing a data set. Tidy data is often a goal. Four functions available in the tidyr package will help make this process easier:

  • gather()

  • spread()

  • separate()

  • unite()

Package tidyr is automatically loaded when you load tidyverse.

library(tidyverse)

Open the file gapminder-raw.csv to see what it contains and its format. Read in gapminder-raw.csv to R and save it as an object called gapminder. Check a few of the rows below to make sure your data was read in properly.

Think about why the above data frame is not tidy.

Tidy data

You will now tidy gapminder1 by using a series of functions in the tidyr package.

In each step, examine the resulting data frame, and attempt to produce code that generates the resulting data frame. Carefully examine the variable names, types, and first few rows.

Step 1: Wide to long

Result

Hints

  • Function gather() takes multiple columns and collapses into key-value pairs, duplicating all other columns as needed. You use gather() when you notice that you have columns that are not variables.

  • Function gather() will transform a data frame from wide to long format.

  • You want to gather all but the first column of gapminder1.

  • Run each line of the code below in your console for a small example.

mini_iris <- iris[c(1, 51, 101), ]

gather(mini_iris, key = flower_att, value = measurement, -Species)

Stpe 2: Variable names

Result

Hints

  • Use names() to change the variable names in the data frame.

Step 3: Fixing the years

Result

Hints

  • Function separate() turns a single character column into multiple columns.

  • To remove the X prepended to the years
    1. separate the column year at X
    2. remove the second column
  • Change year to type integer

Visualizations

Recreate plots 1 and 2. Try to create the plot without looking at the hints, and comment on any interesting trends/relationships you observe.

Plot 1

Plot

Hints

  • Use subset() to filter gapminder for United States

  • geom_line(size = 1.5, color = "blue")

  • annotate("text", 1863, y = 28, label = "Civil War", color = "red")

Plot 2

Plot

Hints

  • Use subset() to filter gapminder for c("China", "India", "Indonesia", "United States", "Brazil")

  • geom_line(size = 1.5)

  • theme(legend.position = "bottom")

Plot 3

Use ggplot() and gapminder to create any plot of your choice. Think about the data you have and what type of plot makes sense. See http://r-statistics.co/Top50-Ggplot2-Visualizations-MasterList-R-Code.html for inspiration.