In this exercise you will learn to clean data using the dplyr package. To this end, you will follow through the codes in one of our e-texts, Data Visualization with R. The given example code below is from Chapter 1.2 Cleaning data.

# Load package
library(tidyverse)

# Import data
data(starwars)
starwars

Q1 select Keep the variables name, eye_color, and films.

select(starwars, name, eye_color, films)
## # A tibble: 87 x 3
##    name               eye_color films    
##    <chr>              <chr>     <list>   
##  1 Luke Skywalker     blue      <chr [5]>
##  2 C-3PO              yellow    <chr [6]>
##  3 R2-D2              red       <chr [7]>
##  4 Darth Vader        yellow    <chr [4]>
##  5 Leia Organa        brown     <chr [5]>
##  6 Owen Lars          blue      <chr [3]>
##  7 Beru Whitesun lars blue      <chr [3]>
##  8 R5-D4              red       <chr [1]>
##  9 Biggs Darklighter  brown     <chr [1]>
## 10 Obi-Wan Kenobi     blue-gray <chr [6]>
## # … with 77 more rows

Q2 filter select blonds.

filter(starwars, hair_color == "blond")
## # A tibble: 3 x 13
##   name  height  mass hair_color skin_color eye_color birth_year gender homeworld
##   <chr>  <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr>  <chr>    
## 1 Luke…    172    77 blond      fair       blue            19   male   Tatooine 
## 2 Anak…    188    84 blond      fair       blue            41.9 male   Tatooine 
## 3 Fini…    170    NA blond      fair       blue            91   male   Coruscant
## # … with 4 more variables: species <chr>, films <list>, vehicles <list>,
## #   starships <list>

Q3 filter select female blonds.

filter(starwars, gender == "female" & hair_color == "blond")
## # A tibble: 0 x 13
## # … with 13 variables: name <chr>, height <int>, mass <dbl>, hair_color <chr>,
## #   skin_color <chr>, eye_color <chr>, birth_year <dbl>, gender <chr>,
## #   homeworld <chr>, species <chr>, films <list>, vehicles <list>,
## #   starships <list>

Q4 mutate Convert height in centimeters to feet.

Hint: Divide the length value by 30.48.

starwars <- mutate(starwars, height = height / 30.48)

Q5 summarize Calculate mean height in feet

summarize(starwars, mean_ht = mean(height, na.rm=TRUE))
## # A tibble: 1 x 1
##   mean_ht
##     <dbl>
## 1    5.72

Q6 group_by and summarize Calculate mean height by gender.

Hint: Use%>%, the pipe operator. Save the result under a new name, mean_height.

newdata <- group_by(starwars, gender)
newdata <- summarize(newdata, 
                     mean_ht = mean(height, na.rm=TRUE))
newdata
## # A tibble: 5 x 2
##   gender        mean_ht
##   <chr>           <dbl>
## 1 female           5.43
## 2 hermaphrodite    5.74
## 3 male             5.88
## 4 none             6.56
## 5 <NA>             3.94

Q7 spread Convert the dataset, mean_height, to a wide dataset.

wide_data <- spread(newdata, gender, mean_ht)
wide_data
## # A tibble: 1 x 5
##   female hermaphrodite  male  none `<NA>`
##    <dbl>         <dbl> <dbl> <dbl>  <dbl>
## 1   5.43          5.74  5.88  6.56   3.94
gather(wide_data, 
                    key="gender", 
                    value="mean_ht", 
                    female:`<NA>`)
## # A tibble: 5 x 2
##   gender        mean_ht
##   <chr>           <dbl>
## 1 female           5.43
## 2 hermaphrodite    5.74
## 3 male             5.88
## 4 none             6.56
## 5 <NA>             3.94

Q8 Hide the messages and the code, but display results of the code from the webpage.

Hint: Use message, echo and results in the chunk options. Refer to the RMarkdown Reference Guide.

Q9 Display the title and your name correctly at the top of the webpage.

Q10 Use the correct slug.