select Keep the variables name, eye_color, and films.filter select blonds.filter select female blonds.mutate Convert height in centimeters to feet.summarize Calculate mean height in feetgroup_by and summarize Calculate mean height by gender.spread Convert the dataset, mean_height, to a wide dataset.In this exercise you will learn to clean data using the dplyr package. To this end, you will follow through the codes in one of our e-texts, Data Visualization with R. The given example code below is from Chapter 1.2 Cleaning data.
# Load package
library(tidyverse)
# Import data
data(starwars)
starwars
## # A tibble: 87 x 13
## name height mass hair_color skin_color eye_color birth_year gender
## <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr>
## 1 Luke… 172 77 blond fair blue 19 male
## 2 C-3PO 167 75 <NA> gold yellow 112 <NA>
## 3 R2-D2 96 32 <NA> white, bl… red 33 <NA>
## 4 Dart… 202 136 none white yellow 41.9 male
## 5 Leia… 150 49 brown light brown 19 female
## 6 Owen… 178 120 brown, gr… light blue 52 male
## 7 Beru… 165 75 brown light blue 47 female
## 8 R5-D4 97 32 <NA> white, red red NA <NA>
## 9 Bigg… 183 84 black light brown 24 male
## 10 Obi-… 182 77 auburn, w… fair blue-gray 57 male
## # … with 77 more rows, and 5 more variables: homeworld <chr>, species <chr>,
## # films <list>, vehicles <list>, starships <list>
select Keep the variables name, eye_color, and films.newdata <- select(starwars, name, eye_color, films)
filter select blonds.newdata <- filter(starwars,
hair_color =="blonde")
filter select female blonds.newdata <- filter(starwars,
hair_color =="blonde",
gender == "female")
mutate Convert height in centimeters to feet.Hint: Divide the length value by 30.48.
starwars <- mutate(starwars,
height = height / 30.48)
summarize Calculate mean height in feetnewdata <- summarize(starwars,
mean_ht = mean(height, na.rm=TRUE))
group_by and summarize Calculate mean height by gender.Hint: Use%>%, the pipe operator. Save the result under a new name, mean_height.
newdata <- group_by(starwars, gender)
newdata <- summarize(newdata,
mean_height = mean(height, na.rm=TRUE))
spread Convert the dataset, mean_height, to a wide dataset.library(tidyr)
wide_data <- spread
newdata <- group_by(starwars, gender)
newdata <- summarize(newdata,
mean_height = mean(height, na.rm=TRUE))
Hint: Use message, echo and results in the chunk options. Refer to the RMarkdown Reference Guide.