In this case study, you’re going to work with a famous dataset,
Iris. This dataset contains three plant species (setosa, virginica,
versicolor) and four features measured for each sample. Please use the
Tidyverse (load the tidyverse library first) to solve the following
questions.
- The iris dataset is pre-installed in R. Please convert it to a
tibble by using the as_tibble function and save the tibble as an object
called iris_dat.
iris_dat <- as_tibble(iris)
glimpse(iris_dat)
## Rows: 150
## Columns: 5
## $ Sepal.Length <dbl> 5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6, 5.0, 4.4, 4.9, 5.4, 4.…
## $ Sepal.Width <dbl> 3.5, 3.0, 3.2, 3.1, 3.6, 3.9, 3.4, 3.4, 2.9, 3.1, 3.7, 3.…
## $ Petal.Length <dbl> 1.4, 1.4, 1.3, 1.5, 1.4, 1.7, 1.4, 1.5, 1.4, 1.5, 1.5, 1.…
## $ Petal.Width <dbl> 0.2, 0.2, 0.2, 0.2, 0.2, 0.4, 0.3, 0.2, 0.2, 0.1, 0.2, 0.…
## $ Species <fct> setosa, setosa, setosa, setosa, setosa, setosa, setosa, s…
- Use the select() function to keep only three variables—Sepal.Length,
Petal.Length, and Species—and arrange them in this order: Species,
Sepal.Length, Petal.Length. Save this subset as an object called
iris_subset.
iris_subset <- iris_dat %>%
select(Species, Sepal.Length, Petal.Length)
glimpse(iris_subset)
## Rows: 150
## Columns: 3
## $ Species <fct> setosa, setosa, setosa, setosa, setosa, setosa, setosa, s…
## $ Sepal.Length <dbl> 5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6, 5.0, 4.4, 4.9, 5.4, 4.…
## $ Petal.Length <dbl> 1.4, 1.4, 1.3, 1.5, 1.4, 1.7, 1.4, 1.5, 1.4, 1.5, 1.5, 1.…
- Within iris_subset, filter out rows where Sepal.Length is greater
than 6.
iris_subset <- iris_subset %>%
filter(Sepal.Length > 6)
glimpse(iris_subset)
## Rows: 61
## Columns: 3
## $ Species <fct> versicolor, versicolor, versicolor, versicolor, versicolo…
## $ Sepal.Length <dbl> 7.0, 6.4, 6.9, 6.5, 6.3, 6.6, 6.1, 6.7, 6.2, 6.1, 6.3, 6.…
## $ Petal.Length <dbl> 4.7, 4.5, 4.9, 4.6, 4.7, 4.6, 4.7, 4.4, 4.5, 4.0, 4.9, 4.…
- Within iris_subset, compute average Petal.Length by species using
group_by() and summarize().
iris_subset %>%
group_by(Species) %>%
summarise(
avg_petal_length = mean(Petal.Length)
)
## # A tibble: 2 × 2
## Species avg_petal_length
## <fct> <dbl>
## 1 versicolor 4.58
## 2 virginica 5.68
- Add the average Petal.Length as a new variable (column) to
iris_subset.
iris_subset <- iris_subset %>%
group_by(Species) %>%
mutate(
avg_petal_length = mean(Petal.Length)
) %>%
ungroup()
glimpse(iris_subset)
## Rows: 61
## Columns: 4
## $ Species <fct> versicolor, versicolor, versicolor, versicolor, versi…
## $ Sepal.Length <dbl> 7.0, 6.4, 6.9, 6.5, 6.3, 6.6, 6.1, 6.7, 6.2, 6.1, 6.3…
## $ Petal.Length <dbl> 4.7, 4.5, 4.9, 4.6, 4.7, 4.6, 4.7, 4.4, 4.5, 4.0, 4.9…
## $ avg_petal_length <dbl> 4.585000, 4.585000, 4.585000, 4.585000, 4.585000, 4.5…