Practicing Facets

Data Preparation

This is where we:

  • Prepare
    • Data
    • Packages
  • Clean Data
Below are the steps and the methods of execution we took.
Below are the steps and the methods of execution we took.

First step is to install all the packages we will be needing.

install.packages("tidyverse")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.3'
## (as 'lib' is unspecified)
install.packages("ggplot2")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.3'
## (as 'lib' is unspecified)
install.packages("palmerpenguins")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.3'
## (as 'lib' is unspecified)

Second step is to load all of those packages by using ‘library()’ function

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.3     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.3     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)
library(palmerpenguins)

Next step is to use t ‘head()’ function to get a good grasp of the dataset.

head(penguins)
## # A tibble: 6 × 8
##   species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
##   <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
## 1 Adelie  Torgersen           39.1          18.7               181        3750
## 2 Adelie  Torgersen           39.5          17.4               186        3800
## 3 Adelie  Torgersen           40.3          18                 195        3250
## 4 Adelie  Torgersen           NA            NA                  NA          NA
## 5 Adelie  Torgersen           36.7          19.3               193        3450
## 6 Adelie  Torgersen           39.3          20.6               190        3650
## # ℹ 2 more variables: sex <fct>, year <int>

For convinience, the ‘colnames()’ function was used to get a list of the column names within the dataset.

colnames(penguins)
## [1] "species"           "island"            "bill_length_mm"   
## [4] "bill_depth_mm"     "flipper_length_mm" "body_mass_g"      
## [7] "sex"               "year"
Visualizations

This is where we start to visualize the data that we have prepared in the Data Preparation section above. In this section there are two varieties of visualization that has been used:

  • Facet Wrap
  • Facet Grid

‘facet_wrap’ is used because we wanted to view multiple scatter plots of the penguin species.

ggplot(data = penguins) +
  geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g, color = species)) +
  facet_wrap(~species)
## Warning: Removed 2 rows containing missing values (`geom_point()`).

Here are the observations in this visual:

  1. Gentoo Penguins are the largest among them.
  2. Adelie and Chinstrap are the smallest.
  3. There is a positive hypotheses that the longer the flipper lenght, the bigger the penguins.

‘facet_grid’ is used because we wanted to view multiple scatter plots and bar charts of the penguin species and diamond details in multiple criterias.

A bonus Viz👌
A bonus Viz👌

The visual below is just an example and we used a different data set [Diamond data set], but we included it because it has superb visual execution.

ggplot(data = diamonds) +
  geom_bar(mapping = aes(x = color, fill = cut)) +
  facet_wrap(~cut)

ggplot(data = penguins) +
  geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g, color = species)) +
  facet_grid(sex~species)
## Warning: Removed 2 rows containing missing values (`geom_point()`).

Using visuals are great way to add life to your data and make them more reader friendly, unless you are data geek or works in the data space, you wouldn’t dare to spend your precious time reading series of rows and columns just to get the story of the data.

Using functions like ‘facet_wrap’ and ‘facet_grid’ allows the analyst to maximize the utility of the ggplot2 package and provide compelling visuals!

END