This is where we:
First step is to install all the packages we will be needing.
install.packages("tidyverse")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.3'
## (as 'lib' is unspecified)
install.packages("ggplot2")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.3'
## (as 'lib' is unspecified)
install.packages("palmerpenguins")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.3'
## (as 'lib' is unspecified)
Second step is to load all of those packages by using ‘library()’ function
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.3 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.3 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)
library(palmerpenguins)
Next step is to use t ‘head()’ function to get a good grasp of the dataset.
head(penguins)
## # A tibble: 6 × 8
## species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
## <fct> <fct> <dbl> <dbl> <int> <int>
## 1 Adelie Torgersen 39.1 18.7 181 3750
## 2 Adelie Torgersen 39.5 17.4 186 3800
## 3 Adelie Torgersen 40.3 18 195 3250
## 4 Adelie Torgersen NA NA NA NA
## 5 Adelie Torgersen 36.7 19.3 193 3450
## 6 Adelie Torgersen 39.3 20.6 190 3650
## # ℹ 2 more variables: sex <fct>, year <int>
For convinience, the ‘colnames()’ function was used to get a list of the column names within the dataset.
colnames(penguins)
## [1] "species" "island" "bill_length_mm"
## [4] "bill_depth_mm" "flipper_length_mm" "body_mass_g"
## [7] "sex" "year"
This is where we start to visualize the data that we have prepared in the Data Preparation section above. In this section there are two varieties of visualization that has been used:
‘facet_wrap’ is used because we wanted to view multiple scatter plots of the penguin species.
ggplot(data = penguins) +
geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g, color = species)) +
facet_wrap(~species)
## Warning: Removed 2 rows containing missing values (`geom_point()`).
Here are the observations in this visual:
‘facet_grid’ is used because we wanted to view multiple scatter plots and bar charts of the penguin species and diamond details in multiple criterias.
The visual below is just an example and we used a different data set [Diamond data set], but we included it because it has superb visual execution.
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = color, fill = cut)) +
facet_wrap(~cut)
ggplot(data = penguins) +
geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g, color = species)) +
facet_grid(sex~species)
## Warning: Removed 2 rows containing missing values (`geom_point()`).
Using visuals are great way to add life to your data and make them more reader friendly, unless you are data geek or works in the data space, you wouldn’t dare to spend your precious time reading series of rows and columns just to get the story of the data.
Using functions like ‘facet_wrap’ and ‘facet_grid’ allows the analyst to maximize the utility of the ggplot2 package and provide compelling visuals!