## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
## # A tibble: 344 × 8
## species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
## <fct> <fct> <dbl> <dbl> <int> <int>
## 1 Adelie Torgersen 39.1 18.7 181 3750
## 2 Adelie Torgersen 39.5 17.4 186 3800
## 3 Adelie Torgersen 40.3 18 195 3250
## 4 Adelie Torgersen NA NA NA NA
## 5 Adelie Torgersen 36.7 19.3 193 3450
## 6 Adelie Torgersen 39.3 20.6 190 3650
## 7 Adelie Torgersen 38.9 17.8 181 3625
## 8 Adelie Torgersen 39.2 19.6 195 4675
## 9 Adelie Torgersen 34.1 18.1 193 3475
## 10 Adelie Torgersen 42 20.2 190 4250
## # ℹ 334 more rows
## # ℹ 2 more variables: sex <fct>, year <int>
## Rows: 344
## Columns: 8
## $ species <fct> Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adel…
## $ island <fct> Torgersen, Torgersen, Torgersen, Torgersen, Torgerse…
## $ bill_length_mm <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1, …
## $ bill_depth_mm <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1, …
## $ flipper_length_mm <int> 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 186…
## $ body_mass_g <int> 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475, …
## $ sex <fct> male, female, female, NA, female, male, female, male…
## $ year <int> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007…
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 2 rows containing non-finite outside the scale range
## (`stat_bin()`).
## Warning: Removed 2 rows containing non-finite outside the scale range
## (`stat_bin()`).
## Warning: Removed 2 rows containing non-finite outside the scale range
## (`stat_bin()`).
## Warning: Removed 2 rows containing non-finite outside the scale range
## (`stat_density()`).
The plot is different as it is horizontal in nature compared to putting the Species on the x axis. It shows the same results but just in a different fashion.
The first bar uses the color function to color the perimeter of the bar chart while the fill function fills the bar with a specified color. The fill option would be more useful as this can be used later to outline the differences between variables in colour.
ggplot(penguins, aes(x = species, y = body_mass_g)) +
geom_boxplot() +
labs(title = "Body mass by Species (Boxplot)")## Warning: Removed 2 rows containing non-finite outside the scale range
## (`stat_boxplot()`).
ggplot(penguins, aes(x = body_mass_g, color = species)) +
geom_density(linewidth = 0.75) +
labs(title = "Body mass by Species (Density)")## Warning: Removed 2 rows containing non-finite outside the scale range
## (`stat_density()`).
ggplot(penguins, aes(x = body_mass_g, color = species, fill = species)) +
geom_density(alpha = 0.5) +
labs(title = "Body mass by Species (w/ alpha 0.5)")## Warning: Removed 2 rows containing non-finite outside the scale range
## (`stat_density()`).
ggplot(penguins, aes(x = island, fill = species)) +
geom_bar(position = "fill") +
labs(title = "Island and Speices (w/ position fill)")
### Two plus numerical variables
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_point()`).
ggplot(penguins, aes(x = flipper_length_mm, y = body_mass_g)) +
geom_point(aes(color = species, shape = island))## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_point()`).
ggplot(penguins, aes(x = flipper_length_mm, y = body_mass_g)) +
geom_point(aes(color = species, shape = species)) +
facet_wrap(~island)## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_point()`).
## # A tibble: 234 × 11
## manufacturer model displ year cyl trans drv cty hwy fl class
## <chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
## 1 audi a4 1.8 1999 4 auto… f 18 29 p comp…
## 2 audi a4 1.8 1999 4 manu… f 21 29 p comp…
## 3 audi a4 2 2008 4 manu… f 20 31 p comp…
## 4 audi a4 2 2008 4 auto… f 21 30 p comp…
## 5 audi a4 2.8 1999 6 auto… f 16 26 p comp…
## 6 audi a4 2.8 1999 6 manu… f 18 26 p comp…
## 7 audi a4 3.1 2008 6 auto… f 18 27 p comp…
## 8 audi a4 quattro 1.8 1999 4 manu… 4 18 26 p comp…
## 9 audi a4 quattro 1.8 1999 4 auto… 4 16 25 p comp…
## 10 audi a4 quattro 2 2008 4 manu… 4 20 28 p comp…
## # ℹ 224 more rows
## Rows: 234
## Columns: 11
## $ manufacturer <chr> "audi", "audi", "audi", "audi", "audi", "audi", "audi", "…
## $ model <chr> "a4", "a4", "a4", "a4", "a4", "a4", "a4", "a4 quattro", "…
## $ displ <dbl> 1.8, 1.8, 2.0, 2.0, 2.8, 2.8, 3.1, 1.8, 1.8, 2.0, 2.0, 2.…
## $ year <int> 1999, 1999, 2008, 2008, 1999, 1999, 2008, 1999, 1999, 200…
## $ cyl <int> 4, 4, 4, 4, 6, 6, 6, 4, 4, 4, 4, 6, 6, 6, 6, 6, 6, 8, 8, …
## $ trans <chr> "auto(l5)", "manual(m5)", "manual(m6)", "auto(av)", "auto…
## $ drv <chr> "f", "f", "f", "f", "f", "f", "f", "4", "4", "4", "4", "4…
## $ cty <int> 18, 21, 20, 21, 16, 18, 18, 18, 16, 20, 19, 15, 17, 17, 1…
## $ hwy <int> 29, 29, 31, 30, 26, 26, 27, 26, 25, 28, 27, 25, 25, 25, 2…
## $ fl <chr> "p", "p", "p", "p", "p", "p", "p", "p", "p", "p", "p", "p…
## $ class <chr> "compact", "compact", "compact", "compact", "compact", "c…
Categorical: manufacturer, model, trans, drv, fl, class
Numerical:
displ, year, cyl,cty, hwy
ggplot(mpg, aes(x = hwy, y = displ)) +
geom_point(aes(colour = year)) +
labs(title = "Highway miles by displacement and year")ggplot(mpg, aes(x = hwy, y = displ)) +
geom_point(aes(color = year, shape = drv)) +
labs(title = "Highway miles by displacement, year and drive type")The aesthetics behave differently with the x and y variables being mapped on their corrosponding axis while color changes the colour of the observation based on its numerical value. For example, a car made in 2008 is a light blue colour and the colour gets darker as the car ages. The shape reflects the drive train of the car so it’s easier to find either category.
Linewidth is ignored.
The cty variable is now added into the plot as a colour.
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_point()`).
The colouring reveals that Gentoo penguins have the shortest bill depth but the longest bill length. While chinstrap penguins have a larger bill length than the Gentoo and Adelie penguins. The Adelie penguins have the shortest bill length but have slightly more observations where their bill depth is longer than the Chinstrap penguins.
ggplot(
data = penguins,
mapping = aes(
x = bill_length_mm, y = bill_depth_mm,
color = species, shape = species
)
) +
geom_point()## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_point()`).
To fix this visualisation, I removed the additional lavel which was present in the end part of the code. This was creating a duplicate of the legend.
From the first plot, you can see that Torgersen island is populated by Adelie penguins only. Dream island is populated by both Adelie and Chinstrap penguins with Chinstrap penguins having a slightly higher popuation. Biscoe island is 75% populated by Gentoo penguins with a pocket population of 25% Adelie penguins.
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Saving 7 x 5 in image
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Saving 7 x 5 in image
The file is being saved becasue the ggsave function is used to same the plot as a png. To save it as a pdf it would need a ggsave of mpg-plot.pdf.