Elaborado por: Daniel Santiago Sandoval Higuera

M Sc. Mario Gregorio Saavedra Rodríguez

Institución: Pontificia Universidad Javeriana

Import libraries

library(ggplot2)
library(ragg)
library(palmerpenguins)

Save data in correct format

penguins <- palmerpenguins::penguins
penguins <- transform(na.omit(penguins),
  species = factor(species),
  island = factor(island),
  sex = factor(sex),
  year = factor(year)
)

Show a scatterplot between thebill lenght and bill deph. Is suggested to exist two distributions

ggplot(data = penguins) + # data
  aes(x = bill_length_mm, y = bill_depth_mm) + # variables
  geom_point() #

The same as before but with another sintax

ggplot(data = penguins, mapping = aes(x = bill_length_mm, y = bill_depth_mm)) +
  geom_point()

Show the same data as before but this time it is colored by penguin specie, we now differenciate 3 important distributions. It also show us a line graph however it may be better to use a scatterplot

 ggplot(data = penguins) +
  aes(x = bill_length_mm, y = bill_depth_mm, colour = species) +
  geom_line()

Add a sex attribute to the oprevious graph, it is represented by a shape of the corresponding point.

ggplot(penguins) +
  aes(x = bill_length_mm, y = bill_depth_mm) +
  geom_line(aes(colour = species)) +
  geom_point(aes(shape = sex)) # add line

Create an histogram where is shown the distribuitons of fipper_lenght and is coloured by type of penguin. We could hypothesize that Adelie penguins are the smaller and Gentoo are the bigger ones.

ggplot(penguins) +
  aes(x = flipper_length_mm, colour = species) +
  geom_histogram(bins = 30) + 
  theme(legend.position = "bottom")

The same as before but a little more with filling of the bars with its corresponding color. We also use the method opf determining the amount of bins as the squareroot of the number of datapoints we have.

ggplot(penguins) +
  aes(x = flipper_length_mm, fill = species) +
  geom_histogram(bins = sqrt(nrow(penguins))) + 
  theme(legend.position = "bottom")

Show the previous histogram however it’s without collor fill, it also ensure that the legend is at the bottom.

ggplot(penguins, aes(flipper_length_mm)) + geom_histogram(bins = sqrt(nrow(penguins)), aes(colour = species)) + theme(legend.position = "bottom")

Show the same data as before but with a frequency line graph

ggplot(penguins, aes(flipper_length_mm, colour = species)) + geom_freqpoly(bins = sqrt(nrow(penguins)))

It show the pfrequency of the bill deph instead of the flipper length and group them width different line and color by species.

ggplot(penguins, aes(bill_depth_mm, colour = species)) + 
  geom_freqpoly(binwidth = 0.5)

The same graph as before but with a fixed binwidth

ggplot(penguins, aes(bill_depth_mm, colour = species)) + 
  geom_freqpoly(binwidth = 1)

Using a frequency line plot of the flipper length by specie and with a very small binwidth.

ggplot(penguins, aes(flipper_length_mm, colour = species)) + 
  geom_freqpoly(binwidth = 0.5)

Histograms of flipper lenght colour by sex, with a binwidth of 0.5 and grouped in different graphss by species.

ggplot(penguins, aes(flipper_length_mm, colour = sex)) + 
  geom_histogram(binwidth = 0.5) + 
  facet_wrap(~species, ncol = 3)

similar to last one but colouring by specie and grouping by sex

ggplot(penguins, aes(flipper_length_mm, colour = species)) + 
  geom_histogram(binwidth = 0.5) + 
  facet_wrap(~sex, ncol = 2)

Frequency polygon of body mass coloured by species. Is noted that Gentoo specie have the bigger body mass

ggplot(penguins, aes(body_mass_g)) + 
  geom_freqpoly(aes(colour = species), binwidth = 100, na.rm = TRUE) +
  xlim(2500, 6500) + 
  theme(legend.position = "bottom")

Show an histogram but on each bin we visualize the ammount of each type of penguins un percentage/100, at the end each bin have a size of 1. There are some limits settled and the position of the legend is on top

ggplot(penguins, aes(body_mass_g)) + 
  geom_histogram(aes(fill = species), binwidth = 100, position = "fill",
    na.rm = TRUE) +
  xlim(2500, 6500) + 
  theme(legend.position = "top")

Density graph of flipper length coloured by species. It is similar to the histogram but more smooth, at the Gentoo distribution we observe might be two different distributions.

ggplot(penguins) +
  aes(x = flipper_length_mm, colour = species) +
  geom_density(aes(fill = species))

Density graph of body mass coloured by specie and without legends. It was also setted a x limit an the na values were removed.

ggplot(penguins, aes(body_mass_g, fill = species, colour = species)) + 
  geom_density(na.rm = TRUE) + 
  xlim(2500, 6500) + 
  theme(legend.position = "none")

Same as before but with a alpha value representing trasparency of each density and the legend position at the left of the graph.

ggplot(penguins, aes(body_mass_g, fill = species, colour = species)) +
  geom_density(alpha = 0.7, na.rm = TRUE) + 
  xlim(2500, 6500) + 
  theme(legend.position = "left")

Old Faithful

It’s a contour map which can help us know how probable is to find an eruption a certain time.

ggplot(faithfuld, aes(eruptions, waiting)) + 
  geom_contour(aes(z = density, colour = ..level..))

It is a density contour map similar to the last one however thisone is more useful only to get a sense of the distribution of the data, because it does not have colors indicating ifo about each level

ggplot(faithful, aes(waiting, eruptions)) +
  geom_density_2d()

The same as the last one but filled with colors, the colors are determined by de density attribute

ggplot(faithfuld, aes(waiting, eruptions, z = density)) + 
  geom_contour_filled()

We can also set the level of detail we want in the next case we have choosen a level of detail of only 3 bins

ggplot(faithfuld, aes(waiting, eruptions, z = density)) + 
  geom_contour(bins = 3)

Level of detail of 7 bins

ggplot(faithfuld, aes(waiting, eruptions, z = density)) + 
  geom_contour(bins = 7)

level of detail of 13 bins

ggplot(faithfuld, aes(waiting, eruptions, z = density)) + 
  geom_contour(bins = 13)

Level of detail of 19 bins

ggplot(faithfuld, aes(waiting, eruptions, z = density)) + 
  geom_contour(bins = 19)

We could also set the level of detail by setting a binwith similar to the histogram, and so, the datas is the one which decides how many bins it will have

ggplot(faithfuld, aes(waiting, eruptions, z = density)) + 
  geom_contour(binwidth = 0.003)

binwidth = 0.007

ggplot(faithfuld, aes(waiting, eruptions, z = density)) + 
  geom_contour(binwidth = 0.007)

binwidth = 0.013

ggplot(faithfuld, aes(waiting, eruptions, z = density)) + 
  geom_contour(binwidth = 0.013)

binwidth = 0.019

ggplot(faithfuld, aes(waiting, eruptions, z = density)) + 
  geom_contour(binwidth = 0.019)

ggplot(faithfuld, aes(waiting, eruptions, z = density)) + 
  geom_contour(aes(colour = after_stat(level)))

Change color to red with the colour attribute

ggplot(faithfuld, aes(waiting, eruptions, z = density)) + 
  geom_contour(colour = "red")

we can make a difused contour map with geom_raster (heat map) and also show the countour at the same time with geom_contour

ggplot(faithfuld, aes(waiting, eruptions, z = density)) + 
  geom_raster(aes(fill = density)) +
  geom_contour(colour = "white")

geom_raster creates a coloured heat map, in this we can identify the most

ggplot(faithfuld, aes(eruptions, waiting)) + 
  geom_raster(aes(fill = density))

Plotting the same data but as a scatterplot, in the first line we save a sample of the data taking the row corresponding to a data point each 10 samples, setting a point size according to the density and setting an alpha value to add a little trasparency to the circles

small <- faithfuld[seq(1, nrow(faithfuld), by = 10), ]
ggplot(small, aes(eruptions, waiting)) + 
  geom_point(aes(size = density), alpha = 1/3) + 
  scale_size_area()

Same as above but with some fixed densities

small <- faithfuld[seq(1, nrow(faithfuld), by = 10), ]
ggplot(small, aes(eruptions, waiting)) + 
  geom_point(aes(size = density), alpha = 1/3) + 
  scale_radius()

Showing scatterplot of penguins data and adding alpha (transparency)

ggplot(penguins, aes(bill_length_mm, bill_depth_mm)) +
  geom_point(alpha = 1 / 3)

Adding lables to the plot

ggplot(penguins, aes(bill_length_mm, bill_depth_mm)) +
  geom_point(alpha = 1 / 3) + 
  xlab("Largo del pico del pingüino (mm)") + 
  ylab("Ancho del pico del pingüino (mm)")

Deleting the plot labels

ggplot(penguins, aes(bill_length_mm, bill_depth_mm)) +
  geom_point(alpha = 1 / 3) + 
  xlab(NULL) + 
  ylab(NULL)

Using Jitter to ensure points on top of others points are showed (Jitter moves each point randomly by a small amount to be able to see more points)

ggplot(penguins, aes(x = flipper_length_mm, y = body_mass_g, colour = species)) +
  geom_jitter(width = 0.75, height = 0.75)

Show the specie and its flipper_lenght points, jittering horizontally to be able visualize the points

ggplot(penguins, aes(x = species, y = flipper_length_mm, colour = species)) +
  geom_jitter(width = 0.3, height = 0) + 
  xlim("Adelie", "Chinstrap", "Gentoo") + 
  ylim(160, 240)

Show the body mass attribute of the data and filter them by sex (used jitter to be able to visualize all data)

ggplot(penguins, aes(x = sex, y = body_mass_g, colour = species)) +
  geom_jitter(width = 0.25, height = 0.50, na.rm = TRUE) + 
  ylim(NA, 6500)

Plotting the flipper length against body mass and colour it by species and finally using position_jitter to ensure more values are showed

ggplot(penguins, aes(x = flipper_length_mm, y = body_mass_g, colour = species)) +
  geom_point(position = position_jitter(width = 0.1, height =0.1))

Use geom_point to show the flipper_leght atribute of each penguing, then filter those by specie and and use jitter to be able to show more points that otherwise would be one on top of the other

ggplot(penguins, aes(x = species, y = flipper_length_mm, colour = species)) +
  geom_point(position = position_jitter(width = 0.50, height = 0)) + 
  xlim("Adelie", "Chinstrap", "Gentoo") + 
  ylim(160, 240)

Use geom_point to show the body_mass_g atribute of each penguing, then filter those by sex and colour them by specie (use jitter to be able to show more points that otherwise would be one on top of the other)

ggplot(penguins, aes(x = sex, y = body_mass_g, colour = species)) +
    geom_point(position = position_jitter(width = 0.25, height = 0.25)) + 
  ylim(NA, 6500)

saving plot to a variable p to be able to call the variable later

p <- ggplot(penguins, aes(flipper_length_mm, body_mass_g, colour = factor(species))) +
  geom_point()

Show the plot

print(p)

Saving the plot

ggsave("plot.png", p, width = 5, height = 5)