Topic 1B: Data Visualisation I
Welcome to the first Data Science computer lab for STM1001!
Throughout the semester, we will use the R software environment for all our work.
R is widely used for statistical computing and data visualisation, and indeed the first four computer labs of the STM1001 Data Science module focus on data visualisation in R.
In this first lab, we will explore some of the more light-hearted options available to R users, and then introduce an excellent package, plotly
(Sievert 2020), which allows us to create interactive data visualisations.
By the end of this lab, you should feel comfortable loading and using packages in R, and be able to create a simple interactive histogram using plotly
.
Making memes in R
Base R contains many functions, and is perfectly sufficient for a number of data analysis methods.
However, one of the great benefits of R is that anyone can create packages (bundles of code, data and functions) which can be uploaded to global repositories (such as CRAN or Bioconductor), and made available for anyone around the world to download and use in their version of R.
Often, these packages are extremely helpful. They may contain useful data sets for a specific field of research, address a shortcoming with the base R suite of functions, allow users to perform specialised analyses, and/or offer users some additional functionalities.
On the other hand, sometimes these packages are more light-hearted, such as the meme
package (Yu 2021), which allows users to create simple memes within R. Let’s take a look at this package now.
Because the meme
package is not installed in base R, we need to download it before we can use it.
We can use the install.packages()
R function to do this, as shown in the code below.
install.packages("meme")
Open up RStudio and run this code now.
Once the meme
package is downloaded and installed, we need to load it in our current R session.
Run the following code to load the meme
package.
library(meme)
Great, now we can start to make some simple memes!
We really only need two lines of code for this.
Firstly, we need to find an appropriate image. For this example, we will use an image of Hagrid, from the Harry Potter series. We have located this image online, and copied the url. In R, we assign this url to the object hagrid
, as shown below:
hagrid <- "https://i.imgflip.com/13wb2t.jpg"
Note that the url needs to be contained within quotation marks.
Make sure to run this code before moving on to the next step.
Next, we use the meme
function to add some words to this image. Try running the code below, and see what happens.
meme(hagrid, "Yer a wizard", "with coding", font = "sans")
Note: Some warnings may appear in your R Console as this code is executing. Don’t worry about it, it is safe to ignore these warnings.
If you would like to save the meme you have made, it is helpful to assign the output of the meme
function to an object.
In the code below, we make a new meme, and assign it to the object success
. Try running this code now.
success_kid <- "http://i0.kym-cdn.com/entries/icons/mobile/000/000/745/success.jpg"
success <- meme(success_kid, "Using R", "to make memes", font = "sans")
success
Hint: Notice that we need to include the final line of code, calling the object success
, in order for the image to be shown.
Now we can save our meme, using the function meme_save
. Take a look at the code below.
meme_save(success, file="c:/STM1001/Data Science/success_kid_R_meme.png")
Here, we are saving our success
meme, to the file location c:\STM1001\Data Science\
, with the name success_kid_R_meme.png
.
Note that although the file path on our computer includes backslashes (\
), in R code these need to be changed to forward slashes (/
).
Now it’s time to try making your own meme.
Find an appropriate image of your choice online (please ensure you pick content suitable for university and work).
Copy the url.
Assign this url to an object in R.
Use the meme
function to add words to your image.
Save your meme using the meme_save
function.
Hint: If you are not quite sure how to begin, click the Code
button to the right below.
# First, we need to find an image, and assign it to an object
# (here we use the generic object name 'image_name')
# Just replace the ...s with the url of your image
image_name <- "..."
# Next, we need to use the meme function, to add some words (just replace the ...s)
my_meme <- meme(image_name, "...", "...", font = "sans")
# Note that you need to include the `, font = "sans"` part to ensure R know which font to use.
# Now all that's left is to save your meme - just refer to the code above.
Congratulations! You were probably not expecting to make a meme in your first data science computer lab, and this probably won’t be on the final exam, but hopefully you are starting to realise that R is very versatile.
Customizing GIFs in R
R is not limited to working with static images - we can modify and create gifs and animations.
In this section, we will use another fun package, the magick
package (Ooms 2021), to customize a gif.
Run the following code to download, install and load the magick
package in your current R session.
install.packages("magick")
library(magick)
Just as we obtained online images of hagrid
and success kid
, so too can we use urls to gifs and animations.
For this example, we have used the url to a rotating earth gif.
We use the image_read
function to read this gif into R, and assign it to the object Earth
.
Earth <- image_read("https://i.giphy.com/media/mf8UbIDew7e8g/giphy.gif")
Earth
Make sure to run this code before moving on to the next step (don’t worry if it takes a few seconds). The gif should appear in the Viewer
section of RStudio.
Using the magick
package, we can easily make some changes to this gif.
Take a look at the code below. You will notice here that:
- We have reversed the gif, using the
rev
function
- We have flipped the gif, using the
image_flip
function, and
- We have added text to this gif using the
image_annotate
function
rev(Earth) %>%
image_flip() %>%
image_annotate(" Meanwhile, in Australia", size = 40, color = "white")
Try running this code now.
This is really just scratching the surface of the magick
package. However, our intention for this first computer lab is to give you a taste of some of the different possibilities available in R, so for the moment, let’s move on.
Drawing a fish in R
Instead of using a pre-existing image or gif, let’s now try to create one from scratch. Specifically, let’s draw a fish.
To do this, we can use the appropriately named rfishdraw
package (Ding 2021).
Let’s download and install the rfishdraw
package now. In order to use this package, we will also need to download and install some additional packages, upon which the rfishdraw
package depends. Such packages are known as dependencies, and it is common for more sophisticated R packages to have multiple dependencies.
Note that these dependencies are packages in their own right.
Run this code in R now.
install.packages("rfishdraw")
install.packages("patchwork")
install.packages("ggplot2")
library("rfishdraw")
library("patchwork")
library("ggplot2")
If you now run the code below, a detailed drawing of a fish should appear in a new window!
get_polylines(path = "inst/fishdraw.js",
format = "smil",
output = "animated.svg",
draw_type = "random")
windows() # If you are using a Mac, replace windows() with: quartz()
fish_draw()
Suppose we would like to change the colour of our fish. We can do this, by including the argument col = "..."
within the function fish_draw
. For example, if we would like our fish to be blue, we can write
fish_draw(col = "blue")
Try changing this colour to a different colour, and then run the code.
Palmer Penguins Data Set
Now that we have had a taste of some of the more light-hearted R packages out there, let’s consider a package which contains some useful data.
The palmerpenguins
R package (Horst, Hill, and Gorman 2020) contains data, collected over the course of several years, on 3 species of penguin living on different islands in the Palmer archipelago, off the coast of Antarctica.
For more details, you can refer to Section 2 of the Data Visualisation in R supplement.
Just like the previous packages, we will need to download and load the palmerpenguins
package before we can begin working with this penguin data.
Run the code below to install and load the palmerpenguins
package in R.
install.packages("palmerpenguins")
library(palmerpenguins)
We can use the summary
function to obtain a quick overview of the data contained within the penguins
data set.
# This code summarises the data in the `palmerpenguins` package.
summary(penguins)
Don’t worry too much about the values shown in the summary table - the main things to note at this stage are the different variables, namely species
, island
, bill_length_mm
, bill_depth_mm
, flipper_length_mm
, body_mass_g
, sex
and year
.
Interactive Histograms
Suppose that we would like to produce histograms showing the distribution of the penguins’ body_mass_g
values (their body mass in grams). We could create a simple histogram using the base R hist
function via the following code:
hist(penguins$body_mass_g, breaks = 19)

However, this histogram has some shortcomings. Firstly, it is static. We can’t interact with the image, and we can’t manipulate it in real time to display different details.
For example, perhaps we would like to see the distribution of the penguins’ body_mass_g
values, but only for the penguins on a specific island. We would need to do some more coding to produce such a histogram in base R. Even then, if we would like to have similar histograms for the other two islands, this would mean further coding.
Alternatively, we could use the plotly
package to create an interactive, responsive histogram. Let’s take a look at how to do this now.
To begin, just as for the previous packages, we will need to download and load the plotly
package in R, before we can use any plotly
functions.
Run the code below to install and load the plotly
package in R.
install.packages("plotly")
library(plotly)
To create plotly
plots, we use the function plot_ly()
. We won’t worry too much about the composition of this function just yet - we’ll cover this in more detail next week. For the moment, take a look at the code below, and see if you can get a general idea of what’s going on.
penguin_hist_base <- plot_ly(data = penguins,
x = ~body_mass_g,
type = "histogram")
penguin_hist_base <- penguin_hist_base %>% layout(yaxis = list(title = 'count'))
Before you move on to the next question, run this code in R.
Note: Once you have taken some time to consider the code above, if you would like more details or would like to check the accuracy of your interpretation, click the Code
button below for a brief explanation.
# Here, we are creating a plotly object called "penguin_hist_base"
penguin_hist_base <- plot_ly(data = penguins, # We are using the penguins data
x = ~body_mass_g, # and modelling the body_mass_g data
type = "histogram") # in a histogram format
# The code below is used to modify the layout of the histogram
# to include a label for the y-axis
penguin_hist_base <- penguin_hist_base %>% layout(yaxis = list(title = 'count'))
To produce this plotly
histogram, run the R code below. Your histogram should appear in the Viewer
section of RStudio.
penguin_hist_base
As we noted earlier, plotly
graphs, unlike base R graphs, are interactive!
Notice that if you hover over the data in the histogram in 5.3, you can see the specific details (note that the graph in this document is also interactive!).
If you left-click and drag your cursor over a section to create a box, you can also zoom in on a particular section of the plot. Just double left-click to zoom back out.
Perhaps you are not impressed with plotly
yet. After all, our histogram doesn’t look that different to the base R version, so what is all the fuss about?
Well, it is very easy to modify our penguin_hist_base
plot_ly
graph to show extra detail. For example, we can easily produce separate histograms for the penguins on each island. Take a look at the R code below, which builds upon what we used in penguin_hist_base
.
penguin_hist <- plot_ly(data = penguins,
x = ~body_mass_g,
color = ~island,
type = "histogram", alpha = 0.6)
penguin_hist <- penguin_hist %>% layout(yaxis = list(title = 'count'),
barmode ="overlay")
Before you move on to the next question, run this code in R.
Note: Once you have taken some time to consider the code above, if you would like more details or would like to check the accuracy of your interpretation, click the Code
button below for a brief explanation.
# Here, we are creating a plotly object called "penguin_hist"
penguin_hist <- plot_ly(data = penguins, # We are using the penguins data
x = ~body_mass_g, # and modelling the body_mass_g data
color = ~island, type = "histogram", alpha = 0.6)
# We are producing a histogram for this data, with points coloured differently,
# depending on the island on which the penguin is located
# The code below is used to modify the layout of the histogram
# This includes adding a label to the y-axis
# and setting the histograms to be layered over each other
# (hence the alpha = 0.6 above to change the opacity)
penguin_hist <- penguin_hist %>% layout(yaxis = list(title = 'count'),
barmode ="overlay")
To produce this new plotly
histogram, run the R code below. Your histogram should appear in the Viewer
section of RStudio.
penguin_hist
This is looking better than our previous histogram! Because we have told our plot_ly
function to assign different colours to the different islands, we now have three histograms, rather than one with all the data clumped together.
Even better, these are all presented within the one plot, which also includes a handy legend. Hopefully you are now beginning to appreciate the increased functionality offered by plotly
over base R plots.
Finally, and perhaps most importantly for this specific example, it is important to note that we can dynamically filter out observations, to focus on data from a specific island. Simply click on one of the lines in the legend in the top right of our histogram in 5.6, to remove that data from assessment (note that the axes dynamically adjust too).
Try focusing just on the Dream island penguins.
Hint: To bring the removed data back, simply click once more on the relevant line in the legend.
That’s the end of the first data science computer lab!
Hopefully you have enjoyed this first computer lab, and now have a better idea of just how versatile R can be. Don’t worry if some of the code seems difficult at the moment - this is only the first week after all! Next week, we will continue working with plotly
and the palmerpenguins
data set, to produce even more detailed interactive plots.
Before you finish up, if you have been writing and running your code in RStudio, make sure to save your script file somewhere safe - it might come in handy later on.
