Data Science Stream

Topic 1B: Using RStudio


Welcome to the first Data Science computer lab for STM1001!

Throughout the semester, we will use the R software environment in our computer labs and assessments. R is free, flexible, and used by millions of people for statistical computing and data visualisation.

Learning R can be challenging at first. To make our learning experience more enjoyable, we will be using RStudio rather than base R for all our R coding. RStudio is an integrated development environment (IDE) for R, and offers several helpful features and user-interface options missing from base R.

In this first Data Science computer lab we will take things slowly, and focus on practicing and reinforcing key R coding skills you began developing in the first core Computer Lab, via some light-hearted examples. A solid R foundation will ensure that in subsequent computer labs, you will be able to pick up and apply new R coding skills more easily.

By the end of this lab, you should feel comfortable using simple R commands, creating and naming new objects, installing, loading and using R packages, and saving images generated in RStudio.

🎧 Reminder: Online students

Throughout the computer lab question sheets, you will see emojis and/or collapsible sections like this one. Each emoji has a particular meaning and will sometimes be associated with additional instructions:

Prompts for you

πŸ’¬ Write your answer in the chat.

Modes at different times during the lab

🏑 Main room. All together in the main room – your computer lab demonstrator will be presenting information or facilitating class discussion

πŸ’‘ Breakout rooms. Person with birthday closest to (your computer lab demonstrator will pick a random date) shares their screen or whiteboard. Here you will discuss a question together and bring your group’s answer back to the main room.

πŸ’» Focus mode. You will still be in the main room, but working independently. All students will be sharing screen during this time so that your computer lab demonstrator (but not other students) can see your screen.


🏫 Reminder: Face-to-face (blended) students

Throughout the computer lab question sheets, you will see emojis and/or collapsible sections like this one. You can ignore the emojis and collapsible sections, as they contain information relevant to students who are studying online.


Checklist

🏑 Before we continue, make sure that you have done the following:

  • Installed R and RStudio on your personal device (this will be helpful for assignment work, even if you intend to complete the computer labs on university computers)
  • Completed the first core Computer Lab
  • Looked over the different books in the Introduction to R content on LMS
  • Confirmed that you are in the correct stream

If you have any questions about any of these items, please ask your computer lab demonstrator for assistance.

1 Installing new R packages

πŸ’» R contains many in-built functions, and by itself is perfectly sufficient for a number of data analysis methods. However, one of the great benefits of R is that anyone can create packages (bundles of code, data and functions) which can be uploaded to global repositories (such as CRAN), and made available for anyone around the world to download and use in their version of R.

These packages are often extremely helpful. They may contain useful data sets for a specific field of research, address a shortcoming with the base R suite of functions, allow users to perform specialised analyses, and/or offer users some additional functionalities missing from base R.

On the other hand, sometimes user-created packages are more light-hearted, or are pet projects not necessarily intended for serious data analysis. One such example is the meme package (Yu 2021), which allows users to create simple memes using R code.

Let’s install and load the meme package in RStudio now.

1.1

πŸ’» Because the meme package is not installed in base R, we need to download it before we can use it. Recall that we can use the install.packages() R function to do this. Run the code below to do this now:

install.packages("meme")
# Note that the package name must be surrounded by quotation marks when using the install.packages function.

1.2

πŸ’» When you install an R package, you will see some text appear in the RStudio Console window. Often, some of this text is red, which at first can be unnerving - you may think that an error has occurred. Don’t worry! While it is always a good idea to check these red text messages, they are not necessarily errors, and often can be safely ignored.

If you check the final part of the text output, which is in black not red, you should see something like package β€˜meme’ successfully unpacked and MD5 sums checked. This is reassuring - the package has installed correctly, and the red text above is just telling us how R went about installing it.

Note: If you see a message about installing Rtools, ignore it, we don’t need it for the purposes of this lab.

1.3

πŸ’» Once the meme package is downloaded and installed, we need to load it in our current RStudio session. Run the following code to load the meme package.

library(meme)

2 Making memes in RStudio

πŸ’» With the meme package installed and loaded in RStudio, we can now start to make some simple memes! We really only need two lines of code for this, as we will demonstrate with the following example.

2.1

πŸ’» Firstly, we need to find an appropriate image. For this example, we will use an image of Hagrid, from the Harry Potter series. We have located this image online, and copied the url. In RStudio, we assign this url to the object hagrid, as shown below:

hagrid <- "https://i.imgflip.com/13wb2t.jpg"
# Note that the url needs to be contained within quotation marks

Make sure to run this code before moving on to the next step.

Note: For a refresher on objects, check the R Coding Fundamentals book in the Introduction to R content on LMS.

2.2

πŸ’» Next, we use the meme function (which is only available because we installed and loaded the meme package) to add some words to this image. Try running the code below, and see what happens.

meme(hagrid, "Yer a wizard", "with coding", font = "sans")

Note: Some warnings may appear in the Console section of RStudio as this code is executing. Don’t worry about them, it is safe to ignore these warnings.

🎧 Online students πŸ’¬ After running the code, a meme should have appeared in RStudio. Take a snippet/screenshot of the meme and copy-paste it into the chat.

2.3 How to save RStudio images

πŸ’» If you would like to save the meme you have made, we have several options, which we will demonstrate now, via another example.

In the code below, we make a new meme, and assign it to the object success, using the assignment operator <-. Try running this code now.

success_kid <- "http://i0.kym-cdn.com/entries/icons/mobile/000/000/745/success.jpg"
success <- meme(success_kid, "Using R", "to make memes", font = "sans")
success

Hint: Notice that we need to include the final line of code, calling the object success, in order for the image to be shown.

2.3.1 Option A: Base R function

πŸ’» We can use the base R function savePlot to save an image created in RStudio. To do so, the image needs to have been produced in a separate graphics device.

  • If you are using a Windows OS, you can run the code windows() to open a separate graphics device in RStudio.
  • If you are using a Mac OS, the equivalent function is quartz().

Note: Separate graphics devices for each image created in RStudio can be helpful if you want to see several images at once. Technically you can also save the image via the graphics device’s menu bar.

Take a look at the code below, and then run it, to save your success kid meme to your current working directory.

windows() # or quartz(), if you are a Mac user
success # render the image in the graphics device
savePlot(filename = "successkid", type = "png") 

2.3.2 Option B: A function from a specific package

πŸ’» Sometimes packages will include a custom function for some operation, such as the saving of a file. These can often offer additional utility to the default alternative.

The meme package has a specific function, meme_save, for saving memes. Take a look at the code below.

meme_save(success, file="c:/STM1001/Data Science/success_kid_R_meme.png") 

Here, the function allows us to specify the save location of our file. We are saving our success meme to the example file location c:\STM1001\Data Science\, with the name success_kid_R_meme.png.

Note: Although the file path on our computer includes backslashes (\), in R code these need to be changed to forward slashes (/).

2.3.3 Option C: Via the RStudio Plots window

πŸ’» Instead of using R code, we can save our image manually, by navigating to the Plots window in RStudio, clicking Export, and selecting either Save as Image... or Save as PDF..., as shown below:

Try this now, and save your success meme as a pdf.

2.4

πŸ’» Now it’s time to make your own meme in RStudio. Follow the steps below:

  1. Find an appropriate image of your choice online (please ensure you pick content suitable for university and work).

  2. Copy the url.

  3. Assign this url to an object in RStudio.

  4. Use the meme function to add words to your image.

  5. Save your meme using either the savePlotfunction, meme_save function or the manual approach.

❓Hint

Hint: If you are not quite sure how to begin, click the Show button to the right below.

# First, we need to find an image, and assign it to an object 
# (here we use the generic object name 'image_name')
# Just replace the ...s with the url of your image
image_name <- "..."
# Next, we need to use the meme function, to add some words (just replace the ...s)
my_meme <- meme(image_name, "...", "...", font = "sans")
# Note that you need to include the `, font = "sans"` part to ensure R know which font to use.
# Now all that's left is to save your meme - just refer to the code above.
🎧 Online students πŸ’¬ Once you have created your meme in RStudio, take a snippet/screenshot of it and copy-paste it into the chat.


Congratulations! You were probably not expecting to make a meme in your first data science computer lab. While this won’t be on the final exam, the R skills you are developing here are important, and hopefully you are starting to realise that R is very versatile.


🏑 Reconvene in main room to discuss results


3 Customizing GIFs in RStudio

πŸ’» R is not limited to working with static images - we can modify and create GIFs and animations (and in future weeks we will make animated, interactive graphs using real data). In this section, we will use another fun package, the magick package (Ooms 2021), to customize a GIF.

3.1

πŸ’» Run the following code to download, install and load the magick package in your current RStudio session.

install.packages("magick")
library(magick)

3.2

πŸ’» Just as we obtained online images of hagrid and success kid, so too can we use urls to GIFs and animations. For this example, we have used the url to a GIF of a rotating Earth.

We can use the image_read function to read this GIF into RStudio. Run the code below to assign it to the object Earth.

Earth <- image_read("https://i.giphy.com/media/mf8UbIDew7e8g/giphy.gif")
Earth

Make sure to run this code before moving on to the next step (don’t worry if it takes a few seconds). The GIF should appear in the Viewer section of RStudio.

3.3

πŸ’» Using the magick package, we can easily make some changes to this Earth GIF.

Run the following code, and inspect the output.

rev(Earth) %>% 
           image_flip() %>% 
           image_annotate("        Meanwhile, in Australia", size = 40, color = "white")

You will notice here that:

  • We have reversed the GIF, using the rev function
  • We have flipped the GIF, using the image_flip function, and
  • We have added text to this GIF using the image_annotate function
🎧 Online students πŸ’¬ Once you have created your GIF, take a snippet/screenshot of it and copy-paste it into the chat.

This is really just scratching the surface of the magick package. For the moment though, let’s move on.

4 Palmer Penguins Data Set

πŸ’» Now that we have had a taste of some of the more light-hearted R packages out there, let’s consider a package which contains some useful data.

The palmerpenguins R package (Horst, Hill, and Gorman 2020) contains data, collected over the course of several years, on 3 species of penguin living on different islands in the Palmer archipelago, off the coast of Antarctica. Over the course of the next few data science computer labs, we will create various interactive data visualisations using the penguins data from this package.

For more details on the penguins data set, and a taste of what’s ahead in future labs, you can refer to Section 2 of the Data Visualisation in R supplement.

For this lab, let’s use some R functions to inspect the penguins data set.

4.1

πŸ’» Just like the previous packages, to begin we will need to download and load the palmerpenguins package.

Using what you have practiced earlier in this computer lab, install and load the palmerpenguins package in RStudio.

## 
## Attaching package: 'palmerpenguins'
## The following objects are masked from 'package:datasets':
## 
##     penguins, penguins_raw

4.2

πŸ’» Recall from the first core Computer Lab that you can easily check the dimensions of your data using the dim, nrow and ncol functions. Use these now to assess the penguins data set.

Note: Check the Code button below if you would like a refresher.

# This code checks the dimensions of the penguins data set
dim(penguins)
🎧 Online students πŸ’¬ Copy your result from the RStudio console and paste it into the chat.

4.3

πŸ’» Use the summary function to obtain a quick overview of the penguins data set.

Don’t worry too much about the values shown in the summary table - the main things to note at this stage are the different variables.

4.4

πŸ’» Often, when we begin working with a new data set, it is helpful to take a quick look at some of the recorded values. We can use the head function to look at the recorded values for the first 6 observations in a data set.

Try using the head function now, with the penguins data set. What do you observe?

🎧 Online students πŸ’¬ Copy your result from the RStudio console and paste it into the chat.

4.5

πŸ’» When a data set has multiple columns of information, we can assess the information in specific column by writing the name of the object, adding a $ at the end, and then writing the name of the specific column we would like to inspect.

For example, we could use the following code to check the recorded bill length measurements of the penguins:

penguins$bill_length_mm

Run this code now.

❓Note When we have a large data set, output may appear over several lines in the Console. The numbers that appear in brackets to the left of each line of output are not observations. Rather, these denote the position number for the first observation on that line of output. E.g. a [17] would denote that the observation directly to the right of the [17] is the 17th recorded observation in the data set, for the variable being considered.

4.6

πŸ’» Try using this $ approach to check the recorded bill depths and body masses of the penguins.

Note: Notice that once you type the $, RStudio will helpfully prompt you with possible selections.


🏑 Reconvene in main room to discuss results


That’s the end of the first data science computer lab!

Hopefully you have enjoyed this first computer lab, and now have a better idea of just how versatile R can be (particularly when using the helpful RStudio GUI). Don’t worry if some of the code seems difficult at the moment - this is only the first lab after all!

In the next data science computer lab we will continue working with the palmerpenguins data set, and cover how to create interactive plots using a new package.

Important Notes

  • If you have any questions about the content in this lab, or are stuck on some R code used in the lab, please ask your lab demonstrator for assistance.

  • Make sure to save your R script file somewhere safe - it may be a helpful reference source for later work.

  • Make sure that you finish off your readings of the Introduction to R content on LMS prior to the second data science computer lab.


References

Horst, Allison Marie, Alison Presmanes Hill, and Kristen B Gorman. 2020. Palmerpenguins: Palmer Archipelago (Antarctica) Penguin Data. https://doi.org/10.5281/zenodo.3960218.
Ooms, Jeroen. 2021. magick: advanced graphics and image-processing in R. https://docs.ropensci.org/magick/.
Yu, Guangchuang. 2021. meme: create memes in R. https://github.com/GuangchuangYu/meme/.


These notes have been prepared by Rupert Kuveke. The copyright for the material in these notes resides with the author named above, with the Department of Mathematical and Physical Sciences and with La Trobe University. Copyright in this work is vested in La Trobe University including all La Trobe University branding and naming. Unless otherwise stated, material within this work is licensed under a Creative Commons Attribution-Non Commercial-Non Derivatives License BY-NC-ND.

