Data Science Stream
Topic 1B: Using RStudio
Welcome to the first Data Science computer lab for STM1001!
Throughout the semester, we will use the R software environment in our computer labs and assessments.
R is free, flexible, and used by millions of people for statistical computing and data visualisation.
Learning R can be challenging at first. To make our learning experience more enjoyable, we will be using RStudio rather than base R for all our R coding. RStudio is an integrated development environment (IDE) for R, and offers several helpful features and user-interface options missing from base R.
In this first Data Science computer lab we will take things slowly, and focus on practicing and reinforcing key R coding skills you began developing in the first core Computer Lab, via some light-hearted examples. A solid R foundation will ensure that in subsequent computer labs, you will be able to pick up and apply new R coding skills more easily.
By the end of this lab, you should feel comfortable using simple R commands, creating and naming new objects, installing, loading and using R packages, and saving images generated in RStudio.
π§ Reminder: Online students
Throughout the computer lab question sheets, you will see emojis and/or collapsible sections like this one. Each emoji has a particular meaning and will sometimes be associated with additional instructions:
Prompts for you
π¬ Write your answer in the chat.
Modes at different times during the lab
π‘ Main room. All together in the main room β your computer lab demonstrator will be presenting information or facilitating class discussion
π‘ Breakout rooms. Person with birthday closest to (your computer lab demonstrator will pick a random date) shares their screen or whiteboard. Here you will discuss a question together and bring your groupβs answer back to the main room.
π» Focus mode. You will still be in the main room, but working independently. All students will be sharing screen during this time so that your computer lab demonstrator (but not other students) can see your screen.
π« Reminder: Face-to-face (blended) students
Throughout the computer lab question sheets, you will see emojis and/or collapsible sections like this one. You can ignore the emojis and collapsible sections, as they contain information relevant to students who are studying online.
Checklist
π‘ Before we continue, make sure that you have done the following:
- Installed R and RStudio on your personal device (this will be helpful for assignment work, even if you intend to complete the computer labs on university computers)
- Completed the first core Computer Lab
- Looked over the different books in the Introduction to R content on LMS
- Confirmed that you are in the correct stream
If you have any questions about any of these items, please ask your computer lab demonstrator for assistance.
Installing new R packages
π» R contains many in-built functions, and by itself is perfectly sufficient for a number of data analysis methods.
However, one of the great benefits of R is that anyone can create packages (bundles of code, data and functions) which can be uploaded to global repositories (such as CRAN), and made available for anyone around the world to download and use in their version of R.
These packages are often extremely helpful. They may contain useful data sets for a specific field of research, address a shortcoming with the base R suite of functions, allow users to perform specialised analyses, and/or offer users some additional functionalities missing from base R.
On the other hand, sometimes user-created packages are more light-hearted, or are pet projects not necessarily intended for serious data analysis. One such example is the meme
package (Yu 2021), which allows users to create simple memes using R code.
Letβs install and load the meme
package in RStudio now.
π» Because the meme
package is not installed in base R, we need to download it before we can use it.
Recall that we can use the install.packages()
R function to do this. Run the code below to do this now:
install.packages("meme")
# Note that the package name must be surrounded by quotation marks when using the install.packages function.
π» When you install an R package, you will see some text appear in the RStudio Console
window. Often, some of this text is red, which at first can be unnerving - you may think that an error has occurred. Donβt worry! While it is always a good idea to check these red text messages, they are not necessarily errors, and often can be safely ignored.
If you check the final part of the text output, which is in black not red, you should see something like package βmemeβ successfully unpacked and MD5 sums checked
. This is reassuring - the package has installed correctly, and the red text above is just telling us how R went about installing it.
Note: If you see a message about installing Rtools
, ignore it, we donβt need it for the purposes of this lab.
π» Once the meme
package is downloaded and installed, we need to load it in our current RStudio session.
Run the following code to load the meme
package.
library(meme)
Making memes in RStudio
π» With the meme
package installed and loaded in RStudio, we can now start to make some simple memes!
We really only need two lines of code for this, as we will demonstrate with the following example.
π» Firstly, we need to find an appropriate image. For this example, we will use an image of Hagrid, from the Harry Potter series. We have located this image online, and copied the url. In RStudio, we assign this url to the object hagrid
, as shown below:
hagrid <- "https://i.imgflip.com/13wb2t.jpg"
# Note that the url needs to be contained within quotation marks
Make sure to run this code before moving on to the next step.
Note: For a refresher on objects, check the R Coding Fundamentals book in the Introduction to R content on LMS.
π» Next, we use the meme
function (which is only available because we installed and loaded the meme
package) to add some words to this image. Try running the code below, and see what happens.
meme(hagrid, "Yer a wizard", "with coding", font = "sans")
Note: Some warnings may appear in the Console
section of RStudio as this code is executing. Donβt worry about them, it is safe to ignore these warnings.
π§ Online students
π¬ After running the code, a meme should have appeared in RStudio. Take a snippet/screenshot of the meme and copy-paste it into the chat.
How to save RStudio images
π» If you would like to save the meme you have made, we have several options, which we will demonstrate now, via another example.
In the code below, we make a new meme, and assign it to the object success
, using the assignment operator <-
. Try running this code now.
success_kid <- "http://i0.kym-cdn.com/entries/icons/mobile/000/000/745/success.jpg"
success <- meme(success_kid, "Using R", "to make memes", font = "sans")
success
Hint: Notice that we need to include the final line of code, calling the object success
, in order for the image to be shown.
Option A: Base R function
π» We can use the base R function savePlot
to save an image created in RStudio.
To do so, the image needs to have been produced in a separate graphics device.
- If you are using a Windows OS, you can run the code
windows()
to open a separate graphics device in RStudio.
- If you are using a Mac OS, the equivalent function is
quartz()
.
Note: Separate graphics devices for each image created in RStudio can be helpful if you want to see several images at once. Technically you can also save the image via the graphics deviceβs menu bar.
Take a look at the code below, and then run it, to save your success kid meme to your current working directory.
windows() # or quartz(), if you are a Mac user
success # render the image in the graphics device
savePlot(filename = "successkid", type = "png")
Option B: A function from a specific package
π» Sometimes packages will include a custom function for some operation, such as the saving of a file. These can often offer additional utility to the default alternative.
The meme
package has a specific function, meme_save
, for saving memes. Take a look at the code below.
meme_save(success, file="c:/STM1001/Data Science/success_kid_R_meme.png")
Here, the function allows us to specify the save location of our file. We are saving our success
meme to the example file location c:\STM1001\Data Science\
, with the name success_kid_R_meme.png
.
Note: Although the file path on our computer includes backslashes (\
), in R code these need to be changed to forward slashes (/
).
Option C: Via the RStudio Plots window
π» Instead of using R code, we can save our image manually, by navigating to the Plots
window in RStudio, clicking Export
, and selecting either Save as Image...
or Save as PDF...
, as shown below:
Try this now, and save your success
meme as a pdf.
π» Now itβs time to make your own meme in RStudio. Follow the steps below:
Find an appropriate image of your choice online (please ensure you pick content suitable for university and work).
Copy the url.
Assign this url to an object in RStudio.
Use the meme
function to add words to your image.
Save your meme using either the savePlot
function, meme_save
function or the manual approach.
βHint
Hint: If you are not quite sure how to begin, click the Show
button to the right below.
# First, we need to find an image, and assign it to an object
# (here we use the generic object name 'image_name')
# Just replace the ...s with the url of your image
image_name <- "..."
# Next, we need to use the meme function, to add some words (just replace the ...s)
my_meme <- meme(image_name, "...", "...", font = "sans")
# Note that you need to include the `, font = "sans"` part to ensure R know which font to use.
# Now all that's left is to save your meme - just refer to the code above.
π§ Online students
π¬ Once you have created your meme in RStudio, take a snippet/screenshot of it and copy-paste it into the chat.
Congratulations! You were probably not expecting to make a meme in your first data science computer lab. While this wonβt be on the final exam, the R skills you are developing here are important, and hopefully you are starting to realise that R is very versatile.
π‘ Reconvene in main room to discuss results
Customizing GIFs in RStudio
π» R is not limited to working with static images - we can modify and create GIFs and animations (and in future weeks we will make animated, interactive graphs using real data).
In this section, we will use another fun package, the magick
package (Ooms 2021), to customize a GIF.
π» Run the following code to download, install and load the magick
package in your current RStudio session.
install.packages("magick")
library(magick)
π» Just as we obtained online images of hagrid
and success kid
, so too can we use urls to GIFs and animations.
For this example, we have used the url to a GIF of a rotating Earth.
We can use the image_read
function to read this GIF into RStudio. Run the code below to assign it to the object Earth
.
Earth <- image_read("https://i.giphy.com/media/mf8UbIDew7e8g/giphy.gif")
Earth
Make sure to run this code before moving on to the next step (donβt worry if it takes a few seconds). The GIF should appear in the Viewer
section of RStudio.
π» Using the magick
package, we can easily make some changes to this Earth
GIF.
Run the following code, and inspect the output.
rev(Earth) %>%
image_flip() %>%
image_annotate(" Meanwhile, in Australia", size = 40, color = "white")
You will notice here that:
- We have reversed the GIF, using the
rev
function
- We have flipped the GIF, using the
image_flip
function, and
- We have added text to this GIF using the
image_annotate
function
π§ Online students
π¬ Once you have created your GIF, take a snippet/screenshot of it and copy-paste it into the chat.
This is really just scratching the surface of the magick
package. For the moment though, letβs move on.
Palmer Penguins Data Set
π» Now that we have had a taste of some of the more light-hearted R packages out there, letβs consider a package which contains some useful data.
The palmerpenguins
R package (Horst, Hill, and Gorman 2020) contains data, collected over the course of several years, on 3 species of penguin living on different islands in the Palmer archipelago, off the coast of Antarctica. Over the course of the next few data science computer labs, we will create various interactive data visualisations using the penguins
data from this package.
For more details on the penguins
data set, and a taste of whatβs ahead in future labs, you can refer to Section 2 of the Data Visualisation in R supplement.
For this lab, letβs use some R functions to inspect the penguins
data set.
π» Just like the previous packages, to begin we will need to download and load the palmerpenguins
package.
Using what you have practiced earlier in this computer lab, install and load the palmerpenguins
package in RStudio.
##
## Attaching package: 'palmerpenguins'
## The following objects are masked from 'package:datasets':
##
## penguins, penguins_raw
π» Recall from the first core Computer Lab that you can easily check the dimensions of your data using the dim
, nrow
and ncol
functions. Use these now to assess the penguins
data set.
Note: Check the Code
button below if you would like a refresher.
# This code checks the dimensions of the penguins data set
dim(penguins)
π§ Online students
π¬ Copy your result from the RStudio console and paste it into the chat.
π» Use the summary
function to obtain a quick overview of the penguins
data set.
Donβt worry too much about the values shown in the summary table - the main things to note at this stage are the different variables.
π» Often, when we begin working with a new data set, it is helpful to take a quick look at some of the recorded values. We can use the head
function to look at the recorded values for the first 6 observations in a data set.
Try using the head
function now, with the penguins
data set. What do you observe?
π§ Online students
π¬ Copy your result from the RStudio console and paste it into the chat.
π» When a data set has multiple columns of information, we can assess the information in specific column by writing the name of the object, adding a $
at the end, and then writing the name of the specific column we would like to inspect.
For example, we could use the following code to check the recorded bill length measurements of the penguins:
penguins$bill_length_mm
Run this code now.
βNote
When we have a large data set, output may appear over several lines in the Console
. The numbers that appear in brackets to the left of each line of output are not observations. Rather, these denote the position number for the first observation on that line of output. E.g. a [17]
would denote that the observation directly to the right of the [17]
is the 17th recorded observation in the data set, for the variable being considered.
π» Try using this $
approach to check the recorded bill depths and body masses of the penguins.
Note: Notice that once you type the $
, RStudio will helpfully prompt you with possible selections.
π‘ Reconvene in main room to discuss results
Thatβs the end of the first data science computer lab!
Hopefully you have enjoyed this first computer lab, and now have a better idea of just how versatile R can be (particularly when using the helpful RStudio GUI). Donβt worry if some of the code seems difficult at the moment - this is only the first lab after all!
In the next data science computer lab we will continue working with the palmerpenguins
data set, and cover how to create interactive plots using a new package.
Important Notes
If you have any questions about the content in this lab, or are stuck on some R code used in the lab, please ask your lab demonstrator for assistance.
Make sure to save your R script file somewhere safe - it may be a helpful reference source for later work.
Make sure that you finish off your readings of the Introduction to R content on LMS prior to the second data science computer lab.
References
Horst, Allison Marie, Alison Presmanes Hill, and Kristen B Gorman. 2020.
Palmerpenguins: Palmer Archipelago (Antarctica) Penguin Data.
https://doi.org/10.5281/zenodo.3960218.
Ooms, Jeroen. 2021.
magick: advanced graphics and image-processing in R.
https://docs.ropensci.org/magick/.
Yu, Guangchuang. 2021.
meme: create memes in R.
https://github.com/GuangchuangYu/meme/.
These notes have been prepared by Rupert Kuveke. The copyright for the material in these notes resides with the author named above, with the Department of Mathematical and Physical Sciences and with La Trobe University. Copyright in this work is vested in La Trobe University including all La Trobe University branding and naming. Unless otherwise stated, material within this work is licensed under a Creative Commons Attribution-Non Commercial-Non Derivatives License
BY-NC-ND.
