Overview

The main goal of this lab is to help you install R and RStudio, which we will be using throughout the course both to learn the statistical concepts discussed in the course and to analyze real data and come to informed conclusions. R is the name of the programming language itself and RStudio is a convenient interface.

As the labs progress, you are encouraged to explore beyond what the labs dictate; a willingness to experiment will make you a much better programmer (scary, I know). Before we get to that stage, however, you need to install the program.

Getting Started

First thing, first. Getting R installed. This will be more frustrating and finicky then any of us would like, but it is doable, and once done need not be repeated (until you get a new computer). It is also worth learning how to navigate install procedures, especially for a program that has a lot of use in research and industry. In addition to the instructions here, there are two videos posted on Canvas, one for Mac users and one for PC users, that will take you through the installation.

R

Regardless of the operating system, the installation of R and RMarkdown starts in the same place: the web. We can find R at: https://www.r-project.org/

The link for the download appears in second paragraph, but it unfortunately takes us to a rather daunting looking page call the CRAN Mirrors (CRAN stands for Comprehensive R Archive Network).

Essentially, these are institutions around the world that are providing the archiving for the R program and packages. All we need to do is find the one closest to us by scrolling down the list to where is says “USA” and click on the link for one of the two locations in Pittsburgh (we’ll assume they are equally close).

Once we have clicked on the link we will be sent to yet another webpage that provides us with Download links for R. This is where the journeys will diverge for Windows and Mac users (Linux users too, but those instructions are available by request only).

Windows

If Windows is our chosen operating system, we will have clicked on the link for “Download R for Windows” and been taken yet another webpage. We want the ‘base’ subdirectory. We can click on either the word ‘base’ or where it says ‘install R for the first time’. This will take us to yet another webpage (sensing why this can get frustrating yet?).

This really is the page that will allow us to download the .exe file to install R, which happens when we click on the link “Download R 4.1.1 for Windows”. The install file should then appear in the bottom left hand of our browser window (without the ‘(2)’). Click on the arrow next to the file name and choose “Open”.

Once we do this, we will be asked if we want to allow this app to make changes changes to our system - the answer is “Yes”. R will then ask us what language we want to use during the installation. The default is English, but there are over 20 languages to choose from, so we have options if English is not in fact our preferred language. Click ‘OK’ once the correct language is chosen. This will then take us to an Information page that we can read, and then click ‘Next’ when we are ready to move on to the next step of the installation.

Once we do that we will be led through a series of windows asking us
1. The folder where we would like R installed
2. The components we would like installed
3. Whether we want to customize start up options
4. Where the setup should place the program’s short cuts
5. What additional tasks we would like done

Click ‘Next’ to move through each window and accept the defaults. We can do differently, but why make our lives more difficult? Once we get to the 5th window R will begin installing and we will get a window telling us we are just about finished. Indeed, our only option to click now is a button labeled ‘Finish’.

Congratulations! R should now be installed on our computers. If we open the program at this stage we will get the R GUI, which can be a bit intimidating. We will actually be accessing R through a different interface called RStudio, and we will go through the instructions for that after we have also told our Mac users how to get R installed.

Mac

If Mac is our chosen operating system, then when we clicked on the link for “Download R for macOS” we will been taken to yet another webpage. We will want to make sure we download the version of R that is appropriate for our operating system. So if we are using macOS 10.3 (High Sierra) or higher, we will want to click on the link for R-4.1.1.pkg. Otherwise, we will need to scroll down and find the version of the package that will work with our operating system (it goes back to macOS 10.6 (Snow Leopard)).

Once we have done that, our computer will start downloading the .pkg package, and eventually a little brown box will appear in the bottom right hand corner next to the trash bin. If we can’t see it there, we will need to go to the Finder and open Downloads.

When we hover our mouse over the box we will be given the option to open it in Finder, which we want to do. That will take us to the Downloads, where we should be able to see a package file called R-4.4.1.pkg (although it will have a different name if we had to download a different version of R to work with our operating system).

Double clicking on the .pkg file will begin the R installation process with a question as to whether or not we want to allow downloads - the answer is yes. The installer will then begin and we will be presented with a window letting us know that the installation process is about to begin.

Once we have clicked on “Continue” we will be presented with a series of windows providing us with:

1.Information on R
2.The license agreement
  a. A pop up window asking us to agree to the terms
     of the license agreement - click “Agree”
3.The destination folder for R
4.Standard Installation

Click ‘Continue’ to move through each window and accept the defaults. We can do differently, but why make our lives more difficult? Once we reach the 4th window the “Continue” button will be replaced with “Install”. At the start of the installation we may be asked for our permission and password to begin the process - please provide yours. Once we do that, then congratulations! The install was successful. We can go ahead a move the installer to the trash bin when our computer prompts us to do so.

To find R on our Mac we will need to look in the Applications folder, where it should be in the alphabetical list of available applications under R.

If we open the program at this stage we will get the R GUI, which can be a bit intimidating. We will actually be accessing R through a different interface called RStudio, and we will go through the instructions for that next.

RStudio

RStudio is what is known as an Integrated Development Environment (IDE), which are used to consolidate the tools required to write and test code. It is an interface for R that is ultimately intended to make our lives easier when it comes to statistical programming. It can take some time for us to get to that point, but it is worth it in the end, even if the path to it can be a bit rough and painful at times. It is available in two formats - a regular desktop application or on a remote server. We are going to use RStudio Desktop, which is the desktop version. We can get started with the download of RStudio by going to its webpage: https://www.rstudio.com/. Tucked away in the upper right hand corner is a link that says “DOWNLOAD”. We want to click on that.

This brings us to another page where we can download the RStudio IDE and where it gives us the option to chose our version. If we scroll down we are given four options: RStudio Desktop, RStudio Desktop Pro, RStudio Server, and RStudio Workbench. We want RStudio Desktop.

This takes us to the steps needed to download RStudio, the first of which is to have downloaded R. Fortunately, we have just completed doing that. Assuming R is downloaded, the site very helpfully recommends which installer to use for our machine.

Windows

Once again, in the bottom right hand corner of the screen we will find the “.exe” file. When we open it we are asked if we want to allow the program to make changes, to which our answer is a resounding “Yes!”. Once we do that we will be led through two windows asking us:
1. The install location for RStudio
2. The start menu folder

Click ‘Next’ to move through the first window and accept the defaults. Again, we are pretty happy with the defaults. After the second window RStudio will begin installing, and we will get a window telling us that we will be finished as soon as we press the button labeled “Finish”.

RStudio has been successfully installed. Just a few more things until we can begin to play with our new program.

Mac

When we click on the big blue button a .dmg file will be downloaded, and should appear in the bottom right hand side of our screen next to the trash bin. Depending on how our computer is configured, we may need to give permission for the download from rstudio.com - go ahead and allow it. Once the file is downloaded, we will open it in Finder, where we should see a file called RStudio-1.4.1717.dmg in Downloads.

Double clicking on the file will open yet another window, this one specific for RStudio, where we will see an icon for the Applications folder and for RStudio. We can then drag and drop the RStudio icon into the Applications folder and straight to our taskbar.

If we now go to open RStudio, Mac will take some time verifying it and then ask us if we are sure we want to open it, since we downloaded it from the internet. Yes, we are sure that it is safe and that we will have to use it for the course (want to use it is a whole other thing), so we will click on “Open”.

RStudio has been successfully installed. We are almost ready to play with our new programs.

RStudio Workspace

Once we have successful installed RStudio, it will be available for us to use anytime we like. When opened on its own, we will see three windows - the R console, the environment and the files and plots.

What we will want to do in this case is also open a file in which we can edit code. This can be done by clicking on “File” -> “New File” -> “R Script”. There are other options for the type of code we will be editing, but R Studio allows us to toggle between them once the window is open, so for now we won’t worry about which one we create at the start. Once we have opened a new file, R Studio will show us four windows.

Now we are free to start writing our code. However, that is a very daunting process when we have never seen R code before, so we will not be starting with a blank page. Instead, a starter file for every lab will be provided, with the expectations of what we can achieve on our own growing throughout the semester. We will talk more about what this looks like in the RMarkdown section of the document.

Packages

In addition to all the functionality built into the base R program, there are additional tools we will want to access that need to be installed separately. These are located in what are known as packages. There are currently more packages out there then we probably want to think about, all written by people who want to help others do statistics, machine learning, data science and visualizations. There are a couple key ones that we are going to use for STAT 141-C, including RMarkdown and the tidyverse, so we will go through how to install them now.

You will need RStudio open in order to install a package. We are then going to look at the window in the lower left-hand corner, which is where you will find information on plots and files, and click on the tab labeled ‘Packages’.

When we do this, we will then be brought to a User Library with an alphabetical list of all the R packages that are already installed. Since we are interested in a new package, we will need to click on the install button.

Once clicked another window will open up asking us where we want to install from (use the default - ‘Repository (CRAN)’), the packages we want to install, and where we want them to be installed (use the default - it will be a different path name for all of us). If we start writing the name of the package (RMarkdown in this case) into the blank space provided for us, R will start to bring up an alphabetical list of the packages that match what we have typed so far. Once we see the name of the package, or we have finished typing in the name, we are just about ready to go. Only thing to make sure of before we click ‘Install’ is that we have also marked the box “install dependencies”. Dependencies are when packages require packages in order to run correctly. By automatically installing the packages that the package we want are dependent on we save ourselves some effort and miss out on error messages telling us that packages are missing. If that box is ticked, click where is says ‘Install’.

One of the nice things about using RStudio is that it can recognize when we reference a package in our code that is not yet installed. When this happens a little yellow banner will appear over the code editor warning us that some packages are needed but not yet installed. Not only that, but it also gives us the option to install the packages right there. Clicking ‘install’ here is just as effective a method of installing packages as the process we just went through above, so we should feel free to make our lives easier by using this function in RStudio.

Once a package is availble to us in the User Library, there is one more step we need to take to access all its functions. When we want to work with a package in R we have to load it into our workspace. Loading the packages we need will be one of the first things we do anytime we start working in R, and it can be done using the R function library. For now, we don’t need to worry about loading RMarkdown, since it will automatically be available to us in RStudio when we open the RMarkdown files (extension .rmd) that are provided as part of all the labs. So to see how loading an R package is done, we will go head and repeat the package install steps, but now for the package tidyverse.

Once we have the tidyverse installed, we can run the following code in the R Console (the window in the bottom right hand corner of RStudio):

library(tidyverse)

For most packages, once loaded nothing will appear to happen. We will press ‘Enter’ and the cursor will just move to the next line in the console. That is good! It means loading the package worked. Sometimes we will get blue text telling us the version a package was built in or whether a function in the package we have loaded has the same name as another package in our workspace (called masking). We don’t need to worry about these - our package has still loaded and we are good to go. When we load the tidyverse, we get a slightly more detailed response from R telling us exactly what it has done:

This is also good! The only time we should really worry at this point is when we get an error message in red text (it is “when”, not “if”. No matter how long we use R, we will always make mistakes that bring up error messages).

If we look at the image above we see that we have an error message that is telling us R can’t find the package we are interested in. This will happen for one of two reasons - the package hasn’t yet been installed, or we have misspelled the name. In this case, the problem is misspelling. There aren’t two d’s in tidyverse! Misspelling and commas in the wrong place are probably the most common mistakes in R code, no matter how good we get at it.

Some R error messages are helpful, some are helpful once we are more familiar with the programme and some just never make any sense. Google can be a good place to copy and paste error messages we don’t understand, since the answer is probably out there. However, understanding the solution can sometimes be as frustrating as understanding the problem. Remember to ask questions of the instructor and fellow classmates as we go - the idea is to have an exercise in learning, not an exercise in frustration.

R Markdown

RMarkdown is a package which can be used to provide us with quick, reproducible reports on the work we have done in R. It that allows fully executable R code to be embedded directly in documents, including outputs and figures, which saves a lot on copying and pasting, and creates a full record of what we have done in the analyses to be able to produce the report. In addition, the report can be outputted in multiple formats, including html, word and pdf documents.

R Markdown has been used to create all of the lab instructions, and the assignments we will submit will also need to be in RMarkdown. However, as mentioned in the previous section, starter files will be provided for all the labs so that we don’t need to learn everything all at once. The files will comes with the extension ‘.rmd’ and it will automatically open in RStudio, with the code editor set to RMarkdown for us.

For this lab, what we need to do is modify some of the text in the lab starter file and then do what is called “knitting the document”. This is when we ask R Markdown to knit together what we have written with our code to create a document we can share with others. When an RMarkdown file is open, the image of a ball of yarn with a knitting needle in it will appear slightly left of center. Clicking on this will knit the document to the default output, and there is also the option to choose the type of output we would like. In the case of our labs, the default output will always be a word document.

One of the things to note is that when you open a .rmd file, RStudio will automatically detect, and set, what is known as the ‘working directory’. A working directory is the file path used by R and RStudio to go looking for any files we want to upload into R (such as the data we collect), and where it will automatically save the files we create when we knit a document in RStudio. When we open a .rmd file, the working directory will always be set to the folder in which the file is saved.

A lot of error messages we will get are going to be about not being able to open files in R because the program things there is no such file or directory. What this means is that R has gone looking along the file path and was unable to locate what we asked for in the defined working directory.

We won’t worry about how to fix or re-define the working directory for now. The important thing is that we know where to go looking for the files we have created using RMarkdown.

Additional steps for Mac

Since Word documents (.docx) are Microsoft Office files, there are some extra steps we will need to take in order to create and view the RMarkdown document. One is to allow RStudio to have access and control to Microsoft Word. Click “OK” when presented with this prompt.

In addition, when we go to knit a document, our computer may ask us to give RStudio access to the folder where we saved the .rmd file. We will need to click “OK” in order to continue creating our document. If we haven’t accessed Microsoft Office on our computer before, we may also need to log in, either through our personal account or through Ursinus, to Office inorder to be able to view the document we have created.

If accessing Microsoft Office isn’t possible on our computer, RMarkdown can also be used to create pdfs. However, this requires the installation of yet another piece of software and some more packages, so we are going to try and skip it for the moment. If it is necessary, though, the instructor can be contacted to help get that set up.

Let’s Go!

If everything has gone smoothly (unlikely, but we remain ever hopeful), then we should now have installed R, RStudio and RMarkdown and are ready to give the first lab assignment a try. Go ahead and do the following:

1.Open the file Lab0 template.Rmd that is available on Canvas.
2.Install any new packages that will be needed to complete the lab
3.Follow instructions in the template to create the document for the first lab submission