Lab 2 - Using RStudio; Data Types & Structures

Author

Joseph Quinn, PhD

Introduction

Last week you did the following:

  • Installed R and RStudio
  • Downloaded a script and opened it in RStudio
  • Learned how to write your own documentation with #
  • Used the function runif() to generate random numbers from a uniform distribution
  • Learned how to query R’s Help documentation for a function by preceding it with ?
  • Stored the output from your calls to the function runif() as objects
    • (e.g., random.2 <- runif(n=10, min=1, max=10) stored 10 random uniform values between 1 and 10 in the object random.2. When you typed random.2 into the console multiple times, those same 10 numbers kept showing up.

You are already figuring out the basics of a new language.

Today we will build on those basics with a more formal introduction to programming in R. This lab will proceed in four parts.

  1. Setting your working directory
  2. Using the interface in RStudio
  3. Working with R’s four different data types (i.e., character, logical, numeric, and factor)
  4. Storing data as “objects,” AKA “data structures” (e.g., vectors and data frames)

We will do parts 1 and 2 (as well as “Part 0”) together. Parts 3 and 4 you will complete with a partner. Please answer all questions completely as instructed.

0. Find Your SOCY 392/data Folder and Save the Lab 2 Data

Before we begin to work in RStudio, locate the data folder you made last class. It should be nested within the SOCY 392 folder you also made during Lab 1. Download the file called lab2data.rda on blackboard and save that file in the data folder. You will explore this data set later on during the lab.

The file you just downloaded and saved to SOCY 392/data includes data about the songs of musical artists people mentioned in the extra credit “Bonus Homework Assignment” last fall. I’m waiting on a few more students to submit their bonus assignment this spring before I make this file for our class. If you submit your response to this homework assignment by this Friday at 11:59pm, your favorite musicians will be added to our class’s data set.

1. Setting your Working Directory

The first thing R always needs to know is where to read in any files that you want to work with, and where to export any files you want to save after cleaning up your data or doing an analysis. There are two ways to tell R how to do this. The first way is the point-and-click approach you learned in Lab #1: click "Session > Set Working Directory > Choose Directory…", then navigate to and select the "data" folder within "SOCY 392".

The second way is to write some code telling R where the working directory is. This method is more reproducible, because if you save this line of code in your script, R will automatically set your working directory when you open that script again and run it.

Try it out:

  • Type getwd() into the console (the panel to the left of RStudio when you open it) and hit enter. The console will now show you where your default working directory is on your computer. Here’s what the result looks like on my computer, in my RStudio console.

    The result (which will be slightly different on your computer) shows you the default location for reading and writing files. It is not what we want - which is why we need to change our working directory. Change the working directory to the "data" folder within the "SOCY 392" folder you made last lab. Finding the name of that file path is different on Macs and PCs:

    • For Macs: open a Finder window and navigate to the data folder. Right-click that folder, and a menu should appear. Press and hold the Option button on your keyboard, and the options in the menu will change. Click the option that says “Copy data as Pathname.”
    • If you’re on a PC: open a File Explorer window and navigate to the data folder. Right-click the folder while holding Shift , and select “Copy as path.”

    Return to RStudio, and navigate to the console. You will now use the setwd() function to set your working directory via code. First, type out the function. Then, within the parentheses of the function, paste the file path you just copied, and make sure the path is surrounded by quotes. All slashes in the path need to be forward slashes (/), not back slashes (\). If you have a PC, you will need to replace all of the back slashes with forward slashes after pasting.

    Check to make sure it worked by typing getwd() into the console once you finish. Here’s what the result looks like on my computer (which might be different than yours):

  • Now that you have done this, you can tell R to go find the data you downloaded and saved in the data folder in Part 0. Try this now: type load("lab2data.rda") into the console. Then type head(lab2data).

  • Want to see who else is in the data set beyond Luke Combs so far? Type unique(lab2data$artist_name) into the console and run the line of code.

2. Exploring the RStudio Interface

Whenever you open a clean instance of RStudio, you will usually see three panels like those in the screenshot below.

  • The left panel is called the “Console.”

  • The top right panel includes details about your “Environment.”

  • The bottom right panel will show you you “Help” (recall from Lab 1), “Plots” and “Packages” (which we’ll get to in Lab 3).

  • The Console is a big calculator. It’s also the place where all of the code you ever write in R will “execute” - though you might save all of that code in a separate “script,” like you did in Lab 1 (and will do in Lab 2, in just a few minutes).

    • Click inside of the console, type 2+3, then hit enter on your keyboard. You can multiply (*), divide (/), add (+), and subtract (-) numbers here.

    • You can also run functions here. Remember the function we used in Lab 1 to generate random numbers? Use it in the console now. Enter runif(n=10, min=0, max=1) into the console and see what happens.

    • You can even create objects here. Let’s create 20 random numbers and assign the result to an object called potato (remember, we can call objects whatever we want to - R does not recognize them as functions). Type potato <- runif(n=10, min=0, max=1) into the console and hit enter. To check and see if this worked, type potato and hit enter, or get a little fancier and type head(potato) to show the first 10 numbers of the object you created called potato.

  • The Environment (upper right tab) will show you all of the objects and data sets that are stored in R’s working memory. If you’ve been following along and opened a new RStudio session when we started this lab, you should now have two objects in your Environment tab: lab2data and potato. This panel of RStudio has other tabs that we will not use much in this class.

  • The Help, Plots, and Packages tabs all appear in the bottom-right panel. We will still only work with Help in Lab 2. We will explore the Plots and Packages tabs in Lab 3.

    • Whenever you query R for information about a function (i.e., when you put a ? in front of a function), the result will pop up in the Help tab. Try that now - type ?runif() in the console. Let’s also try if for another function you used during this lab: type ?head().
  • Now that you are familiar with these three panels of RStudio, it is time to add a 4th panel - the Script panel. Download the R script for your Lab 2 problem set (lab2script.R on BlackBoard), save it to SOCY 392/programs, and open it in RStudio. Notice that it opens above the Console panel.

    • The Console panel and the Script panel serve a similar purpose: they are both places where you enter code. But there are a few huge differences:

      • In the Console, whatever code you type gets executed right away.

      • In a Script, no code executes until you highlight the lines you’d like to run, and click the Run button in the top-right corner of the panel. You also keep a record of all of your work, which you can use again if you ever need to re-analyze your data or write a similar program.

      • Regardless of where you run the code from (the Script or the Console), all of the results will show up in the Console only.

    • We will almost exclusively use the Script panel for writing code in this class. But you must always check the console after running chunks of your script to see what happened and to check your work.

Begin Your Problem Set (Parts 3 & 4)

You have completed Parts 1, and 2. Get going on Parts 3 and 4 with a lab partner. Read through the instructions in the lab2script.R file you opened in RStudio carefully, and answer the questions completely by writing code, and by explaining your answers when appropriate with the comment function (#)

When you are finished, upload your answers to BlackBoard’s submission page. Use the file naming convention lab2_partner1last_partner2last.R. Put your last names in alphabetical order.