1 Welcome to the R Community

1.1 Why R?

If you are wondering why we are learning R instead of Excel for data analysis in this class, this couple of articles may be helpful:

R vs. Excel

“How not to Excel at economics”

How R is useful to data journalism

1.2 R vs. RStudio:

Here is a good illustration of the differences between R and RStudio (source: https://moderndive.com/1-getting-started.html):

Danielle Navarro also explains the difference this way:

The term R doesn’t really refer to a specific application on your computer. Rather, it refers to the underlying statistical language. You can use this language through lots of different applications. When you install R initially, it comes with one application that lets you do this: it’s the R.exe application on a Windows machine, and the R.app application on a Mac. But that’s not the only way to do it. There are lots of different applications that you can use that will let you interact with R. One of those is called Rstudio, and it’s the one I’m going to suggest that you use. Rstudio provides a clean, professional interface to R that I find much nicer to work with than either the Windows or Mac defaults.

1.3 Installing R and RStudio

Watch this introduction by Andy Field, which covers:

1.4 What does RStudio look like?

Take this quick tour of RStudio with Andy Field to get an idea of:

  • The four panes of RStudio and changing their configuration
  • Creating a new R Markdown document (Note: The R Markdown document is just an example here; we will not require you to learn R Markdown in this class.)
  • Getting help in R
  • Changing the RStudio theme

Back to top

2 R Preliminaries

2.1 Opening RStuido

As you have learned from Andy Field’s video (from the previous section), the R-studio interface consists of four “panels.” If you happen to see only three panels, please select File > New File > R Script.

The two most important panels are the “editor window” and the “console window”.

  • The editor window: where you type “commands” in the R language to tell R what you want it to do (you can think about it as input).

  • The console window: where R tells you what it has done with your commands (you can think about it as output).

Now, some tricks if you need to adjust your RStudio interface:

  • Check out this very cool site for tips on adjusting panels, rearranging panels, recovering lost panels, setting font size, etc. All gifs – easy to follow!

Back to top

2.2 Running Codes

Unlike other statistical software programs like Excel or SPSS, R does not provide point-and-click interfaces. That means you have to type in commands written in R code. When we say “running code” it means telling R to perform an act by giving it commands.

Let’s practice executing a simple command in R. Type in the editor window:

4 + 8

To execute this command (that is, telling R to add 4 and 8), place your cursor anywhere in the typed command. Click the button located on the upper-right corner of the editor window to “run” or execute this line. Alternatively, you may use keyboard shortcut: Ctrl + Enter (on windows) or Cmd + Enter (on Mac) to execute the command. If successful, the console window should show:

## [1] 12

You’ll note the [1] part in the output produced by R. For now you can think of [1] 12 as if R were saying “the answer to the 1st question you asked is 12” (not quite exactly the case but close enough for now).

Now try …

3 * 3

Success? You should get:

## [1] 9

Executing commands line-by-line can become tedious, and you can run multiple commands at once: Just select all the lines to be executed and again use the button (or the keyboard shortcuts Ctrl + R or Cmd + Enter). Now try running these two lines together in the editor window:

12 + 8
16 / 4

You should see these results in the console window:

## [1] 20
## [1] 4

Here are some basic arithmetic operations in R. You are encouraged to play more with these functions in R as a way to get more familiar with the R interface. And you can for sure use R as a calculator (for more complex calculations, make sure you do them in the right order).

Operation Operator Example Input Example Output
addition + 2 + 2 4
subtraction - 9 - 2 7
multiplication * 5 * 5 25
division / 12 / 3 4
power ^ 5 ^ 2 25

Back to top

2.3 Creating Objects

As an object-oriented programming language, R works with objects. An object is a “container” with the information that you assign to it. This “container,” with the label you give it, stores this information in R and can be called later to display the stored information or be used in further analyses.

To create an object, start by typing the label for the object. After that, we write <- (the assignment operator), followed by the content (values or other information) you’d like to store. The generic formula for creating an object looks like: label <- content

Suppose we want to store the result of 2 * 10 to an object, and let’s just give it a boring name, “result”. Type in the editor window

result <- 2 * 10 

and run this line. Notice that the console window does not display any specific result of this computation at this point. Behind the scenes, however, R has created an object result and given it the value of the multiplication 2 * 10. To display the content of the object, simply type result on a new line in the editor window and run it.

result  

The console window should display what is stored in the object result, which in this case is the product of 2 * 10

## [1] 20

Of course, this is only a very simple example, but you’ll see later how important and helpful objects are in R. We’ll now just do a couple of more things with this example.
Now, type result - 1 on a new line in the editor window and run this new line. The console window should display:

## [1] 19

Let’s store this calculation in a new object called result2. Type in the editor window:

result2 <- result - 1 
result2  

and run these lines.

Notice that the objects result and result2, which you just created, are listed in the environment window of the R-studio interface.

Back to top

2.3.1 Types of objects

R can store any sort of information. In other words, the object of R can be of different types. For example, you can create the following objects in R by typing in:

a <- 19.81     
b <- "Hello SLC"    
c <- TRUE    

You can check the type of data via the command class(). Type in:

class(a)
class(b)
class(c)

and you should see what type of data they are treated as in R.

## [1] "numeric"
## [1] "character"
## [1] "logical"

That is, a is numeric; b is a character string; c is what we call “logical” objects that are TRUE or FALSE. There are other types of objects, as you will encounter later. A data frame, such as the data files you have imported into R as part of your assignment, is also a type of object.

Back to top

2.3.2 Creating Vectors

So far, we’ve been looking at “objects” with a single value. Let’s look at multiple data points. Multiple data points can be combined into a single object as a vector. In short, a vector contains a series of values. As a simple example for illustration, we can create a vector use the function c() (short for concatenate).

Suppose we measured icecream intake for seven individuals on a summer day: 1, 2, 2, 3, 1, 0, 2 servings of icecream (let’s just assume a serving is 1/2 cup). Type these data (in the editor window) and run:

icecream<- c(1, 2, 2, 3, 1, 0, 2)
icecream

From the Console, you can see that you have created a vector object called icecream containing these values:

## [1] 1 2 2 3 1 0 2

Now, if we assume all the icecreams have the same fat content: 7 grams of fat in one serving (1/2 cup). We can then calculate the fat consumption for these 7 individuals by eating icecreams:

icecream.fat<- icecream*7
icecream.fat
## [1]  7 14 14 21  7  0 14

Suppose we also have recorded the gender for the seven individuals: We can create an object gender. Type and run:

gender <- c('male', 'female', 'male', 'female', 'male', 'female', 'female')
gender
## [1] "male"   "female" "male"   "female" "male"   "female" "female"

Note that “gender” is a character vector, as its elements are categorical labels, not numerical values.

Back to top

2.4 R Commands/Functions

R does almost everything through “commands” or “functions” (used interchangeably). These command gives the computer instructions on what task to be completed. You can think of commands as the verbs of R – telling R what to do. The generic form of a function is function(object)– That is, the name of the function followed by what is given as the input of the function. What you’ve tried above– c()– is a command/function. Here are some other examples:

print("Hello SLC!")
## [1] "Hello SLC!"

Try more of the print function:

print(3 + 5)
## [1] 8

Or:

print(3 > 5)
## [1] FALSE

You can see that in the last example, we asked R to assess whether 3 is greater than 5, and R tells us it’s not (smart!).

You can also ask R to perform a series of commands by running multiple lines together:

x <- 3 + 5
print(x)
print(x*2)
## [1] 8
## [1] 16

Now, going back to the icecream example in the previous section. If we want to see the average number of icecreams across the 7 individuals, we can use the command mean() [put the vector object “icecream” in the ()]–

icecream<- c(1, 2, 2, 3, 1, 0, 2)
mean(icecream) #mean is the command for getting the average: in this case the average across the values in the vector
## [1] 1.571429

If we want to keep only two decimal points for the mean, we can use the command round

round(mean(icecream), digits = 2) 
## [1] 1.57

As an exercise, you can try getting the mean for the fat consumption through icecream eating (e.g., icecream.fat that we created in the previous section), and keep 3 decimal points. Can you do it?

Try the function rm to remove an object! For example, if you don’t want the object “x” you created above any more, type in:

rm(x)

Or, to remove icecream, type in

rm(icecream)

The objects x and icecream should no longer be in your R environment. If you want to remove all objects in your environment all at once, use the following command (exactly as is written out here):

rm(list=ls())

Here we highlight some functions for basic computations:

  • sqrt(), calculates the square root of a (set of) numeric value(s);
  • log(), calculates the natural logarithm of a (set of) numeric value(s);
  • round(), rounds a (set of) numeric value(s), by default to the nearest integer (whole number);
  • sum(), calculates the sum of a set of numeric values;
  • mean(), calculates the mean of a set of numeric values.

For a full list of the basic R functions (i.e., functions in base R, requiring no additional packages), please see here.

Back to top

2.5 Working with Data Frames

For the icecream exercise above, we created three vectors representing 7 individuals: gender, icecream consumption, and fat consumption. (You can recreate those again by running these commands again. See below –)

icecream<- c(1, 2, 2, 3, 1, 0, 2)
icecream.fat<- icecream*7
gender<- c('male', 'female', 'male', 'female', 'male', 'female', 'female')

We can take it one step further and combine these three vectors into one dataset (i.e, Individual 1: Male, 1 serving of icecream, 7g of fat consumption; Individual 2: Female, 2 servings of icecream, 14g of fat consumption, etc.). We can create a data frame by using data.frame(). “Data frames” are going to be the format of data you’ll use most of the semester. You’ll be working with existing data, though, which means you don’t have to create such data frames yourself. But let’s just do this exercise to better understand what a data frame consists of and how it works. Let’s call the data frame we want to create “mydata”. Type in:

mydata <- data.frame(gender, icecream, icecream.fat)

Run the above line, and you’ll notice in the “Global Environment” panel of your RStudio that mydata is listed under “Data” as an object.

You can either directly click on “mydata” in the “Global Environment” section, or type in

View(mydata) 

to see what you have created. There should be three columns (see below), representing three variables for the 7 individuals: their gender, icecream consumption (i.e., # of servings), and fat consumption.

Since the information of the three vectors is stored in mydata, we can now remove the three separate vector objects from the “Global Environment” to avoid confusion.

rm(icecream) 
rm(icecream.fat)
rm(gender)

These objects have now disappeared from your Environment panel; and if you type and run icecream now, you will get an error message:

icecream
## [1] 1 2 2 3 1 0 2

However, since the information is stored in mydata, you can extract this vector (or variable) from mydata by using the $ sign as follows:

mydata$icecream

We can apply commands/functions to this vector/column, such as:

mean(mydata$icecream)
## [1] 1.571429

And to see how many males vs. females were in this dataset, you can type in:

table(mydata$gender)
## 
## female   male 
##      4      3

If you’d like to see the icecream consumption for females only, we can extract the corresponding information for the “subset” by using the [] (which tells R to only give you the information when the specified conditions are met) and a logical operator == (which means “equal to”):

mydata$icecream[mydata$gender == "female"] ## make sure the " " contains exactly the phrase when you used in your script.
## [1] 2 3 0 2

To get the average of icecream consumption for this group, you can just command R to calculate the mean() by treating the above subset as an object in the ().

mean(mydata$icecream[mydata$gender == "female"])
## [1] 1.75

Other logical operators include:

  • < (less than)
  • <= (less than or equal to)
  • > (greater than)
  • >= (greater than or equal to)
  • != (not equal to)
  • == (equal to)
  • & (and) (e.g., a == 3 & b < 4 means “a is equal to 3 AND b is less than 4”)
  • | (or) (e.g., a == 3 | b < 4 means “a is equal to 3 OR b is less than 4”)

So to do the same calculation we did above, we can also specify the condition as “not equal to ‘males’”–

mean(mydata$icecream[mydata$gender != "male"])
## [1] 1.75

Pretty cool, right? We’ll be learning many more such tricks during the semester.

Maybe helpful– Differences between the logical operators “AND”, “OR”, & “NOT”:

Back to top

2.6 R Packages

Many R functions (such as what you see above) come with the standard (basic) R installation. But many others are associated with specific packages. R packages are building blocks for doing different kinds of analyses in R. To use these functions, you will need to install the packages first. For example, for your assignment, the function freq is part of the package descr and to use this function you need to first load the descr package.

R packages expands the capacity of R by providing additional functions, contributed by a worldwide community of R users, with free access. Currently (as of Jan 2021), there are 17,041 R packages! As you get to know R more, you’ll realize that often different packages may contain functions that focus on similar analyses. When you Google how to do a certain analysis using R, for example, you may sometimes find a variety of ways citing different functions belonging to different packages. [Potential indicators of “good” packages (advice from Wolfgang Viechtbauer):
- written by a known expert in the field - package has been around for some time - package has been updated - listed under one or multiple task views - has a ‘vignette’ or other supporting documentation - paper/book about package has been published - help files are comprehensive and free of errors - has been cited in papers - …
]

An analogy for R packages is that they are like the apps on your Smartphone (source):

Like downloading an app and opening it on your phone, for an R package, you need to:

  1. Install the package – like installing an app on your phone – you only need to do it once on your computer unless you would like to have an updated version of the package.
  2. “Load” the package – like opening an app on the phone when you need to use it – you need to “load” a package each time you start RStudio if you need to use an associated function.

To install a package, you can use one of the two ways:

  1. From the bottom right panel, you can click Packages and then Install and type in the name of the package to be installed:


  1. Or you can type in the command in your R script (i.e., the Editor panel) install.packages("packagename"). For example, to install the package descr you can simply type in:
install.packages("descr") 

Note: when you install, use quotation marks in the command line!

To load a package after the package has been installed, you can use the library(packagename) command. For example,

library(descr) 

Note: When loading a package, there are no quotation marks!

Back to top

3 Tips, Habits, Workflow, etc.

3.1 On labeling/naming your objects:

  1. Please remember that R is case-sensitive. So, result, Result, and RESULT are all different objects in R! In general, it is good practice to only use lower case letters! Take this advice to heart– it will help prevent some unnecessary confusions and frustrations.

  2. Object names cannot include spaces: therefore my result is not a valid name, but my.result or my_result is. You are also allowed to use numerical characters 0-9.

  3. Use short, informative names. Obviously there’s some tension between using informative (which tend to be long) vs. short (hard to be specific) names. So use a bit of common sense when trading off these two conventions.

3.2 Save your R Script

In the above examples, you typed some commands in the editor window. These different lines of code form an “R script”. You should save your R scripts regularly. To save your R script, go to File > Save As… (or File > Save). By default, the script will be saved using an .R extension (a “dot R file”).

3.3 Annotate your R script

When writing R code, you may want to include explanations to your future self or to other team members. This can be done by adding comments to your R commands in the script, starting with the #. R will see what follows the # sign as your note and ignore it when running the code. We strongly recommend that you include such annotations, wherever you feel necessary. Here’s an example of some comments:

# let's define an object 
x <- 3 + 5

y <- x + 2 # adding 2 to x and assigning the result to y

z <- x*y # get the product of x and y

print(z) # use the function print() to display the result of z 

mean(c(x, y, z)) #to get the average across x, y, & z, use function c() --that is, to combine 
##these three elements first, and then the function mean()

When you run this script, you will see that the R recognizes the commands and ignore the comments as designated by #. Yet when you save this script, you will have these notes to go back to in the future. The commands above should produce the result of z and the mean of x, y, z in the console:

## [1] 80
## [1] 32.66667

You can also use multiple ## or ### or ##### – it odes not matter how many. Notice that it’s possible for a line to contain only a comment (e.g., the first line in the example above), or for part of a line to be a comment (e.g., for the last few lines in the above example).

3.4 How to get better

3.4.1 Some tips on learning and practicing

Tips on learning to code source

3.4.2 Be patient: It takes a while.

source

“Whenever you’re learning a new tool, for a long time, you’re going to suck… But the good news is that is typical; that’s something that happens to everyone, and it’s only temporary.”
–Hadley Wickham

Back to top

4 Notes & Credits

This tutorial is designed for COMM 3710 students at the University of Utah. It is developed by integrating and modifying materials from other tutorials and books on R, especially the following:
- R Tutorial
- ModernDive
- R Module 1
- Learning statistics with R
- Andy Field’s Adventure in Statistics: R Tutorials

Back to top