Introduction

Here you can find lab notes and resources for Econ 415. These will be updated after our in-class lab sessions. These notes are not a substitute for attending lab but serve as an additional resource.

Much of the lab content will be drawn from R for Data Science

Useful R Resources

Getting Started with R – A collection of resources for those getting started with R

TidyR Cheatsheet – A useful cheatsheet for data cleaning and tidy data using the tidyverse functions

ggplot2 Cheatsheet – A useful cheatsheet for various ggplot geoms

Tidy Tuesday Repo – A weekly data science project in R to test your tidyverse skills!

R Setup and Workflow Basics

Before R: File Management

Effective file folder management is crucial for maintaining an organized and efficient digital workspace. Setting up organized folders will make your life significantly easier in the future!

  • Use your computer’s file management tools or RStudio’s “Files” tab.
  • Create a main folder for your project or course.
  • Go to the desired folder and create a new folder, e.g., “Lab1”
  • Save all downloaded data files and R Scripts in this folder

A Look Around R Studio

In the bottom left of R Studio, you will see the console. The console executes code. You can type code and execute it using the console but the code is not saved when you close R Studio. It is recommended that you do not use the console in your regular workflow.

To save your work, you should code in an R Script. Open a script using the button that looks like a piece of paper with a green plus sign in the top left corner of R Studio.

R scripts will open here. You can code, comment, and run the code from your script. To run the code, either click the “Run” button or by pressing CMD+Enter (Mac) or Ctrl+Enter (Windows). R scripts will be saved to the folder you are currently working in.

In the top left corner, we have the workspace/environment panes.

The workspace/enviroment tab tells you what objects are stored in R (i.e. what is loaded or stored in memory). The History tab which shows previous commands you have run.

Last, on the bottom right, we have several tabs including:

  • Files - shows the files on your computer in the directory you are working in
  • Viewer - can vew data or R objects
  • Help - shows help documentations for R functions and datasets
  • Plots - can see current and previous plots generated in your R session, save, and export them to png/pdf formats.
  • Packages - list of R packages you have installed. You can also install packages directly from this tab.

Coding Basics

R uses object-oriented programming. If you have never used this type of programming before, it can be a bit confusing at first. Essentially, R uses functions, which we apply to objects. More on this shortly, but if you aren’t sure what a function does, or how it works, you can use ? before the function to get the documentation. Ex: ?mean will bring up the help page for the mean() function. Try typing ?mean in the console and looking at the help page.

Objects

An object is an assignment between a name and a value. You assign values to names using <- or =. The first assignment symbol consists of a < next to a dash - to make it look like an arrow.

x <- 5 #assign the value of 5 to a variable called x
# notice that this x is now in your global environment
x # print x
## [1] 5
y = 10
y
## [1] 10

You can combine objects together as well which lets us do some basic math operations.

# create a new object called z that is equal to x*y
z <- x * y
#print z
z
## [1] 50

If you do not create an object, R will not save it to the global environment. If an object is not in the global environment and you try to reference it later, R will not know what you are referring to.

Math Operations

a <- 2+3
a
## [1] 5
b<-4-5
b
## [1] -1
c<-4*2
c
## [1] 8
d<-6/3
d
## [1] 2
e<-7^2
e
## [1] 49

Vectors

You can create a vector (a list) of items in R.

# create a vector of 1 through 10
vector1 <- 1:10
vector1
##  [1]  1  2  3  4  5  6  7  8  9 10

If we want specific items, we use the c() function and separate the items with a comma.

vector2 <- c(1,3,5,7,9)
vector2
## [1] 1 3 5 7 9

Mathematical operations work on vectors too!

vector2^2
## [1]  1  9 25 49 81

Classes

Objects in R have different classes. Check the class of a few objects we have already created:

class(x)
## [1] "numeric"
class(vector1)
## [1] "integer"

There are other classes too!

# create a string
my_string <- "Econ is cool!"
class(my_string)
## [1] "character"
# logical class
class(2>3)
## [1] "logical"

What happens if we have a vector of characters and numbers?

char_vector <- c(1:5, "banana", "apple")
char_vector
## [1] "1"      "2"      "3"      "4"      "5"      "banana" "apple"
#cant use mathematical operations on characters
# why?? because the entire vector is a character class!
class(char_vector)
## [1] "character"

Functions

Functions are operations that can transform your created object in a ton of different ways. We have actually already used two functions, c() and class(). Here are a few other useful ones:

#print the first few objects in vector1
head(vector1)
## [1] 1 2 3 4 5 6
#print the first 2 objects in vector1
head(vector1, 2)
## [1] 1 2
#print the last few objects in vector1
tail(vector1)
## [1]  5  6  7  8  9 10
#print last two objects in vector1
tail(vector1, 2)
## [1]  9 10
#find the mean of vector1
mean(vector1)
## [1] 5.5
#median 
median(vector1)
## [1] 5.5
#standard deviation
sd(vector1)
## [1] 3.02765
#Summary() prints summary stats
summary(vector1)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00    3.25    5.50    5.50    7.75   10.00
min(vector1)
## [1] 1
max(vector1)
## [1] 10

Code Style

Coding style is the punctuation and grammer of the coding world. Making sure that your code is formatting in a readable, standard format is helpful for yourself and others to understand the code. We will follow guidelines from the tidyverse style guide

Spaces: Put spaces on either side of mathematical operators (i.e. +, -, ==, <, …), and around the assignment operator (<-). The exception to this is the ^ symbol.

# Strive for
z <- (a + b)^2 / d

# Avoid
z<-( a + b ) ^ 2/d

Don’t put spaces inside or outside parentheses for regular function calls. Always put a space after a comma.

# Strive for
mean(x, na.rm = TRUE)
## [1] 5
# Avoid
mean (x ,na.rm=TRUE)
## [1] 5

Adding extra spaces is fine if it helps with alignment. For example:

example_data_frame <- 
  data.frame(
    variable1      = c(1:10),
    variable_name2 = c(2:11),
    var_name       = c(3:12)
  )

Naming Conventions: Object names must start with a letter and can only contain letters, numbers, _, and .. The names should be descriptive–snake case is the recommended naming convention (separating lowercase wrods with _).

really_long_variable_name <- 1

Commenting: You can comment your code with #. It is strongly recommended to leave comments in your code so that others, and future you, can keep track of your thought process.

# Good code is well-commented code!!

You can also create section comments that will be collapsable. This is incredibly helpful when you have a really long R script! Any comment line which includes at least four trailing dashes (-) will create a section.

# This is section 1 ----

# ---- This is section 2 ----

Packages

R is really useful because of its ability to use packages. Pacman is a package for “package management” - it helps us load multiple packages at onc.

# if you have not previously installed the package, include the line:
#install.packages("pacman")

# you only have to do this once. you can also install packages from the "Packages" side panel tab

We need to load the a package after installing it to use it by using library().

library(pacman)

Now we use the p_load function to load other packages we want to use. We will use the tidyverse() package throughout the course, so let’s load that one.

p_load(tidyverse)