Here you can find lab notes and resources for Econ 415. These will be updated after our in-class lab sessions. These notes are not a substitute for attending lab but serve as an additional resource.
Much of the lab content will be drawn from R for Data Science
Getting Started with R – A collection of resources for those getting started with R
TidyR Cheatsheet – A useful cheatsheet for data cleaning and tidy data using the tidyverse functions
ggplot2 Cheatsheet – A useful cheatsheet for various ggplot geoms
Tidy Tuesday Repo – A weekly data science project in R to test your tidyverse skills!
Effective file folder management is crucial for maintaining an organized and efficient digital workspace. Setting up organized folders will make your life significantly easier in the future!
In the bottom left of R Studio, you will see the console. The console executes code. You can type code and execute it using the console but the code is not saved when you close R Studio. It is recommended that you do not use the console in your regular workflow.
To save your work, you should code in an R Script. Open a script using the button that looks like a piece of paper with a green plus sign in the top left corner of R Studio.
R scripts will open here. You can code, comment, and run the code from your script. To run the code, either click the “Run” button or by pressing CMD+Enter (Mac) or Ctrl+Enter (Windows). R scripts will be saved to the folder you are currently working in.
In the top left corner, we have the workspace/environment panes.
The workspace/enviroment tab tells you what objects are stored in R (i.e. what is loaded or stored in memory). The History tab which shows previous commands you have run.
Last, on the bottom right, we have several tabs including:
R uses object-oriented programming. If you have never used this type
of programming before, it can be a bit confusing at first. Essentially,
R uses functions, which we apply to objects. More on
this shortly, but if you aren’t sure what a function does, or how it
works, you can use ? before the function to get the documentation. Ex:
?mean will bring up the help page for the
mean() function. Try typing ?mean in the console and
looking at the help page.
An object is an assignment between a name and a value. You assign
values to names using <- or =. The first
assignment symbol consists of a < next to a dash
- to make it look like an arrow.
x <- 5 #assign the value of 5 to a variable called x
# notice that this x is now in your global environment
x # print x
## [1] 5
y = 10
y
## [1] 10
You can combine objects together as well which lets us do some basic math operations.
# create a new object called z that is equal to x*y
z <- x * y
#print z
z
## [1] 50
If you do not create an object, R will not save it to the global environment. If an object is not in the global environment and you try to reference it later, R will not know what you are referring to.
a <- 2+3
a
## [1] 5
b<-4-5
b
## [1] -1
c<-4*2
c
## [1] 8
d<-6/3
d
## [1] 2
e<-7^2
e
## [1] 49
You can create a vector (a list) of items in R.
# create a vector of 1 through 10
vector1 <- 1:10
vector1
## [1] 1 2 3 4 5 6 7 8 9 10
If we want specific items, we use the c() function and
separate the items with a comma.
vector2 <- c(1,3,5,7,9)
vector2
## [1] 1 3 5 7 9
Mathematical operations work on vectors too!
vector2^2
## [1] 1 9 25 49 81
Objects in R have different classes. Check the class of a few objects we have already created:
class(x)
## [1] "numeric"
class(vector1)
## [1] "integer"
There are other classes too!
# create a string
my_string <- "Econ is cool!"
class(my_string)
## [1] "character"
# logical class
class(2>3)
## [1] "logical"
What happens if we have a vector of characters and numbers?
char_vector <- c(1:5, "banana", "apple")
char_vector
## [1] "1" "2" "3" "4" "5" "banana" "apple"
#cant use mathematical operations on characters
# why?? because the entire vector is a character class!
class(char_vector)
## [1] "character"
Functions are operations that can transform your created
object in a ton of different ways. We have actually
already used two functions, c() and class().
Here are a few other useful ones:
#print the first few objects in vector1
head(vector1)
## [1] 1 2 3 4 5 6
#print the first 2 objects in vector1
head(vector1, 2)
## [1] 1 2
#print the last few objects in vector1
tail(vector1)
## [1] 5 6 7 8 9 10
#print last two objects in vector1
tail(vector1, 2)
## [1] 9 10
#find the mean of vector1
mean(vector1)
## [1] 5.5
#median
median(vector1)
## [1] 5.5
#standard deviation
sd(vector1)
## [1] 3.02765
#Summary() prints summary stats
summary(vector1)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 3.25 5.50 5.50 7.75 10.00
min(vector1)
## [1] 1
max(vector1)
## [1] 10
Coding style is the punctuation and grammer of the coding world. Making sure that your code is formatting in a readable, standard format is helpful for yourself and others to understand the code. We will follow guidelines from the tidyverse style guide
Spaces: Put spaces on either side of mathematical operators (i.e. +, -, ==, <, …), and around the assignment operator (<-). The exception to this is the ^ symbol.
# Strive for
z <- (a + b)^2 / d
# Avoid
z<-( a + b ) ^ 2/d
Don’t put spaces inside or outside parentheses for regular function calls. Always put a space after a comma.
# Strive for
mean(x, na.rm = TRUE)
## [1] 5
# Avoid
mean (x ,na.rm=TRUE)
## [1] 5
Adding extra spaces is fine if it helps with alignment. For example:
example_data_frame <-
data.frame(
variable1 = c(1:10),
variable_name2 = c(2:11),
var_name = c(3:12)
)
Naming Conventions: Object names must start with a
letter and can only contain letters, numbers, _, and
.. The names should be descriptive–snake case is the
recommended naming convention (separating lowercase wrods with
_).
really_long_variable_name <- 1
Commenting: You can comment your code with
#. It is strongly recommended to leave comments in
your code so that others, and future you, can keep track of your thought
process.
# Good code is well-commented code!!
You can also create section comments that will be collapsable. This is incredibly helpful when you have a really long R script! Any comment line which includes at least four trailing dashes (-) will create a section.
# This is section 1 ----
# ---- This is section 2 ----
R is really useful because of its ability to use packages. Pacman is a package for “package management” - it helps us load multiple packages at onc.
# if you have not previously installed the package, include the line:
#install.packages("pacman")
# you only have to do this once. you can also install packages from the "Packages" side panel tab
We need to load the a package after installing it to use it by using
library().
library(pacman)
Now we use the p_load function to load other packages we want to use.
We will use the tidyverse() package throughout the course,
so let’s load that one.
p_load(tidyverse)