Introduction to R
R looks like before tutorials vs. How you look
like after learning R
Set up R
Download R from RPubs.
Next, you should download RStudio as a text editor. Text editors make our life easier for using R.
Once Rstudio installed, launch it and you are good to go!
The frontend
- Before I walk you through the interface, some fun first. You can change the default color scheme of your RStudio interface. Go to Tools>Global Options>Appearance. Here you can change the font type and size, and color scheme. I am using the default. Here is how my interface looks like:
\(~\)
- The ordering/layout of these windows/tabs might be different on your screen. That’s ok. On my screen, please notice that the console window is in the up right corner. The window with files-plots tabs is in the down right corner. And the environment window is to the left. You can change this layout as you wish. Whatever is more convenient for you. Go to Tools>Global Options>Pane Layout. Here are my settings:
\(~\)
- What are these tabs then?
The console tab is where you can type your commands and see the output. But if you’d like to keep track of your commands, you should start an
Rscript. If you used STATA before, it’s similar to do-files. File > New File > R Script to open a new script. You will find your RScript on the Source tab on your panel. You can type and run yourRcommands here. On the Source tab, you will also access your data frames.Everything on
Ris an object. The environment tab stores the objects you saved/created during the session. The object can be a list of values, data frames, or a function. For instance, as you may see below, in my environment, there is data entitled “vdem15”, a function entitled “thewitcher” and a list of values entitled “lucky” stored.
- The files tab displays all files in your default workspace. The plots tab displays the plots/figures that you’ll create.
You are good to go! I know R might be intimidating for
some of you. But hang in there. You’ll be able to pull off cool stuff
with it. I’ve prepared this html for you on R, for
instance. Or you can design interactive plots like this (it’s called
Rosenbrock’s valley):
Some Basics
Let’s check out the basic syntax and command operators on R.
Mathematical Operations
Open your R script and type in the following commands. Then click Ctrl + Enter. With this shortcut, you can run the command on the current line or any selected lines. You will see how the Console tab runs the code and generates the output.
### You can type your notes/headings after a hashtag.
105 + 105 #you may perform arithmetic operations
365/12
5^-2
-5.02 + -4.48
(3*3) + (2/5)
### There are some built-in functions in R.
log(100) #natural log
seq(2,6) #create a sequence of numbers from 2 to 6
seq(1,12, by=2) #create a sequence of numbers from 1 to 12 that increases by 2.
seq(0,1, length=11) #a sequence from 0 to 1 with specific length/total number of elements
5:8 #an alternative notation for integer-sequence
sum(5:8) #take a sum of numbers from 5 to 8
mean(10:20) #find mean of numbers from 10 to 20
sqrt(16) #square root of a non-negative number
#You can always look through documentations of these functions to seek help and remember the notation.
?seq
Creating Objects
We store information in R sessions as an object with an assigned name. To that aim, we use the “<-” assignment operator.
### Assignments
result <- sqrt(36) + sum(4:15) / 2^4
result #Write the object name and hit Enter in the console, or Ctrl+Enter in R script. It'll print the result in the Console.## [1] 13.125
Please note that result object has just been stored as a value in your environment. If you assign different value to a stored object, it’ll be replaced. Be advised! You can assign numeric values, functions, strings of characters to an object.
Trick: you can both store and print the object at
the same time by putting the assignment in parentheses.
You can list and remove these stored objects.
### Listing and removing objects
a1 <- "number"
a2 <- 5789
ls() #lists all objects in the environment## [1] "a1" "a2" "result"
## [1] "a1" "a2"
rm(a1) #removes a1 object from the environment
rm(list=ls(pat="a")) #removes all objects that contain letter a from the environment
rm(list = ls()) #removes all objetsEach object has two intrinsic values: its class and length. e.g. An object might be logical, numeric, character, function, etc.
## [1] "character"
## [1] "numeric"
## [1] "character"
## [1] "logical"
## [1] "function"
You may use numeric objects for subsequent mathematical operations.
Vectors and Lists
We can combine/concatenate multiple elements and objects into one object - a vector.
### Concatenate
a1 <- c("Oppenheimer","is", "a", "good", "movie")
length(a1) #the length of the vector/how many elements## [1] 5
## [1] 4
## [1] 1 2 3 4 4 72 21
## [1] 7
a4 <- c(a2, a3) #You can combine vectors. Please note that logical elements are coerced into numeric.
print(a4)## [1] 1 0 0 1 1 2 3 4 4 72 21
## [1] "Oppenheimer" "is" "a" "good" "movie"
## [6] "1" "2" "3" "4" "4"
## [11] "72" "21"
You can access specific elements of vectors through indexing.
## [1] "good"
## [1] 0 0 1 1 2 3
## [1] 0 3
## [1] "Oppenheimer" "is" "a" "movie"
## [1] "Oppenheimer" "is" "good"
## [1] "Oppenheimer"
You can use specific elements of numeric vectors for mathematical
operations. Below you see the number of shootings and firearm discharges
in Toronto last year.
| Date | Cases |
|---|---|
| January | 17 |
| February | 16 |
| March | 22 |
| April | 23 |
| May | 33 |
| June | 29 |
| July | 40 |
| August | 37 |
| September | 27 |
| October | 29 |
| November | 37 |
Table 1: Number of shootings and firearm discharges in Toronto
cases <- c(17, 16, 22, 23, 33, 29, 40, 37, 27, 29, 37, NA) #Storing number of cases -- list of values
## 12 values with the last month missingVectorized arithmetic is also possible.
## [1] 40 37 37 33 29 29 27 23 22 17 16
Lists are objects that include elements of different types.
## [1] 3
## [1] "list"
## [[1]]
## [1] "Sep" "3" "2020"
## [1] "Sep" "3" "2020"
## [1] 3
Functions
Please take a moment to take a stock of built-in functions we have learned so far.
Another built-in function is names which assigns names
to elements in a vector. Let’s label the shooting statistics with their
respective dates.
###Assigning names and saving as date
to.date <- c("January 1 2023", "February 1 2023", "March 1 2023", "April 1 2023", "May 1 2023", "June 1 2023", "July 1 2023", "August 1 2023", "September 1 2023", "October 1 2023", "November1 2023", "December 1 2023")
names(cases) <- to.date
print(cases) ## it is stored as a named list of values.## January 1 2023 February 1 2023 March 1 2023 April 1 2023
## 17 16 22 23
## May 1 2023 June 1 2023 July 1 2023 August 1 2023
## 33 29 40 37
## September 1 2023 October 1 2023 November1 2023 December 1 2023
## 27 29 37 NA
constant <- rep(2, times=length(cases)) #rep is another built-in function for R.
to <- as.data.frame(cbind(cases, to.date, constant)) ## save it as a data frame
#to is a data frame with two columns. cbind means bind the columns.
rownames(to) <- NULL #don't need rownames anymore. R has two missing values: NA and NULL. In data sets, we often encounter missing data, which we represent in R with the value NA. NULL, on the other hand, represents that the value in question simply doesn’t exist.
## Warning in mean.default(to$cases): argument is not numeric or logical:
## returning NA
## [1] NA
## [1] "character"
## [1] "character"
## [1] "numeric"
to$to.date <- as.Date(to$to.date, format = "%B %d %Y")
## format option allows you to set the date format. Check ?as.Date for instructions.
class(to$to.date)## [1] "Date"
## [1] 40
## [1] 16
You may also use paste function quite often.
sentence <- c("toronto", "is", "the", "best", "city", "in the world")
sentence2 <- paste(sentence, collapse = " ")
length(sentence)## [1] 6
## [1] 1
You can also create your own functions to avoid typing the same command over and over again. User-defined functions offer us efficiency. Let’s say as part of your job, you are expected to report quarterly 1) the cumulative number of shootings and discharges for Toronto, 2) the average number of cases updated each quarter. Instead of writing the code for each quarter and run it separately, you may simply create a function to render the process more efficient!
###User-defined functions
casesq1 <- c(17, 16, 22)
casesq2 <- c(23, 33, 29)
casesq3 <- c(40, 37, 27)
police.summary <- function(x){ #creating a function with one input, x, titled police.summary
out.total <- sum(x) #cumulative cases for the quarter
out.mean <- mean(x) #average
out.final <- c(out.total, out.mean) #final output
names(out.final) <- c("Cumulative Cases", "Average Number") #labeling the output, be careful with the ordering!
return(out.final) #return function will call the output here
}
police.summary(casesq1) #Calling the function police.summary supplying casesq1 vector as an argument## Cumulative Cases Average Number
## 55.00000 18.33333
There are different ways in which you can define your arguments. e.g. You can create a function with multiple arguments/inputs. Let’s say you are asked to calculate percent change in average number of cases quarterly.
police.change <- function(w1, w2) { #two arguments defined as w1 and w2 that would be referred to in the function
out.percent <- (mean(w2) - mean(w1))*100/mean(w1)
names(out.percent) <- "Percent Change"
return(out.percent)
}
police.change(casesq1, casesq2) ## Percent Change
## 54.54545
Data
We don’t really save our data manually with vectors. More often than not, we import external files to
R. Often it’s either a .csv (Excel), .txt (Text), .dta (STATA) or RData files. RData is a collection ofRobjects (i.e.Routput).Rwill automatically display data from your working directory. Check the Files tab to see your current working directory. Ideally, you’d want to have your project files in one designated place. Do not save to / import from your Desktop - it’ll be chaotic.There are different ways in which you can assign/change your working directory. Manually, you can check your current directory with
getwd()or Session > Set Working Directory. I will show you a better way towards the end.Please download the datasets from Imai’s website for Chapter 1. Unzip and place these files in your working directory.
UNpop <- read.csv("UNpop.csv") #read.csv is a built-in function to help us import csv. files into an R object. Don't forget to assign this data to a specific object, otherwise it'll not be stored in your environment.
class(UNpop)
length(UNpop)
load("UNpop.RData") #for Rdata, we use load function. - Check your Environment tab. UNpop object is a data.frame with 2
variables and 7 observations. Data frame is an
Robject of collection of vectors. In this case, it’s as if there are two vectors/columns (therefore the length of UNpop is 2) merged into a data frame.
Let’s work on this data a bit.
## [1] 2
## [1] 7
## year world.pop
## Min. :1950 Min. :2525779
## 1st Qu.:1965 1st Qu.:3358588
## Median :1980 Median :4449049
## Mean :1980 Mean :4579529
## 3rd Qu.:1995 3rd Qu.:5724258
## Max. :2010 Max. :6916183
Please familiarize yourself with $ operator. That allows
us to access variables/columns from data frames and individual elements
from objects.
## [1] 1950 1960 1970 1980 1990 2000 2010
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2525779 3358588 4449049 4579529 5724258 6916183
## [1] 1950 1960 1970 1980 1990 2000 2010
## [1] 1950 1960 1970
Let’s create a new dataset for running some descriptive statistics. Assume that this is a dataset on schools in an imaginary province that lays out information on their status, funding, and number of teachers.
### Expand Grid
schools <- expand.grid(status=c("Public", "Private"),
funding=seq(1500, 2000, by=100),
teacher=c(seq(5,15,by=5),NA))
#expand.grid is a built-in function that creates a data frame with all possible combinations of given vectors.
head(schools)## status funding teacher
## 1 Public 1500 5
## 2 Private 1500 5
## 3 Public 1600 5
## 4 Private 1600 5
## 5 Public 1700 5
## 6 Private 1700 5
## [1] NA
mean(schools$teacher, na.rm = TRUE) #TRUE and FALSE are logical statements in R. na.rm option allows you to discard them for taking the average.## [1] 10
You can save/export your data to your working directory.
Packages
R package is a collection of coding, data, and
documentation to expand R functionalities. You can think of
them as apps we install on our phones. Our phone can make a call, send a
text, etc., but with extra apps, you can shoot a TikTok video…
In order to use these packages, we must install them first. One useful package is “tidyverse”. It is a full new language that facilitates multiple operations in R.
### Install Packages
install.packages("tidyverse") #Write this in console, not in script; you need to install this only once in your computer.
library("tidyverse") #Once installed, you must use the library command in your R script for each R session.
## It's like each time you need to use an app, you must tap on the icon, right? Library function does just that.
## You may access specific functions loaded in each package with ::
## such as
## dplyr::
Plots
ggplot(to, aes(x = to.date, y=cases)) +
geom_line() +
theme_bw() +
labs(x = "Date", y = "Number of cases")Other Syntax and Operators
An operator helps us with mathematical and logical manipulations.
- The built-in operators for arithmetic operators are
+, -, *, /, ^etc. - Relational and logical operators:
>, <, ==, <=, >=, !=.!introduces negation.!=means not equal.&, |are logical AND - OR operators. - Other operators:
:is a colon operator that implies series in a sequence.%in%denotes if an element belongs to a vector or not.
You may find some examples below.
### Operators
schools$status[!(schools$funding<1900)] #list status of schools with funding more than 1900.
#Let's calculate total number of teachers in public schools.
sum(schools$teacher[schools$status=="Public"], na.rm=TRUE) #access teachers in public schools. Notice the use of double brackets. This is subsetting with a logical operation with "=="
sum(schools$teacher[schools$status=="Public" & schools$funding>1900], na.rm=TRUE) #total number of teachers in public schools with funding more than 1900. Other Tips and Shortcuts
- If you cannot run your
Rcode and keep getting errors, it is either you forgot to add a column or comma somewhere, or it’s just you need to “turn it off and on again”. Go to Session>Restart R. But that means you must run the whole code again.
You can comment out blocks of code by selecting lines of code and use the following shortcut: Ctrl + Shift + C. For Mac users: Cmd + Shifct C. You can take it back with the same shortcut.
You can edit several lines at the same time by pressing ALT.
Get familiar with some of these RStudio shortcuts. It’ll make your life easier.
You can create a heading in the navigation button at the end of your R script tab by adding at least 4 #### or
- - - -at the end of your lines.If you cannot figure something out, just google checkmarked answers on Stack Overflow or ask ChatGPT. 80% of learning how to code is just that.
A Sample R Script
##--------------------------------------------------------------##
## Tutorial #1 ##
## Introduction to R ##
## Semuhi Sinanoglu ##
## January 2024 ##
##--------------------------------------------------------------##
pacman::p_load("tidyverse")
## Get data ------------------------------------------------------
UNpop <- read.csv("UNpop.csv")
UNpop.analysis <- UNpop #keep the original in the environment and use the new one for data manipulation
## Descriptive Statistics ----------------------------------------
summary(UNpop.analysis)Project-based Workflow
For every new research project/homework, I highly encourage you to start an
Rproject. Each project will be self-contained and easily reproducible, especially used with here package. It’ll help you to have a file system structure in the sense that all of your files for a project will be stored in a designated folder.Go to File->New Project->New Directory->New Project and then create your new project in a designated folder.
- Rproj can help you access your other files stored in the working directory in the Files tab. It also allows us to switch to recent projects. Check the right corner of your screen for a scrolldown with the Rproj icon. Once you work on your Rproj, close it. When you open it again, you’ll realize that you start where you left off!