Setup

I assume you have already installed R and RStudio on your computer.

First, create a directory for all your R work for SIR. In the future, you can save all SIR-related R script/data in this folder

Then, open RStudio, use the command setwd to enter this directory

setwd("~/mydoc/Course/sta425/R")

By default, RStudio has a two-by-two display

  • Upper left: Source (where you can edit your R script);
  • Lower left: Console (where you execute your R code and see the result)
  • Upper right: Environment (what objects in your current R session) and History (command history)
  • Lower right: Plots/Help/etc

You can minimize/resize each panel, or change the layout. For example, I usually switch the position of Plots and Environment, and also maximize the Source windows when editing scripts.

To change layout, go to Preferences and Pane Layout.

Basic Commands

Try any math expressions at the prompt, and R will evaluate it and print the result.

Note that any line beginning with # means a comment and it will not be executed by R.

2+(3*2)
exp(0)
sin(2)
1/2
sqrt(3)

3>2  ## returns False or True

Assign a value/vector/matrix to a variable.

x = c(1,3,2,5)
x
x = c(1,6,2)
x
y = c(1,4,3)
length(x)
length(y)
?length          ## Check the help page for command "length"
x+y
ls()            ## Display what objects are in your current R session
                ## You can also just click "Environment" in one of the panel
rm(x,y)         ## remove the two objects
ls()
rm(list=ls())   ## remove all objects in the current R session
?matrix
x = matrix(c(1,2,3,4,5,6), 
           nrow = 3, ncol = 2)
x
x + c(1,2)      ## How does R compute a matrix plus a vector? 
x + c(1,2,3,4)  ## error message 

matrix(1:6,3,2,byrow=TRUE)

Many math operations on vector/matrix are executed in an element-wise way.

x
sqrt(x)
x^2

Matrix multiplication.

x
t(x)
x %*% t(x)
t(x) %*% x

Access individual rows, columns, and cells of a matrix.

A=matrix(1:16,4,4)
A
A[2,3]
A[c(1,3),c(2,4)]
A[1:3,2:4]
A[1:2,]
A[,1:2]
A[1,]
A[-c(1,3),]              ## minus sign means "remove"
A[-c(1,3),-c(1,3,4)]
dim(A)

Generate Random Samples

Generate random samples

rnorm(5)  # normal random variables with mean zero and variance 1
runif(5)  # random samples from interval (0, 1)

The following two sets of samples will be different, since they are randomly sampled.

rnorm(2)
rnorm(2)

Specify seeds if you want the random samples are reproducible.

set.seed(1303); rnorm(2)
set.seed(1303); rnorm(2)

set.seed(3); rnorm(2)
set.seed(3); rnorm(2)

Generate data and compute statistics such as sample mean, sample variance and sample standard deviation.

y = rnorm(10)
y
mean(y)
sum(y) / length(y)
var(y)

sqrt(var(y))  ## same as sd(y)
sd(y)

Graphics

x = rnorm(100)
y = rnorm(100)
plot(x, y)
plot(x, y,
     xlab = "this is the x-axis",
     ylab = "this is the y-axis",
     main = "Plot of X vs Y")

Load Data

Auto = read.table("https://www.statlearning.com/s/Auto.data")
head(Auto)               ## show the first 5 rows of the data

It seems the 1st row is not data, but the column names. We need to tell read.table to use the 1st row as column names.

Auto = read.table("https://www.statlearning.com/s/Auto.data",header=T)
head(Auto)
str(Auto)               ## Show structural info of "Auto"

Auto is a dataframe, i.e., a data matrix with col names.

dim(Auto)
Auto[1:4,]
names(Auto)

We can extract a partiular column by index or by dollar sign with name that is, Auto[,1] and Auto$mpg both refer to the 1st column of Auto.

Auto[1:4,1]
Auto$mpg[1:4]

A quick 5-number summary—min, 25% quantile, median, 75% quantile, max—of each column of Auto.

summary(Auto)

Based on the summary, it seems that cylinders, acceleration, year, and origin take only a few unique values. For such variables, frequency tables provide more info than 5-number summary.

unique(Auto$cylinders)
unique(Auto$year)
unique(Auto$origin)

table(Auto$cylinders)             ## frequency tables
table(Auto$year)
table(Auto$origin)

Additional graphical and numerical summaries for this dataset. Each plot command will overwrite the previous one; so, run the following commands one-by-one.

# Attach a data matrix, so each column can be accessed by its name
attach(Auto)

plot(cylinders, mpg)
cylinders = as.factor(cylinders)
plot(cylinders, mpg)
plot(cylinders, mpg, col="red")
plot(cylinders, mpg, col="red", varwidth = T)
plot(cylinders, mpg, col="red", varwidth = T, horizontal = T)
plot(cylinders, mpg, col="red", varwidth = T, 
     xlab = "cylinders", ylab = "MPG")

## Histograms
hist(mpg)
hist(mpg,col = 2)
hist(mpg,col = 2, breaks = 15)
LS0tCnRpdGxlOiAiTGFiIDE6IEludHJvZHVjdGlvbiB0byBSIgpkYXRlOiAiRmFsbCAyMDIyIChTSVIpIgpvdXRwdXQ6CiAgaHRtbF9ub3RlYm9vazoKICAgIHRoZW1lOiByZWFkYWJsZQogICAgdG9jOiBUUlVFCiAgICB0b2NfZmxvYXQ6IFRSVUUKLS0tCgojIyBTZXR1cCAKSSBhc3N1bWUgeW91IGhhdmUgYWxyZWFkeSBpbnN0YWxsZWQgUiBhbmQgUlN0dWRpbyBvbiB5b3VyIGNvbXB1dGVyLiAKCioqRmlyc3QqKiwgY3JlYXRlIGEgZGlyZWN0b3J5IGZvciBhbGwgeW91ciBSIHdvcmsgZm9yIFNJUi4gSW4gdGhlIGZ1dHVyZSwgeW91IGNhbiBzYXZlIGFsbCBTSVItcmVsYXRlZCBSIHNjcmlwdC9kYXRhIGluIHRoaXMgZm9sZGVyCgoqKlRoZW4qKiwgb3BlbiBSU3R1ZGlvLCB1c2UgdGhlIGNvbW1hbmQgYHNldHdkYCB0byBlbnRlciB0aGlzIGRpcmVjdG9yeQpgYGB7ciwgZXZhbCA9IEZBTFNFfQpzZXR3ZCgifi9teWRvYy9Db3Vyc2Uvc3RhNDI1L1IiKQpgYGAKCkJ5IGRlZmF1bHQsIFJTdHVkaW8gaGFzIGEgdHdvLWJ5LXR3byBkaXNwbGF5CgotIFVwcGVyIGxlZnQ6ICoqU291cmNlKiogKHdoZXJlIHlvdSBjYW4gZWRpdCB5b3VyIFIgc2NyaXB0KTsgCi0gTG93ZXIgbGVmdDogKipDb25zb2xlKiogKHdoZXJlIHlvdSBleGVjdXRlIHlvdXIgUiBjb2RlIGFuZCBzZWUgdGhlIHJlc3VsdCkKLSBVcHBlciByaWdodDogKipFbnZpcm9ubWVudCoqICh3aGF0IG9iamVjdHMgaW4geW91ciBjdXJyZW50IFIgc2Vzc2lvbikgYW5kIEhpc3RvcnkgKGNvbW1hbmQgaGlzdG9yeSkKLSBMb3dlciByaWdodDogKipQbG90cyoqL0hlbHAvZXRjIAoKWW91IGNhbiBtaW5pbWl6ZS9yZXNpemUgZWFjaCBwYW5lbCwgb3IgY2hhbmdlIHRoZSBsYXlvdXQuIEZvciBleGFtcGxlLCBJIHVzdWFsbHkgc3dpdGNoIHRoZSBwb3NpdGlvbiBvZiBQbG90cyBhbmQgRW52aXJvbm1lbnQsIGFuZCBhbHNvIG1heGltaXplIHRoZSBTb3VyY2Ugd2luZG93cyB3aGVuIGVkaXRpbmcgc2NyaXB0cy4gCgpUbyBjaGFuZ2UgbGF5b3V0LCBnbyB0byAqKlByZWZlcmVuY2VzKiogYW5kICoqUGFuZSBMYXlvdXQqKi4gCgojIyBCYXNpYyBDb21tYW5kcwoKVHJ5IGFueSBtYXRoIGV4cHJlc3Npb25zIGF0IHRoZSBwcm9tcHQsIGFuZCBSIHdpbGwgZXZhbHVhdGUgaXQgYW5kIHByaW50IHRoZSByZXN1bHQuIAoKTm90ZSB0aGF0IGFueSBsaW5lIGJlZ2lubmluZyB3aXRoIGAjYCBtZWFucyBhIGNvbW1lbnQgYW5kIGl0IHdpbGwgbm90IGJlIGV4ZWN1dGVkIGJ5IFIuIAoKYGBge3IsIGV2YWwgPSBGQUxTRX0KMisoMyoyKQpleHAoMCkKc2luKDIpCjEvMgpzcXJ0KDMpCgozPjIgICMjIHJldHVybnMgRmFsc2Ugb3IgVHJ1ZQpgYGAKCkFzc2lnbiBhIHZhbHVlL3ZlY3Rvci9tYXRyaXggdG8gYSB2YXJpYWJsZS4gCgpgYGB7ciwgZXZhbCA9IEZBTFNFfQp4ID0gYygxLDMsMiw1KQp4CnggPSBjKDEsNiwyKQp4CnkgPSBjKDEsNCwzKQpgYGAKCgpgYGB7ciwgZXZhbCA9IEZBTFNFfQpsZW5ndGgoeCkKbGVuZ3RoKHkpCj9sZW5ndGggICAgICAgICAgIyMgQ2hlY2sgdGhlIGhlbHAgcGFnZSBmb3IgY29tbWFuZCAibGVuZ3RoIgp4K3kKbHMoKSAgICAgICAgICAgICMjIERpc3BsYXkgd2hhdCBvYmplY3RzIGFyZSBpbiB5b3VyIGN1cnJlbnQgUiBzZXNzaW9uCiAgICAgICAgICAgICAgICAjIyBZb3UgY2FuIGFsc28ganVzdCBjbGljayAiRW52aXJvbm1lbnQiIGluIG9uZSBvZiB0aGUgcGFuZWwKcm0oeCx5KSAgICAgICAgICMjIHJlbW92ZSB0aGUgdHdvIG9iamVjdHMKbHMoKQpybShsaXN0PWxzKCkpICAgIyMgcmVtb3ZlIGFsbCBvYmplY3RzIGluIHRoZSBjdXJyZW50IFIgc2Vzc2lvbgpgYGAKCmBgYHtyLCBldmFsID0gRkFMU0V9Cj9tYXRyaXgKeCA9IG1hdHJpeChjKDEsMiwzLDQsNSw2KSwgCiAgICAgICAgICAgbnJvdyA9IDMsIG5jb2wgPSAyKQp4CnggKyBjKDEsMikgICAgICAjIyBIb3cgZG9lcyBSIGNvbXB1dGUgYSBtYXRyaXggcGx1cyBhIHZlY3Rvcj8gCnggKyBjKDEsMiwzLDQpICAjIyBlcnJvciBtZXNzYWdlIAoKbWF0cml4KDE6NiwzLDIsYnlyb3c9VFJVRSkKYGBgCgoKTWFueSBtYXRoIG9wZXJhdGlvbnMgb24gdmVjdG9yL21hdHJpeCBhcmUgZXhlY3V0ZWQgaW4gYW4gZWxlbWVudC13aXNlIHdheS4KCmBgYHtyLCBldmFsID0gRkFMU0V9CngKc3FydCh4KQp4XjIKYGBgCgpNYXRyaXggbXVsdGlwbGljYXRpb24uCgpgYGB7ciwgZXZhbCA9IEZBTFNFfQp4CnQoeCkKeCAlKiUgdCh4KQp0KHgpICUqJSB4CmBgYAoKQWNjZXNzIGluZGl2aWR1YWwgcm93cywgY29sdW1ucywgYW5kIGNlbGxzIG9mIGEgbWF0cml4LgpgYGB7ciwgZXZhbCA9IEZBTFNFfQpBPW1hdHJpeCgxOjE2LDQsNCkKQQpBWzIsM10KQVtjKDEsMyksYygyLDQpXQpBWzE6MywyOjRdCkFbMToyLF0KQVssMToyXQpBWzEsXQpBWy1jKDEsMyksXSAgICAgICAgICAgICAgIyMgbWludXMgc2lnbiBtZWFucyAicmVtb3ZlIgpBWy1jKDEsMyksLWMoMSwzLDQpXQpkaW0oQSkKYGBgCgoKIyMgR2VuZXJhdGUgUmFuZG9tIFNhbXBsZXMKCkdlbmVyYXRlIHJhbmRvbSBzYW1wbGVzIApgYGB7ciwgZXZhbCA9IEZBTFNFfQpybm9ybSg1KSAgIyBub3JtYWwgcmFuZG9tIHZhcmlhYmxlcyB3aXRoIG1lYW4gemVybyBhbmQgdmFyaWFuY2UgMQpydW5pZig1KSAgIyByYW5kb20gc2FtcGxlcyBmcm9tIGludGVydmFsICgwLCAxKQpgYGAKClRoZSBmb2xsb3dpbmcgdHdvIHNldHMgb2Ygc2FtcGxlcyB3aWxsIGJlIGRpZmZlcmVudCwgc2luY2UgdGhleSBhcmUgcmFuZG9tbHkgc2FtcGxlZC4KYGBge3IsIGV2YWwgPSBGQUxTRX0Kcm5vcm0oMikKcm5vcm0oMikKYGBgCgpTcGVjaWZ5IHNlZWRzIGlmIHlvdSB3YW50IHRoZSByYW5kb20gc2FtcGxlcyBhcmUgcmVwcm9kdWNpYmxlLiAKCmBgYHtyLCBldmFsID0gRkFMU0V9CnNldC5zZWVkKDEzMDMpOyBybm9ybSgyKQpzZXQuc2VlZCgxMzAzKTsgcm5vcm0oMikKCnNldC5zZWVkKDMpOyBybm9ybSgyKQpzZXQuc2VlZCgzKTsgcm5vcm0oMikKYGBgCgpHZW5lcmF0ZSBkYXRhIGFuZCBjb21wdXRlIHN0YXRpc3RpY3Mgc3VjaCBhcyBzYW1wbGUgbWVhbiwgc2FtcGxlIHZhcmlhbmNlIGFuZCBzYW1wbGUgc3RhbmRhcmQgZGV2aWF0aW9uLiAKCmBgYHtyLCBldmFsID0gRkFMU0V9CnkgPSBybm9ybSgxMCkKeQptZWFuKHkpCnN1bSh5KSAvIGxlbmd0aCh5KQp2YXIoeSkKCnNxcnQodmFyKHkpKSAgIyMgc2FtZSBhcyBzZCh5KQpzZCh5KQpgYGAKCiMjIEdyYXBoaWNzCgpgYGB7ciwgZXZhbCA9IEZBTFNFfQp4ID0gcm5vcm0oMTAwKQp5ID0gcm5vcm0oMTAwKQpwbG90KHgsIHkpCnBsb3QoeCwgeSwKICAgICB4bGFiID0gInRoaXMgaXMgdGhlIHgtYXhpcyIsCiAgICAgeWxhYiA9ICJ0aGlzIGlzIHRoZSB5LWF4aXMiLAogICAgIG1haW4gPSAiUGxvdCBvZiBYIHZzIFkiKQpgYGAKCgojIyBMb2FkIERhdGEKCmBgYHtyLCBldmFsID0gRkFMU0V9CkF1dG8gPSByZWFkLnRhYmxlKCJodHRwczovL3d3dy5zdGF0bGVhcm5pbmcuY29tL3MvQXV0by5kYXRhIikKaGVhZChBdXRvKSAgICAgICAgICAgICAgICMjIHNob3cgdGhlIGZpcnN0IDUgcm93cyBvZiB0aGUgZGF0YQpgYGAKSXQgc2VlbXMgdGhlIDFzdCByb3cgaXMgbm90IGRhdGEsIGJ1dCB0aGUgY29sdW1uIG5hbWVzLiBXZSBuZWVkIHRvIHRlbGwgYHJlYWQudGFibGVgIHRvIHVzZSB0aGUgMXN0IHJvdyBhcyBjb2x1bW4gbmFtZXMuIAogCmBgYHtyLCBldmFsID0gRkFMU0V9CkF1dG8gPSByZWFkLnRhYmxlKCJodHRwczovL3d3dy5zdGF0bGVhcm5pbmcuY29tL3MvQXV0by5kYXRhIixoZWFkZXI9VCkKaGVhZChBdXRvKQpzdHIoQXV0bykgICAgICAgICAgICAgICAjIyBTaG93IHN0cnVjdHVyYWwgaW5mbyBvZiAiQXV0byIKYGBgCgpBdXRvIGlzIGEgKipkYXRhZnJhbWUqKiwgaS5lLiwgYSBkYXRhIG1hdHJpeCB3aXRoIGNvbCBuYW1lcy4gCgpgYGB7ciwgZXZhbCA9IEZBTFNFfQpkaW0oQXV0bykKQXV0b1sxOjQsXQpuYW1lcyhBdXRvKQpgYGAKCldlIGNhbiBleHRyYWN0IGEgcGFydGl1bGFyIGNvbHVtbiBieSBpbmRleCBvciBieSBkb2xsYXIgc2lnbiB3aXRoIG5hbWUgdGhhdCBpcywgYEF1dG9bLDFdYCBhbmQgYEF1dG8kbXBnYCBib3RoIHJlZmVyIHRvIHRoZSAxc3QgY29sdW1uIG9mIGBBdXRvYC4KCmBgYHtyLCBldmFsID0gRkFMU0V9CkF1dG9bMTo0LDFdCkF1dG8kbXBnWzE6NF0KYGBgCgpBIHF1aWNrIDUtbnVtYmVyIHN1bW1hcnktLS1taW4sIDI1JSBxdWFudGlsZSwgbWVkaWFuLCA3NSUgcXVhbnRpbGUsIG1heC0tLW9mIGVhY2ggY29sdW1uIG9mIGBBdXRvYC4KYGBge3IsIGV2YWwgPSBGQUxTRX0Kc3VtbWFyeShBdXRvKQpgYGAKCkJhc2VkIG9uIHRoZSBzdW1tYXJ5LCBpdCBzZWVtcyB0aGF0IGBjeWxpbmRlcnNgLCBgYWNjZWxlcmF0aW9uYCwgYHllYXJgLCAgYW5kIGBvcmlnaW5gIHRha2Ugb25seSBhIGZldyB1bmlxdWUgdmFsdWVzLiBGb3Igc3VjaCB2YXJpYWJsZXMsIGZyZXF1ZW5jeSB0YWJsZXMgcHJvdmlkZSBtb3JlIGluZm8gdGhhbiA1LW51bWJlciBzdW1tYXJ5LiAKCgpgYGB7ciwgZXZhbCA9IEZBTFNFfQp1bmlxdWUoQXV0byRjeWxpbmRlcnMpCnVuaXF1ZShBdXRvJHllYXIpCnVuaXF1ZShBdXRvJG9yaWdpbikKCnRhYmxlKEF1dG8kY3lsaW5kZXJzKSAgICAgICAgICAgICAjIyBmcmVxdWVuY3kgdGFibGVzCnRhYmxlKEF1dG8keWVhcikKdGFibGUoQXV0byRvcmlnaW4pCmBgYAoKQWRkaXRpb25hbCBncmFwaGljYWwgYW5kIG51bWVyaWNhbCBzdW1tYXJpZXMgZm9yIHRoaXMgZGF0YXNldC4gRWFjaCBwbG90IGNvbW1hbmQgd2lsbCBvdmVyd3JpdGUgdGhlIHByZXZpb3VzIG9uZTsgc28sIHJ1biB0aGUgZm9sbG93aW5nIGNvbW1hbmRzIG9uZS1ieS1vbmUuIAoKCmBgYHtyLCBldmFsID0gRkFMU0V9CiMgQXR0YWNoIGEgZGF0YSBtYXRyaXgsIHNvIGVhY2ggY29sdW1uIGNhbiBiZSBhY2Nlc3NlZCBieSBpdHMgbmFtZQphdHRhY2goQXV0bykKCnBsb3QoY3lsaW5kZXJzLCBtcGcpCmN5bGluZGVycyA9IGFzLmZhY3RvcihjeWxpbmRlcnMpCnBsb3QoY3lsaW5kZXJzLCBtcGcpCnBsb3QoY3lsaW5kZXJzLCBtcGcsIGNvbD0icmVkIikKcGxvdChjeWxpbmRlcnMsIG1wZywgY29sPSJyZWQiLCB2YXJ3aWR0aCA9IFQpCnBsb3QoY3lsaW5kZXJzLCBtcGcsIGNvbD0icmVkIiwgdmFyd2lkdGggPSBULCBob3Jpem9udGFsID0gVCkKcGxvdChjeWxpbmRlcnMsIG1wZywgY29sPSJyZWQiLCB2YXJ3aWR0aCA9IFQsIAogICAgIHhsYWIgPSAiY3lsaW5kZXJzIiwgeWxhYiA9ICJNUEciKQoKIyMgSGlzdG9ncmFtcwpoaXN0KG1wZykKaGlzdChtcGcsY29sID0gMikKaGlzdChtcGcsY29sID0gMiwgYnJlYWtzID0gMTUpCmBgYAoKCgogCgo=