Setup
I assume you have already installed R and RStudio on your
computer.
First, create a directory for all your R work for
SIR. In the future, you can save all SIR-related R script/data in this
folder
Then, open RStudio, use the command
setwd to enter this directory
setwd("~/mydoc/Course/sta425/R")
By default, RStudio has a two-by-two display
- Upper left: Source (where you can edit your R
script);
- Lower left: Console (where you execute your R code
and see the result)
- Upper right: Environment (what objects in your
current R session) and History (command history)
- Lower right: Plots/Help/etc
You can minimize/resize each panel, or change the layout. For
example, I usually switch the position of Plots and Environment, and
also maximize the Source windows when editing scripts.
To change layout, go to Preferences and Pane
Layout.
Basic Commands
Try any math expressions at the prompt, and R will evaluate it and
print the result.
Note that any line beginning with # means a comment and
it will not be executed by R.
2+(3*2)
exp(0)
sin(2)
1/2
sqrt(3)
3>2 ## returns False or True
Assign a value/vector/matrix to a variable.
x = c(1,3,2,5)
x
x = c(1,6,2)
x
y = c(1,4,3)
length(x)
length(y)
?length ## Check the help page for command "length"
x+y
ls() ## Display what objects are in your current R session
## You can also just click "Environment" in one of the panel
rm(x,y) ## remove the two objects
ls()
rm(list=ls()) ## remove all objects in the current R session
?matrix
x = matrix(c(1,2,3,4,5,6),
nrow = 3, ncol = 2)
x
x + c(1,2) ## How does R compute a matrix plus a vector?
x + c(1,2,3,4) ## error message
matrix(1:6,3,2,byrow=TRUE)
Many math operations on vector/matrix are executed in an element-wise
way.
x
sqrt(x)
x^2
Matrix multiplication.
x
t(x)
x %*% t(x)
t(x) %*% x
Access individual rows, columns, and cells of a matrix.
A=matrix(1:16,4,4)
A
A[2,3]
A[c(1,3),c(2,4)]
A[1:3,2:4]
A[1:2,]
A[,1:2]
A[1,]
A[-c(1,3),] ## minus sign means "remove"
A[-c(1,3),-c(1,3,4)]
dim(A)
Generate Random Samples
Generate random samples
rnorm(5) # normal random variables with mean zero and variance 1
runif(5) # random samples from interval (0, 1)
The following two sets of samples will be different, since they are
randomly sampled.
rnorm(2)
rnorm(2)
Specify seeds if you want the random samples are reproducible.
set.seed(1303); rnorm(2)
set.seed(1303); rnorm(2)
set.seed(3); rnorm(2)
set.seed(3); rnorm(2)
Generate data and compute statistics such as sample mean, sample
variance and sample standard deviation.
y = rnorm(10)
y
mean(y)
sum(y) / length(y)
var(y)
sqrt(var(y)) ## same as sd(y)
sd(y)
Graphics
x = rnorm(100)
y = rnorm(100)
plot(x, y)
plot(x, y,
xlab = "this is the x-axis",
ylab = "this is the y-axis",
main = "Plot of X vs Y")
Load Data
Auto = read.table("https://www.statlearning.com/s/Auto.data")
head(Auto) ## show the first 5 rows of the data
It seems the 1st row is not data, but the column names. We need to
tell read.table to use the 1st row as column names.
Auto = read.table("https://www.statlearning.com/s/Auto.data",header=T)
head(Auto)
str(Auto) ## Show structural info of "Auto"
Auto is a dataframe, i.e., a data matrix with col
names.
dim(Auto)
Auto[1:4,]
names(Auto)
We can extract a partiular column by index or by dollar sign with
name that is, Auto[,1] and Auto$mpg both refer
to the 1st column of Auto.
Auto[1:4,1]
Auto$mpg[1:4]
A quick 5-number summary—min, 25% quantile, median, 75% quantile,
max—of each column of Auto.
summary(Auto)
Based on the summary, it seems that cylinders,
acceleration, year, and origin
take only a few unique values. For such variables, frequency tables
provide more info than 5-number summary.
unique(Auto$cylinders)
unique(Auto$year)
unique(Auto$origin)
table(Auto$cylinders) ## frequency tables
table(Auto$year)
table(Auto$origin)
Additional graphical and numerical summaries for this dataset. Each
plot command will overwrite the previous one; so, run the following
commands one-by-one.
# Attach a data matrix, so each column can be accessed by its name
attach(Auto)
plot(cylinders, mpg)
cylinders = as.factor(cylinders)
plot(cylinders, mpg)
plot(cylinders, mpg, col="red")
plot(cylinders, mpg, col="red", varwidth = T)
plot(cylinders, mpg, col="red", varwidth = T, horizontal = T)
plot(cylinders, mpg, col="red", varwidth = T,
xlab = "cylinders", ylab = "MPG")
## Histograms
hist(mpg)
hist(mpg,col = 2)
hist(mpg,col = 2, breaks = 15)
LS0tCnRpdGxlOiAiTGFiIDE6IEludHJvZHVjdGlvbiB0byBSIgpkYXRlOiAiRmFsbCAyMDIyIChTSVIpIgpvdXRwdXQ6CiAgaHRtbF9ub3RlYm9vazoKICAgIHRoZW1lOiByZWFkYWJsZQogICAgdG9jOiBUUlVFCiAgICB0b2NfZmxvYXQ6IFRSVUUKLS0tCgojIyBTZXR1cCAKSSBhc3N1bWUgeW91IGhhdmUgYWxyZWFkeSBpbnN0YWxsZWQgUiBhbmQgUlN0dWRpbyBvbiB5b3VyIGNvbXB1dGVyLiAKCioqRmlyc3QqKiwgY3JlYXRlIGEgZGlyZWN0b3J5IGZvciBhbGwgeW91ciBSIHdvcmsgZm9yIFNJUi4gSW4gdGhlIGZ1dHVyZSwgeW91IGNhbiBzYXZlIGFsbCBTSVItcmVsYXRlZCBSIHNjcmlwdC9kYXRhIGluIHRoaXMgZm9sZGVyCgoqKlRoZW4qKiwgb3BlbiBSU3R1ZGlvLCB1c2UgdGhlIGNvbW1hbmQgYHNldHdkYCB0byBlbnRlciB0aGlzIGRpcmVjdG9yeQpgYGB7ciwgZXZhbCA9IEZBTFNFfQpzZXR3ZCgifi9teWRvYy9Db3Vyc2Uvc3RhNDI1L1IiKQpgYGAKCkJ5IGRlZmF1bHQsIFJTdHVkaW8gaGFzIGEgdHdvLWJ5LXR3byBkaXNwbGF5CgotIFVwcGVyIGxlZnQ6ICoqU291cmNlKiogKHdoZXJlIHlvdSBjYW4gZWRpdCB5b3VyIFIgc2NyaXB0KTsgCi0gTG93ZXIgbGVmdDogKipDb25zb2xlKiogKHdoZXJlIHlvdSBleGVjdXRlIHlvdXIgUiBjb2RlIGFuZCBzZWUgdGhlIHJlc3VsdCkKLSBVcHBlciByaWdodDogKipFbnZpcm9ubWVudCoqICh3aGF0IG9iamVjdHMgaW4geW91ciBjdXJyZW50IFIgc2Vzc2lvbikgYW5kIEhpc3RvcnkgKGNvbW1hbmQgaGlzdG9yeSkKLSBMb3dlciByaWdodDogKipQbG90cyoqL0hlbHAvZXRjIAoKWW91IGNhbiBtaW5pbWl6ZS9yZXNpemUgZWFjaCBwYW5lbCwgb3IgY2hhbmdlIHRoZSBsYXlvdXQuIEZvciBleGFtcGxlLCBJIHVzdWFsbHkgc3dpdGNoIHRoZSBwb3NpdGlvbiBvZiBQbG90cyBhbmQgRW52aXJvbm1lbnQsIGFuZCBhbHNvIG1heGltaXplIHRoZSBTb3VyY2Ugd2luZG93cyB3aGVuIGVkaXRpbmcgc2NyaXB0cy4gCgpUbyBjaGFuZ2UgbGF5b3V0LCBnbyB0byAqKlByZWZlcmVuY2VzKiogYW5kICoqUGFuZSBMYXlvdXQqKi4gCgojIyBCYXNpYyBDb21tYW5kcwoKVHJ5IGFueSBtYXRoIGV4cHJlc3Npb25zIGF0IHRoZSBwcm9tcHQsIGFuZCBSIHdpbGwgZXZhbHVhdGUgaXQgYW5kIHByaW50IHRoZSByZXN1bHQuIAoKTm90ZSB0aGF0IGFueSBsaW5lIGJlZ2lubmluZyB3aXRoIGAjYCBtZWFucyBhIGNvbW1lbnQgYW5kIGl0IHdpbGwgbm90IGJlIGV4ZWN1dGVkIGJ5IFIuIAoKYGBge3IsIGV2YWwgPSBGQUxTRX0KMisoMyoyKQpleHAoMCkKc2luKDIpCjEvMgpzcXJ0KDMpCgozPjIgICMjIHJldHVybnMgRmFsc2Ugb3IgVHJ1ZQpgYGAKCkFzc2lnbiBhIHZhbHVlL3ZlY3Rvci9tYXRyaXggdG8gYSB2YXJpYWJsZS4gCgpgYGB7ciwgZXZhbCA9IEZBTFNFfQp4ID0gYygxLDMsMiw1KQp4CnggPSBjKDEsNiwyKQp4CnkgPSBjKDEsNCwzKQpgYGAKCgpgYGB7ciwgZXZhbCA9IEZBTFNFfQpsZW5ndGgoeCkKbGVuZ3RoKHkpCj9sZW5ndGggICAgICAgICAgIyMgQ2hlY2sgdGhlIGhlbHAgcGFnZSBmb3IgY29tbWFuZCAibGVuZ3RoIgp4K3kKbHMoKSAgICAgICAgICAgICMjIERpc3BsYXkgd2hhdCBvYmplY3RzIGFyZSBpbiB5b3VyIGN1cnJlbnQgUiBzZXNzaW9uCiAgICAgICAgICAgICAgICAjIyBZb3UgY2FuIGFsc28ganVzdCBjbGljayAiRW52aXJvbm1lbnQiIGluIG9uZSBvZiB0aGUgcGFuZWwKcm0oeCx5KSAgICAgICAgICMjIHJlbW92ZSB0aGUgdHdvIG9iamVjdHMKbHMoKQpybShsaXN0PWxzKCkpICAgIyMgcmVtb3ZlIGFsbCBvYmplY3RzIGluIHRoZSBjdXJyZW50IFIgc2Vzc2lvbgpgYGAKCmBgYHtyLCBldmFsID0gRkFMU0V9Cj9tYXRyaXgKeCA9IG1hdHJpeChjKDEsMiwzLDQsNSw2KSwgCiAgICAgICAgICAgbnJvdyA9IDMsIG5jb2wgPSAyKQp4CnggKyBjKDEsMikgICAgICAjIyBIb3cgZG9lcyBSIGNvbXB1dGUgYSBtYXRyaXggcGx1cyBhIHZlY3Rvcj8gCnggKyBjKDEsMiwzLDQpICAjIyBlcnJvciBtZXNzYWdlIAoKbWF0cml4KDE6NiwzLDIsYnlyb3c9VFJVRSkKYGBgCgoKTWFueSBtYXRoIG9wZXJhdGlvbnMgb24gdmVjdG9yL21hdHJpeCBhcmUgZXhlY3V0ZWQgaW4gYW4gZWxlbWVudC13aXNlIHdheS4KCmBgYHtyLCBldmFsID0gRkFMU0V9CngKc3FydCh4KQp4XjIKYGBgCgpNYXRyaXggbXVsdGlwbGljYXRpb24uCgpgYGB7ciwgZXZhbCA9IEZBTFNFfQp4CnQoeCkKeCAlKiUgdCh4KQp0KHgpICUqJSB4CmBgYAoKQWNjZXNzIGluZGl2aWR1YWwgcm93cywgY29sdW1ucywgYW5kIGNlbGxzIG9mIGEgbWF0cml4LgpgYGB7ciwgZXZhbCA9IEZBTFNFfQpBPW1hdHJpeCgxOjE2LDQsNCkKQQpBWzIsM10KQVtjKDEsMyksYygyLDQpXQpBWzE6MywyOjRdCkFbMToyLF0KQVssMToyXQpBWzEsXQpBWy1jKDEsMyksXSAgICAgICAgICAgICAgIyMgbWludXMgc2lnbiBtZWFucyAicmVtb3ZlIgpBWy1jKDEsMyksLWMoMSwzLDQpXQpkaW0oQSkKYGBgCgoKIyMgR2VuZXJhdGUgUmFuZG9tIFNhbXBsZXMKCkdlbmVyYXRlIHJhbmRvbSBzYW1wbGVzIApgYGB7ciwgZXZhbCA9IEZBTFNFfQpybm9ybSg1KSAgIyBub3JtYWwgcmFuZG9tIHZhcmlhYmxlcyB3aXRoIG1lYW4gemVybyBhbmQgdmFyaWFuY2UgMQpydW5pZig1KSAgIyByYW5kb20gc2FtcGxlcyBmcm9tIGludGVydmFsICgwLCAxKQpgYGAKClRoZSBmb2xsb3dpbmcgdHdvIHNldHMgb2Ygc2FtcGxlcyB3aWxsIGJlIGRpZmZlcmVudCwgc2luY2UgdGhleSBhcmUgcmFuZG9tbHkgc2FtcGxlZC4KYGBge3IsIGV2YWwgPSBGQUxTRX0Kcm5vcm0oMikKcm5vcm0oMikKYGBgCgpTcGVjaWZ5IHNlZWRzIGlmIHlvdSB3YW50IHRoZSByYW5kb20gc2FtcGxlcyBhcmUgcmVwcm9kdWNpYmxlLiAKCmBgYHtyLCBldmFsID0gRkFMU0V9CnNldC5zZWVkKDEzMDMpOyBybm9ybSgyKQpzZXQuc2VlZCgxMzAzKTsgcm5vcm0oMikKCnNldC5zZWVkKDMpOyBybm9ybSgyKQpzZXQuc2VlZCgzKTsgcm5vcm0oMikKYGBgCgpHZW5lcmF0ZSBkYXRhIGFuZCBjb21wdXRlIHN0YXRpc3RpY3Mgc3VjaCBhcyBzYW1wbGUgbWVhbiwgc2FtcGxlIHZhcmlhbmNlIGFuZCBzYW1wbGUgc3RhbmRhcmQgZGV2aWF0aW9uLiAKCmBgYHtyLCBldmFsID0gRkFMU0V9CnkgPSBybm9ybSgxMCkKeQptZWFuKHkpCnN1bSh5KSAvIGxlbmd0aCh5KQp2YXIoeSkKCnNxcnQodmFyKHkpKSAgIyMgc2FtZSBhcyBzZCh5KQpzZCh5KQpgYGAKCiMjIEdyYXBoaWNzCgpgYGB7ciwgZXZhbCA9IEZBTFNFfQp4ID0gcm5vcm0oMTAwKQp5ID0gcm5vcm0oMTAwKQpwbG90KHgsIHkpCnBsb3QoeCwgeSwKICAgICB4bGFiID0gInRoaXMgaXMgdGhlIHgtYXhpcyIsCiAgICAgeWxhYiA9ICJ0aGlzIGlzIHRoZSB5LWF4aXMiLAogICAgIG1haW4gPSAiUGxvdCBvZiBYIHZzIFkiKQpgYGAKCgojIyBMb2FkIERhdGEKCmBgYHtyLCBldmFsID0gRkFMU0V9CkF1dG8gPSByZWFkLnRhYmxlKCJodHRwczovL3d3dy5zdGF0bGVhcm5pbmcuY29tL3MvQXV0by5kYXRhIikKaGVhZChBdXRvKSAgICAgICAgICAgICAgICMjIHNob3cgdGhlIGZpcnN0IDUgcm93cyBvZiB0aGUgZGF0YQpgYGAKSXQgc2VlbXMgdGhlIDFzdCByb3cgaXMgbm90IGRhdGEsIGJ1dCB0aGUgY29sdW1uIG5hbWVzLiBXZSBuZWVkIHRvIHRlbGwgYHJlYWQudGFibGVgIHRvIHVzZSB0aGUgMXN0IHJvdyBhcyBjb2x1bW4gbmFtZXMuIAogCmBgYHtyLCBldmFsID0gRkFMU0V9CkF1dG8gPSByZWFkLnRhYmxlKCJodHRwczovL3d3dy5zdGF0bGVhcm5pbmcuY29tL3MvQXV0by5kYXRhIixoZWFkZXI9VCkKaGVhZChBdXRvKQpzdHIoQXV0bykgICAgICAgICAgICAgICAjIyBTaG93IHN0cnVjdHVyYWwgaW5mbyBvZiAiQXV0byIKYGBgCgpBdXRvIGlzIGEgKipkYXRhZnJhbWUqKiwgaS5lLiwgYSBkYXRhIG1hdHJpeCB3aXRoIGNvbCBuYW1lcy4gCgpgYGB7ciwgZXZhbCA9IEZBTFNFfQpkaW0oQXV0bykKQXV0b1sxOjQsXQpuYW1lcyhBdXRvKQpgYGAKCldlIGNhbiBleHRyYWN0IGEgcGFydGl1bGFyIGNvbHVtbiBieSBpbmRleCBvciBieSBkb2xsYXIgc2lnbiB3aXRoIG5hbWUgdGhhdCBpcywgYEF1dG9bLDFdYCBhbmQgYEF1dG8kbXBnYCBib3RoIHJlZmVyIHRvIHRoZSAxc3QgY29sdW1uIG9mIGBBdXRvYC4KCmBgYHtyLCBldmFsID0gRkFMU0V9CkF1dG9bMTo0LDFdCkF1dG8kbXBnWzE6NF0KYGBgCgpBIHF1aWNrIDUtbnVtYmVyIHN1bW1hcnktLS1taW4sIDI1JSBxdWFudGlsZSwgbWVkaWFuLCA3NSUgcXVhbnRpbGUsIG1heC0tLW9mIGVhY2ggY29sdW1uIG9mIGBBdXRvYC4KYGBge3IsIGV2YWwgPSBGQUxTRX0Kc3VtbWFyeShBdXRvKQpgYGAKCkJhc2VkIG9uIHRoZSBzdW1tYXJ5LCBpdCBzZWVtcyB0aGF0IGBjeWxpbmRlcnNgLCBgYWNjZWxlcmF0aW9uYCwgYHllYXJgLCAgYW5kIGBvcmlnaW5gIHRha2Ugb25seSBhIGZldyB1bmlxdWUgdmFsdWVzLiBGb3Igc3VjaCB2YXJpYWJsZXMsIGZyZXF1ZW5jeSB0YWJsZXMgcHJvdmlkZSBtb3JlIGluZm8gdGhhbiA1LW51bWJlciBzdW1tYXJ5LiAKCgpgYGB7ciwgZXZhbCA9IEZBTFNFfQp1bmlxdWUoQXV0byRjeWxpbmRlcnMpCnVuaXF1ZShBdXRvJHllYXIpCnVuaXF1ZShBdXRvJG9yaWdpbikKCnRhYmxlKEF1dG8kY3lsaW5kZXJzKSAgICAgICAgICAgICAjIyBmcmVxdWVuY3kgdGFibGVzCnRhYmxlKEF1dG8keWVhcikKdGFibGUoQXV0byRvcmlnaW4pCmBgYAoKQWRkaXRpb25hbCBncmFwaGljYWwgYW5kIG51bWVyaWNhbCBzdW1tYXJpZXMgZm9yIHRoaXMgZGF0YXNldC4gRWFjaCBwbG90IGNvbW1hbmQgd2lsbCBvdmVyd3JpdGUgdGhlIHByZXZpb3VzIG9uZTsgc28sIHJ1biB0aGUgZm9sbG93aW5nIGNvbW1hbmRzIG9uZS1ieS1vbmUuIAoKCmBgYHtyLCBldmFsID0gRkFMU0V9CiMgQXR0YWNoIGEgZGF0YSBtYXRyaXgsIHNvIGVhY2ggY29sdW1uIGNhbiBiZSBhY2Nlc3NlZCBieSBpdHMgbmFtZQphdHRhY2goQXV0bykKCnBsb3QoY3lsaW5kZXJzLCBtcGcpCmN5bGluZGVycyA9IGFzLmZhY3RvcihjeWxpbmRlcnMpCnBsb3QoY3lsaW5kZXJzLCBtcGcpCnBsb3QoY3lsaW5kZXJzLCBtcGcsIGNvbD0icmVkIikKcGxvdChjeWxpbmRlcnMsIG1wZywgY29sPSJyZWQiLCB2YXJ3aWR0aCA9IFQpCnBsb3QoY3lsaW5kZXJzLCBtcGcsIGNvbD0icmVkIiwgdmFyd2lkdGggPSBULCBob3Jpem9udGFsID0gVCkKcGxvdChjeWxpbmRlcnMsIG1wZywgY29sPSJyZWQiLCB2YXJ3aWR0aCA9IFQsIAogICAgIHhsYWIgPSAiY3lsaW5kZXJzIiwgeWxhYiA9ICJNUEciKQoKIyMgSGlzdG9ncmFtcwpoaXN0KG1wZykKaGlzdChtcGcsY29sID0gMikKaGlzdChtcGcsY29sID0gMiwgYnJlYWtzID0gMTUpCmBgYAoKCgogCgo=