Katia Oleinik
koleinik@bu.edu
SCV examples: http://scv.bu.edu/examples/r/
Examples of tasks replicated in SAS and R: http://sas-and-r.blogspot.com/
Many examples of statistical analysis using R: http://www.ats.ucla.edu/stat/r/
R for SAS, Stata and SPSS users: http://scv.bu.edu/examples/r/tutorials/
Login: tuta30
Password: VizTut30
ssh -X scc4.bu.edu
cp -r /scratch/r-intro-2 .
Try executing the following commands in your Console Window:
> 2+3 # addition
[1] 5
> 2^3 # power
[1] 8
> log(2) # built-in functions
[1] 0.6931
By default R outputs 7 significant digits (single precision display). But all calculations are always done using double precision
> options(digits=15) # change to double precision display
> exp(3)
[1] 20.0855369231877
Return back to the single precision output
> options(digits=7)
> exp(3)
[1] 20.08554
> a <- 3
> A <- 7 # R is case sensetive
>
> b = -5
Both assignment operators are equivalent, the first one is more “traditional”.
Rules:
Let's create a few variables:
> str.var <- "character variable"
> num.var <- 21.17
> bool.var <- TRUE
> comp.var <- 1-3i
To view the variable's value, either type in the variable name or use print() function:
> num.var
[1] 21.17
> print(str.var)
[1] "character variable"
Check the mode of the variable (its type):
> mode(bool.var)
[1] "logical"
> mode(num.var)
[1] "numeric"
> mode(str.var)
[1] "character"
R has a wide variety of data types including:
Vector - an array of R objects of the same type.
To create a vector use function c() - concatinate:
> ( names <- c ("Alex", "Nick", "Mike") )
[1] "Alex" "Nick" "Mike"
> ( numbers <- c (21, -3, 7.25) )
[1] 21.00 -3.00 7.25
Note: If the R command is enclosed in parentheses, then after the command is executed the result is also printed to the screen.
Vectors can be defined in a number of ways:
> ( vals <- c (2, -7, 5, 3, -1 ) )
[1] 2 -7 5 3 -1
> ( vals <- 1:7 )
[1] 1 2 3 4 5 6 7
> ( vals <- seq(0, 3, by=0.5) )
[1] 0.0 0.5 1.0 1.5 2.0 2.5 3.0
> ( vals <- rep("o", 7) )
[1] "o" "o" "o" "o" "o" "o" "o"
Vectors can be defined in a number of ways:
> ( vals <- numeric(9) )
[1] 0 0 0 0 0 0 0 0 0
> ( vals <- rnorm(5,2,1.5 ) )
[1] 1.242416 5.075357 2.803972 2.021579 3.121984
> ( vals <- rpois(5,4 ) )
[1] 3 1 5 1 1
Vector elements can have labels:
> heights<-c(Alex=180, Bob=175, Clara=165, Don=185)
> heights
Alex Bob Clara Don
180 175 165 185
Do not use loops to perform operations on vectors. R operates on vectors!
> a <- 1:5
> b <- seq(2,10, by=2)
>
> a+b
[1] 3 6 9 12 15
> b/a
[1] 2 2 2 2 2
You can access particular elements in the vector in the following manner:
> # define a numeric vector
> x <- c(734, 145, 958, 456, 924)
>
>
> x[2] # returns second element
[1] 145
> x[2:4] # returns 2nd through 4th
[1] 145 958 456
You can access particular elements in the vector in the following manner:
> # define a numeric vector
> x <- c(734, 145, 958, 456, 924)
>
> x[c(1,3,5)] # returns 1st, 3rd and 5th elemetns
[1] 734 958 924
> x[-2] # returns all but 2nd element
[1] 734 958 456 924
You can access particular elements in the vector in the following manner:
> # define a numeric vector
> x <- c(734, 145, 958, 456, 924)
>
> x[c(TRUE, TRUE, FALSE, FALSE, TRUE)] # returns 1st, 2nd, and 5th elements
[1] 734 145 924
You can access particular elements in the vector in the following manner:
> # define a numeric vector
> x <- c(734, 145, 958, 456, 924)
>
> # returns only those elements that less than 500
> x[x<500]
[1] 145 456
max(x), min(x), sum(x)
mean(x), median(x), range(x)
var(x) , cor(x,y)
sort(x), rank(x), order(x)
cumsum(), cumprod(x), cummin(x), cumprod(x)
duplicated(x), unique(x)
Example:
> # define a numeric vector
> x <- c(734, 145, 958, 456, 924)
>
> mean(x)
[1] 643.4
> sort(x)
[1] 145 456 734 924 958
Usefule vector function:
> # define a numeric vector
> x <- c(734, 145, 958, 456, 924)
>
> summary(x)
Min. 1st Qu. Median Mean 3rd Qu. Max.
145.0 456.0 734.0 643.4 924.0 958.0
To define a missing value, use NA:
> # define a numeric vector
> x <- c(734, 145, NA, 456, NA)
>
> # Check if the element in the vector is missing
> is.na(x)
[1] FALSE FALSE TRUE FALSE TRUE
> # Which elements in the vector are missing
> which(is.na(x))
[1] 3 5
Missing value cannot be compared to anything:
> # define a numeric vector
> x <- c(734, 145, NA, 456, NA)
>
> x == NA # this does not work !
[1] NA NA NA NA NA
> # Use is.na() instead
> is.na(x)
[1] FALSE FALSE TRUE FALSE TRUE
Factor is a special type of a vector that stores “categorical” variables.
To convert a vector into the factor use factor() function
> x <- c(0, 1, 1, 1, 0, 0, 1, 0)
> x <- factor(x)
> table(x)
x
0 1
4 4
Each level in the factor variable can be named
> x <- factor( c(0, 0, 1, 1, 0, 0, 1, 0), labels=c("Fail","Success"))
>
> table(x)
x
Fail Success
5 3
Factors are treated differently by the summary() function:
> # define a numeric vector
> x <- factor( c(0, 0, 1, 1, 0, 0, 1, 0), labels=c("Fail","Success"))
>
> summary(x)
Fail Success
5 3
Matrix is a 2 dimentional array of elements of the same type:
> matr <- matrix( c(1,2,3,4,5,6) , ncol=2)
> matr
[,1] [,2]
[1,] 1 4
[2,] 2 5
[3,] 3 6
Note: R matrix is “column-major”.
Matrix is a 2 dimentional array of elements of the same type:
> matr <- matrix( c(1,2,3,4,5,6) , nrow=2)
> matr
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
You can also convert array into a matrix:
> # First define the vector
> matr <- c(1,2,3,4,5,6)
> # Then change dimensions
> dim(matr) <- c(2,3)
>
> matr
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
You can fill matrix by-row
> matr <- matrix( c(1,2,3,4,5,6) , ncol=2, byrow=TRUE)
> matr
[,1] [,2]
[1,] 1 2
[2,] 3 4
[3,] 5 6
> smatr = matrix( c(1,-3, 2, 5) , ncol=2)
> smatr
[,1] [,2]
[1,] 1 2
[2,] -3 5
> # transpose matrix
> t(smatr)
[,1] [,2]
[1,] 1 -3
[2,] 2 5
> smatr = matrix( c(1,-3, 2, 5) , ncol=2)
> smatr
[,1] [,2]
[1,] 1 2
[2,] -3 5
> # Inverse matrix
> solve(smatr)
[,1] [,2]
[1,] 0.4545455 -0.18181818
[2,] 0.2727273 0.09090909
> smatr
[,1] [,2]
[1,] 1 2
[2,] -3 5
> # product of matricies elements
> smatr*smatr
[,1] [,2]
[1,] 1 4
[2,] 9 25
> smatr
[,1] [,2]
[1,] 1 2
[2,] -3 5
> # matrix product
> smatr %*% smatr
[,1] [,2]
[1,] -5 12
[2,] -18 19
> smatr
[,1] [,2]
[1,] 1 2
[2,] -3 5
> # inverse of each element of the matrix
> smatr^(-1)
[,1] [,2]
[1,] 1.0000000 0.5
[2,] -0.3333333 0.2
Some useful matrix functions:
colMeans(); rowMeans(); colSums(); rowSums()
> smatr
[,1] [,2]
[1,] 1 2
[2,] -3 5
> # inverse of each element of the matrix
> rowSums(smatr)
[1] 3 2
Access help file for the R function:
> ?matrix
> help(matrix)
You can search for help using help.search() function:
> help.search("matrix")
Or using two question marks:
> ??matrix
Get arguments of a function
> args(matrix)
function (data = NA, nrow = 1, ncol = 1, byrow = FALSE, dimnames = NULL)
NULL
Examples of function usage
> example(matrix)
matrix> is.matrix(as.matrix(1:10))
[1] TRUE
matrix> !is.matrix(warpbreaks) # data.frame, NOT matrix!
[1] TRUE
matrix> warpbreaks[1:10,]
breaks wool tension
1 26 A L
2 30 A L
3 54 A L
4 25 A L
5 70 A L
6 52 A L
7 51 A L
8 26 A L
9 67 A L
10 18 A M
matrix> as.matrix(warpbreaks[1:10,]) # using as.matrix.data.frame(.) method
breaks wool tension
1 "26" "A" "L"
2 "30" "A" "L"
3 "54" "A" "L"
4 "25" "A" "L"
5 "70" "A" "L"
6 "52" "A" "L"
7 "51" "A" "L"
8 "26" "A" "L"
9 "67" "A" "L"
10 "18" "A" "M"
matrix> ## Example of setting row and column names
matrix> mdat <- matrix(c(1,2,3, 11,12,13), nrow = 2, ncol = 3, byrow = TRUE,
matrix+ dimnames = list(c("row1", "row2"),
matrix+ c("C.1", "C.2", "C.3")))
matrix> mdat
C.1 C.2 C.3
row1 1 2 3
row2 11 12 13
Check variables in the current session
> objects()
[1] "a" "A" "b" "bool.var" "comp.var" "heights"
[7] "matr" "mdat" "names" "num.var" "numbers" "smatr"
[13] "str.var" "vals" "x"
Or
> ls()
[1] "a" "A" "b" "bool.var" "comp.var" "heights"
[7] "matr" "mdat" "names" "num.var" "numbers" "smatr"
[13] "str.var" "vals" "x"
Remove array x from the memory
> rm(x)
Remove everything from your working enviroment
> rm(list=ls())
R CMD BATCH Rprog.R
Rscript Rprog.R
R -q –vanilla < Rprog.R
A data frame is more general than a matrix, in that different columns can have different modes (numeric, character, factor, etc.). It is similar to SAS and SPSS datasets.
> names <- c("Alex", "Bob", "Cat")
> ages <- c(12,5,7)
> sex <- c("M","M","F")
> kids <- data.frame(Names=names,Ages=ages,Sex=sex)
> kids
Names Ages Sex
1 Alex 12 M
2 Bob 5 M
3 Cat 7 F
Summary function will recognize each variable type:
> kids
Names Ages Sex
1 Alex 12 M
2 Bob 5 M
3 Cat 7 F
> summary(kids)
Names Ages Sex
Alex:1 Min. : 5.0 F:1
Bob :1 1st Qu.: 6.0 M:2
Cat :1 Median : 7.0
Mean : 8.0
3rd Qu.: 9.5
Max. :12.0
Read the dataframe
Error in file(file, "rt") : cannot open the connection