Introduction to R

Katia Oleinik
koleinik@bu.edu

Helpful Links:


Login to SCC Cluster



Login: tuta30

Password: VizTut30



ssh -X scc4.bu.edu


cp -r /scratch/r-intro-2 .

R as a scientific calculator

Try executing the following commands in your Console Window:

> 2+3          # addition  
[1] 5
> 2^3          # power  
[1] 8
> log(2)       # built-in functions 
[1] 0.6931

Double Precision

By default R outputs 7 significant digits (single precision display). But all calculations are always done using double precision

> options(digits=15)   # change to double precision display
> exp(3)
[1] 20.0855369231877

Return back to the single precision output

> options(digits=7) 
> exp(3)
[1] 20.08554

Variables


> a <- 3
> A <- 7  # R is case sensetive
> 
> b = -5  

Both assignment operators are equivalent, the first one is more “traditional”.

Variables


Rules:

  • The name should contain only letters,digits, periods and underscores.
  • No other special characters ($, %, etc.) are allowed.
  • It should start with either a character or a period.
  • Avoid using names c, t, cat, F, T, D as those are built-in functions/constants

Variables

Let's create a few variables:

> str.var  <- "character variable"   
> num.var  <- 21.17                  
> bool.var <- TRUE                   
> comp.var <- 1-3i                   

To view the variable's value, either type in the variable name or use print() function:

> num.var  
[1] 21.17
> print(str.var)
[1] "character variable"

Variables

Check the mode of the variable (its type):

> mode(bool.var)
[1] "logical"
> mode(num.var)
[1] "numeric"
> mode(str.var)
[1] "character"

Data Types

R has a wide variety of data types including:

  • Vectors (numerical, character, boolean, complex)
  • Factors (numerical, character, boolean, complex)
  • Matrices
  • Data Frames
  • Lists

Vectors

Vector - an array of R objects of the same type.

To create a vector use function c() - concatinate:

> ( names <- c ("Alex", "Nick", "Mike") )
[1] "Alex" "Nick" "Mike"
> ( numbers <- c (21, -3, 7.25) )
[1] 21.00 -3.00  7.25

Note: If the R command is enclosed in parentheses, then after the command is executed the result is also printed to the screen.

Vectors

Vectors can be defined in a number of ways:

> ( vals <- c (2, -7, 5, 3, -1 ) )   
[1]  2 -7  5  3 -1
> ( vals <- 1:7 )                  
[1] 1 2 3 4 5 6 7
> ( vals <- seq(0, 3, by=0.5) )       
[1] 0.0 0.5 1.0 1.5 2.0 2.5 3.0
> ( vals <- rep("o", 7) )            
[1] "o" "o" "o" "o" "o" "o" "o"

Vectors

Vectors can be defined in a number of ways:

> ( vals <- numeric(9) )
[1] 0 0 0 0 0 0 0 0 0
> ( vals <- rnorm(5,2,1.5 ) )
[1] 1.242416 5.075357 2.803972 2.021579 3.121984
> ( vals <- rpois(5,4 )  )
[1] 3 1 5 1 1

Vectors


Vector elements can have labels:

> heights<-c(Alex=180, Bob=175, Clara=165, Don=185)
> heights
 Alex   Bob Clara   Don 
  180   175   165   185 

Vector arithmetics:

Do not use loops to perform operations on vectors. R operates on vectors!

> a <- 1:5
> b <- seq(2,10, by=2)
> 
> a+b
[1]  3  6  9 12 15
> b/a   
[1] 2 2 2 2 2

Vector slicing (subsetting)

You can access particular elements in the vector in the following manner:

> # define a numeric vector
> x <- c(734, 145, 958, 456, 924) 
> 
> 
> x[2]     # returns second element        
[1] 145
> x[2:4]   # returns 2nd through 4th
[1] 145 958 456

Vector slicing (subsetting)

You can access particular elements in the vector in the following manner:

> # define a numeric vector
> x <- c(734, 145, 958, 456, 924) 
> 
> x[c(1,3,5)]  # returns 1st, 3rd and 5th elemetns
[1] 734 958 924
> x[-2]        # returns all but 2nd element
[1] 734 958 456 924

Vector slicing (subsetting)

You can access particular elements in the vector in the following manner:

> # define a numeric vector
> x <- c(734, 145, 958, 456, 924) 
> 
> x[c(TRUE, TRUE, FALSE, FALSE, TRUE)]   # returns 1st, 2nd, and 5th elements
[1] 734 145 924

Vector slicing (subsetting)

You can access particular elements in the vector in the following manner:

> # define a numeric vector
> x <- c(734, 145, 958, 456, 924) 
> 
> # returns only those elements that less than 500
> x[x<500]     
[1] 145 456

Vector Functions


max(x), min(x), sum(x)

mean(x), median(x), range(x)

var(x) , cor(x,y)

sort(x), rank(x), order(x)

cumsum(), cumprod(x), cummin(x), cumprod(x)

duplicated(x), unique(x)

Vector Functions


Example:

> # define a numeric vector
> x <- c(734, 145, 958, 456, 924) 
> 
> mean(x)
[1] 643.4
> sort(x)
[1] 145 456 734 924 958

Vector Functions


Usefule vector function:

> # define a numeric vector
> x <- c(734, 145, 958, 456, 924) 
> 
> summary(x)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  145.0   456.0   734.0   643.4   924.0   958.0 

Missing Values


To define a missing value, use NA:

> # define a numeric vector
> x <- c(734, 145, NA, 456, NA)    
> 
> # Check if the element in the vector is missing
> is.na(x)              
[1] FALSE FALSE  TRUE FALSE  TRUE
> # Which elements in the vector are missing
> which(is.na(x))       
[1] 3 5

Missing Values


Missing value cannot be compared to anything:

> # define a numeric vector
> x <- c(734, 145, NA, 456, NA)    
> 
> x == NA   # this does not work !     
[1] NA NA NA NA NA
> # Use is.na() instead
> is.na(x)              
[1] FALSE FALSE  TRUE FALSE  TRUE

Factors


Factor is a special type of a vector that stores “categorical” variables. To convert a vector into the factor use factor() function

> x <- c(0, 1, 1, 1, 0, 0, 1, 0)    
> x <- factor(x)
> table(x)
x
0 1 
4 4 

Factors


Each level in the factor variable can be named

> x <- factor( c(0, 0, 1, 1, 0, 0, 1, 0), labels=c("Fail","Success"))   
> 
> table(x)
x
   Fail Success 
      5       3 

Factors


Factors are treated differently by the summary() function:

> # define a numeric vector
> x <- factor( c(0, 0, 1, 1, 0, 0, 1, 0), labels=c("Fail","Success"))   
> 
> summary(x)
   Fail Success 
      5       3 

Matrices


Matrix is a 2 dimentional array of elements of the same type:

> matr <- matrix( c(1,2,3,4,5,6) , ncol=2)
> matr
     [,1] [,2]
[1,]    1    4
[2,]    2    5
[3,]    3    6

Note: R matrix is “column-major”.

Matrices


Matrix is a 2 dimentional array of elements of the same type:

> matr <- matrix( c(1,2,3,4,5,6) , nrow=2) 
> matr
     [,1] [,2] [,3]
[1,]    1    3    5
[2,]    2    4    6

Matrices


You can also convert array into a matrix:

> # First define the vector
> matr <- c(1,2,3,4,5,6) 
> # Then change dimensions
> dim(matr) <- c(2,3)      
> 
> matr
     [,1] [,2] [,3]
[1,]    1    3    5
[2,]    2    4    6

Matrices


You can fill matrix by-row

> matr <- matrix( c(1,2,3,4,5,6) , ncol=2, byrow=TRUE)
> matr
     [,1] [,2]
[1,]    1    2
[2,]    3    4
[3,]    5    6

Matrix Operations


> smatr = matrix( c(1,-3, 2, 5) , ncol=2) 
> smatr
     [,1] [,2]
[1,]    1    2
[2,]   -3    5
> # transpose matrix
> t(smatr) 
     [,1] [,2]
[1,]    1   -3
[2,]    2    5

Matrix Operations


> smatr = matrix( c(1,-3, 2, 5) , ncol=2) 
> smatr
     [,1] [,2]
[1,]    1    2
[2,]   -3    5
> # Inverse matrix
> solve(smatr)   
          [,1]        [,2]
[1,] 0.4545455 -0.18181818
[2,] 0.2727273  0.09090909

Matrix Operations


> smatr
     [,1] [,2]
[1,]    1    2
[2,]   -3    5
> # product of matricies elements
> smatr*smatr    
     [,1] [,2]
[1,]    1    4
[2,]    9   25

Matrix Operations


> smatr
     [,1] [,2]
[1,]    1    2
[2,]   -3    5
> # matrix product
> smatr %*% smatr  
     [,1] [,2]
[1,]   -5   12
[2,]  -18   19

Matrix Operations


> smatr
     [,1] [,2]
[1,]    1    2
[2,]   -3    5
> # inverse of each element of the matrix
> smatr^(-1)     
           [,1] [,2]
[1,]  1.0000000  0.5
[2,] -0.3333333  0.2

Matrix Operations

Some useful matrix functions:
colMeans(); rowMeans(); colSums(); rowSums()

> smatr
     [,1] [,2]
[1,]    1    2
[2,]   -3    5
> # inverse of each element of the matrix
> rowSums(smatr)    
[1] 3 2

R help


Access help file for the R function:

> ?matrix


> help(matrix)

R help


You can search for help using help.search() function:

> help.search("matrix")



Or using two question marks:

> ??matrix

R help


Get arguments of a function

> args(matrix)
function (data = NA, nrow = 1, ncol = 1, byrow = FALSE, dimnames = NULL) 
NULL

R help


Examples of function usage

> example(matrix)

matrix> is.matrix(as.matrix(1:10))
[1] TRUE

matrix> !is.matrix(warpbreaks)  # data.frame, NOT matrix!
[1] TRUE

matrix> warpbreaks[1:10,]
   breaks wool tension
1      26    A       L
2      30    A       L
3      54    A       L
4      25    A       L
5      70    A       L
6      52    A       L
7      51    A       L
8      26    A       L
9      67    A       L
10     18    A       M

matrix> as.matrix(warpbreaks[1:10,])  # using as.matrix.data.frame(.) method
   breaks wool tension
1  "26"   "A"  "L"    
2  "30"   "A"  "L"    
3  "54"   "A"  "L"    
4  "25"   "A"  "L"    
5  "70"   "A"  "L"    
6  "52"   "A"  "L"    
7  "51"   "A"  "L"    
8  "26"   "A"  "L"    
9  "67"   "A"  "L"    
10 "18"   "A"  "M"    

matrix> ## Example of setting row and column names
matrix> mdat <- matrix(c(1,2,3, 11,12,13), nrow = 2, ncol = 3, byrow = TRUE,
matrix+                dimnames = list(c("row1", "row2"),
matrix+                                c("C.1", "C.2", "C.3")))

matrix> mdat
     C.1 C.2 C.3
row1   1   2   3
row2  11  12  13

Clean current R session


Check variables in the current session

> objects()
 [1] "a"        "A"        "b"        "bool.var" "comp.var" "heights" 
 [7] "matr"     "mdat"     "names"    "num.var"  "numbers"  "smatr"   
[13] "str.var"  "vals"     "x"       

Or

> ls()
 [1] "a"        "A"        "b"        "bool.var" "comp.var" "heights" 
 [7] "matr"     "mdat"     "names"    "num.var"  "numbers"  "smatr"   
[13] "str.var"  "vals"     "x"       

Clean current R session


Remove array x from the memory

> rm(x)


Remove everything from your working enviroment

> rm(list=ls())

Running R script in a batch mode



R CMD BATCH Rprog.R

Rscript Rprog.R

R -q –vanilla < Rprog.R

Data Frames


A data frame is more general than a matrix, in that different columns can have different modes (numeric, character, factor, etc.). It is similar to SAS and SPSS datasets.

> names <- c("Alex", "Bob", "Cat")
> ages <- c(12,5,7)
> sex <- c("M","M","F")
> kids <- data.frame(Names=names,Ages=ages,Sex=sex)
> kids
  Names Ages Sex
1  Alex   12   M
2   Bob    5   M
3   Cat    7   F

Data Frames

Summary function will recognize each variable type:

> kids
  Names Ages Sex
1  Alex   12   M
2   Bob    5   M
3   Cat    7   F
> summary(kids)
  Names        Ages      Sex  
 Alex:1   Min.   : 5.0   F:1  
 Bob :1   1st Qu.: 6.0   M:2  
 Cat :1   Median : 7.0        
          Mean   : 8.0        
          3rd Qu.: 9.5        
          Max.   :12.0        

Data Frames


  1. Read the data from a file
  2. Clean the dataset if necessary, check for missing values
  3. Get summary of the data
  4. Use R graphics to explore each variable in the dataset
  5. Perform statistical analysis

Data Frames


Read the dataframe

Error in file(file, "rt") : cannot open the connection