Week 1: Background, getting started, and Nuts&Bolts

Background imformations

Some functions and commands:

getwd()
dir()
ls()
Ctrl + L # clear the screen
source("mycode.R")
str(file)
args(functionName)

# file manipulate in R
dir.create("testdir")
setwd("testdir")
file.create("mytest.R")
file.exists("mytest.R")
file.info("mytest.R")
file.rename("mytest.R","mytest2.R"
file.copy("mytest2.R", "mytest3.R")
dir.create(file.path('testdir2', 'testdir3'), recursive = TRUE) # create nested dirs
unlink("testdir2", recursive = TRUE)

nults and Bolts

1991: created by Ross, R mailing list would be one good way to find answers

5 basic type of objects: character,numeric, Inf, Na, NaN,interger,complex,logical. R objects have attributes like class, dimensions, length. c(), vector() are functions to create new objects.

test = vector("list", length = 5)

List is a special of vector that contain elements of different classes, double bracket to subset. Matrices is another important type of object. Ths frist level for factors is the basline level, so sometimes we need to specify the level. Missing values, data frames etc..

Reading data
there are several functions to reading different types of data into R, including read.table, read.csv, readLines for text file, source(dump) for R code file etc. there are arguments filename, header, sep, nrows, skip, stringsAsFactors. For reading large datasets you will need other options.
Interfaces to the outside world

file opens a conection to a file, gzfile opens a connection to a compressed file, url opens a connection to webpage.

con<- gzfile("words.gz","r"); 
x<- readLines(con,10) # read text files
Subset
there are several ways to subset for specific data type. We can use [],[[]],$. Partial matching is very handful x[["a",exact= F]] or x$a, where x is a list.

Week 2

Control structure

Using the function next and several control structures.

if (condition) {}
else {}

for (i in 1:4) {}

count <- 1
while(count < 10){}  # make things readable

for (i in 1:100){
    if (i <=20){
    next
    }
    ## some codes
}

Create functions

Argument matching
R functions arguments can be matched by position or by name. For example read.table. It can be partially match
colummean <- function(y, removeNA = TRUE){
       nc <- ncol(y)
       means <- numeric(nc)
       for (i in 1:nc) {
            means[i] <- mean(y[,i], na.rm = removeNA)
       }
}

# default value of args, remove NAs

The “…” argument indicate a variable number of args that are usually passed on to other functions. The following code will create a function like plot differ only with point type tiny circle.

myplot <- function(x, y, type = "l", ...){
        plot(x, y, type = type, ...)
}

args(paste)

Scoping rules

search() to show packages. Lexical scoping means that the values of free variable are searched for in the environment in which the function was defined.

make.power <- function(n){
      pow <- function(x){
            x^n
      }
      pow
}
cube <- make.power(3)
square <- make.power(2)
cube(3)
square(3)
ls(environment(cube))
get("n", environment(cube))

Code standards, date and time

  1. text file.
  2. Indenting your code.
  3. functions to fulfill one task.

as.Date, as.POSIXct, as.POSIXlt(character string)

x <- Sys.time() ##  in POSIXct format
p <- as.POSIXlt(x)  ## it is a list
names(unclass(p))
##  [1] "sec"    "min"    "hour"   "mday"   "mon"    "year"   "wday"  
##  [8] "yday"   "isdst"  "zone"   "gmtoff"
p$sec
## [1] 37.38552
unclass(x) 
## [1] 1471429957
datestring <- c("January 10, 2012 10:40", "December 9, 2011 9:10")
x <- strptime(datestring, "%B %d, %Y %H:%M")
class(x)
## [1] "POSIXlt" "POSIXt"

We do some basic operation on dates and time like “-, +”


Week 3

Loop function

lapply, sapply, apply, tapply, mapply, split are functions

## lapply always return a list, input is coerce as list
lapply(1:4, runif, min = 0, max = 10)
lapply(x, function(elt) elt[,1])  ## anonymous function for extracting the first column
## sapply return a vector when the return could be coerce as vector, otherwise list

x <- matrix(rnorm(200), 20, 10)
apply(x, 2, mean)  ## preserve the columns
apply(x, 1, mean)  ## preserve the rows
rowSums = apply(x, 1, sum)
rowMeans = apply(x, 1, mean)
apply(x, 1, quantile, probs = c(0.25, 0.75))
## mapply among lists
mapply(rep, 1:4, 4:1)
## [[1]]
## [1] 1 1 1 1
## 
## [[2]]
## [1] 2 2 2
## 
## [[3]]
## [1] 3 3
## 
## [[4]]
## [1] 4
## tapply for factors
x <- c(rnorm(10), runif(10), rnorm(10,1))
f <- gl(3, 10)  # factor generate function
tapply(x, f, mean)
##          1          2          3 
## -0.2768301  0.5317314  1.1608785
tapply(x, f, mean, simplify = FALSE)
## $`1`
## [1] -0.2768301
## 
## $`2`
## [1] 0.5317314
## 
## $`3`
## [1] 1.160878
str(split)
## function (x, f, drop = FALSE, ...)
lapply(split(x,f), mean)
## $`1`
## [1] -0.2768301
## 
## $`2`
## [1] 0.5317314
## 
## $`3`
## [1] 1.160878

Debugging

log(-1)
## Warning in log(-1): 产生了NaNs
## [1] NaN
traceback()
## No traceback available

Week 4

The str function

It is the most important function in R, short for structure

str(str)
## function (object, ...)
str(lm)
## function (formula, data, subset, weights, na.action, method = "qr", 
##     model = TRUE, x = FALSE, y = FALSE, qr = TRUE, singular.ok = TRUE, 
##     contrasts = NULL, offset, ...)

Simulation

d for density, r for random, p for cumulative distribution, q for quantile. `lower.tail = FALSE’ means the survial function.

Generating random numbers from a linear model.

set.seed(1)   ## it is very important
x <- rnorm(100)
e <- rnorm(100, 0, 2)
y <- 0.5 + 2*x + e
summary(y)
plot(x,y)

flips <- sample(c(0,1), 100, replace = TRUE, prob = c(0.3, 0.7))
rbinom(1, size = 100, prob = 0.7)

Sample function

sample(1:10, 4, replace = T)
## [1] 8 2 7 6

R profiler

Profiling is a systematic way to examine how much time is spend in different parts of a program. useful when trying to optimize your code.

system.time(), the user time and the elapsed time. time charged to the CPU(s) for this expression, the elapsed time is the time you experience, which is wall clock time.

## network connectivity
system.time(readLines("http://www.jhsph.edu"))
##    user  system elapsed 
##   0.012   0.000   1.900
## multiple tasks
hilbert <- function(n){
      i <- 1:n
      1 / outer(i -1, i, "+")
}

x <- hilbert(1000)
system.time(svd(x))
##    user  system elapsed 
##   2.840   0.004   2.843

Swirl classes

1. Create sequences

seq(along.with = my_seq)
seq_along(my_seq)

2. R programming vectors

Using of function paste.

my_char <- c("my", "name", "is")
paste(my_char, collapse = " ")
paste(1:3, c("X", "Y", "Z"), sep = "") # "1X" "2Y" "3Z"
paste(LETTERS, 1:4, sep = "-")

3. something else

Three types of index vectors – logical, positive integer, and negative integer

A matrix is simply an atomic vector with a dimension attribute.

&, && the first element versus logical vector

The xor() function stands for exclusive OR. If one argument evaluates to TRUE and one argument evaluates to FALSE, then this function will return TRUE, otherwise it will return FALSE.

The which() function takes a logical vector as an argument and returns the indices of the vector that are TRUE.

Binary operators

"%p%" <- function(left, right){ # Remember to add arguments!
    paste(left, right)
}
'I' %p% 'love' %p% 'R!'
## [1] "I love R!"

looking at datasets


Questions

  1. scoping rules - optimization example.
  2. How use the functions debug, recover, browse
  3. Changable length parameter exmaple for paste “…”.

    fun <- function(...){
      x <- list(...)
      for (i in 1:length(x)) {print(x[[i]])
      }
    }
  4. Function evaluate.

    evaluate(function(x){x+1}, 6) ## using of anonymous function
    evaluate(function(x){x[1]}, c(8, 4, 0))
  5. Rprof() and summaryRprof()