Some functions and commands:
getwd()
dir()
ls()
Ctrl + L # clear the screen
source("mycode.R")
str(file)
args(functionName)
# file manipulate in R
dir.create("testdir")
setwd("testdir")
file.create("mytest.R")
file.exists("mytest.R")
file.info("mytest.R")
file.rename("mytest.R","mytest2.R"
file.copy("mytest2.R", "mytest3.R")
dir.create(file.path('testdir2', 'testdir3'), recursive = TRUE) # create nested dirs
unlink("testdir2", recursive = TRUE)
1991: created by Ross, R mailing list would be one good way to find answers
5 basic type of objects: character,numeric, Inf, Na, NaN,interger,complex,logical. R objects have attributes like class, dimensions, length. c(), vector()
are functions to create new objects.
test = vector("list", length = 5)
List is a special of vector that contain elements of different classes, double bracket to subset. Matrices is another important type of object. Ths frist level for factors is the basline level, so sometimes we need to specify the level. Missing values, data frames etc..
read.table, read.csv, readLines for text file, source(dump) for R code file etc.
there are arguments filename, header, sep, nrows, skip, stringsAsFactors
. For reading large datasets you will need other options.
file
opens a conection to a file, gzfile
opens a connection to a compressed file, url
opens a connection to webpage.
con<- gzfile("words.gz","r");
x<- readLines(con,10) # read text files
[],[[]],$
. Partial matching is very handful x[["a",exact= F]] or x$a
, where x
is a list.
Using the function next
and several control structures.
if (condition) {}
else {}
for (i in 1:4) {}
count <- 1
while(count < 10){} # make things readable
for (i in 1:100){
if (i <=20){
next
}
## some codes
}
read.table
. It can be partially match
colummean <- function(y, removeNA = TRUE){
nc <- ncol(y)
means <- numeric(nc)
for (i in 1:nc) {
means[i] <- mean(y[,i], na.rm = removeNA)
}
}
# default value of args, remove NAs
The “…” argument indicate a variable number of args that are usually passed on to other functions. The following code will create a function like plot differ only with point type tiny circle.
myplot <- function(x, y, type = "l", ...){
plot(x, y, type = type, ...)
}
args(paste)
search()
to show packages. Lexical scoping means that the values of free variable are searched for in the environment in which the function was defined.
make.power <- function(n){
pow <- function(x){
x^n
}
pow
}
cube <- make.power(3)
square <- make.power(2)
cube(3)
square(3)
ls(environment(cube))
get("n", environment(cube))
as.Date, as.POSIXct, as.POSIXlt(character string)
x <- Sys.time() ## in POSIXct format
p <- as.POSIXlt(x) ## it is a list
names(unclass(p))
## [1] "sec" "min" "hour" "mday" "mon" "year" "wday"
## [8] "yday" "isdst" "zone" "gmtoff"
p$sec
## [1] 37.38552
unclass(x)
## [1] 1471429957
datestring <- c("January 10, 2012 10:40", "December 9, 2011 9:10")
x <- strptime(datestring, "%B %d, %Y %H:%M")
class(x)
## [1] "POSIXlt" "POSIXt"
We do some basic operation on dates and time like “-, +”
lapply, sapply, apply, tapply, mapply, split are functions
## lapply always return a list, input is coerce as list
lapply(1:4, runif, min = 0, max = 10)
lapply(x, function(elt) elt[,1]) ## anonymous function for extracting the first column
## sapply return a vector when the return could be coerce as vector, otherwise list
x <- matrix(rnorm(200), 20, 10)
apply(x, 2, mean) ## preserve the columns
apply(x, 1, mean) ## preserve the rows
rowSums = apply(x, 1, sum)
rowMeans = apply(x, 1, mean)
apply(x, 1, quantile, probs = c(0.25, 0.75))
## mapply among lists
mapply(rep, 1:4, 4:1)
## [[1]]
## [1] 1 1 1 1
##
## [[2]]
## [1] 2 2 2
##
## [[3]]
## [1] 3 3
##
## [[4]]
## [1] 4
## tapply for factors
x <- c(rnorm(10), runif(10), rnorm(10,1))
f <- gl(3, 10) # factor generate function
tapply(x, f, mean)
## 1 2 3
## -0.2768301 0.5317314 1.1608785
tapply(x, f, mean, simplify = FALSE)
## $`1`
## [1] -0.2768301
##
## $`2`
## [1] 0.5317314
##
## $`3`
## [1] 1.160878
str(split)
## function (x, f, drop = FALSE, ...)
lapply(split(x,f), mean)
## $`1`
## [1] -0.2768301
##
## $`2`
## [1] 0.5317314
##
## $`3`
## [1] 1.160878
log(-1)
## Warning in log(-1): 产生了NaNs
## [1] NaN
traceback()
## No traceback available
str
functionIt is the most important function in R, short for structure
str(str)
## function (object, ...)
str(lm)
## function (formula, data, subset, weights, na.action, method = "qr",
## model = TRUE, x = FALSE, y = FALSE, qr = TRUE, singular.ok = TRUE,
## contrasts = NULL, offset, ...)
d for density, r for random, p for cumulative distribution, q for quantile. `lower.tail = FALSE’ means the survial function.
Generating random numbers from a linear model.
set.seed(1) ## it is very important
x <- rnorm(100)
e <- rnorm(100, 0, 2)
y <- 0.5 + 2*x + e
summary(y)
plot(x,y)
flips <- sample(c(0,1), 100, replace = TRUE, prob = c(0.3, 0.7))
rbinom(1, size = 100, prob = 0.7)
Sample function
sample(1:10, 4, replace = T)
## [1] 8 2 7 6
Profiling is a systematic way to examine how much time is spend in different parts of a program. useful when trying to optimize your code.
system.time()
, the user time and the elapsed time. time charged to the CPU(s) for this expression, the elapsed time is the time you experience, which is wall clock time.
## network connectivity
system.time(readLines("http://www.jhsph.edu"))
## user system elapsed
## 0.012 0.000 1.900
## multiple tasks
hilbert <- function(n){
i <- 1:n
1 / outer(i -1, i, "+")
}
x <- hilbert(1000)
system.time(svd(x))
## user system elapsed
## 2.840 0.004 2.843
seq(along.with = my_seq)
seq_along(my_seq)
Using of function paste
.
my_char <- c("my", "name", "is")
paste(my_char, collapse = " ")
paste(1:3, c("X", "Y", "Z"), sep = "") # "1X" "2Y" "3Z"
paste(LETTERS, 1:4, sep = "-")
Three types of index vectors – logical, positive integer, and negative integer
A matrix is simply an atomic vector with a dimension attribute.
&, &&
the first element versus logical vector
The xor() function stands for exclusive OR. If one argument evaluates to TRUE and one argument evaluates to FALSE, then this function will return TRUE, otherwise it will return FALSE.
The which() function takes a logical vector as an argument and returns the indices of the vector that are TRUE.
Binary operators
"%p%" <- function(left, right){ # Remember to add arguments!
paste(left, right)
}
'I' %p% 'love' %p% 'R!'
## [1] "I love R!"
debug, recover, browse
Changable length parameter exmaple for paste “…”.
fun <- function(...){
x <- list(...)
for (i in 1:length(x)) {print(x[[i]])
}
}
Function evaluate.
evaluate(function(x){x+1}, 6) ## using of anonymous function
evaluate(function(x){x[1]}, c(8, 4, 0))
Rprof()
and summaryRprof()