Reading and Writing Data


reading

  • read.csv, read.table
    • for reading tabular data
    • return data frame in R
  • readLines
    • reading lines of text file
  • source
    • reading R code file
    • inverse of dump
  • dget
    • reading R code file
    • inverse of dput
  • load
    • reading in saved workspaces
  • unserialize
    • reading single R object in binary form

writing

  • write.table
  • writeLines
  • dump
  • dput
  • save
  • serialize

reading large set of data

initial <- read.table("test.txt", nrows=100)
classes <- sapply(initial, class)
tableAll <- read.table("test.txt", colClasses=classes)

interface with outside world

Subsetting


  1. [] :
    • always return an object of the same class as the original;
    • can be used to selected more than one element.
  2. [[]] :
    • is used to extract element of a list or data frames;
    • it can be only be used to extract a single element
    • and the class of return object will not necessary to be list or data frame.
  3. $
    • is used to extract element of a list or data frames by name
    • semantics are similar to hat of [[]]

Baisc

numeric index

create vector

> x <- c('a','b','c','c','d','a')
> x
[1] "a" "b" "c" "c" "d" "a"

subset by index

> x[1]
[1] "a"
> x[2]
[1] "b"
> x[1:4]
[1] "a" "b" "c" "c"
> x[4:1]
[1] "c" "c" "b" "a"
logical index
> x [x>'a']
[1] "b" "c" "c" "d"
> u <- x>'a'
> u
[1] FALSE  TRUE  TRUE  TRUE  TRUE FALSE
> x[u]
[1] "b" "c" "c" "d"

Lists

List:

> x <- list(foo=1:4, bar=0.6)
> x
$foo
[1] 1 2 3 4

$bar
[1] 0.6

[] return the same class, so x[1] still list

> x[1]
$foo
[1] 1 2 3 4

[[]] return the single element of the list

> x[[1]]
[1] 1 2 3 4
> x[['foo']]
[1] 1 2 3 4
> name <- 'foo'
> x[name] #return list
$foo
[1] 1 2 3 4
> x[[name]] #return element
[1] 1 2 3 4

$ return the elment associate with the name

> x$bar
[1] 0.6
> x$foo
[1] 1 2 3 4

other

> x <- list(foo='hello' ,bar=0.6, baz='world')
> x[c(1,3)]
$foo
[1] "hello"
$baz
[1] "world"

[] compare [[]]

> x[['bar']] # return element
[1] 0.6
> x['bar']   # return list
$bar
[1] 0.6

subsetting nested elements of a list

> x <- list(a =list(12,13,14), b=list(3.14, 2.18))
> x[[1]][[1]]
[1] 12
> x[[2]][[1]]
[1] 3.14
> x[[c(1,2)]]
[1] 13
> x[[c(2,1)]]
[1] 3.14

Matrices

  • first index: row
  • second index: column

create 2*3 matrix

> x <- matrix(1:6, nrow=2, ncol=3)
> x
     [,1] [,2] [,3]
[1,]    1    3    5
[2,]    2    4    6

example of subset

> x[1,2]
[1] 3
> x[2,1]
[1] 2
> x[1,]
[1] 1 3 5
> x[,2]
[1] 3 4

subset of matrix still be matrix

> x[1,2, drop=FALSE]
     [,1]
[1,]    3
> x[1,, drop=FALSE]
     [,1] [,2] [,3]
[1,]    1    3    5

Partial Matching

create list

> x <- list(abdababga = 1:4)
> x
$abdababga
[1] 1 2 3 4
> x$a
[1] 1 2 3 4
> x[['a']]
NULL
> x[['a', exact=F]]
[1] 1 2 3 4

Removing Missing Values

create vector

> x <- c(1, 2, NA, 5, NA, 7)
> x
[1]  1  2 NA  5 NA  7

find NA

> bad <- is.na(x)
> bad # logical vector
[1] FALSE FALSE  TRUE FALSE  TRUE FALSE

remove NA

> x[!bad]
[1] 1 2 5 7
method two:

create vectors

> x <- c(1, 2, NA, 5, NA, 7)
> y <- c('a', 'b', NA, 'd', NA, NA)
> x
[1]  1  2 NA  5 NA  7
> y
[1] "a" "b" NA  "d" NA  NA

detected good

> good <- complete.cases(x, y)
> good
[1]  TRUE  TRUE FALSE  TRUE FALSE FALSE

remove NA

> x[good]
[1] 1 2 5
> y[good]
[1] "a" "b" "d"

Vectorized Operations


create x and y

> x <- 1:4
> y <- 5:8
> x
[1] 1 2 3 4
> y
[1] 5 6 7 8

add

> x+y
[1]  6  8 10 12

substract

> x-y
[1] -4 -4 -4 -4

multiply

> x*y
[1]  5 12 21 32

devided

> x/y
[1] 0.2000000 0.3333333 0.4285714 0.5000000

vector * vector

> x %*% y
     [,1]
[1,]   70

logical operation

> x>y
[1] FALSE FALSE FALSE FALSE
> x ==y
[1] FALSE FALSE FALSE FALSE
> x <y
[1] TRUE TRUE TRUE TRUE
> x ==2
[1] FALSE  TRUE FALSE FALSE