Tuesday, Jan 19, 2016

Introduction to

Textbook

What is R?

  • R is a dialect of the S language
  • S was developed by John Chambers and others from Bell Labs as an statistical analysis environment
  • The project started in 1976, in Fortran
  • In 1988, it was rewritten in C, Version 3
  • In 1998, Version 4
  • In 1998, R won the Association for Computing Machinery's Software System Award
  • Currently, S-PLUS by TIBCO

S Philosophy

  • S had its roots in data analysis not a traditional programming language
  • Created to make data analysis easier
  • A language for interactive data analysis, more command-line oriented
  • Also to write long programs

R

  • S was only commercially available, as S-PLUS
  • R was created in 1991 by Ross Ihaka and Robert Gentleman, Dept. Statistics at the University of Auckland
  • First announcement in 1993
  • Paper in 1996

    • Ross Ihaka and Robert Gentleman. R: A language for data analysis and graphics. Journal of Computational and Graphical Statistics, 5(3):299-314, 1996
  • 1995, GNU General Public License to make R free SW
  • 1996, public mailing lists (R-help and R-devel)
  • 1997, R Core group
  • 2000, R 1.0.0 released to the public

R Key Features

  • Syntax very similar to S
  • Semantics similar to Scheme
  • Runs on almost any computer platform and OS
  • Has been adapted to run in tablets, phones, PDAs, game consoles
  • Frequent releases
  • Sophisticated graphics capabilities
  • Good for interactive analysis and to develop new tools
  • A great community - R-help, R-devel, even StackOverflow

R is Free Software

  • The four freedoms of SW
    • Freedom to run the program, for any purpose (freedom 0)
    • The freedom to study how the program works, and adapt it to your needs (freedom 1)
    • The freedom to redistribute copies so you can help your neighbor (freedom 2)
    • The freedom to improve the program, and release your improvements to the public, so that the whole community benefits (freedom 3)

R Design

  • A main package and many add-on packages
  • Available at: Comprehensive R Archive Network (CRAN)
  • Two main parts
  • Base package
    • Required to run R
    • Most fundamental functions
    • utils, stats, datasets, graphics, grDevices, grid, methods, tools, parallel, compiler, splines, tcltk, stats4
    • Recommended packages
    • boot, class, cluster, codetools, foreign, KernSmooth, lattice, mgcv, nlme, rpart, survival, MASS, spatial, nnet, Matrix

R Design

  • Two main parts (continue…)
  • All other packages
    • 4000 packages in CRAN
    • Packages in the Bioconductor project
    • Packages in personal web sites
    • Packages in GitHub, BitBucket

There are also LIMITATIONS

  • Yes, nothing is perfect
  • Based on almost 50 years old technology
  • Objects must generally be stored in physical memory
  • Functionality is based on consumer demand and "voluntary" user contributions

Resources

Getting Started with R

Assignment

Basic Commands

  • The prompt
    • ">"
  • Comments with:
    • "#"
    • Not support for multi-line comments
  • Obtain the Current Directory
    getwd()
## [1] "/home/jagonzalez/SpiderOak Hive/work/Courses/DataScience"
  • Set Working Directory
    setwd(".")

Basic Commands 2

  • Assign a value to a variable
    x <- 3
    y <- "abc"

Basic Commands 3

  • Print the value of the variable
    print(x)
## [1] 3
    print(y)
## [1] "abc"
  • Another way to print the value of a variable
    x
## [1] 3

Evaluation

    x <- 3
    x
## [1] 3
    print(x)
## [1] 3

Evaluation

  • Index of vectors printed to the left
    x <- 1:50
    x
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
## [24] 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46
## [47] 47 48 49 50
  • The ":" operator is used to create sequences

Atomic Classes of Objects

  • character
  • numeric
  • integer
  • complex
  • logical (True/False)

Types

  • Most basic type is vector
  • Create an empty vector with
    • vector()
    • A vector can only contain values of the same class
    • Lists may contain values of different classes

Numbers

  • Usually treated as numeric objects
    • As reals or double precision
  • To get an integer number, pass it specifically as integer:
    • 1L
  • Inf represents infinity

  • 1 / Inf = 0
  • Undefined value represented with:
    • NaN, Not a Number
    • 0 / 0 = NaN
    • NaN also represents missing values

Attributes

  • R objects can have attributes –Metadata
    • Such as column names
  • Metadata examples
    • names, dimnames
    • dimensions (for matrices, arrays)
    • class (as numeric, integer)
    • length
    • user defined attributes (metadata)
  • Attributes are accessed with function:
    • attributes()

Vectors

  • Use c() to concatenate values to create a vector:
    x <- c(1.5, 2.3, 3.7, 4.6)
    x <- c("hello", "world")
    x <- c(TRUE, TRUE, FALSE)
  • Use vector() to initialize a vector
    x <- vector("numeric",length = 10)
    x
##  [1] 0 0 0 0 0 0 0 0 0 0

Implicit Coersion

  • Mixing objects of different classes in a vector is not allowed
    • When we mix two classes of objects, coersion occurs
    • c(3.1416, "pi") ##coerse to character

Explicit Coersion

  • Explicit coersion using as.*
    x <- 0:10
    class(x)
## [1] "integer"
    as.numeric(x)
##  [1]  0  1  2  3  4  5  6  7  8  9 10

Explicit Coersion

    as.logical(x)
##  [1] FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
    as.character(x)
##  [1] "0"  "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10"
  • When R doesn't find a way to coerse:
    • Returns NA's

Matrices

  • Matrices are vectors with a dimension attribute
    a <- matrix(nrow=2, ncol=3)
    a
##      [,1] [,2] [,3]
## [1,]   NA   NA   NA
## [2,]   NA   NA   NA

Matrices

    dim(a)
## [1] 2 3
    attributes(a)
## $dim
## [1] 2 3

Creating Matrices

    b <- matrix(1:9, nrow=3, ncol=3)
    b
##      [,1] [,2] [,3]
## [1,]    1    4    7
## [2,]    2    5    8
## [3,]    3    6    9

Creating Matrices

    c <- 1:9
    c
## [1] 1 2 3 4 5 6 7 8 9
    dim(c) <- c(3,3)
    c
##      [,1] [,2] [,3]
## [1,]    1    4    7
## [2,]    2    5    8
## [3,]    3    6    9

Matrices with cbind and rbind

  • Creating matrices column-wise and row-wise
    x <- 1:3
    y <- 10:12
    cbind(x,y)
##      x  y
## [1,] 1 10
## [2,] 2 11
## [3,] 3 12
    rbind(x,y)
##   [,1] [,2] [,3]
## x    1    2    3
## y   10   11   12

Lists

  • May contain objects of different classes
  • Lists are very powerful when used withthe apply commands
  • Created with the list() command
    list("a", TRUE, 3.1416, 5)
## [[1]]
## [1] "a"
## 
## [[2]]
## [1] TRUE
## 
## [[3]]
## [1] 3.1416
## 
## [[4]]
## [1] 5

Lists

  • Creating a NULL list
    x <- vector("list", length = 3)
    x
## [[1]]
## NULL
## 
## [[2]]
## NULL
## 
## [[3]]
## NULL

Factors

  • We use factors to represent categorical data
  • Can be ordered or unordered
  • We use labels with factors
    • Variable with values "Male" and "Female" better than values 1 and 2
  • Factors treated specially by modelling functions such as lm() or glm
  • Created with factor() function
  • Factors may be automatically created as with read.table()
  • Levels can be ordered when we create the factor

Factors

    x <- factor(c("high", "high", "low", "low", "high", "medium"))
    x
## [1] high   high   low    low    high   medium
## Levels: high low medium

Factors

    table(x)
## x
##   high    low medium 
##      3      2      1
    x <- factor(c("high", "high", "low", "low", "high", "medium"),
                levels = c("low", "medium", "high"))
    x
## [1] high   high   low    low    high   medium
## Levels: low medium high

Missing Values

  • Represented with NA or NaN
  • is.na() is used to test if an object is NA
  • is.nan() is used to test if an object is NaN
  • NA values have a class
    • Integer NA
    • Character NA
  • A NaN value is also a NA value but an NA may not be a NaN

Factors

    x <- c(1, 8, 3, NA, 4, 2)
    x
## [1]  1  8  3 NA  4  2
    is.na(x)
## [1] FALSE FALSE FALSE  TRUE FALSE FALSE

Data Frames

  • We use data frames to store tabular data in R
  • Very useful to work in Machine Learning
  • The dplyr package works with dataframes to manipulate the data
  • A data frame is a kind of list in which each value is a vector
    • The lenght of all vectors must be the same
    • Usually the length of columns (or lists) is the number of rows
    • Elements of the list, or attributes, are columns of the list

Data Frames

  • In a training dataset, our dataframe consists of m list attributes of length equal to the number of rows or samples
  • The type of each element of the list corresponds to the type of the attribute.

Data Frames

  • Dataframes have column names and row names
  • Created with
    • read.table()
    • read.csv()
    • data.frame()
  • Can be converted into matrix
    • data.matrix()

Data Frames

    name <- c("Mary", "Alice", "John", "George", "Joseph")
    age <- c(25, 30, 28, 24, 27)
    grade <- c("A", "A", "C", "B", "B")
    studentGrade <- data.frame(name, age, grade)
    studentGrade
##     name age grade
## 1   Mary  25     A
## 2  Alice  30     A
## 3   John  28     C
## 4 George  24     B
## 5 Joseph  27     B

Data Frames

    nrow(studentGrade)
## [1] 5
    ncol(studentGrade)
## [1] 3

Names

  • R objects may have names
    • Self describing objects
    • Writing clear code
    sales <- c(20, 17, 19)
    names(sales)
## NULL
    names(sales) <- c("Computers", "Printers", "Hard Drives")
    sales
##   Computers    Printers Hard Drives 
##          20          17          19

Names

  • Matrices may have both, column and row names
    • dimnames() # both col and row names at the same time
    • rownames() # row names separately
    • colnames() # col names separately

Giving Names Table

Reading Data

  • We can read data in csv format (tabular)
    • read.table()
    • read.csv()
  • To read lines from a textfile
    • readLines()
  • To read R code files inside an R program (inverse of dump())
    • source()

Reading Data

  • To read R code files (inverse of dput())
    • dget()
  • To read in saved workspaces
    • load()
  • To read R binary objects
    • unserialize()

readtable() Arguments

  • file
  • header
    • logical
  • sep
    • how columns are separated
  • colClasses
    • character vector

readtable() Arguments

  • nrows
    • # of rows in the dataset, default is the entire file
  • comment.char
    • default is "#"
  • stringsAsFactors, either TRUE or FALSE
    • Should character variables be coded as factors?

readtable() Example

  • Example
    • dataset <- read.table("mydata.txt")

readcsv()

  • Identical to read.table() except that some parameters are initialized to a different value
    • sep is set to ","

Remember, R loads all objects into memory, be aware

dump() and dput()

  • Store data in textual format but with additional metadata
    • class of each attribute
    • levels of a factor variable
  • The format is not very space efficient
  • Text is partially readable

Using dput() and dget()

    name <- c("Alex", "Alice", "Mary")
    grade <- c("A", "B", "B")
    grades <- data.frame(name, grade)
    grades
##    name grade
## 1  Alex     A
## 2 Alice     B
## 3  Mary     B
    dput(grades)
## structure(list(name = structure(1:3, .Label = c("Alex", "Alice", 
## "Mary"), class = "factor"), grade = structure(c(1L, 2L, 2L), .Label = c("A", 
## "B"), class = "factor")), .Names = c("name", "grade"), row.names = c(NA, 
## -3L), class = "data.frame")

Using dput() and dget()

  • Saving the output of dput into a file
    dput(grades, file = "grades.R")
    new_grades <- dget("grades.R")
    new_grades
##    name grade
## 1  Alex     A
## 2 Alice     B
## 3  Mary     B

dump() and source()

    x <- "Alex"
    y <- "A"
    dump(c("x", "y"), file = "data.R")
    rm(x, y)
    source("data.R")
    str(y)
##  chr "A"
    x
## [1] "Alex"

Binary Formats

  • Data can also be stored in binary format
    • As with images
    • Numeric data, to avoid losing precision
  • Functions to convert into binary format
    • save()
      • To save individual objects into a file
    • save.image()
      • To save many objects into a file (i.e. workspace)
    • serialize()

save() and load()

    a <- data.frame(x = rnorm(100), y = runif(100))
    b <- c(1, 5.1, 2/3)
    save(a, b, file = "mydata.rda")
    rm("a", "b")
    load("mydata.rda")
    a
##                x          y
## 1    0.254091370 0.84216677
## 2    1.685355723 0.72267534
## 3    0.631521709 0.56686937
## 4   -0.689931916 0.12302968
## 5   -0.905849990 0.25373114
## 6   -1.239939643 0.43529656
## 7   -1.453510309 0.89655194
## 8   -1.486551737 0.50830777
## 9    0.302163434 0.69846341
## 10  -0.057888388 0.72385997
## 11   0.586150948 0.80584521
## 12  -0.004544574 0.54556273
## 13   1.006846439 0.28188886
## 14   2.061480504 0.85075754
## 15   0.758175948 0.54823279
## 16  -0.614024422 0.67038907
## 17  -0.542151968 0.74277669
## 18   0.758257194 0.18343342
## 19   0.517718702 0.03569640
## 20  -0.853922330 0.20787358
## 21   1.588420211 0.18039228
## 22  -1.906204414 0.08777102
## 23   1.375055711 0.33068697
## 24   0.736541349 0.20272690
## 25   0.092855055 0.28634010
## 26   2.317684480 0.25846113
## 27  -1.244456085 0.69872277
## 28   0.759373013 0.24168329
## 29   1.202575005 0.49173601
## 30  -0.811015694 0.54479527
## 31  -1.471904042 0.50937673
## 32   0.813551585 0.68599401
## 33   1.160714166 0.44138936
## 34  -1.394768920 0.38973168
## 35  -0.417419572 0.28452569
## 36   0.017063471 0.52311345
## 37  -0.445858240 0.05264593
## 38   0.825467544 0.90916983
## 39  -0.122574353 0.04260943
## 40   1.065511788 0.21696129
## 41   0.496413460 0.24757980
## 42  -1.054612481 0.03201525
## 43  -0.777780891 0.29375021
## 44   0.117452824 0.95661225
## 45   1.340426816 0.96162987
## 46   2.060540387 0.16930863
## 47   0.518913149 0.71126686
## 48   0.636198729 0.91583976
## 49  -1.373450609 0.95807571
## 50  -0.419674635 0.48744647
## 51  -0.229509086 0.27195293
## 52   1.557246651 0.68286209
## 53   1.637436946 0.80601817
## 54   1.034943938 0.40256151
## 55   1.580530921 0.43586289
## 56   1.367273792 0.33130847
## 57   1.679581049 0.07526132
## 58   1.075735015 0.02504518
## 59   1.526304176 0.94177891
## 60  -0.222324468 0.10439616
## 61   1.208890811 0.14919900
## 62  -1.372540896 0.65216228
## 63  -2.318095318 0.20067534
## 64  -0.229669692 0.05948637
## 65  -1.761672309 0.24258475
## 66   1.436401133 0.87682350
## 67   0.170324528 0.56137201
## 68  -0.854233755 0.80918467
## 69  -0.808413242 0.42584163
## 70  -1.655440173 0.12804505
## 71   0.624910836 0.51197904
## 72   0.326590185 0.35214156
## 73   0.965835266 0.61629880
## 74  -0.010644915 0.64745422
## 75   1.032710788 0.88966153
## 76   0.821263335 0.13839951
## 77   1.276315771 0.83818156
## 78   1.195252971 0.88496060
## 79  -0.433852654 0.75666248
## 80  -1.310870763 0.95594798
## 81   0.447981574 0.72950209
## 82  -0.607596677 0.80751797
## 83   0.911798498 0.37177908
## 84  -0.473920791 0.67891397
## 85   0.513886051 0.55113423
## 86   0.080864830 0.74919069
## 87   0.012497953 0.12207326
## 88   0.413608303 0.52204736
## 89  -1.323652309 0.09107455
## 90   1.789839979 0.22026572
## 91  -0.209705943 0.90524248
## 92  -0.522625753 0.87982012
## 93  -0.433262232 0.73033484
## 94  -0.032084161 0.51334311
## 95   0.709014456 0.27710272
## 96  -0.579694218 0.93869966
## 97   1.075352091 0.45721181
## 98  -0.090156882 0.99455063
## 99  -1.057881089 0.66551360
## 100  0.202982256 0.25610662

save() and load()

    save.image(file = "mydata.RData")
    rm("a", "b")
    load("mydata.RData")
    a
##                x          y
## 1    0.254091370 0.84216677
## 2    1.685355723 0.72267534
## 3    0.631521709 0.56686937
## 4   -0.689931916 0.12302968
## 5   -0.905849990 0.25373114
## 6   -1.239939643 0.43529656
## 7   -1.453510309 0.89655194
## 8   -1.486551737 0.50830777
## 9    0.302163434 0.69846341
## 10  -0.057888388 0.72385997
## 11   0.586150948 0.80584521
## 12  -0.004544574 0.54556273
## 13   1.006846439 0.28188886
## 14   2.061480504 0.85075754
## 15   0.758175948 0.54823279
## 16  -0.614024422 0.67038907
## 17  -0.542151968 0.74277669
## 18   0.758257194 0.18343342
## 19   0.517718702 0.03569640
## 20  -0.853922330 0.20787358
## 21   1.588420211 0.18039228
## 22  -1.906204414 0.08777102
## 23   1.375055711 0.33068697
## 24   0.736541349 0.20272690
## 25   0.092855055 0.28634010
## 26   2.317684480 0.25846113
## 27  -1.244456085 0.69872277
## 28   0.759373013 0.24168329
## 29   1.202575005 0.49173601
## 30  -0.811015694 0.54479527
## 31  -1.471904042 0.50937673
## 32   0.813551585 0.68599401
## 33   1.160714166 0.44138936
## 34  -1.394768920 0.38973168
## 35  -0.417419572 0.28452569
## 36   0.017063471 0.52311345
## 37  -0.445858240 0.05264593
## 38   0.825467544 0.90916983
## 39  -0.122574353 0.04260943
## 40   1.065511788 0.21696129
## 41   0.496413460 0.24757980
## 42  -1.054612481 0.03201525
## 43  -0.777780891 0.29375021
## 44   0.117452824 0.95661225
## 45   1.340426816 0.96162987
## 46   2.060540387 0.16930863
## 47   0.518913149 0.71126686
## 48   0.636198729 0.91583976
## 49  -1.373450609 0.95807571
## 50  -0.419674635 0.48744647
## 51  -0.229509086 0.27195293
## 52   1.557246651 0.68286209
## 53   1.637436946 0.80601817
## 54   1.034943938 0.40256151
## 55   1.580530921 0.43586289
## 56   1.367273792 0.33130847
## 57   1.679581049 0.07526132
## 58   1.075735015 0.02504518
## 59   1.526304176 0.94177891
## 60  -0.222324468 0.10439616
## 61   1.208890811 0.14919900
## 62  -1.372540896 0.65216228
## 63  -2.318095318 0.20067534
## 64  -0.229669692 0.05948637
## 65  -1.761672309 0.24258475
## 66   1.436401133 0.87682350
## 67   0.170324528 0.56137201
## 68  -0.854233755 0.80918467
## 69  -0.808413242 0.42584163
## 70  -1.655440173 0.12804505
## 71   0.624910836 0.51197904
## 72   0.326590185 0.35214156
## 73   0.965835266 0.61629880
## 74  -0.010644915 0.64745422
## 75   1.032710788 0.88966153
## 76   0.821263335 0.13839951
## 77   1.276315771 0.83818156
## 78   1.195252971 0.88496060
## 79  -0.433852654 0.75666248
## 80  -1.310870763 0.95594798
## 81   0.447981574 0.72950209
## 82  -0.607596677 0.80751797
## 83   0.911798498 0.37177908
## 84  -0.473920791 0.67891397
## 85   0.513886051 0.55113423
## 86   0.080864830 0.74919069
## 87   0.012497953 0.12207326
## 88   0.413608303 0.52204736
## 89  -1.323652309 0.09107455
## 90   1.789839979 0.22026572
## 91  -0.209705943 0.90524248
## 92  -0.522625753 0.87982012
## 93  -0.433262232 0.73033484
## 94  -0.032084161 0.51334311
## 95   0.709014456 0.27710272
## 96  -0.579694218 0.93869966
## 97   1.075352091 0.45721181
## 98  -0.090156882 0.99455063
## 99  -1.057881089 0.66551360
## 100  0.202982256 0.25610662

serialize()

  • We use serialize to convert individual objects into binary format
    • raw vector in hexadecimal format
    • Perfectly represent an R object in an exportable format
    • Without losing precision or any metadata
    • Good for data exchange
    • If only storing objects, better use save()

serialize()

    x <- list("A", "B", "C")
    y <- list(1, 2, 3)
    serialize(x, NULL)
##  [1] 58 0a 00 00 00 02 00 03 02 02 00 02 03 00 00 00 00 13 00 00 00 03 00
## [24] 00 00 10 00 00 00 01 00 04 00 09 00 00 00 01 41 00 00 00 10 00 00 00
## [47] 01 00 04 00 09 00 00 00 01 42 00 00 00 10 00 00 00 01 00 04 00 09 00
## [70] 00 00 01 43
    serialize(y, NULL)
##  [1] 58 0a 00 00 00 02 00 03 02 02 00 02 03 00 00 00 00 13 00 00 00 03 00
## [24] 00 00 0e 00 00 00 01 3f f0 00 00 00 00 00 00 00 00 00 0e 00 00 00 01
## [47] 40 00 00 00 00 00 00 00 00 00 00 0e 00 00 00 01 40 08 00 00 00 00 00
## [70] 00

Connections

  • file
    • opens a file connection
  • gzfile
    • opens a file connection to a file compressed with gzip
  • bzfile
    • opens a file connection to a file compressed with bzip2
  • url
    • opens a connection to a webpage

file() Attributes

  • description
    • Name of the file
  • open
    • Code of the mode used to open the file
      • "r", read only mode
      • "w", writing (initializes a new file)
      • "a", append
      • "rb", "wb", "ab", for binary modes in Windows
  • We might not need to deal with the connection if we use a function such as read.table()
    • data <- read.csv("mydata.csv")

Reading from a Connection

    getwd()
## [1] "/home/jagonzalez/SpiderOak Hive/work/Courses/DataScience"
    con <- file("genes-leukemia.csv")
    open(con, "r")
    data <- read.csv(con)
    close(con)
    data
##    SNUM CLASS BM_PB TB_if_ALL FAB_if_AML Year Gender pct_Blasts
## 1    s1   ALL    BM    B-cell          ? 1996      M          ?
## 2    s2   ALL    BM    T-cell          ?    ?      M          ?
## 3    s3   ALL    BM    T-cell          ?    ?      M          ?
## 4    s4   ALL    BM    B-cell          ?    ?      ?          ?
## 5    s5   ALL    BM    B-cell          ?    ?      ?          ?
## 6    s6   ALL    BM    T-cell          ?    ?      M          ?
## 7    s7   ALL    BM    B-cell          ? 1983      F          ?
## 8    s8   ALL    BM    B-cell          ?    ?      F          ?
## 9    s9   ALL    BM    T-cell          ?    ?      M          ?
## 10  s10   ALL    BM    T-cell          ? 1987      M          ?
## 11  s11   ALL    BM    T-cell          ? 1985      M          ?
## 12  s12   ALL    BM    B-cell          ? 1985      F          ?
## 13  s13   ALL    BM    B-cell          ? 1988      F          ?
## 14  s14   ALL    BM    T-cell          ? 1987      M          ?
## 15  s15   ALL    BM    B-cell          ? 1989      F          ?
## 16  s16   ALL    BM    B-cell          ? 1990      M          ?
## 17  s17   ALL    BM    B-cell          ? 1990      M          ?
## 18  s18   ALL    BM    B-cell          ?    ?      F          ?
## 19  s19   ALL    BM    B-cell          ?    ?      ?          ?
## 20  s20   ALL    BM    B-cell          ?    ?      ?          ?
## 21  s21   ALL    BM    B-cell          ? 1984      M          ?
## 22  s22   ALL    BM    B-cell          ? 1988      M          ?
## 23  s23   ALL    BM    T-cell          ? 1991      M          ?
## 24  s24   ALL    BM    B-cell          ? 1981      M          ?
## 25  s25   ALL    BM    B-cell          ? 1982      M          ?
## 26  s26   ALL    BM    B-cell          ?    ?      F          ?
## 27  s27   ALL    BM    B-cell          ?    ?      F          ?
## 28  s28   AML    BM         ?         M2    ?      ?         79
## 29  s29   AML    BM         ?         M2    ?      ?         34
## 30  s30   AML    BM         ?         M5    ?      ?         93
## 31  s31   AML    BM         ?         M4    ?      ?         77
## 32  s32   AML    BM         ?         M1    ?      ?         86
## 33  s33   AML    BM         ?         M2    ?      ?         70
## 34  s34   AML    BM         ?         M2    ?      ?         77
## 35  s35   AML    BM         ?         M1    ?      ?         67
## 36  s36   AML    BM         ?         M5    ?      ?         76
## 37  s37   AML    BM         ?         M2    ?      ?         44
## 38  s38   AML    BM         ?         M1    ?      ?         80
## 39  s39   ALL    BM    B-cell          ?    ?      F          ?
## 40  s40   ALL    BM    B-cell          ? 1980      F          ?
## 41  s41   ALL    BM    B-cell          ?    ?      F          ?
## 42  s42   ALL    BM    B-cell          ?    ?      F          ?
## 43  s43   ALL    BM    B-cell          ?    ?      F          ?
## 44  s44   ALL    BM    B-cell          ? 1998      F          ?
## 45  s45   ALL    BM    B-cell          ? 1998      M          ?
## 46  s46   ALL    BM    B-cell          ? 1999      F          ?
## 47  s47   ALL    BM    B-cell          ? 1986      M          ?
## 48  s48   ALL    BM    B-cell          ? 1992      F          ?
## 49  s49   ALL    BM    B-cell          ?    ?      M          ?
## 50  s50   AML    BM         ?         M4    ?      ?         93
## 51  s51   AML    BM         ?         M2    ?      ?         57
## 52  s52   AML    PB         ?         M4    ?      ?         86
## 53  s53   AML    BM         ?         M2    ?      ?         76
## 54  s54   AML    BM         ?         M4    ?      F          ?
## 55  s55   ALL    BM    B-cell          ?    ?      F          ?
## 56  s56   ALL    BM    B-cell          ?    ?      F          ?
## 57  s57   AML    BM         ?         M2    ?      F          ?
## 58  s58   AML    BM         ?         M2    ?      ?          ?
## 59  s59   ALL    BM    B-cell          ?    ?      F          ?
## 60  s60   AML    BM         ?         M2    ?      M          ?
## 61  s61   AML    BM         ?         M1    ?      ?          ?
## 62  s62   AML    PB         ?          ?    ?      M          ?
## 63  s63   AML    PB         ?          ?    ?      F          ?
## 64  s64   AML    PB         ?          ?    ?      M          ?
## 65  s65   AML    BM         ?          ?    ?      M          ?
## 66  s66   AML    BM         ?          ?    ?      M          ?
## 67  s67   ALL    PB    T-cell          ? 1997      M          ?
## 68  s68   ALL    PB    B-cell          ? 1998      M          ?
## 69  s69   ALL    PB    B-cell          ? 1998      M          ?
## 70  s70   ALL    PB    B-cell          ? 1998      F          ?
## 71  s71   ALL    PB    B-cell          ? 1998      ?          ?
## 72  s72   ALL    PB    B-cell          ? 1998      ?          ?
##    Treatment_Response   PS  Source D49950 D63880 J03473 J05243 L13278
## 1                   ? 1.00    DFCI     75    556   2018    610    193
## 2                   ? 0.41    DFCI    129    476    650    927     31
## 3                   ? 0.87    DFCI     44    498    573   1697    198
## 4                   ? 0.91    DFCI    218   1211   2291    425     91
## 5                   ? 0.89    DFCI    110    820   2796    529    194
## 6                   ? 0.76    DFCI     33    485    405   1682     96
## 7                   ? 0.78    DFCI    115    734   1829    386    178
## 8                   ? 0.77    DFCI     32    223   2497    629    112
## 9                   ? 0.89    DFCI     20    686   1204   2760    198
## 10                  ? 0.56    DFCI     54    721    324   1722     20
## 11                  ? 0.74    DFCI     38    551    264   1587     36
## 12                  ? 0.20    DFCI     20    198    472    169     20
## 13                  ? 1.00    DFCI    103    550   1591    719    170
## 14                  ? 0.73    DFCI     72    694    999    859    188
## 15                  ? 0.98    DFCI    158   1287   2352    661    217
## 16                  ? 0.95    DFCI     28    475   2576   1034    240
## 17                  ? 0.49    DFCI    117    883   1875    447    254
## 18                  ? 0.59    DFCI     99   1281   2150    252     32
## 19                  ? 0.80    DFCI    101    256   1338   1041    200
## 20                  ? 0.90    DFCI     22    568   4443   1717    338
## 21                  ? 0.76    DFCI     48    454   1040    604    119
## 22                  ? 0.37    DFCI     74     43   1191    304     20
## 23                  ? 0.77    DFCI     43    613    579    397     85
## 24                  ? 0.92    DFCI    168    298   2741    642    191
## 25                  ? 0.43    DFCI     20    231    569    295     20
## 26                  ? 0.89    DFCI    117    306   2088   1122     20
## 27                  ? 0.82    DFCI     20    433   1702    209     79
## 28            Failure 0.44   CALGB    297    279    841    102     20
## 29            Failure 0.74   CALGB    190    277    500    237     20
## 30            Failure 0.80   CALGB    326    136    426    238     20
## 31            Failure 0.61   CALGB    197    321    871     57     60
## 32            Failure 0.47   CALGB    259    283    671    628     34
## 33            Failure 0.89   CALGB    100    151    109     20     20
## 34            Success 0.64   CALGB    355    226    803    206     20
## 35            Success 0.21   CALGB    295     20    430     20     20
## 36            Success 0.94   CALGB    283    410    603     20     22
## 37            Success 0.95   CALGB    311    185    404     87     20
## 38            Success 0.73   CALGB     70    118    355     71     20
## 39                  ? 0.78    DFCI    214    166   1001    786     67
## 40                  ? 0.68    DFCI    234     20    872    994    216
## 41                  ? 0.99    DFCI     20    867   2158    615    118
## 42                  ? 0.42    DFCI     35    376    844    233    100
## 43                  ? 0.66    DFCI     43    586    942    853     84
## 44                  ? 0.97    DFCI     97    560   2090    773     20
## 45                  ? 0.88    DFCI    142    462   1588    382     86
## 46                  ? 0.84    DFCI     80    368    561    443     69
## 47                  ? 0.81    DFCI    191    741    926    510    176
## 48                  ? 0.94    DFCI     51    579   1924    598    461
## 49                  ? 0.84    DFCI    128    391   2186    861    119
## 50            Failure 0.97   CALGB    346    164    675    135     38
## 51            Failure 1.00   CALGB    211    209    506     48     20
## 52            Success 0.61   CALGB    109    131    488    294     20
## 53            Success 0.89   CALGB    106     94    449    109     20
## 54                  ? 0.23 St-Jude    204    335   1061    352     76
## 55                  ? 0.73 St-Jude     74    449   1490    421     20
## 56                  ? 0.84 St-Jude     20    494   1593   1359     33
## 57                  ? 0.22 St-Jude     96    326    569     21     20
## 58                  ? 0.74 St-Jude    151    238    658    151     64
## 59                  ? 0.68 St-Jude    205    691    724    739    144
## 60                  ? 0.06 St-Jude    183    428    575    336     20
## 61                  ? 0.40 St-Jude    175     95    795    288     20
## 62                  ? 0.58     CCG    333    333    560    282    113
## 63                  ? 0.69     CCG    250    318    776     20     87
## 64                  ? 0.52     CCG    455    325    534    195    113
## 65                  ? 0.60     CCG    218    156    409    220     20
## 66                  ? 0.27     CCG     65    349    828    230     67
## 67                  ? 0.15    DFCI     53     72    540    519     20
## 68                  ? 0.80    DFCI    145    734   3090   1226    179
## 69                  ? 0.85    DFCI     84    545   2412    765     91
## 70                  ? 0.73    DFCI     20    440   1146    344    117
## 71                  ? 0.30    DFCI    105    113    704   1164     25
## 72                  ? 0.77    DFCI    146     78   1231    894     20
##    L47738 M21551_rna1 M55150 M62762 M81933 M91432 S50223 U12471_cds1
## 1     571         178    654    835     20    767    268         160
## 2    2893         336   1283   3072     20    814    346         134
## 3    2723         345   1286    609    124   1547    804         167
## 4     731         374    915    935    167    831    452         104
## 5     649         321    732   1665    114   1423    476         145
## 6    1858          44    691    499     56    430    256          81
## 7     280         359    853    764    203    764    208         122
## 8    1716         364   1238   1323     20    752    310         335
## 9    2388         333    822   1741    124   1777    640         131
## 10    913         271    885   1188     89    426    152         264
## 11   2147         267    680   1870     46    472    441         195
## 12    140          20    412   1030     44     76     90         104
## 13    556         268    548   1482    192    920    545         150
## 14   1483          20   1157    892     77    606    570         135
## 15   1505          64    387   1306     96   1417    571         128
## 16    904         160    707    593    120    869    235         182
## 17    643         363   1822   2376     98   2031    352         227
## 18    669         333    738    542     89    954     20         157
## 19   1137         279    907    809     82    532     84         290
## 20   3810         117    860   2475    120   3192    867          70
## 21   1631         152     92   1514    161    896    323          39
## 22    857         265    686   1977    111     70    241         126
## 23    813          20    803    672     22    321    493          74
## 24    899         392    803   1235     20    992    502         205
## 25    223         305    842    933    113    363    301         144
## 26    759         330    514   1114     52    636    101          98
## 27    899          64    561     20     20    419     20         143
## 28    449         456   1929   2702    313    323    107         238
## 29     65         259   1647   1736    243    125    150         217
## 30    172         487   2112   3553    229    158     20         213
## 31    239         554   1555   3255    132    295     31         283
## 32    683         462   1514   4249    230    368    210         264
## 33    153         672   2693   1871    282    110     20         304
## 34     20         405   1811   3236    179    263     56         293
## 35    218         327   1406   1125    214    226    168         322
## 36     20         485   1707   4647    179     87     20         337
## 37     20         415   2072   3808    330    183     20         417
## 38    201         582   1753   3594    407    111     62         271
## 39    837          20    627   1738     94    335     43          20
## 40    180         363    503    708     27    382    373         411
## 41    770          20    181   1188    173    707    167         154
## 42    273         199   1565   2274    371    553    127         649
## 43    439          69   1165    951    355    562    274          20
## 44    579          38    441   1393    314    575    216         183
## 45    444         165    309   1062    445    747    236         278
## 46    570          20    385    586    222    392    177         160
## 47    575         235    767   1828    224    638    239          75
## 48    603         302    927   1946    143   1340    776         194
## 49    763         404    894    417     20    799     20          86
## 50    578         342   1895   3427    274    225     48         175
## 51    446         339   1339   2120     20    126     20         434
## 52    455         354    801   3266     62    163     20         198
## 53    118         308   1534   3457    206    135     45         334
## 54    269         319   1811   2860    216    413     26         238
## 55    287         194    781   1170    224    339     20         107
## 56   1009          46    669   1246    296    496     56         239
## 57     21         253   1147   1662     20    196     20         192
## 58     37         646   1656   1489     20    129     20          71
## 59    310          20    261   1769    195    155    106          66
## 60    471          20    880   1933    275    412     20         176
## 61    218         361   1263   2600     71    422     20         132
## 62    393         345   1295   4346     20    384     20         155
## 63    233          93   1476   4514     20    462     20         256
## 64    309         534   3220   2803     20    273     20         143
## 65     20         308    610   2631     20    179     20         280
## 66    408         254    749    960    165    362     20         163
## 67    908         222    654   2601     70    140     31         134
## 68   1235         140    859   1618     20   1385    439         216
## 69   1284         351    623   1760    182   1012    607         120
## 70    853          30    360   1025    301    334    106         185
## 71   1213         263    310   1494    170    146     20         134
## 72    586         339    665   2122    174    235    163         115
##    U32944 U35451 U50136_rna1 U53468 U72342 U82759 X15949 X52142
## 1    3349    408        1124    141    978    393    277    107
## 2    1002    633        1062     84    324    118    104     20
## 3    2089    912        1398     98    855    667     91     20
## 4    1625    304         942     20    369    410    403    190
## 5    3502    398         928    238    653    119    416    361
## 6     316    506         842     32    449    147     46     26
## 7    1530    357        1066     20    638    735    267    191
## 8    1042    134        1668     71    361    248    314    179
## 9    2286    982         966     90    576    258    210    240
## 10    934    613        1175     62     99    409     97     96
## 11    282    745         626     54    686    306     27     69
## 12    277     20         747     63     20     20     20     20
## 13   2069    810         706    112   1354     87    588    269
## 14   1613    404         864     20    901    188    307    159
## 15   2952    547         515     62    905     99    747    499
## 16   3110     72         825    108    909     67    393    260
## 17   2591    665        1837     20   1024     20    822    214
## 18    820    726         883     20    171    102    515     20
## 19   1116     95        1309    191    493    432    339    184
## 20   5908    767         817    167   1224    359    748    326
## 21   1143    600         383    162    353    149    201    122
## 22    108    247         825     82    128    160     48     20
## 23    146    442         731     20    136    366     20    142
## 24   2457    621         899    196    740    770    598    277
## 25   1081    116        1171     58     20    267    163     20
## 26   1480     96         805     55    295     59    169    105
## 27   1283    291        1286    143    220     20    104    109
## 28    384    209        1641     20     32    822     64     20
## 29    179     29        2545     31    173   1137     35     20
## 30    651    160        3818     21    118   1050     20     20
## 31    469    223        2943     52     20   1157     23     20
## 32    328    293        2535     24    249   1083     59     20
## 33    133     73        1373     29    380   1099     33     20
## 34    427    208        2402     20     20    652     35     20
## 35    288     57        2137     20    117    262     79     20
## 36    295    214        3965     20    259    618     71     20
## 37    224    149        2539     20     20    805     20     20
## 38    230    195        2286     36     20    673     64     20
## 39    306    181        1122     20     20     67    171     20
## 40    759    269        1016     29    749     68    275     81
## 41   1657    417         428    122    479    198    198    331
## 42    727    136          64    110    217    472    113     20
## 43   1802    108         646     20    341    242    233    122
## 44   2345    552         188     20    320     54     68    378
## 45   1407     27         470    132     78     46    205    218
## 46    798    284         591     51    372    103     49     83
## 47   2582     42        1141    155    590    176    264    113
## 48   3482     48         769    106   1892    353    778    213
## 49   1146    189        1390     20     66     20    141    297
## 50    185     82        2442     20     20   1375    202     41
## 51    249    160        2335     20     34    677     20    201
## 52    882     70         541     20    222    218     27     20
## 53    177     76        3568     79    439    300     20     20
## 54   2110    333        1083     20    494    280    194    430
## 55   1145    289         861     20    282     20     82    327
## 56   1148    136         973     20    224     20    206    203
## 57    807    186         721     20     86    110    184     20
## 58    534     71        2392     20     20    340    182     20
## 59    734     94         571     20    167     20    216     55
## 60   1235    290         723     20    307    139    228    109
## 61    773     95        1023     58     20    407    112    101
## 62    919    236        1030     20    597    653    298     20
## 63   1425    312        2502     64    304    359    313     20
## 64    393    384        3190     21   1076   1116    260     20
## 65    501     20        2349    117     20    386    152     26
## 66    993    115         775     20    106    217     67     20
## 67     59    243         993    175    331     96     58     20
## 68   3869    523        1049    335   1083    386    469    352
## 69   1569    146        1496    271    659    349    379    161
## 70   1072    252         682     27     85     51     75     86
## 71    227    239         931     78    168     20     75     31
## 72    325    358        1029     97    510    157     20     62
##    X56411_rna1 X63469 X74262 X76061 X76648 X95735 Y08612 Y12670 D38073
## 1          178    460   1372    210   1361    298    517    600    994
## 2          183    151   1184    216    237    307    351    337    539
## 3          166    230   2221    250   1240    309    214    574   1441
## 4          131    314   1051    139    992    693    432    716    680
## 5          141    632   1370    270   1233    713    596    524    950
## 6          151    181   1306     67    401    247    220    270   1258
## 7          245    213   1339    193    821     52    196    463    509
## 8           77    227    655    167   1012     20    436    689    331
## 9          205    362   3593    152    929     20    513    262   1974
## 10         119     47   1058     20    330    492    114    167    664
## 11         107    225    953     71    512     20    492    174    884
## 12          20     43    209     38     90    408    119    258    314
## 13         153    495   1624    418    827    724    393    201   1991
## 14         206    442   2050    449    679    360    416    180   1062
## 15          56    522   1995    335   1426    938    651    625   1030
## 16         225    467   1625    253   1578    848    278    330    675
## 17         234    485   3139    413    660    926    623    784   1731
## 18         255    472    882    121    541    390    198    314    762
## 19          82    156   1459    211   1225    615    251    146    800
## 20         308    629   4253    166   3081    827    847    636   1158
## 21         183    256    768    266   1620     68    689    371   1185
## 22          39    104    176     20    532    122    290    180    141
## 23         113    172    862     20    303     20    199    780    767
## 24         232    494   1409    364   1463    222    569    655   1145
## 25          95    162    489     35    416    700    292    402    521
## 26         297    348    716     97   1134    433    424    190    700
## 27         291    217    500     92    477     20    517    273    456
## 28          20     28    596     75    214   2224    292    733    832
## 29          20    178    309     20    514   3348    151   1296    294
## 30          84     20    313     29    147   6218    173    647    148
## 31          20    115    275     20    159   1548    119   1085    309
## 32         127    136    311    165    165   3297    185   1467    259
## 33          47     63     20     20    307   3482    172   1513    256
## 34         121    181    279     34    110   2947    200   1807    238
## 35          33     74    336     40    209   1050     86   1038    382
## 36          20    202    194     58    358   4863    209   1051    245
## 37          81     20    130     20    254   2612    253   1064    267
## 38          94     20     51     20    550   1671    199   2538    342
## 39         427    321    176     77    637    178    333    465    335
## 40          20    225    490    199    860    105    338    639    472
## 41         100    184   1258    133    668    252    428    203    929
## 42         116    157    678     98    539     92    471    450    543
## 43          74    115    699    225    629     20    252    389    737
## 44          77    287   1236     84    800     20    346    244    898
## 45         131    228    555     60    679     20    264    466    889
## 46          80    194    384     64    457     28    283    180    674
## 47         113    380   1903     56    546    399    157    342    853
## 48         188    449   2123    437   1627    805    648    535   1587
## 49         243    172    424    122    464     20    498    396    733
## 50          20     20    148     20    176   5814    166    729    212
## 51          30     20    148     30    168   2335    112    683    108
## 52          20     65    401     35    495   2007    109    442    232
## 53         161    112    162    125    108   3870    284   4460    288
## 54          60    258    784     20    310   4403    157    458    740
## 55          54    269    723     24    431    878    177    244    506
## 56         139    221    834     54    502    162    337    244    746
## 57         147     64    330     31    222   2287    165    318    654
## 58         284     86    105     20    287   2871    173    333    327
## 59         236     30    818    127    754    373    132    101    963
## 60         296    104    919    168    769   2122     57    391    325
## 61         179     58    301     82     23   3186    307    485    225
## 62         206    217    748    139   1047   7133    187    676    379
## 63         117    544    810    139   1018   5949    233    656    646
## 64         257    130    544    485    249   4174    216   1152    609
## 65          20    105    215     20     43   2194    185    323    423
## 66          20    217   1082    108    228    543    115    410    825
## 67         156    208    581     20     79   2451    223    501    156
## 68         107    400   1700    312   1690   1931    517    648    688
## 69         152    599    808    457   1211    395    596    391    706
## 70          69    213    490     37    533     43    349     86    519
## 71         232    232    149    396    471    509    353    396    144
## 72         234    262    583    113    986    417    502    754    347
##    AF012024_s D26156_s M84371_rna1_s M31211_s U09087_s U26266_s L49229_f
## 1         257     1595          2911      601      358      289      337
## 2          46      822           575      435       82      288      131
## 3         139     1452           905      547      263      447      529
## 4         168      654          2038      472      218      424      422
## 5          94     1011          1871      661      186      364      354
## 6         197     1584           634      337      209      320      327
## 7         188      578          2364      309      144      381      354
## 8          61     1024          1409      263      167       53      232
## 9         269     1297           644      978      385      869      568
## 10         99      971           358      752      146      403       20
## 11        132     1913           500      897       89      526       20
## 12         54      329          1119      334       37       60       20
## 13         89     1593          1473      668      283      673      207
## 14        199     1011           408      265      179      308      290
## 15        147     1763          2239      898      314      929      513
## 16        182     2065          1981      666      106      698      190
## 17        358     1778           409      374      335     1060      214
## 18         20      764          1046      516      119      358       20
## 19         65     1448          3210      473       91      310       45
## 20        128     2191          4072     1487      268     1047      312
## 21        123      768          1069      688       20       20       37
## 22         20     1323          1878      143       30       20      184
## 23         98     1992           685      376      334      395       93
## 24         93     1210          1967      531      204      651      424
## 25        153      738          2537      481       81      166       77
## 26        121     2143          3495      647      109      190       20
## 27         90     1245          1768      145      139      201      170
## 28         43      893           443      300       91      151       20
## 29         50      624           225      237       58       66       20
## 30         20      388           251      100       20       20       60
## 31         45      432           595      192       61       20       56
## 32         68      706           863      339       50      192       88
## 33         20      736           678       59       20       20       20
## 34         28      302           469      153       34      282       20
## 35         50      201           548       20       48       43       38
## 36         29      667           763       88       74       64       20
## 37         98      697           466      139       52       20       93
## 38         41      721           838       75       33       20       20
## 39        111     4048          1477      475       94      306      181
## 40         20      879          2758      251       20      136      178
## 41        173     1147          1681      616      114      457       64
## 42        109     1140           721      362      135      138      123
## 43        134      605          2047      329       26       20       48
## 44         91     1749          2305      710      200      750       69
## 45         65      795          1995      467       32      348       72
## 46         99      521          1787      475       77      488      158
## 47         69      466           779      493      100      632      109
## 48        204     1355          2338      951      192      703       20
## 49        125     1816          2363      356      213       20       20
## 50         20      802           676       70       45       20       48
## 51         90      667           502       64       50       20       52
## 52         36       20           291      248       20      376       25
## 53         31     1024           530      158       20      119       39
## 54        169      640           517      575      185       56      213
## 55        102      911          3739      439       20      644       54
## 56        136     1521          2721      190      131      432       33
## 57         66      534           463       21       20      131       20
## 58        188      534           126      173       20      114       20
## 59         20      950          2213      592       33      338       40
## 60         20      371           655      137       82      201       93
## 61         54     1092           429      201       58       99       60
## 62        141      349           535       94       71      326       20
## 63        140      832           996       74       76       63       20
## 64         70      818          1157       51      119      162       20
## 65         70      440           383      198       20      367       20
## 66         65      625           512      226       49      378       20
## 67         73     1295           658      200       20      234       69
## 68        202     1131           643      580      255      755      443
## 69        227     1187          2362      573      208      476      258
## 70         77      693          1955      348       43      100       89
## 71         81     1313          1399      218      110      388       81
## 72         38     2352          1667      493       53       82       22
##    M31523 M28170 U29175
## 1    1320    397   1582
## 2     898     20    624
## 3     597    183    753
## 4    1644    363    743
## 5    1322    251    626
## 6     787     74   1157
## 7     946    280    552
## 8    1917    235    572
## 9    1440     24   1776
## 10    442     20    756
## 11    617     20   1972
## 12    474     84    219
## 13   1969     87   1322
## 14    686    155    732
## 15   2572    371   1636
## 16   1439    360   2076
## 17    989     20    917
## 18   1186    195    453
## 19   1224    285   1660
## 20   4555    470   2174
## 21   2707    514   3707
## 22    274     95    295
## 23    590     53   1377
## 24   1823    418   1456
## 25    553    391    404
## 26   1909    495   1982
## 27   1076    219    949
## 28    353     20    429
## 29    279     35    314
## 30    250     20    208
## 31    381     20    385
## 32    671     20    688
## 33    200     20    248
## 34    299     20    454
## 35    389     67    391
## 36    126     20    412
## 37    190     20    393
## 38    275     20    286
## 39   1484     67   2112
## 40    606    437   1323
## 41   1479     84   1033
## 42   1751     20    826
## 43   2148    420    937
## 44   1955    180   1691
## 45   1275     32    479
## 46    980    313    438
## 47   1052     20    613
## 48   2428    275   1475
## 49   1467    457   1353
## 50    197    110    190
## 51    110     44    382
## 52    139     20    250
## 53    388     20    367
## 54    282     20    578
## 55    821    184    751
## 56   1297    620   1127
## 57    196     20    398
## 58    268     20    151
## 59   1087    231    846
## 60    461     20    486
## 61    446     20    485
## 62    328    132    625
## 63    253     20    745
## 64    450    113    878
## 65    375     20    588
## 66    528     20    669
## 67    929     20   1661
## 68   3923     67   1088
## 69   1794    155   1187
## 70   1277    266   1002
## 71    828    142   1504
## 72   1361    119   1926

Reading Lines of a Text File

    con <- file("genes-leukemia.csv")
    x <- readLines(con, 2)
    x
## [1] "SNUM,CLASS,BM_PB,TB_if_ALL,FAB_if_AML,Year,Gender,pct_Blasts,Treatment_Response,PS,Source,D49950,D63880,J03473,J05243,L13278,L47738,M21551_rna1,M55150,M62762,M81933,M91432,S50223,U12471_cds1,U32944,U35451,U50136_rna1,U53468,U72342,U82759,X15949,X52142,X56411_rna1,X63469,X74262,X76061,X76648,X95735,Y08612,Y12670,D38073,AF012024_s,D26156_s,M84371_rna1_s,M31211_s,U09087_s,U26266_s,L49229_f,M31523,M28170,U29175"
## [2] "s1,ALL,BM,B-cell,?,1996,M,?,?,1,DFCI,75,556,2018,610,193,571,178,654,835,20,767,268,160,3349,408,1124,141,978,393,277,107,178,460,1372,210,1361,298,517,600,994,257,1595,2911,601,358,289,337,1320,397,1582"
    close(con)

Reading from a URL Connection

    con <- url("http://ccc.inaoep.mx/~jagonzalez/", "r")
    x <- readLines(con)
    head(x)
## [1] "<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">"
## [2] "<!--"                                                                                                                         
## [3] "Design by Free CSS Templates"                                                                                                 
## [4] "http://www.freecsstemplates.org"                                                                                              
## [5] "Released for free under a Creative Commons Attribution 2.5 License"                                                           
## [6] ""
    close(con)

Subsetting R Objects

  • There are 3 operators to extract subsets of R objects
    • [, returns an object of the same class as the original
      • To select multiple elements of an object
    • [[, returns elements of a list or data frame
      • To extract a single element and the class of the returned object will not necessarily be a list or data frame
    • $, returns elements of a list or data frame by name
      • Semantics similar to [[

Subsetting a Vector

    x <- c("A", "B", "C", "C", "B", "A")
    x[1]
## [1] "A"
    x[3]
## [1] "C"
    x[1:3]
## [1] "A" "B" "C"

Subsetting a Vector

    x <- c("A", "B", "C", "C", "B", "A")
    x[c(3, 2, 1)]
## [1] "C" "B" "A"
    x[x > "A"]
## [1] "B" "C" "C" "B"

Subsetting a Vector

    x <- c("A", "B", "C", "C", "B", "A")
    u <- x > "A"
    u
## [1] FALSE  TRUE  TRUE  TRUE  TRUE FALSE
    x[u]
## [1] "B" "C" "C" "B"

Subsetting a Matrix

    x <- matrix(1:9, 3, 3)
    x
##      [,1] [,2] [,3]
## [1,]    1    4    7
## [2,]    2    5    8
## [3,]    3    6    9
    x[2, 1]
## [1] 2
    x[1, 2]
## [1] 4
    x[1, ]
## [1] 1 4 7
    x[, 2]
## [1] 4 5 6

Subsetting a Matrix

    x <- matrix(1:9, 3, 3)
    x
##      [,1] [,2] [,3]
## [1,]    1    4    7
## [2,]    2    5    8
## [3,]    3    6    9
    x[1, ]
## [1] 1 4 7
    x[, 2]
## [1] 4 5 6

Dropping Matrix Dimensions

    x <- matrix(1:9, 3, 3)
    x
##      [,1] [,2] [,3]
## [1,]    1    4    7
## [2,]    2    5    8
## [3,]    3    6    9
    x[1, 2]
## [1] 4
    x[1, 2, drop = FALSE]
##      [,1]
## [1,]    4

Dropping Matrix Dimensions

    x <- matrix(1:9, 3, 3)
    x
##      [,1] [,2] [,3]
## [1,]    1    4    7
## [2,]    2    5    8
## [3,]    3    6    9
    x[1, ]
## [1] 1 4 7
    x[1, , drop = FALSE]
##      [,1] [,2] [,3]
## [1,]    1    4    7

Subsetting Lists

    x <- list(temp = 70:75, hum = 0.55)
    x
## $temp
## [1] 70 71 72 73 74 75
## 
## $hum
## [1] 0.55
    x[[1]]
## [1] 70 71 72 73 74 75

Subsetting Lists

    x <- list(temp = 70:75, hum = 0.55)
    x[["temp"]]
## [1] 70 71 72 73 74 75
    x$temp
## [1] 70 71 72 73 74 75

Subsetting Nested Elements of a List

    x <- list(a = list(1, 2, 3), b = c(11, 12, 13))
    ## Get the 3rd element of the 1st element
    x[[c(1, 3)]]
## [1] 3
    ## The same as above
    x[[1]][[3]]
## [1] 3
    ## Get the 1st element of the 2nd element
    x[[c(2, 1)]]
## [1] 11

Extracting Multiple Elements of a List

    x <- list(temp = 70:75, hum = 0.55, wind = "T")
    x[c(1, 3)]
## $temp
## [1] 70 71 72 73 74 75
## 
## $wind
## [1] "T"

Partial Matching

    x <- list(abcdefghijklmno = 1:10)
    x$a
##  [1]  1  2  3  4  5  6  7  8  9 10
    x[["a"]]
## NULL
    x[["a", exact = FALSE]]
##  [1]  1  2  3  4  5  6  7  8  9 10
  • Partial matching possible with [ and $

Removing NA Values

  • NA refers to missing values
    x <- c(1, 2, NA, 4, NA, 6)
    bad <- is.na(x)
    print(bad)
## [1] FALSE FALSE  TRUE FALSE  TRUE FALSE
    x[!bad]
## [1] 1 2 4 6

Removing NA Values

    x <- c(1, 2, NA, 4, NA, 6)
    y <- c("a", "b", "c", NA, "e", NA)
    good <- complete.cases(x, y)
    good
## [1]  TRUE  TRUE FALSE FALSE FALSE FALSE
    x[good]
## [1] 1 2
    y[good]
## [1] "a" "b"

Assignment

  • Install swirl
    • install.packages("swirl")
  • Load swirl
    • library(swirl)
  • Execute swirl
    • swirl()
  • Choose a tutorial and follow it!
    • Start with the "R programming" tutorial
    • Keep working with advanced tutorials…