the first data struct we learn is vectors

create vectors of data for three med patient


subject_name <- c("Youssef Hemimy", "Jane Doe", "Steve Graves")
temperature <- c(98.1, 98.6, 101.4)
flu_status <- c(FALSE, FALSE, TRUE)

now we can access any element in the body temp vector

note that indexing in r starts at 0

temperature[1]
[1] 98.1
temperature[2]
[1] 98.6
temperature[3]
[1] 101.4

other ways to access (range, exclude)

include items in the range 2 to 3

temperature[2:3]
[1]  98.6 101.4

exclude item 2 using the minus sign

temperature[-2]
[1]  98.1 101.4

Factors

it is a data type used to represent categorical variables.

Think: labels, not numbers.

add gender factor

gender <- factor(c("Male","Female", "Male"))
gender
[1] Male   Female Male  
Levels: Female Male

Levels are the distinct categories a factor can take.

They are the allowed labels that factor can take.

add blood type factor

blood <- factor(c("O", "AB", "A"),
                levels = c("A", "B", "AB", "O"))
blood
[1] O  AB A 
Levels: A B AB O

to add an ordered factor, we add arg ordered=true

symptoms <- factor(c("SEVERE", "MILD", "MODERATE"),
                   levels = c("MILD", "MODERATE", "SEVERE"),
                   ordered = TRUE)
symptoms
[1] SEVERE   MILD     MODERATE
Levels: MILD < MODERATE < SEVERE

I am confused on how the ordering works

adding a numbered factor to understand exactly how it works

sy <- factor(c("1", "2", "3"),
                   levels = c("1", "2", "3"),
                   ordered = TRUE)
sy
[1] 1 2 3
Levels: 1 < 2 < 3
sy2 <- factor(c("1a", "2b", "2a", "3a"),
                   levels = c("1a","2a", "2b", "3a"),
                   ordered = TRUE)
sy2
[1] 1a 2b 2a 3a
Levels: 1a < 2a < 2b < 3a

this output shows that levels defines the order, ascending

Lists

display info for a patient

subject_name[1]
[1] "Youssef Hemimy"
temperature[1]
[1] 98.1
flu_status[1]
[1] FALSE
gender[1]
[1] Male
Levels: Female Male
blood[1]
[1] O
Levels: A B AB O
symptoms[1]
[1] SEVERE
Levels: MILD < MODERATE < SEVERE

a vector and a list are both containers, but they obey very different rules.

A vector is homogeneous, all elements must be the same type.

Types: numeric, character, logical, factor

lists can take any type

there are a few methods for accessing a list

some return sublists other returns numeric vector

# create list for a patient
subject1 <- list(fullname = subject_name[1], 
                 temperature = temperature[1],
                 flu_status = flu_status[1],
                 gender = gender[1],
                 blood = blood[1],
                 symptoms = symptoms[1])
subject1
$fullname
[1] "Youssef Hemimy"

$temperature
[1] 98.1

$flu_status
[1] FALSE

$gender
[1] Male
Levels: Female Male

$blood
[1] O
Levels: A B AB O

$symptoms
[1] SEVERE
Levels: MILD < MODERATE < SEVERE

get a single list value by position

returns a sub-list

subject1[2]

get a single list value by position

returns a numeric vector

subject1[[2]]
[1] 98.1

get a single list value by name

subject1$temperature
[1] 98.1

get several lists items by specifiying a vector of names

subject1[c("temperature", "flu_status")]
$temperature
[1] 98.1

$flu_status
[1] FALSE

we can also access a list like a vector

get values 1 to 3 (range)

subject1[1:3]
$fullname
[1] "Youssef Hemimy"

$temperature
[1] 98.1

$flu_status
[1] FALSE

Data frames

create a data frame from medical patient data

note that in R version 4+ we don’t need the arg

stringsAsFactors = False

it is already false by default

in prev versions of R, this caused silent bugs

because it would convert it internally to integers

pt_data <- data.frame(subject_name, temperature, flu_status, gender,
                      blood, symptoms, stringsAsFactors = FALSE)

display the data frame

pt_data

accessing a data frame

get a single column

pt_data$subject_name
[1] "Youssef Hemimy" "Jane Doe"       "Steve Graves"  

get several columns by specifying a vector of names

pt_data[c("temperature", "flu_status")]

this is the same as above, extracting temperature and #flu_status

pt_data[2:3]
NA

accessing by row and column

pt_data[1, 2]
[1] 98.1

accessing several rows and several columns using vectors

pt_data[c(1, 3), c(2, 4)]
NA

Leave a row or column blank to extract all rows or columns

column 1, all rows

pt_data[, 1]
[1] "Youssef Hemimy" "Jane Doe"       "Steve Graves"  

row 1, all columns

pt_data[1, ]
NA

all rows and all columns

pt_data[ , ]
NA

the following are equivalent

pt_data[c(1, 3), c("temperature", "gender")]
NA
pt_data[-2, c(-1, -3, -5, -6)]
NA

creating a Celsius temperature column

pt_data$temp_c <- (pt_data$temperature - 32) * (5 / 9)

comparing before and after

pt_data[c("temperature", "temp_c")]
NA

Matrixes

create a 2x2 matrix

m <- matrix(c(1, 2, 3, 4), nrow = 2)
m
     [,1] [,2]
[1,]    1    3
[2,]    2    4

equivalent to the above


m <- matrix(c(1, 2, 3, 4), ncol = 2)
m
     [,1] [,2]
[1,]    1    3
[2,]    2    4

create a 2x3 matrix

m <- matrix(c(1, 2, 3, 4, 5, 6), nrow = 2)
m
     [,1] [,2] [,3]
[1,]    1    3    5
[2,]    2    4    6

create a 3x2 matrix

m <- matrix(c(1, 2, 3, 4, 5, 6), ncol = 2)
m
     [,1] [,2]
[1,]    1    4
[2,]    2    5
[3,]    3    6

extract values from matrixes

m[1,1]
[1] 1
m[3, 2]
[1] 6
# extract rows
m[1, ]
[1] 1 4
# extract columns
m[, 1]
[1] 1 2 3
LS0tCnRpdGxlOiAiWW91c3NlZi1Nb2R1bGUtMiIKb3V0cHV0OiBodG1sX25vdGVib29rCi0tLQoKIyB0aGUgZmlyc3QgZGF0YSBzdHJ1Y3Qgd2UgbGVhcm4gaXMgdmVjdG9ycyAKCiMgY3JlYXRlIHZlY3RvcnMgb2YgZGF0YSBmb3IgdGhyZWUgbWVkIHBhdGllbnQKCmBgYHtyfQoKc3ViamVjdF9uYW1lIDwtIGMoIllvdXNzZWYgSGVtaW15IiwgIkphbmUgRG9lIiwgIlN0ZXZlIEdyYXZlcyIpCnRlbXBlcmF0dXJlIDwtIGMoOTguMSwgOTguNiwgMTAxLjQpCmZsdV9zdGF0dXMgPC0gYyhGQUxTRSwgRkFMU0UsIFRSVUUpCmBgYAoKIyBub3cgd2UgY2FuIGFjY2VzcyBhbnkgZWxlbWVudCBpbiB0aGUgYm9keSB0ZW1wIHZlY3RvciAKIyBub3RlIHRoYXQgaW5kZXhpbmcgaW4gciBzdGFydHMgYXQgMCAKYGBge3J9CnRlbXBlcmF0dXJlWzFdCnRlbXBlcmF0dXJlWzJdCnRlbXBlcmF0dXJlWzNdCgpgYGAKCgojIG90aGVyIHdheXMgdG8gYWNjZXNzICAocmFuZ2UsIGV4Y2x1ZGUpCiMgaW5jbHVkZSBpdGVtcyBpbiB0aGUgcmFuZ2UgMiB0byAzCmBgYHtyfQp0ZW1wZXJhdHVyZVsyOjNdCgpgYGAKCiMgZXhjbHVkZSBpdGVtIDIgdXNpbmcgdGhlIG1pbnVzIHNpZ24KYGBge3J9CnRlbXBlcmF0dXJlWy0yXQoKYGBgCgojIEZhY3RvcnMgCiMgaXQgaXMgYSBkYXRhIHR5cGUgdXNlZCB0byByZXByZXNlbnQgY2F0ZWdvcmljYWwgdmFyaWFibGVzLgojIFRoaW5rOiBsYWJlbHMsIG5vdCBudW1iZXJzLgoKIyBhZGQgZ2VuZGVyIGZhY3RvciAKCmBgYHtyfQpnZW5kZXIgPC0gZmFjdG9yKGMoIk1hbGUiLCAiRmVtYWxlIiwgIk1hbGUiKSkKZ2VuZGVyCmBgYAoKIyBMZXZlbHMgYXJlIHRoZSBkaXN0aW5jdCBjYXRlZ29yaWVzIGEgZmFjdG9yIGNhbiB0YWtlLgojIFRoZXkgYXJlIHRoZSBhbGxvd2VkIGxhYmVscyB0aGF0IGZhY3RvciBjYW4gdGFrZS4gCgojIGFkZCBibG9vZCB0eXBlIGZhY3RvcgoKYGBge3J9CmJsb29kIDwtIGZhY3RvcihjKCJPIiwgIkFCIiwgIkEiKSwKICAgICAgICAgICAgICAgIGxldmVscyA9IGMoIkEiLCAiQiIsICJBQiIsICJPIikpCmJsb29kCmBgYAoKIyB0byBhZGQgYW4gb3JkZXJlZCBmYWN0b3IsIHdlIGFkZCBhcmcgb3JkZXJlZD10cnVlIApgYGB7cn0Kc3ltcHRvbXMgPC0gZmFjdG9yKGMoIlNFVkVSRSIsICJNSUxEIiwgIk1PREVSQVRFIiksCiAgICAgICAgICAgICAgICAgICBsZXZlbHMgPSBjKCJNSUxEIiwgIk1PREVSQVRFIiwgIlNFVkVSRSIpLAogICAgICAgICAgICAgICAgICAgb3JkZXJlZCA9IFRSVUUpCnN5bXB0b21zCmBgYAoKIyBJIGFtIGNvbmZ1c2VkIG9uIGhvdyB0aGUgb3JkZXJpbmcgd29ya3MgCiMgYWRkaW5nIGEgbnVtYmVyZWQgZmFjdG9yIHRvIHVuZGVyc3RhbmQgZXhhY3RseSBob3cgaXQgd29ya3MgCgpgYGB7cn0Kc3kgPC0gZmFjdG9yKGMoIjEiLCAiMiIsICIzIiksCiAgICAgICAgICAgICAgICAgICBsZXZlbHMgPSBjKCIxIiwgIjIiLCAiMyIpLAogICAgICAgICAgICAgICAgICAgb3JkZXJlZCA9IFRSVUUpCnN5CgpgYGAKYGBge3J9CnN5MiA8LSBmYWN0b3IoYygiMWEiLCAiMmIiLCAiMmEiLCAiM2EiKSwKICAgICAgICAgICAgICAgICAgIGxldmVscyA9IGMoIjFhIiwiMmEiLCAiMmIiLCAiM2EiKSwKICAgICAgICAgICAgICAgICAgIG9yZGVyZWQgPSBUUlVFKQpzeTIKYGBgCiMgdGhpcyBvdXRwdXQgc2hvd3MgdGhhdCBsZXZlbHMgZGVmaW5lcyB0aGUgb3JkZXIsIGFzY2VuZGluZyAKCgojIExpc3RzIAoKIyBkaXNwbGF5IGluZm8gZm9yIGEgcGF0aWVudCAKYGBge3J9CnN1YmplY3RfbmFtZVsxXQp0ZW1wZXJhdHVyZVsxXQpmbHVfc3RhdHVzWzFdCmdlbmRlclsxXQpibG9vZFsxXQpzeW1wdG9tc1sxXQoKYGBgCgojIGEgdmVjdG9yIGFuZCBhIGxpc3QgYXJlIGJvdGggY29udGFpbmVycywgYnV0IHRoZXkgb2JleSB2ZXJ5IGRpZmZlcmVudCBydWxlcy4KIyBBIHZlY3RvciBpcyBob21vZ2VuZW91cywgYWxsIGVsZW1lbnRzIG11c3QgYmUgdGhlIHNhbWUgdHlwZS4KIyBUeXBlczogbnVtZXJpYywgY2hhcmFjdGVyLCBsb2dpY2FsLCBmYWN0b3IKCiMgbGlzdHMgY2FuIHRha2UgYW55IHR5cGUgCiMgdGhlcmUgYXJlIGEgZmV3IG1ldGhvZHMgZm9yIGFjY2Vzc2luZyBhIGxpc3QgCiMgc29tZSByZXR1cm4gc3VibGlzdHMgb3RoZXIgcmV0dXJucyBudW1lcmljIHZlY3RvciAKCmBgYHtyfQojIGNyZWF0ZSBsaXN0IGZvciBhIHBhdGllbnQKc3ViamVjdDEgPC0gbGlzdChmdWxsbmFtZSA9IHN1YmplY3RfbmFtZVsxXSwgCiAgICAgICAgICAgICAgICAgdGVtcGVyYXR1cmUgPSB0ZW1wZXJhdHVyZVsxXSwKICAgICAgICAgICAgICAgICBmbHVfc3RhdHVzID0gZmx1X3N0YXR1c1sxXSwKICAgICAgICAgICAgICAgICBnZW5kZXIgPSBnZW5kZXJbMV0sCiAgICAgICAgICAgICAgICAgYmxvb2QgPSBibG9vZFsxXSwKICAgICAgICAgICAgICAgICBzeW1wdG9tcyA9IHN5bXB0b21zWzFdKQpzdWJqZWN0MQpgYGAKCiMgZ2V0IGEgc2luZ2xlIGxpc3QgdmFsdWUgYnkgcG9zaXRpb24gCiMgcmV0dXJucyBhIHN1Yi1saXN0IApgYGB7cn0Kc3ViamVjdDFbMl0KYGBgCgojIGdldCBhIHNpbmdsZSBsaXN0IHZhbHVlIGJ5IHBvc2l0aW9uIAojIHJldHVybnMgYSBudW1lcmljIHZlY3RvciAKYGBge3J9CnN1YmplY3QxW1syXV0KYGBgCgojIGdldCBhIHNpbmdsZSBsaXN0IHZhbHVlIGJ5IG5hbWUgCmBgYHtyfQpzdWJqZWN0MSR0ZW1wZXJhdHVyZQpgYGAKCiMgZ2V0IHNldmVyYWwgbGlzdHMgaXRlbXMgYnkgc3BlY2lmaXlpbmcgYSB2ZWN0b3Igb2YgbmFtZXMgCmBgYHtyfQpzdWJqZWN0MVtjKCJ0ZW1wZXJhdHVyZSIsICJmbHVfc3RhdHVzIildCmBgYAojIHdlIGNhbiBhbHNvIGFjY2VzcyBhIGxpc3QgbGlrZSBhIHZlY3RvcgojIGdldCB2YWx1ZXMgMSB0byAzIChyYW5nZSkKYGBge3J9CnN1YmplY3QxWzE6M10KCmBgYAoKIyBEYXRhIGZyYW1lcwojIGNyZWF0ZSBhIGRhdGEgZnJhbWUgZnJvbSBtZWRpY2FsIHBhdGllbnQgZGF0YQoKIyBub3RlIHRoYXQgIGluIFIgdmVyc2lvbiA0KyB3ZSBkb24ndCBuZWVkIHRoZSBhcmcgCiMgc3RyaW5nc0FzRmFjdG9ycyA9IEZhbHNlIAojIGl0IGlzIGFscmVhZHkgZmFsc2UgYnkgZGVmYXVsdCAKIyBpbiBwcmV2IHZlcnNpb25zIG9mIFIsIHRoaXMgY2F1c2VkIHNpbGVudCBidWdzIAojIGJlY2F1c2UgaXQgd291bGQgY29udmVydCBpdCBpbnRlcm5hbGx5IHRvIGludGVnZXJzIAoKCmBgYHtyfQpwdF9kYXRhIDwtIGRhdGEuZnJhbWUoc3ViamVjdF9uYW1lLCB0ZW1wZXJhdHVyZSwgZmx1X3N0YXR1cywgZ2VuZGVyLAogICAgICAgICAgICAgICAgICAgICAgYmxvb2QsIHN5bXB0b21zLCBzdHJpbmdzQXNGYWN0b3JzID0gRkFMU0UpCmBgYAoKIyBkaXNwbGF5IHRoZSBkYXRhIGZyYW1lCmBgYHtyfQpwdF9kYXRhCmBgYAoKIyBhY2Nlc3NpbmcgYSBkYXRhIGZyYW1lCiMgZ2V0IGEgc2luZ2xlIGNvbHVtbgoKYGBge3J9CnB0X2RhdGEkc3ViamVjdF9uYW1lCgpgYGAKCiMgZ2V0IHNldmVyYWwgY29sdW1ucyBieSBzcGVjaWZ5aW5nIGEgdmVjdG9yIG9mIG5hbWVzCmBgYHtyfQpwdF9kYXRhW2MoInRlbXBlcmF0dXJlIiwgImZsdV9zdGF0dXMiKV0KCmBgYAoKIyB0aGlzIGlzIHRoZSBzYW1lIGFzIGFib3ZlLCBleHRyYWN0aW5nIHRlbXBlcmF0dXJlIGFuZCAjZmx1X3N0YXR1cwpgYGB7cn0KcHRfZGF0YVsyOjNdCgpgYGAKIyBhY2Nlc3NpbmcgYnkgcm93IGFuZCBjb2x1bW4KYGBge3J9CnB0X2RhdGFbMSwgMl0KCmBgYAoKIyBhY2Nlc3Npbmcgc2V2ZXJhbCByb3dzIGFuZCBzZXZlcmFsIGNvbHVtbnMgdXNpbmcgdmVjdG9ycwoKYGBge3J9CnB0X2RhdGFbYygxLCAzKSwgYygyLCA0KV0KCmBgYAoKIyBMZWF2ZSBhIHJvdyBvciBjb2x1bW4gYmxhbmsgdG8gZXh0cmFjdCBhbGwgcm93cyBvciBjb2x1bW5zCiMgY29sdW1uIDEsIGFsbCByb3dzCmBgYHtyfQpwdF9kYXRhWywgMV0KCmBgYAoKIyByb3cgMSwgYWxsIGNvbHVtbnMKYGBge3J9CnB0X2RhdGFbMSwgXQoKYGBgCiMgYWxsIHJvd3MgYW5kIGFsbCBjb2x1bW5zCmBgYHtyfQpwdF9kYXRhWyAsIF0KCmBgYAojIHRoZSBmb2xsb3dpbmcgYXJlIGVxdWl2YWxlbnQKCmBgYHtyfQpwdF9kYXRhW2MoMSwgMyksIGMoInRlbXBlcmF0dXJlIiwgImdlbmRlciIpXQoKYGBgCgpgYGB7cn0KcHRfZGF0YVstMiwgYygtMSwgLTMsIC01LCAtNildCgpgYGAKCiMgY3JlYXRpbmcgYSBDZWxzaXVzIHRlbXBlcmF0dXJlIGNvbHVtbgpgYGB7cn0KcHRfZGF0YSR0ZW1wX2MgPC0gKHB0X2RhdGEkdGVtcGVyYXR1cmUgLSAzMikgKiAoNSAvIDkpCgpgYGAKCiMgY29tcGFyaW5nIGJlZm9yZSBhbmQgYWZ0ZXIKYGBge3J9CnB0X2RhdGFbYygidGVtcGVyYXR1cmUiLCAidGVtcF9jIildCgpgYGAKCiMgTWF0cml4ZXMgCgojIGNyZWF0ZSBhIDJ4MiBtYXRyaXgKYGBge3J9Cm0gPC0gbWF0cml4KGMoMSwgMiwgMywgNCksIG5yb3cgPSAyKQptCmBgYAoKCiMgZXF1aXZhbGVudCB0byB0aGUgYWJvdmUKYGBge3J9CgptIDwtIG1hdHJpeChjKDEsIDIsIDMsIDQpLCBuY29sID0gMikKbQpgYGAKCiMgY3JlYXRlIGEgMngzIG1hdHJpeApgYGB7cn0KbSA8LSBtYXRyaXgoYygxLCAyLCAzLCA0LCA1LCA2KSwgbnJvdyA9IDIpCm0KYGBgCgoKIyBjcmVhdGUgYSAzeDIgbWF0cml4CmBgYHtyfQptIDwtIG1hdHJpeChjKDEsIDIsIDMsIDQsIDUsIDYpLCBuY29sID0gMikKbQpgYGAKCiMgZXh0cmFjdCB2YWx1ZXMgZnJvbSBtYXRyaXhlcwpgYGB7cn0KbVsxLDFdCm1bMywgMl0KIyBleHRyYWN0IHJvd3MKbVsxLCBdCiMgZXh0cmFjdCBjb2x1bW5zCm1bLCAxXQpgYGAKCg==