the first data struct we learn is vectors
create vectors of data for three med patient
subject_name <- c("Youssef Hemimy", "Jane Doe", "Steve Graves")
temperature <- c(98.1, 98.6, 101.4)
flu_status <- c(FALSE, FALSE, TRUE)
now we can access any element in the body temp vector
note that indexing in r starts at 0
temperature[1]
[1] 98.1
temperature[2]
[1] 98.6
temperature[3]
[1] 101.4
other ways to access (range, exclude)
include items in the range 2 to 3
temperature[2:3]
[1] 98.6 101.4
exclude item 2 using the minus sign
temperature[-2]
[1] 98.1 101.4
Factors
it is a data type used to represent categorical variables.
Think: labels, not numbers.
add gender factor
gender <- factor(c("Male","Female", "Male"))
gender
[1] Male Female Male
Levels: Female Male
Levels are the distinct categories a factor can take.
They are the allowed labels that factor can take.
add blood type factor
blood <- factor(c("O", "AB", "A"),
levels = c("A", "B", "AB", "O"))
blood
[1] O AB A
Levels: A B AB O
to add an ordered factor, we add arg ordered=true
symptoms <- factor(c("SEVERE", "MILD", "MODERATE"),
levels = c("MILD", "MODERATE", "SEVERE"),
ordered = TRUE)
symptoms
[1] SEVERE MILD MODERATE
Levels: MILD < MODERATE < SEVERE
I am confused on how the ordering works
adding a numbered factor to understand exactly how it works
sy <- factor(c("1", "2", "3"),
levels = c("1", "2", "3"),
ordered = TRUE)
sy
[1] 1 2 3
Levels: 1 < 2 < 3
sy2 <- factor(c("1a", "2b", "2a", "3a"),
levels = c("1a","2a", "2b", "3a"),
ordered = TRUE)
sy2
[1] 1a 2b 2a 3a
Levels: 1a < 2a < 2b < 3a
this output shows that levels defines the order, ascending
Lists
display info for a patient
subject_name[1]
[1] "Youssef Hemimy"
temperature[1]
[1] 98.1
flu_status[1]
[1] FALSE
gender[1]
[1] Male
Levels: Female Male
blood[1]
[1] O
Levels: A B AB O
symptoms[1]
[1] SEVERE
Levels: MILD < MODERATE < SEVERE
a vector and a list are both containers, but they obey very
different rules.
A vector is homogeneous, all elements must be the same type.
Types: numeric, character, logical, factor
lists can take any type
there are a few methods for accessing a list
some return sublists other returns numeric vector
# create list for a patient
subject1 <- list(fullname = subject_name[1],
temperature = temperature[1],
flu_status = flu_status[1],
gender = gender[1],
blood = blood[1],
symptoms = symptoms[1])
subject1
$fullname
[1] "Youssef Hemimy"
$temperature
[1] 98.1
$flu_status
[1] FALSE
$gender
[1] Male
Levels: Female Male
$blood
[1] O
Levels: A B AB O
$symptoms
[1] SEVERE
Levels: MILD < MODERATE < SEVERE
get a single list value by position
returns a sub-list
subject1[2]
get a single list value by position
returns a numeric vector
subject1[[2]]
[1] 98.1
get a single list value by name
subject1$temperature
[1] 98.1
get several lists items by specifiying a vector of names
subject1[c("temperature", "flu_status")]
$temperature
[1] 98.1
$flu_status
[1] FALSE
we can also access a list like a vector
get values 1 to 3 (range)
subject1[1:3]
$fullname
[1] "Youssef Hemimy"
$temperature
[1] 98.1
$flu_status
[1] FALSE
Data frames
create a data frame from medical patient data
note that in R version 4+ we don’t need the arg
stringsAsFactors = False
it is already false by default
in prev versions of R, this caused silent bugs
because it would convert it internally to integers
pt_data <- data.frame(subject_name, temperature, flu_status, gender,
blood, symptoms, stringsAsFactors = FALSE)
display the data frame
pt_data
accessing a data frame
get a single column
pt_data$subject_name
[1] "Youssef Hemimy" "Jane Doe" "Steve Graves"
get several columns by specifying a vector of names
pt_data[c("temperature", "flu_status")]
this is the same as above, extracting temperature and
#flu_status
pt_data[2:3]
NA
accessing by row and column
pt_data[1, 2]
[1] 98.1
accessing several rows and several columns using vectors
pt_data[c(1, 3), c(2, 4)]
NA
Leave a row or column blank to extract all rows or columns
column 1, all rows
pt_data[, 1]
[1] "Youssef Hemimy" "Jane Doe" "Steve Graves"
row 1, all columns
pt_data[1, ]
NA
all rows and all columns
pt_data[ , ]
NA
the following are equivalent
pt_data[c(1, 3), c("temperature", "gender")]
NA
pt_data[-2, c(-1, -3, -5, -6)]
NA
creating a Celsius temperature column
pt_data$temp_c <- (pt_data$temperature - 32) * (5 / 9)
comparing before and after
pt_data[c("temperature", "temp_c")]
NA
Matrixes
create a 2x2 matrix
m <- matrix(c(1, 2, 3, 4), nrow = 2)
m
[,1] [,2]
[1,] 1 3
[2,] 2 4
equivalent to the above
m <- matrix(c(1, 2, 3, 4), ncol = 2)
m
[,1] [,2]
[1,] 1 3
[2,] 2 4
create a 2x3 matrix
m <- matrix(c(1, 2, 3, 4, 5, 6), nrow = 2)
m
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
create a 3x2 matrix
m <- matrix(c(1, 2, 3, 4, 5, 6), ncol = 2)
m
[,1] [,2]
[1,] 1 4
[2,] 2 5
[3,] 3 6
LS0tCnRpdGxlOiAiWW91c3NlZi1Nb2R1bGUtMiIKb3V0cHV0OiBodG1sX25vdGVib29rCi0tLQoKIyB0aGUgZmlyc3QgZGF0YSBzdHJ1Y3Qgd2UgbGVhcm4gaXMgdmVjdG9ycyAKCiMgY3JlYXRlIHZlY3RvcnMgb2YgZGF0YSBmb3IgdGhyZWUgbWVkIHBhdGllbnQKCmBgYHtyfQoKc3ViamVjdF9uYW1lIDwtIGMoIllvdXNzZWYgSGVtaW15IiwgIkphbmUgRG9lIiwgIlN0ZXZlIEdyYXZlcyIpCnRlbXBlcmF0dXJlIDwtIGMoOTguMSwgOTguNiwgMTAxLjQpCmZsdV9zdGF0dXMgPC0gYyhGQUxTRSwgRkFMU0UsIFRSVUUpCmBgYAoKIyBub3cgd2UgY2FuIGFjY2VzcyBhbnkgZWxlbWVudCBpbiB0aGUgYm9keSB0ZW1wIHZlY3RvciAKIyBub3RlIHRoYXQgaW5kZXhpbmcgaW4gciBzdGFydHMgYXQgMCAKYGBge3J9CnRlbXBlcmF0dXJlWzFdCnRlbXBlcmF0dXJlWzJdCnRlbXBlcmF0dXJlWzNdCgpgYGAKCgojIG90aGVyIHdheXMgdG8gYWNjZXNzICAocmFuZ2UsIGV4Y2x1ZGUpCiMgaW5jbHVkZSBpdGVtcyBpbiB0aGUgcmFuZ2UgMiB0byAzCmBgYHtyfQp0ZW1wZXJhdHVyZVsyOjNdCgpgYGAKCiMgZXhjbHVkZSBpdGVtIDIgdXNpbmcgdGhlIG1pbnVzIHNpZ24KYGBge3J9CnRlbXBlcmF0dXJlWy0yXQoKYGBgCgojIEZhY3RvcnMgCiMgaXQgaXMgYSBkYXRhIHR5cGUgdXNlZCB0byByZXByZXNlbnQgY2F0ZWdvcmljYWwgdmFyaWFibGVzLgojIFRoaW5rOiBsYWJlbHMsIG5vdCBudW1iZXJzLgoKIyBhZGQgZ2VuZGVyIGZhY3RvciAKCmBgYHtyfQpnZW5kZXIgPC0gZmFjdG9yKGMoIk1hbGUiLCAiRmVtYWxlIiwgIk1hbGUiKSkKZ2VuZGVyCmBgYAoKIyBMZXZlbHMgYXJlIHRoZSBkaXN0aW5jdCBjYXRlZ29yaWVzIGEgZmFjdG9yIGNhbiB0YWtlLgojIFRoZXkgYXJlIHRoZSBhbGxvd2VkIGxhYmVscyB0aGF0IGZhY3RvciBjYW4gdGFrZS4gCgojIGFkZCBibG9vZCB0eXBlIGZhY3RvcgoKYGBge3J9CmJsb29kIDwtIGZhY3RvcihjKCJPIiwgIkFCIiwgIkEiKSwKICAgICAgICAgICAgICAgIGxldmVscyA9IGMoIkEiLCAiQiIsICJBQiIsICJPIikpCmJsb29kCmBgYAoKIyB0byBhZGQgYW4gb3JkZXJlZCBmYWN0b3IsIHdlIGFkZCBhcmcgb3JkZXJlZD10cnVlIApgYGB7cn0Kc3ltcHRvbXMgPC0gZmFjdG9yKGMoIlNFVkVSRSIsICJNSUxEIiwgIk1PREVSQVRFIiksCiAgICAgICAgICAgICAgICAgICBsZXZlbHMgPSBjKCJNSUxEIiwgIk1PREVSQVRFIiwgIlNFVkVSRSIpLAogICAgICAgICAgICAgICAgICAgb3JkZXJlZCA9IFRSVUUpCnN5bXB0b21zCmBgYAoKIyBJIGFtIGNvbmZ1c2VkIG9uIGhvdyB0aGUgb3JkZXJpbmcgd29ya3MgCiMgYWRkaW5nIGEgbnVtYmVyZWQgZmFjdG9yIHRvIHVuZGVyc3RhbmQgZXhhY3RseSBob3cgaXQgd29ya3MgCgpgYGB7cn0Kc3kgPC0gZmFjdG9yKGMoIjEiLCAiMiIsICIzIiksCiAgICAgICAgICAgICAgICAgICBsZXZlbHMgPSBjKCIxIiwgIjIiLCAiMyIpLAogICAgICAgICAgICAgICAgICAgb3JkZXJlZCA9IFRSVUUpCnN5CgpgYGAKYGBge3J9CnN5MiA8LSBmYWN0b3IoYygiMWEiLCAiMmIiLCAiMmEiLCAiM2EiKSwKICAgICAgICAgICAgICAgICAgIGxldmVscyA9IGMoIjFhIiwiMmEiLCAiMmIiLCAiM2EiKSwKICAgICAgICAgICAgICAgICAgIG9yZGVyZWQgPSBUUlVFKQpzeTIKYGBgCiMgdGhpcyBvdXRwdXQgc2hvd3MgdGhhdCBsZXZlbHMgZGVmaW5lcyB0aGUgb3JkZXIsIGFzY2VuZGluZyAKCgojIExpc3RzIAoKIyBkaXNwbGF5IGluZm8gZm9yIGEgcGF0aWVudCAKYGBge3J9CnN1YmplY3RfbmFtZVsxXQp0ZW1wZXJhdHVyZVsxXQpmbHVfc3RhdHVzWzFdCmdlbmRlclsxXQpibG9vZFsxXQpzeW1wdG9tc1sxXQoKYGBgCgojIGEgdmVjdG9yIGFuZCBhIGxpc3QgYXJlIGJvdGggY29udGFpbmVycywgYnV0IHRoZXkgb2JleSB2ZXJ5IGRpZmZlcmVudCBydWxlcy4KIyBBIHZlY3RvciBpcyBob21vZ2VuZW91cywgYWxsIGVsZW1lbnRzIG11c3QgYmUgdGhlIHNhbWUgdHlwZS4KIyBUeXBlczogbnVtZXJpYywgY2hhcmFjdGVyLCBsb2dpY2FsLCBmYWN0b3IKCiMgbGlzdHMgY2FuIHRha2UgYW55IHR5cGUgCiMgdGhlcmUgYXJlIGEgZmV3IG1ldGhvZHMgZm9yIGFjY2Vzc2luZyBhIGxpc3QgCiMgc29tZSByZXR1cm4gc3VibGlzdHMgb3RoZXIgcmV0dXJucyBudW1lcmljIHZlY3RvciAKCmBgYHtyfQojIGNyZWF0ZSBsaXN0IGZvciBhIHBhdGllbnQKc3ViamVjdDEgPC0gbGlzdChmdWxsbmFtZSA9IHN1YmplY3RfbmFtZVsxXSwgCiAgICAgICAgICAgICAgICAgdGVtcGVyYXR1cmUgPSB0ZW1wZXJhdHVyZVsxXSwKICAgICAgICAgICAgICAgICBmbHVfc3RhdHVzID0gZmx1X3N0YXR1c1sxXSwKICAgICAgICAgICAgICAgICBnZW5kZXIgPSBnZW5kZXJbMV0sCiAgICAgICAgICAgICAgICAgYmxvb2QgPSBibG9vZFsxXSwKICAgICAgICAgICAgICAgICBzeW1wdG9tcyA9IHN5bXB0b21zWzFdKQpzdWJqZWN0MQpgYGAKCiMgZ2V0IGEgc2luZ2xlIGxpc3QgdmFsdWUgYnkgcG9zaXRpb24gCiMgcmV0dXJucyBhIHN1Yi1saXN0IApgYGB7cn0Kc3ViamVjdDFbMl0KYGBgCgojIGdldCBhIHNpbmdsZSBsaXN0IHZhbHVlIGJ5IHBvc2l0aW9uIAojIHJldHVybnMgYSBudW1lcmljIHZlY3RvciAKYGBge3J9CnN1YmplY3QxW1syXV0KYGBgCgojIGdldCBhIHNpbmdsZSBsaXN0IHZhbHVlIGJ5IG5hbWUgCmBgYHtyfQpzdWJqZWN0MSR0ZW1wZXJhdHVyZQpgYGAKCiMgZ2V0IHNldmVyYWwgbGlzdHMgaXRlbXMgYnkgc3BlY2lmaXlpbmcgYSB2ZWN0b3Igb2YgbmFtZXMgCmBgYHtyfQpzdWJqZWN0MVtjKCJ0ZW1wZXJhdHVyZSIsICJmbHVfc3RhdHVzIildCmBgYAojIHdlIGNhbiBhbHNvIGFjY2VzcyBhIGxpc3QgbGlrZSBhIHZlY3RvcgojIGdldCB2YWx1ZXMgMSB0byAzIChyYW5nZSkKYGBge3J9CnN1YmplY3QxWzE6M10KCmBgYAoKIyBEYXRhIGZyYW1lcwojIGNyZWF0ZSBhIGRhdGEgZnJhbWUgZnJvbSBtZWRpY2FsIHBhdGllbnQgZGF0YQoKIyBub3RlIHRoYXQgIGluIFIgdmVyc2lvbiA0KyB3ZSBkb24ndCBuZWVkIHRoZSBhcmcgCiMgc3RyaW5nc0FzRmFjdG9ycyA9IEZhbHNlIAojIGl0IGlzIGFscmVhZHkgZmFsc2UgYnkgZGVmYXVsdCAKIyBpbiBwcmV2IHZlcnNpb25zIG9mIFIsIHRoaXMgY2F1c2VkIHNpbGVudCBidWdzIAojIGJlY2F1c2UgaXQgd291bGQgY29udmVydCBpdCBpbnRlcm5hbGx5IHRvIGludGVnZXJzIAoKCmBgYHtyfQpwdF9kYXRhIDwtIGRhdGEuZnJhbWUoc3ViamVjdF9uYW1lLCB0ZW1wZXJhdHVyZSwgZmx1X3N0YXR1cywgZ2VuZGVyLAogICAgICAgICAgICAgICAgICAgICAgYmxvb2QsIHN5bXB0b21zLCBzdHJpbmdzQXNGYWN0b3JzID0gRkFMU0UpCmBgYAoKIyBkaXNwbGF5IHRoZSBkYXRhIGZyYW1lCmBgYHtyfQpwdF9kYXRhCmBgYAoKIyBhY2Nlc3NpbmcgYSBkYXRhIGZyYW1lCiMgZ2V0IGEgc2luZ2xlIGNvbHVtbgoKYGBge3J9CnB0X2RhdGEkc3ViamVjdF9uYW1lCgpgYGAKCiMgZ2V0IHNldmVyYWwgY29sdW1ucyBieSBzcGVjaWZ5aW5nIGEgdmVjdG9yIG9mIG5hbWVzCmBgYHtyfQpwdF9kYXRhW2MoInRlbXBlcmF0dXJlIiwgImZsdV9zdGF0dXMiKV0KCmBgYAoKIyB0aGlzIGlzIHRoZSBzYW1lIGFzIGFib3ZlLCBleHRyYWN0aW5nIHRlbXBlcmF0dXJlIGFuZCAjZmx1X3N0YXR1cwpgYGB7cn0KcHRfZGF0YVsyOjNdCgpgYGAKIyBhY2Nlc3NpbmcgYnkgcm93IGFuZCBjb2x1bW4KYGBge3J9CnB0X2RhdGFbMSwgMl0KCmBgYAoKIyBhY2Nlc3Npbmcgc2V2ZXJhbCByb3dzIGFuZCBzZXZlcmFsIGNvbHVtbnMgdXNpbmcgdmVjdG9ycwoKYGBge3J9CnB0X2RhdGFbYygxLCAzKSwgYygyLCA0KV0KCmBgYAoKIyBMZWF2ZSBhIHJvdyBvciBjb2x1bW4gYmxhbmsgdG8gZXh0cmFjdCBhbGwgcm93cyBvciBjb2x1bW5zCiMgY29sdW1uIDEsIGFsbCByb3dzCmBgYHtyfQpwdF9kYXRhWywgMV0KCmBgYAoKIyByb3cgMSwgYWxsIGNvbHVtbnMKYGBge3J9CnB0X2RhdGFbMSwgXQoKYGBgCiMgYWxsIHJvd3MgYW5kIGFsbCBjb2x1bW5zCmBgYHtyfQpwdF9kYXRhWyAsIF0KCmBgYAojIHRoZSBmb2xsb3dpbmcgYXJlIGVxdWl2YWxlbnQKCmBgYHtyfQpwdF9kYXRhW2MoMSwgMyksIGMoInRlbXBlcmF0dXJlIiwgImdlbmRlciIpXQoKYGBgCgpgYGB7cn0KcHRfZGF0YVstMiwgYygtMSwgLTMsIC01LCAtNildCgpgYGAKCiMgY3JlYXRpbmcgYSBDZWxzaXVzIHRlbXBlcmF0dXJlIGNvbHVtbgpgYGB7cn0KcHRfZGF0YSR0ZW1wX2MgPC0gKHB0X2RhdGEkdGVtcGVyYXR1cmUgLSAzMikgKiAoNSAvIDkpCgpgYGAKCiMgY29tcGFyaW5nIGJlZm9yZSBhbmQgYWZ0ZXIKYGBge3J9CnB0X2RhdGFbYygidGVtcGVyYXR1cmUiLCAidGVtcF9jIildCgpgYGAKCiMgTWF0cml4ZXMgCgojIGNyZWF0ZSBhIDJ4MiBtYXRyaXgKYGBge3J9Cm0gPC0gbWF0cml4KGMoMSwgMiwgMywgNCksIG5yb3cgPSAyKQptCmBgYAoKCiMgZXF1aXZhbGVudCB0byB0aGUgYWJvdmUKYGBge3J9CgptIDwtIG1hdHJpeChjKDEsIDIsIDMsIDQpLCBuY29sID0gMikKbQpgYGAKCiMgY3JlYXRlIGEgMngzIG1hdHJpeApgYGB7cn0KbSA8LSBtYXRyaXgoYygxLCAyLCAzLCA0LCA1LCA2KSwgbnJvdyA9IDIpCm0KYGBgCgoKIyBjcmVhdGUgYSAzeDIgbWF0cml4CmBgYHtyfQptIDwtIG1hdHJpeChjKDEsIDIsIDMsIDQsIDUsIDYpLCBuY29sID0gMikKbQpgYGAKCiMgZXh0cmFjdCB2YWx1ZXMgZnJvbSBtYXRyaXhlcwpgYGB7cn0KbVsxLDFdCm1bMywgMl0KIyBleHRyYWN0IHJvd3MKbVsxLCBdCiMgZXh0cmFjdCBjb2x1bW5zCm1bLCAxXQpgYGAKCg==