For my next learning, I decided I need to understand the language “R” a bit better. In addition to our current book, I also found and read parts of R for Beginners. One goal was to be clear about some of the funny syntax that creeps in.

For example, what the heck is df$a?

df is that variable that names a data frame, $a selects column a

df <- data.frame(a = 1:5, second = 2:6, alpha = letters[1:5])
df

So now let’s select a column with the number 1 through 5

df$a
[1] 1 2 3 4 5

or the column named “second”

df$second
[1] 2 3 4 5 6

I like that in R you can operate on vectors. Here, I’m going to double all the numbers in the second column.

df$second <- 2 * df$second
df

You can also either access or assign any table cell with variable[row,col]. Here I am going to change the 8 in the “second” column to be zero.

df[3,2] <- 0
df

I think this should work to change the zero back to 8

df$second[3] <- 8
df

Yes, it did!

A cool thing I learned was that you can put a conditional in the brackets. For example, I could change the cells whose number is less than 10 to 0.

df$second[df$second < 10] <- 0
df

Let’s break that down. The df$second in brackets is a vector of true or false, depending on whether the value is less than 10.

df$second < 10
[1]  TRUE  TRUE  TRUE FALSE FALSE

so basically, we are first figuring out which cells as less than 10, then selecting those to set to 0.

less_than_10 <- df$second < 10
df$second[less_than_10] <- 0
df

let’s make all those zeros be twice the value in column a

df$second[df$second == 0] <- df$a[df$second == 0] * 2
df

By the way, in the San Mateo exercise I did early, I had two problems. It was a poll whose values were either the character “1” (for true) or NA. I really wanted the number 1 or 0 for these values. I’ve now figured out a much simpler way to clean that up.

First, let’s make a data set similar to the San Mateo one. It had two first columns I didn’t care about, plus 4 poll answer columns. The R function “sample” makes a vector by sampling a set. I’ll use that to fake poll answers.

answers <- c("1",NA)
sample(answers,10,replace=TRUE)
 [1] NA  "1" "1" "1" "1" "1" NA  NA  NA  "1"

so that gives me random 1’s or NAs

sm <- data.frame(a = 11:20,
                 b = letters[11:20],
                 answer1 = sample(answers,10,replace=TRUE),
                 answer2 = sample(answers,10,replace=TRUE),
                 answer3 = sample(answers,10,replace=TRUE),
                 answer4 = sample(answers,10,replace=TRUE))
sm

So first let’s convert everything from characters to numbers. as.numeric will do this, and we could do it one column at a time like this.

sm$answer3 <- as.numeric(sm$answer3)

but we could also operate on a subset of the whole table, that is the 3rd through 6th columns.

mutate_at(sm,3:6,as.numeric)

ooh, let’s try that!

sm <- mutate_at(sm,3:6,as.numeric)
sm

Very cool! now my columns with answers are numbers (of type dbl). But they have that pesky NA. Which I want to be a zero for a false answer to my poll.

Note what the following bit does; it produces a true/false table corresponding to whether a cell is an NA.

is.na(sm[3:6])
      answer1 answer2 answer3 answer4
 [1,]   FALSE    TRUE   FALSE    TRUE
 [2,]    TRUE   FALSE    TRUE    TRUE
 [3,]   FALSE   FALSE   FALSE   FALSE
 [4,]   FALSE   FALSE    TRUE   FALSE
 [5,]    TRUE   FALSE   FALSE    TRUE
 [6,]   FALSE    TRUE    TRUE   FALSE
 [7,]    TRUE    TRUE   FALSE    TRUE
 [8,]   FALSE   FALSE   FALSE    TRUE
 [9,]    TRUE   FALSE    TRUE    TRUE
[10,]   FALSE    TRUE    TRUE   FALSE

So I should be able to use that to change the NAs to 0

sm[3:6][is.na(sm[3:6])] <- 0
sm

Wow! That was super cool !! I first got a subset of the data frame with just the columns 3 through 6: sm[3:6]. Then I selected from that those cells who have an na value with a second set of brackets and assigned them to zero. I’m feeling the power.

I also found there is a built-in coalesce function which replaces NAs. And this works too. It takes two arguments, the second being the value to replace the NAs with. But mutate_at only takes a function of one argument. So I made a function in place that takes one arg, and always supplies 0 as the other arg to coalesce.

sm <- mutate_at(sm,3:6,function(x) {coalesce(x,0)})
sm

So now that I understand R better, I could have “cleaned” that San Mateo data with just one piped expression, which converts to numeric and then cleans up the NAs.

sm <- sm %>% mutate_at(3:6,as.numeric) %>% 
        mutate_at(3:6,function(x) {coalesce(x,0)})
sm
sm <- sm %>% mutate_at(3:6,as.numeric) %>% 
        mutate_at(3:6,function(x) {x[is.na(x)] <- 0;x})
sm

That’s all for today!

jeremy

LS0tCnRpdGxlOiAiSmVyZW15IFBsYXlpbmcgd2l0aCBSIHRoZSBMYW5ndWFnZSIKb3V0cHV0OiBodG1sX25vdGVib29rCi0tLQoKRm9yIG15IG5leHQgbGVhcm5pbmcsIEkgZGVjaWRlZCBJIG5lZWQgdG8gdW5kZXJzdGFuZCB0aGUgbGFuZ3VhZ2UgIlIiIGEgYml0IGJldHRlci4gSW4gYWRkaXRpb24gdG8gb3VyIGN1cnJlbnQgYm9vaywgSSBhbHNvIGZvdW5kIGFuZCByZWFkIHBhcnRzIG9mIFtSIGZvciBCZWdpbm5lcnNdKGh0dHBzOi8vY3Jhbi5yLXByb2plY3Qub3JnL2RvYy9jb250cmliL1BhcmFkaXMtcmRlYnV0c19lbi5wZGYpLiBPbmUgZ29hbCB3YXMgdG8gYmUgY2xlYXIgYWJvdXQgc29tZSBvZiB0aGUgZnVubnkgc3ludGF4IHRoYXQgY3JlZXBzIGluLgoKRm9yIGV4YW1wbGUsIHdoYXQgdGhlIGhlY2sgaXMgKipkZiRhKio/CgpkZiBpcyB0aGF0IHZhcmlhYmxlIHRoYXQgbmFtZXMgYSBkYXRhIGZyYW1lLCAkYSBzZWxlY3RzIGNvbHVtbiBhCgpgYGB7cn0KZGYgPC0gZGF0YS5mcmFtZShhID0gMTo1LCBzZWNvbmQgPSAyOjYsIGFscGhhID0gbGV0dGVyc1sxOjVdKQpkZgpgYGAKClNvIG5vdyBsZXQncyBzZWxlY3QgYSBjb2x1bW4gd2l0aCB0aGUgbnVtYmVyIDEgdGhyb3VnaCA1CgpgYGB7cn0KZGYkYQpgYGAKCm9yIHRoZSBjb2x1bW4gbmFtZWQgInNlY29uZCIKCmBgYHtyfQpkZiRzZWNvbmQKYGBgCgpJIGxpa2UgdGhhdCBpbiBSIHlvdSBjYW4gb3BlcmF0ZSBvbiB2ZWN0b3JzLiBIZXJlLCBJJ20gZ29pbmcgdG8gZG91YmxlIGFsbCB0aGUgbnVtYmVycyBpbiB0aGUgc2Vjb25kIGNvbHVtbi4KCmBgYHtyfQpkZiRzZWNvbmQgPC0gMiAqIGRmJHNlY29uZApkZgpgYGAKCllvdSBjYW4gYWxzbyBlaXRoZXIgYWNjZXNzIG9yIGFzc2lnbiBhbnkgdGFibGUgY2VsbCB3aXRoIHZhcmlhYmxlW3Jvdyxjb2xdLiBIZXJlIEkgYW0gZ29pbmcgdG8gY2hhbmdlIHRoZSA4IGluIHRoZSAic2Vjb25kIiBjb2x1bW4gdG8gYmUgemVyby4KCmBgYHtyfQpkZlszLDJdIDwtIDAKZGYKYGBgCgpJIHRoaW5rIHRoaXMgc2hvdWxkIHdvcmsgdG8gY2hhbmdlIHRoZSB6ZXJvIGJhY2sgdG8gOAoKYGBge3J9CmRmJHNlY29uZFszXSA8LSA4CmRmCmBgYApZZXMsIGl0IGRpZCEKCkEgY29vbCB0aGluZyBJIGxlYXJuZWQgd2FzIHRoYXQgeW91IGNhbiBwdXQgYSBjb25kaXRpb25hbCBpbiB0aGUgYnJhY2tldHMuIEZvciBleGFtcGxlLCBJIGNvdWxkIGNoYW5nZSB0aGUgY2VsbHMgd2hvc2UgbnVtYmVyIGlzIGxlc3MgdGhhbiAxMCB0byAwLiAKCmBgYHtyfQpkZiRzZWNvbmRbZGYkc2Vjb25kIDwgMTBdIDwtIDAKZGYKYGBgCgpMZXQncyBicmVhayB0aGF0IGRvd24uIFRoZSBkZiRzZWNvbmQgaW4gYnJhY2tldHMgaXMgYSB2ZWN0b3Igb2YgdHJ1ZSBvciBmYWxzZSwgZGVwZW5kaW5nIG9uIHdoZXRoZXIgdGhlIHZhbHVlIGlzIGxlc3MgdGhhbiAxMC4KCmBgYHtyfQpkZiRzZWNvbmQgPCAxMApgYGAKCnNvIGJhc2ljYWxseSwgd2UgYXJlIGZpcnN0IGZpZ3VyaW5nIG91dCB3aGljaCBjZWxscyBhcyBsZXNzIHRoYW4gMTAsIHRoZW4gc2VsZWN0aW5nIHRob3NlIHRvIHNldCB0byAwLgoKYGBge3J9Cmxlc3NfdGhhbl8xMCA8LSBkZiRzZWNvbmQgPCAxMApkZiRzZWNvbmRbbGVzc190aGFuXzEwXSA8LSAwCmRmCmBgYAoKbGV0J3MgbWFrZSBhbGwgdGhvc2UgemVyb3MgYmUgdHdpY2UgdGhlIHZhbHVlIGluIGNvbHVtbiBhCgpgYGB7cn0KZGYkc2Vjb25kW2RmJHNlY29uZCA9PSAwXSA8LSBkZiRhW2RmJHNlY29uZCA9PSAwXSAqIDIKZGYKYGBgCgpCeSB0aGUgd2F5LCBpbiB0aGUgW1NhbiBNYXRlbyBleGVyY2lzZV0oaHR0cDovL3JwdWJzLmNvbS9yb3NjaGVsbGUvMzYwMjU3KSBJIGRpZCBlYXJseSwgSSBoYWQgdHdvIHByb2JsZW1zLiBJdCB3YXMgYSBwb2xsIHdob3NlIHZhbHVlcyB3ZXJlIGVpdGhlciB0aGUgY2hhcmFjdGVyICIxIiAoZm9yIHRydWUpIG9yIE5BLiBJIHJlYWxseSB3YW50ZWQgdGhlIG51bWJlciAxIG9yIDAgZm9yIHRoZXNlIHZhbHVlcy4gSSd2ZSBub3cgZmlndXJlZCBvdXQgYSBtdWNoIHNpbXBsZXIgd2F5IHRvIGNsZWFuIHRoYXQgdXAuCgpGaXJzdCwgbGV0J3MgbWFrZSBhIGRhdGEgc2V0IHNpbWlsYXIgdG8gdGhlIFNhbiBNYXRlbyBvbmUuIEl0IGhhZCB0d28gZmlyc3QgY29sdW1ucyBJIGRpZG4ndCBjYXJlIGFib3V0LCBwbHVzIDQgcG9sbCBhbnN3ZXIgY29sdW1ucy4gVGhlIFIgZnVuY3Rpb24gInNhbXBsZSIgbWFrZXMgYSB2ZWN0b3IgYnkgc2FtcGxpbmcgYSBzZXQuIEknbGwgdXNlIHRoYXQgdG8gZmFrZSBwb2xsIGFuc3dlcnMuCgpgYGB7cn0KYW5zd2VycyA8LSBjKCIxIixOQSkKc2FtcGxlKGFuc3dlcnMsMTAscmVwbGFjZT1UUlVFKQpgYGAKCnNvIHRoYXQgZ2l2ZXMgbWUgcmFuZG9tIDEncyBvciBOQXMKCmBgYHtyfQpzbSA8LSBkYXRhLmZyYW1lKGEgPSAxMToyMCwKICAgICAgICAgICAgICAgICBiID0gbGV0dGVyc1sxMToyMF0sCiAgICAgICAgICAgICAgICAgYW5zd2VyMSA9IHNhbXBsZShhbnN3ZXJzLDEwLHJlcGxhY2U9VFJVRSksCiAgICAgICAgICAgICAgICAgYW5zd2VyMiA9IHNhbXBsZShhbnN3ZXJzLDEwLHJlcGxhY2U9VFJVRSksCiAgICAgICAgICAgICAgICAgYW5zd2VyMyA9IHNhbXBsZShhbnN3ZXJzLDEwLHJlcGxhY2U9VFJVRSksCiAgICAgICAgICAgICAgICAgYW5zd2VyNCA9IHNhbXBsZShhbnN3ZXJzLDEwLHJlcGxhY2U9VFJVRSkpCnNtCmBgYAoKU28gZmlyc3QgbGV0J3MgY29udmVydCBldmVyeXRoaW5nIGZyb20gY2hhcmFjdGVycyB0byBudW1iZXJzLiBhcy5udW1lcmljIHdpbGwgZG8gdGhpcywgYW5kIHdlIGNvdWxkIGRvIGl0IG9uZSBjb2x1bW4gYXQgYSB0aW1lIGxpa2UgdGhpcy4KCnNtXCRhbnN3ZXIzIDwtIGFzLm51bWVyaWMoc21cJGFuc3dlcjMpCgpidXQgd2UgY291bGQgYWxzbyBvcGVyYXRlIG9uIGEgc3Vic2V0IG9mIHRoZSB3aG9sZSB0YWJsZSwgdGhhdCBpcyB0aGUgM3JkIHRocm91Z2ggNnRoIGNvbHVtbnMuCgptdXRhdGVfYXQoc20sMzo2LGFzLm51bWVyaWMpCgpvb2gsIGxldCdzIHRyeSB0aGF0IQoKYGBge3J9CnNtIDwtIG11dGF0ZV9hdChzbSwzOjYsYXMubnVtZXJpYykKc20KYGBgCgpWZXJ5IGNvb2whIG5vdyBteSBjb2x1bW5zIHdpdGggYW5zd2VycyBhcmUgbnVtYmVycyAob2YgdHlwZSBkYmwpLiBCdXQgdGhleSBoYXZlIHRoYXQgcGVza3kgTkEuIFdoaWNoIEkgd2FudCB0byBiZSBhIHplcm8gZm9yIGEgZmFsc2UgYW5zd2VyIHRvIG15IHBvbGwuIAoKTm90ZSB3aGF0IHRoZSBmb2xsb3dpbmcgYml0IGRvZXM7IGl0IHByb2R1Y2VzIGEgdHJ1ZS9mYWxzZSB0YWJsZSBjb3JyZXNwb25kaW5nIHRvIHdoZXRoZXIgYSBjZWxsIGlzIGFuIE5BLgoKYGBge3J9CmlzLm5hKHNtWzM6Nl0pCmBgYAoKU28gSSBzaG91bGQgYmUgYWJsZSB0byB1c2UgdGhhdCB0byBjaGFuZ2UgdGhlIE5BcyB0byAwCgpgYGB7cn0Kc21bMzo2XVtpcy5uYShzbVszOjZdKV0gPC0gMApzbQpgYGAKCldvdyEgVGhhdCB3YXMgc3VwZXIgY29vbCAhISBJIGZpcnN0IGdvdCBhIHN1YnNldCBvZiB0aGUgZGF0YSBmcmFtZSB3aXRoIGp1c3QgdGhlIGNvbHVtbnMgMyB0aHJvdWdoIDY6IHNtWzM6Nl0uIFRoZW4gSSBzZWxlY3RlZCBmcm9tIHRoYXQgdGhvc2UgY2VsbHMgd2hvIGhhdmUgYW4gbmEgdmFsdWUgd2l0aCBhIHNlY29uZCBzZXQgb2YgYnJhY2tldHMgYW5kIGFzc2lnbmVkIHRoZW0gdG8gemVyby4gSSdtIGZlZWxpbmcgdGhlIHBvd2VyLgoKSSBhbHNvIGZvdW5kIHRoZXJlIGlzIGEgYnVpbHQtaW4gY29hbGVzY2UgZnVuY3Rpb24gd2hpY2ggcmVwbGFjZXMgTkFzLiBBbmQgdGhpcyB3b3JrcyB0b28uIEl0IHRha2VzIHR3byBhcmd1bWVudHMsIHRoZSBzZWNvbmQgYmVpbmcgdGhlIHZhbHVlIHRvIHJlcGxhY2UgdGhlIE5BcyB3aXRoLiBCdXQgbXV0YXRlX2F0IG9ubHkgdGFrZXMgYSBmdW5jdGlvbiBvZiBvbmUgYXJndW1lbnQuIFNvIEkgbWFkZSBhIGZ1bmN0aW9uICppbiBwbGFjZSogdGhhdCB0YWtlcyBvbmUgYXJnLCBhbmQgYWx3YXlzIHN1cHBsaWVzIDAgYXMgdGhlIG90aGVyIGFyZyB0byBjb2FsZXNjZS4KCmBgYHtyfQpzbSA8LSBtdXRhdGVfYXQoc20sMzo2LGZ1bmN0aW9uKHgpIHtjb2FsZXNjZSh4LDApfSkKc20KYGBgCgpTbyBub3cgdGhhdCBJIHVuZGVyc3RhbmQgUiBiZXR0ZXIsIEkgY291bGQgaGF2ZSAiY2xlYW5lZCIgdGhhdCBTYW4gTWF0ZW8gZGF0YSB3aXRoIGp1c3Qgb25lIHBpcGVkIGV4cHJlc3Npb24sIHdoaWNoIGNvbnZlcnRzIHRvIG51bWVyaWMgYW5kIHRoZW4gY2xlYW5zIHVwIHRoZSBOQXMuCgpgYGB7cn0Kc20gPC0gc20gJT4lIG11dGF0ZV9hdCgzOjYsYXMubnVtZXJpYykgJT4lIAogICAgICAgIG11dGF0ZV9hdCgzOjYsZnVuY3Rpb24oeCkge2NvYWxlc2NlKHgsMCl9KQpzbQpgYGAKCmBgYHtyfQpzbSA8LSBzbSAlPiUgbXV0YXRlX2F0KDM6Nixhcy5udW1lcmljKSAlPiUgCiAgICAgICAgbXV0YXRlX2F0KDM6NixmdW5jdGlvbih4KSB7eFtpcy5uYSh4KV0gPC0gMDt4fSkKc20KYGBgCgpUaGF0J3MgYWxsIGZvciB0b2RheSEKCipqZXJlbXkqCgoK