Character to Factor
How to transform character vector into factor?
Here is an example vector of character class -
> gender_char <- c("female","female","female","male",
+ "female","male","male","female","other")
> class(gender_char)
[1] "character"
Transform to factor using factor() -
> gender_fact <- factor(gender_char)
> class(gender_fact)
[1] "factor"
[1] "female" "male" "other"
droplevels()
How to drop unused levels from a factor?
Let’s see how levels change after taking a subset of the data set -
> gender_subset <- gender_fact[1:4]
> gender_subset
[1] female female female male
Levels: female male other
Although other is not in the data set it is showing up in the levels. So droplevels() function needs to be used -
> gender_subset <- droplevels(gender_subset)
> gender_subset
[1] female female female male
Levels: female male
Number to factor
How to transform numeric vector into factor?
> gender_num <- c(0,0,0,1,0,1,1,0)
> class(gender_num)
[1] "numeric"
> gender_num_fact <- factor(gender_num)
> class(gender_num_fact)
[1] "factor"
gl()
To create a sequence of factor type data `gl() function can be used -
> fac_seq <- gl(n = 2, k = 3)
> fac_seq
[1] 1 1 1 2 2 2
Levels: 1 2
[1] "factor"
We can specify the levels using the argument labels-
> fac_seq <- gl(n = 2, k = 3, labels = c("Control", "Treatment"))
> fac_seq
[1] Control Control Control Treatment Treatment Treatment
Levels: Control Treatment
If we want the levels to be ordered then ordered argument can be used -
> gl(n = 2, k = 3, labels = c("January", "May"), ordered = T)
[1] January January January May May May
Levels: January < May
Changing factor levels
How to change factor levels?
> levels(gender_num_fact) <- c('Female','Male')
> gender_num_fact
[1] Female Female Female Male Female Male Male Female
Levels: Female Male
Ordered Levels
How to make ordered levels?
Levels are usually ordered ascending order of alphabets -
> mymonth <- c('Jan','Dec','Mar','May','Sep')
> mymonth
[1] "Jan" "Dec" "Mar" "May" "Sep"
> mymonth_fact <- factor(mymonth)
> mymonth_fact
[1] Jan Dec Mar May Sep
Levels: Dec Jan Mar May Sep
But we know this is not right since the months are not in correct order. ordered = T preserves the order of levels passed in levels argument -
> mymonth_ord_fact <- factor(mymonth, levels = month.abb, ordered = T)
> #"month.abb" - base R's constant vector. Another example = "month.name"
> mymonth_ord_fact
[1] Jan Dec Mar May Sep
12 Levels: Jan < Feb < Mar < Apr < May < Jun < Jul < Aug < Sep < ... < Dec
Let’s drop the unused levels -
> mymonth_ord_fact <- droplevels(mymonth_ord_fact)
> mymonth_ord_fact
[1] Jan Dec Mar May Sep
Levels: Jan < Mar < May < Sep < Dec
Factor to Numeric
We can see some values are encoded as factors in the following vector -
> raise <- factor(c(1200,1500,1500,1200,1500,1800,1500,1200))
> raise
[1] 1200 1500 1500 1200 1500 1800 1500 1200
Levels: 1200 1500 1800
But when we try to convert it into numeric type, we’ll get the 1,2,3… .
[1] 1 2 2 1 2 3 2 1
To get the numbers in output we need to convert it into strings first, and then into numeric -
> as.numeric(as.character(raise))
[1] 1200 1500 1500 1200 1500 1800 1500 1200
LS0tDQp0aXRsZTogIkZhY3RvciINCmF1dGhvcjogIk1EIEFIU0FOVUwgSVNMQU0iDQpvdXRwdXQ6IA0KICBodG1sX2RvY3VtZW50Og0KICAgIHRvYzogdHJ1ZQ0KICAgIHRvY19mbG9hdDogdHJ1ZQ0KICAgIHRoZW1lOiBjZXJ1bGVhbg0KICAgIGhpZ2hsaWdodDogaGFkZG9jaw0KICAgIGNvZGVfZG93bmxvYWQ6IHRydWUNCi0tLQ0KDQpgYGB7ciwgaW5jbHVkZT1GQUxTRX0NCmtuaXRyOjpvcHRzX2NodW5rJHNldCgNCiAgY29tbWVudCA9ICIiLCBwcm9tcHQgPSBUUlVFLCBtZXNzYWdlID0gRkFMU0UsIHdhcm5pbmcgPSBGQUxTRQ0KKQ0KYGBgDQoNCmBgYHtjc3MsIGVjaG89RkFMU0V9DQouaGVhZGVyLXBhbmVsICNoZWFkZXIgew0KICAgIGRpc3BsYXk6IGZsZXg7DQogICAganVzdGlmeS1jb250ZW50OiBmbGV4LXN0YXJ0Ow0KICAgIGZsZXgtZGlyZWN0aW9uOiBjb2x1bW47DQogICAgbWluLWhlaWdodDogMTQ0cHg7DQogICAgbWFyZ2luLWJvdHRvbTogMTBweDsNCiAgICBhbGlnbi1pdGVtczogYmFzZWxpbmU7DQp9DQpoMS50aXRsZSB7DQogIGZvbnQtd2VpZ2h0OiBib2xkOw0KfQ0KaHJ7DQogIG1hcmdpbjogMmVtIGF1dG87DQogIGJvcmRlci10b3A6IDJweCBzb2xpZDsNCn0NCi5oZWFkZXItcGFuZWwgew0KICAgIGJhY2tncm91bmQtY29sb3I6ICM2YTcxYzU7DQp9DQpgYGANCg0KLS0tDQoNCiMgQ2hhcmFjdGVyIHRvIEZhY3Rvcg0KDQoqKkhvdyB0byB0cmFuc2Zvcm0gY2hhcmFjdGVyIHZlY3RvciBpbnRvIGZhY3Rvcj8qKg0KDQpIZXJlIGlzIGFuIGV4YW1wbGUgdmVjdG9yIG9mIGNoYXJhY3RlciBjbGFzcyAtIA0KYGBge3J9DQpnZW5kZXJfY2hhciA8LSBjKCJmZW1hbGUiLCJmZW1hbGUiLCJmZW1hbGUiLCJtYWxlIiwNCiAgICAgICAgICAgICAgICAgImZlbWFsZSIsIm1hbGUiLCJtYWxlIiwiZmVtYWxlIiwib3RoZXIiKQ0KY2xhc3MoZ2VuZGVyX2NoYXIpDQpgYGANCg0KVHJhbnNmb3JtIHRvIGZhY3RvciB1c2luZyBgZmFjdG9yKClgIC0NCmBgYHtyfQ0KZ2VuZGVyX2ZhY3QgPC0gZmFjdG9yKGdlbmRlcl9jaGFyKQ0KY2xhc3MoZ2VuZGVyX2ZhY3QpDQpsZXZlbHMoZ2VuZGVyX2ZhY3QpDQpgYGANCg0KDQojIGRyb3BsZXZlbHMoKQ0KDQoqKkhvdyB0byBkcm9wIHVudXNlZCBsZXZlbHMgZnJvbSBhIGZhY3Rvcj8qKg0KDQpMZXQncyBzZWUgaG93IGxldmVscyBjaGFuZ2UgYWZ0ZXIgdGFraW5nIGEgc3Vic2V0IG9mIHRoZSBkYXRhIHNldCAtIA0KYGBge3J9DQpnZW5kZXJfc3Vic2V0IDwtIGdlbmRlcl9mYWN0WzE6NF0NCmdlbmRlcl9zdWJzZXQNCmBgYA0KQWx0aG91Z2ggYG90aGVyYCBpcyBub3QgaW4gdGhlIGRhdGEgc2V0IGl0IGlzIHNob3dpbmcgdXAgaW4gdGhlIGxldmVscy4gU28gYGRyb3BsZXZlbHMoKWAgZnVuY3Rpb24gbmVlZHMgdG8gYmUgdXNlZCAtDQpgYGB7cn0NCmdlbmRlcl9zdWJzZXQgPC0gZHJvcGxldmVscyhnZW5kZXJfc3Vic2V0KQ0KZ2VuZGVyX3N1YnNldA0KYGBgDQoNCg0KIyBOdW1iZXIgdG8gZmFjdG9yDQoNCioqSG93IHRvIHRyYW5zZm9ybSBudW1lcmljIHZlY3RvciBpbnRvIGZhY3Rvcj8qKg0KDQoNCmBgYHtyfQ0KZ2VuZGVyX251bSA8LSBjKDAsMCwwLDEsMCwxLDEsMCkNCmNsYXNzKGdlbmRlcl9udW0pDQpnZW5kZXJfbnVtX2ZhY3QgPC0gZmFjdG9yKGdlbmRlcl9udW0pDQpjbGFzcyhnZW5kZXJfbnVtX2ZhY3QpDQpgYGANCg0KIyBnbCgpDQoNClRvIGNyZWF0ZSBhIHNlcXVlbmNlIG9mIGZhY3RvciB0eXBlIGRhdGEgYGdsKCkgZnVuY3Rpb24gY2FuIGJlIHVzZWQgLSANCmBgYHtyfQ0KZmFjX3NlcSA8LSBnbChuID0gMiwgayA9IDMpDQpmYWNfc2VxDQpjbGFzcyhmYWNfc2VxKQ0KYGBgDQpXZSBjYW4gc3BlY2lmeSB0aGUgbGV2ZWxzIHVzaW5nIHRoZSBhcmd1bWVudCBgbGFiZWxzYC0NCmBgYHtyfQ0KZmFjX3NlcSA8LSBnbChuID0gMiwgayA9IDMsIGxhYmVscyA9IGMoIkNvbnRyb2wiLCAiVHJlYXRtZW50IikpDQpmYWNfc2VxDQpgYGANCg0KSWYgd2Ugd2FudCB0aGUgbGV2ZWxzIHRvIGJlIG9yZGVyZWQgdGhlbiBgb3JkZXJlZGAgYXJndW1lbnQgY2FuIGJlIHVzZWQgLQ0KYGBge3J9DQpnbChuID0gMiwgayA9IDMsIGxhYmVscyA9IGMoIkphbnVhcnkiLCAiTWF5IiksIG9yZGVyZWQgPSBUKQ0KYGBgDQoNCiMgQ2hhbmdpbmcgZmFjdG9yIGxldmVscw0KDQoqKkhvdyB0byBjaGFuZ2UgZmFjdG9yIGxldmVscz8qKg0KDQpgYGB7cn0NCmxldmVscyhnZW5kZXJfbnVtX2ZhY3QpIDwtIGMoJ0ZlbWFsZScsJ01hbGUnKQ0KZ2VuZGVyX251bV9mYWN0DQpgYGANCg0KDQojIE9yZGVyZWQgTGV2ZWxzDQoNCioqSG93IHRvIG1ha2Ugb3JkZXJlZCBsZXZlbHM/KioNCg0KTGV2ZWxzIGFyZSB1c3VhbGx5IG9yZGVyZWQgYXNjZW5kaW5nIG9yZGVyIG9mIGFscGhhYmV0cyAtIA0KYGBge3J9DQpteW1vbnRoIDwtIGMoJ0phbicsJ0RlYycsJ01hcicsJ01heScsJ1NlcCcpDQpteW1vbnRoDQpteW1vbnRoX2ZhY3QgPC0gZmFjdG9yKG15bW9udGgpDQpteW1vbnRoX2ZhY3QNCmBgYA0KDQpCdXQgd2Uga25vdyB0aGlzIGlzIG5vdCByaWdodCBzaW5jZSB0aGUgbW9udGhzIGFyZSBub3QgaW4gY29ycmVjdCBvcmRlci4gYG9yZGVyZWQgPSBUYCBwcmVzZXJ2ZXMgdGhlIG9yZGVyIG9mIGxldmVscyBwYXNzZWQgaW4gYGxldmVsc2AgYXJndW1lbnQgLSANCg0KYGBge3J9DQpteW1vbnRoX29yZF9mYWN0IDwtIGZhY3RvcihteW1vbnRoLCBsZXZlbHMgPSBtb250aC5hYmIsIG9yZGVyZWQgPSBUKQ0KIyJtb250aC5hYmIiIC0gYmFzZSBSJ3MgY29uc3RhbnQgdmVjdG9yLiBBbm90aGVyIGV4YW1wbGUgPSAibW9udGgubmFtZSINCm15bW9udGhfb3JkX2ZhY3QNCmBgYA0KDQpMZXQncyBkcm9wIHRoZSB1bnVzZWQgbGV2ZWxzIC0gDQpgYGB7cn0NCm15bW9udGhfb3JkX2ZhY3QgPC0gZHJvcGxldmVscyhteW1vbnRoX29yZF9mYWN0KQ0KbXltb250aF9vcmRfZmFjdA0KYGBgDQoNCiMgRmFjdG9yIHRvIE51bWVyaWMNCg0KV2UgY2FuIHNlZSBzb21lIHZhbHVlcyBhcmUgZW5jb2RlZCBhcyBmYWN0b3JzIGluIHRoZSBmb2xsb3dpbmcgdmVjdG9yIC0gDQpgYGB7cn0NCnJhaXNlIDwtIGZhY3RvcihjKDEyMDAsMTUwMCwxNTAwLDEyMDAsMTUwMCwxODAwLDE1MDAsMTIwMCkpDQpyYWlzZQ0KYGBgDQoNCkJ1dCB3aGVuIHdlIHRyeSB0byBjb252ZXJ0IGl0IGludG8gbnVtZXJpYyB0eXBlLCB3ZSdsbCBnZXQgdGhlIDEsMiwzLi4uIC4NCmBgYHtyfQ0KYXMubnVtZXJpYyhyYWlzZSkNCmBgYA0KDQpUbyBnZXQgdGhlIG51bWJlcnMgaW4gb3V0cHV0IHdlIG5lZWQgdG8gY29udmVydCBpdCBpbnRvIHN0cmluZ3MgZmlyc3QsIGFuZCB0aGVuIGludG8gbnVtZXJpYyAtIA0KYGBge3J9DQphcy5udW1lcmljKGFzLmNoYXJhY3RlcihyYWlzZSkpDQpgYGANCg0K