Data Camp: Introduction to R

Using my own data and examples

Chapter 4 - Factors

Types of variables - Categorical with limited number of categories (e.g. sex) - Continuous with infinite number of values (e.g. income)

Factor: a data type with limited categories

Steps to create a factor using sex as an example 1. Create a character vector with elements containing the values of the categories using the c function to combine them.

#1. Create sex_vector
sex_vector <- c("Male","Male","Female","Male","Female","Male","Female","Male","Male","Female")
sex_vector
 [1] "Male"   "Male"   "Female" "Male"   "Female" "Male"  
 [7] "Female" "Male"   "Male"   "Female"
  1. Convert the vector to a factor and assign the result to a variable
#Convert sex_vector to a factor, giving it a new name
factor_sex <- factor(sex_vector)
factor_sex
 [1] Male   Male   Female Male   Female Male   Female Male  
 [9] Male   Female
Levels: Female Male

Types of categorical factors: - Nominal - having no implied order (like sex) - Ordinal - having an implied order (Example: low, medium, high) The factor function needs two arguments: - order=TRUE - levels = c(“1st category”,“2nd category” …)

# Temperature
temperature_vector <- c("High", "Low", "High","Low", "Medium")
factor_temperature_vector <- factor(temperature_vector, order = TRUE, levels = c("Low", "Medium", "High"))
factor_temperature_vector
[1] High   Low    High   Low    Medium
Levels: Low < Medium < High

If your vector only contains single letters, the order matters when you map the vector elements to levels. R assumes alphabetical order. In this survey vector: “M”, “F”, “F”, “M”, “M”), R will show the levels as “F” “M”.

To correctly map “F” to “Female” and “M” to “Male”, the levels should be set to c(“Female”, “Male”), in this order.

#Create survey_vector
survey_vector <- c("M", "F", "F", "M", "M")
factor_survey_vector <- factor(survey_vector)
factor_survey_vector
[1] M F F M M
Levels: F M
#Assign levels to the factor_survey_vector
levels(factor_survey_vector) <- c("Female","Male")
factor_survey_vector
[1] Male   Female Female Male   Male  
Levels: Female Male

Summary Function - This function gives you a summary of a variable. The summary of the factor is more useful than the summary of the vector.

# Generate summary for survey_vector
summary(survey_vector)
   Length     Class      Mode 
        5 character character 
# Generate summary for factor_survey_vector
summary(factor_survey_vector)
Female   Male 
     2      3 

Compare elements of a factor

Assign each element of the factor to a variable using square brackets: variable_name <- factor_name[# of element]


# Create factor_speed_vector
speed_vector <- c("medium", "slow", "slow", "medium", "fast")
factor_speed_vector <- factor(speed_vector, ordered = TRUE, levels = c("slow", "medium", "fast"))

# Factor value for second data analyst
da2 <- factor_speed_vector[2]

# Factor value for fifth data analyst
da5 <- factor_speed_vector[5]

# Is data analyst 2 faster than data analyst 5?
da2 > da5
[1] FALSE
LS0tCnRpdGxlOiAiTGVhcm5pbmcgUiAtIE1hdHJpY2VzIgpvdXRwdXQ6IGh0bWxfbm90ZWJvb2sKLS0tCiMgRGF0YSBDYW1wOiBJbnRyb2R1Y3Rpb24gdG8gUgojIyMgVXNpbmcgbXkgb3duIGRhdGEgYW5kIGV4YW1wbGVzCgojIyBDaGFwdGVyIDQgLSBGYWN0b3JzCgpUeXBlcyBvZiB2YXJpYWJsZXMKLSBDYXRlZ29yaWNhbCB3aXRoIGxpbWl0ZWQgbnVtYmVyIG9mIGNhdGVnb3JpZXMgKGUuZy4gc2V4KQotIENvbnRpbnVvdXMgd2l0aCBpbmZpbml0ZSBudW1iZXIgb2YgdmFsdWVzIChlLmcuIGluY29tZSkKCkZhY3RvcjogYSBkYXRhIHR5cGUgd2l0aCBsaW1pdGVkIGNhdGVnb3JpZXMKClN0ZXBzIHRvIGNyZWF0ZSBhIGZhY3RvciB1c2luZyBzZXggYXMgYW4gZXhhbXBsZQoxLiBDcmVhdGUgYSBjaGFyYWN0ZXIgdmVjdG9yIHdpdGggZWxlbWVudHMgY29udGFpbmluZyB0aGUgdmFsdWVzIG9mIHRoZSBjYXRlZ29yaWVzIHVzaW5nIHRoZSBjIGZ1bmN0aW9uIHRvIGNvbWJpbmUgdGhlbS4KYGBge3J9CiMxLiBDcmVhdGUgc2V4X3ZlY3RvcgpzZXhfdmVjdG9yIDwtIGMoIk1hbGUiLCJNYWxlIiwiRmVtYWxlIiwiTWFsZSIsIkZlbWFsZSIsIk1hbGUiLCJGZW1hbGUiLCJNYWxlIiwiTWFsZSIsIkZlbWFsZSIpCnNleF92ZWN0b3IKYGBgCjIuIENvbnZlcnQgdGhlIHZlY3RvciB0byBhIGZhY3RvciBhbmQgYXNzaWduIHRoZSByZXN1bHQgdG8gYSB2YXJpYWJsZQpgYGB7cn0KI0NvbnZlcnQgc2V4X3ZlY3RvciB0byBhIGZhY3RvciwgZ2l2aW5nIGl0IGEgbmV3IG5hbWUKZmFjdG9yX3NleCA8LSBmYWN0b3Ioc2V4X3ZlY3RvcikKZmFjdG9yX3NleApgYGAKClR5cGVzIG9mIGNhdGVnb3JpY2FsIGZhY3RvcnM6Ci0gTm9taW5hbCAtIGhhdmluZyBubyBpbXBsaWVkIG9yZGVyIChsaWtlIHNleCkKLSBPcmRpbmFsIC0gaGF2aW5nIGFuIGltcGxpZWQgb3JkZXIgKEV4YW1wbGU6IGxvdywgbWVkaXVtLCBoaWdoKQogIFRoZSBmYWN0b3IgZnVuY3Rpb24gbmVlZHMgdHdvIGFyZ3VtZW50czoKICAgICAtIG9yZGVyPVRSVUUKICAgICAtIGxldmVscyA9IGMoIjFzdCBjYXRlZ29yeSIsIjJuZCBjYXRlZ29yeSIgLi4uKQogIApgYGB7cn0KIyBUZW1wZXJhdHVyZQp0ZW1wZXJhdHVyZV92ZWN0b3IgPC0gYygiSGlnaCIsICJMb3ciLCAiSGlnaCIsIkxvdyIsICJNZWRpdW0iKQpmYWN0b3JfdGVtcGVyYXR1cmVfdmVjdG9yIDwtIGZhY3Rvcih0ZW1wZXJhdHVyZV92ZWN0b3IsIG9yZGVyID0gVFJVRSwgbGV2ZWxzID0gYygiTG93IiwgIk1lZGl1bSIsICJIaWdoIikpCmZhY3Rvcl90ZW1wZXJhdHVyZV92ZWN0b3IKYGBgCiAgCklmIHlvdXIgdmVjdG9yIG9ubHkgY29udGFpbnMgc2luZ2xlIGxldHRlcnMsIHRoZSBvcmRlciBtYXR0ZXJzIHdoZW4geW91IG1hcCB0aGUgdmVjdG9yIGVsZW1lbnRzIHRvIGxldmVscy4gIFIgYXNzdW1lcyBhbHBoYWJldGljYWwgb3JkZXIuCkluIHRoaXMgc3VydmV5IHZlY3RvcjogIk0iLCAiRiIsICJGIiwgIk0iLCAiTSIpLCBSIHdpbGwgc2hvdyB0aGUgbGV2ZWxzIGFzICJGIiAiTSIuCgpUbyBjb3JyZWN0bHkgbWFwICJGIiB0byAiRmVtYWxlIiBhbmQgIk0iIHRvICJNYWxlIiwgdGhlIGxldmVscyBzaG91bGQgYmUgc2V0IHRvIGMoIkZlbWFsZSIsICJNYWxlIiksIGluIHRoaXMgb3JkZXIuCmBgYHtyfQojQ3JlYXRlIHN1cnZleV92ZWN0b3IKc3VydmV5X3ZlY3RvciA8LSBjKCJNIiwgIkYiLCAiRiIsICJNIiwgIk0iKQpmYWN0b3Jfc3VydmV5X3ZlY3RvciA8LSBmYWN0b3Ioc3VydmV5X3ZlY3RvcikKZmFjdG9yX3N1cnZleV92ZWN0b3IKCiNBc3NpZ24gbGV2ZWxzIHRvIHRoZSBmYWN0b3Jfc3VydmV5X3ZlY3RvcgpsZXZlbHMoZmFjdG9yX3N1cnZleV92ZWN0b3IpIDwtIGMoIkZlbWFsZSIsIk1hbGUiKQpmYWN0b3Jfc3VydmV5X3ZlY3RvcgoKYGBgCgpTdW1tYXJ5IEZ1bmN0aW9uCi0gVGhpcyBmdW5jdGlvbiBnaXZlcyB5b3UgYSBzdW1tYXJ5IG9mIGEgdmFyaWFibGUuICBUaGUgc3VtbWFyeSBvZiB0aGUgZmFjdG9yIGlzIG1vcmUgdXNlZnVsIHRoYW4gdGhlIHN1bW1hcnkgb2YgdGhlIHZlY3Rvci4KYGBge3J9CiMgR2VuZXJhdGUgc3VtbWFyeSBmb3Igc3VydmV5X3ZlY3RvcgpzdW1tYXJ5KHN1cnZleV92ZWN0b3IpCgojIEdlbmVyYXRlIHN1bW1hcnkgZm9yIGZhY3Rvcl9zdXJ2ZXlfdmVjdG9yCnN1bW1hcnkoZmFjdG9yX3N1cnZleV92ZWN0b3IpCgpgYGAKCkNvbXBhcmUgZWxlbWVudHMgb2YgYSBmYWN0b3IKCkFzc2lnbiBlYWNoIGVsZW1lbnQgb2YgdGhlIGZhY3RvciB0byBhIHZhcmlhYmxlIHVzaW5nIHNxdWFyZSBicmFja2V0czogdmFyaWFibGVfbmFtZSA8LSBmYWN0b3JfbmFtZVsjIG9mIGVsZW1lbnRdCmBgYHtyfQoKIyBDcmVhdGUgZmFjdG9yX3NwZWVkX3ZlY3RvcgpzcGVlZF92ZWN0b3IgPC0gYygibWVkaXVtIiwgInNsb3ciLCAic2xvdyIsICJtZWRpdW0iLCAiZmFzdCIpCmZhY3Rvcl9zcGVlZF92ZWN0b3IgPC0gZmFjdG9yKHNwZWVkX3ZlY3Rvciwgb3JkZXJlZCA9IFRSVUUsIGxldmVscyA9IGMoInNsb3ciLCAibWVkaXVtIiwgImZhc3QiKSkKCiMgRmFjdG9yIHZhbHVlIGZvciBzZWNvbmQgZGF0YSBhbmFseXN0CmRhMiA8LSBmYWN0b3Jfc3BlZWRfdmVjdG9yWzJdCgojIEZhY3RvciB2YWx1ZSBmb3IgZmlmdGggZGF0YSBhbmFseXN0CmRhNSA8LSBmYWN0b3Jfc3BlZWRfdmVjdG9yWzVdCgojIElzIGRhdGEgYW5hbHlzdCAyIGZhc3RlciB0aGFuIGRhdGEgYW5hbHlzdCA1PwpkYTIgPiBkYTUKCgpgYGAKCgo=