Types of variables - Categorical with limited number of categories (e.g. sex) - Continuous with infinite number of values (e.g. income)
Factor: a data type with limited categories
Steps to create a factor using sex as an example 1. Create a character vector with elements containing the values of the categories using the c function to combine them.
#1. Create sex_vector
sex_vector <- c("Male","Male","Female","Male","Female","Male","Female","Male","Male","Female")
sex_vector
[1] "Male" "Male" "Female" "Male" "Female" "Male"
[7] "Female" "Male" "Male" "Female"
#Convert sex_vector to a factor, giving it a new name
factor_sex <- factor(sex_vector)
factor_sex
[1] Male Male Female Male Female Male Female Male
[9] Male Female
Levels: Female Male
Types of categorical factors: - Nominal - having no implied order (like sex) - Ordinal - having an implied order (Example: low, medium, high) The factor function needs two arguments: - order=TRUE - levels = c(“1st category”,“2nd category” …)
# Temperature
temperature_vector <- c("High", "Low", "High","Low", "Medium")
factor_temperature_vector <- factor(temperature_vector, order = TRUE, levels = c("Low", "Medium", "High"))
factor_temperature_vector
[1] High Low High Low Medium
Levels: Low < Medium < High
If your vector only contains single letters, the order matters when you map the vector elements to levels. R assumes alphabetical order. In this survey vector: “M”, “F”, “F”, “M”, “M”), R will show the levels as “F” “M”.
To correctly map “F” to “Female” and “M” to “Male”, the levels should be set to c(“Female”, “Male”), in this order.
#Create survey_vector
survey_vector <- c("M", "F", "F", "M", "M")
factor_survey_vector <- factor(survey_vector)
factor_survey_vector
[1] M F F M M
Levels: F M
#Assign levels to the factor_survey_vector
levels(factor_survey_vector) <- c("Female","Male")
factor_survey_vector
[1] Male Female Female Male Male
Levels: Female Male
Summary Function - This function gives you a summary of a variable. The summary of the factor is more useful than the summary of the vector.
# Generate summary for survey_vector
summary(survey_vector)
Length Class Mode
5 character character
# Generate summary for factor_survey_vector
summary(factor_survey_vector)
Female Male
2 3
Compare elements of a factor
Assign each element of the factor to a variable using square brackets: variable_name <- factor_name[# of element]
# Create factor_speed_vector
speed_vector <- c("medium", "slow", "slow", "medium", "fast")
factor_speed_vector <- factor(speed_vector, ordered = TRUE, levels = c("slow", "medium", "fast"))
# Factor value for second data analyst
da2 <- factor_speed_vector[2]
# Factor value for fifth data analyst
da5 <- factor_speed_vector[5]
# Is data analyst 2 faster than data analyst 5?
da2 > da5
[1] FALSE