Introduction to Factors

In R, a factor is a vector that can take on only a limited number of different values. Such a vector is often referred to as a categorical variable. The factor() function is used to create a factor.

# A factor of eye colors
eye_colors <- factor(c("brown", "blue", "green", "blue", "brown", "brown"))
print(eye_colors)
## [1] brown blue  green blue  brown brown
## Levels: blue brown green

Exercises

  1. Create a factor named education_levels with the following education levels: “High School”, “Bachelor’s”, “Master’s”, “PhD”, “Bachelor’s”.
# Your code here
education_levels <- factor(c("High School", "Bachelor's", "Master's", "PhD", "Bachelor's"))
print(education_levels)
## [1] High School Bachelor's  Master's    PhD         Bachelor's 
## Levels: Bachelor's High School Master's PhD
  1. Display the levels of the education_levels factor.
# Your code here
levels(education_levels)
## [1] "Bachelor's"  "High School" "Master's"    "PhD"
  1. Create an ordered factor for education_levels.
# Your code here
education_levels_ordered <- factor(education_levels,
  levels = c("High School", "Bachelor's", "Master's", "PhD"),
  ordered = TRUE
)
print(education_levels_ordered)
## [1] High School Bachelor's  Master's    PhD         Bachelor's 
## Levels: High School < Bachelor's < Master's < PhD
  1. Create a factor of movie ratings with the following ratings: “Good”, “Great”, “Good”, “Excellent”, “Good”, “Bad”. The levels should be “Bad”, “Good”, “Great”, “Excellent”.
# Your code here
movie_ratings <- factor(c("Good", "Great", "Good", "Excellent", "Good", "Bad"), levels = c("Bad", "Good", "Great", "Excellent"))
print(movie_ratings)
## [1] Good      Great     Good      Excellent Good      Bad      
## Levels: Bad Good Great Excellent
  1. Create a data frame with two columns: student (a character vector of student names) and grade (a factor with levels “A”, “B”, “C”, “D”, “F”).
# Your code here
students_df <- data.frame(
  student = c("Alif", "Rahib", "Upoma", "Bangla", "Svenska"),
  grade = factor(
    c("A", "B", "C", "D", "F"),
    levels = c("A", "B", "C", "D", "F")
    )
)
print(students_df)
##   student grade
## 1    Alif     A
## 2   Rahib     B
## 3   Upoma     C
## 4  Bangla     D
## 5 Svenska     F
  1. Subset the data frame to only include students with a grade of “A” or “B”.
# Your code here
top_students <- students_df[students_df$grade %in% c("A", "B"), ]
print(top_students)
##   student grade
## 1    Alif     A
## 2   Rahib     B

Solutions

Click here for the solutions