Factor Variables

Sameer Mathur

Create a Dataframe

patientID <- c(1, 2, 3, 4, 5, 6)
age <- c(25, 34, 28, 52, 34, 64)
diabetes <- c("Type1", "Type2", "Type1", "Type1", "Type2", "Type1")
status <- c("Poor", "Improved", "Excellent", "Poor", "Improved", "Excellent")
patientdata <- data.frame(patientID, age, diabetes, status)

View the data

patientdata
  patientID age diabetes    status
1         1  25    Type1      Poor
2         2  34    Type2  Improved
3         3  28    Type1 Excellent
4         4  52    Type1      Poor
5         5  34    Type2  Improved
6         6  64    Type1 Excellent

str

# structure of the patientdata
str(patientdata)
'data.frame':   6 obs. of  4 variables:
 $ patientID: num  1 2 3 4 5 6
 $ age      : num  25 34 28 52 34 64
 $ diabetes : Factor w/ 2 levels "Type1","Type2": 1 2 1 1 2 1
 $ status   : Factor w/ 3 levels "Excellent","Improved",..: 3 2 1 3 2 1

Summary of the data

# summary
summary(patientdata)
   patientID         age        diabetes       status 
 Min.   :1.00   Min.   :25.0   Type1:4   Excellent:2  
 1st Qu.:2.25   1st Qu.:29.5   Type2:2   Improved :2  
 Median :3.50   Median :34.0             Poor     :2  
 Mean   :3.50   Mean   :39.5                          
 3rd Qu.:4.75   3rd Qu.:47.5                          
 Max.   :6.00   Max.   :64.0                          

mtcars

# structure of mtcars
data(mtcars)
str(mtcars)
'data.frame':   32 obs. of  11 variables:
 $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
 $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
 $ disp: num  160 160 108 258 360 ...
 $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
 $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
 $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
 $ qsec: num  16.5 17 18.6 19.4 17 ...
 $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
 $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
 $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
 $ carb: num  4 4 1 1 2 1 4 2 2 4 ...

Car Cylinders

# car cylinders in mtcars
cyls = mtcars$cyl
cyls
 [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4
is.factor(cyls)
[1] FALSE
is.numeric(cyls)
[1] TRUE

Car Cylinders

Now let's create a factor variable called cyls.f based on cyls.

# car cylinders in mtcars
cyls
 [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4
cyls.f <- factor(cyls, labels = c("Low", "Medium", "High"))
cyls.f
 [1] Medium Medium Low    Medium High   Medium High   Low    Low    Medium
[11] Medium High   High   High   High   High   High   Low    Low    Low   
[21] Low    High   High   High   High   Low    Low    Low    High   Medium
[31] High   Low   
Levels: Low Medium High

The first label “Low” “, will correspond to cyl=4, the second label "Medium” will correspond to cyl=6 etc, because the order of the labels will follow the numeric order of the data.