Where is my categorical variable?

Say we have some data like this, with a column “groups” that has a categorial variable w/ two levels, “Group.1” and “Group.2”

head(your.data)
##    groups    values
## 1 Group.1  2.556703
## 2 Group.1  3.464532
## 3 Group.1  4.008589
## 4 Group.1  3.877420
## 5 Group.1  5.078678
## 6 Group.1 11.518368
tail(your.data)
##     groups   values
## 25 Group.2 1.730739
## 26 Group.2 6.651701
## 27 Group.2 2.749330
## 28 Group.2 1.692362
## 29 Group.2 3.555673
## 30 Group.2 2.635218

Sometimes when R loads data it won’t automatically turn a categorical variable into a categorical variable, and instead treats it as raw text aka character data. When this happens and you use the summary() command, you don’t get what you’d expect from a grouping / categorical variable

summary(your.data)
##     groups              values       
##  Length:30          Min.   :-0.8548  
##  Class :character   1st Qu.: 2.9281  
##  Mode  :character   Median : 4.3288  
##                     Mean   : 4.7984  
##                     3rd Qu.: 6.0748  
##                     Max.   :13.4843

The word character below “groups” is hint as to the nature of the problem.

Fixing the Problem

The factor() function can be used to change a column to a categorical (aka “factor”) variable

your.data$groups <- factor(your.data$groups)

In words, this means “Take the”groups" columns of the your.data columns and replace it with factor-ized data from that same column“.

Now the summary() command makes sense

summary(your.data)
##      groups       values       
##  Group.1:17   Min.   :-0.8548  
##  Group.2:13   1st Qu.: 2.9281  
##               Median : 4.3288  
##               Mean   : 4.7984  
##               3rd Qu.: 6.0748  
##               Max.   :13.4843