Say we have some data like this, with a column “groups” that has a categorial variable w/ two levels, “Group.1” and “Group.2”
head(your.data)
## groups values
## 1 Group.1 2.556703
## 2 Group.1 3.464532
## 3 Group.1 4.008589
## 4 Group.1 3.877420
## 5 Group.1 5.078678
## 6 Group.1 11.518368
tail(your.data)
## groups values
## 25 Group.2 1.730739
## 26 Group.2 6.651701
## 27 Group.2 2.749330
## 28 Group.2 1.692362
## 29 Group.2 3.555673
## 30 Group.2 2.635218
Sometimes when R loads data it won’t automatically turn a categorical variable into a categorical variable, and instead treats it as raw text aka character data. When this happens and you use the summary() command, you don’t get what you’d expect from a grouping / categorical variable
summary(your.data)
## groups values
## Length:30 Min. :-0.8548
## Class :character 1st Qu.: 2.9281
## Mode :character Median : 4.3288
## Mean : 4.7984
## 3rd Qu.: 6.0748
## Max. :13.4843
The word character below “groups” is hint as to the nature of the problem.
The factor() function can be used to change a column to a categorical (aka “factor”) variable
your.data$groups <- factor(your.data$groups)
In words, this means “Take the”groups" columns of the your.data columns and replace it with factor-ized data from that same column“.
Now the summary() command makes sense
summary(your.data)
## groups values
## Group.1:17 Min. :-0.8548
## Group.2:13 1st Qu.: 2.9281
## Median : 4.3288
## Mean : 4.7984
## 3rd Qu.: 6.0748
## Max. :13.4843