Some algorthims such as Xgboost expect numeric vectors so what do you do if your features are categorical?
Here are a couple of techniques on recoding those categorical features.
Say we have the following dataframe
## dates items
## 1 1/24/2014 A
## 2 10/28/2014 b
## 3 10/29/2014 c
## 4 12/12/2014 d
## 5 1/4/2015 A
## 6 1/5/2015 e
## 7 1/9/2015 f
We can use the recode function in the car package to convert those categorical items to numeric:-
library(car)
required.labels <- df["items"]
recoded.labels <- recode(required.labels$items,"'A'=1; 'b'=2; 'c'=3; 'd'=4; 'e'=5; 'f'=6")
df$items <- recoded.labels
Now the dataframe items column was been recoded into a numeric type.
## dates items
## 1 1/24/2014 1
## 2 10/28/2014 2
## 3 10/29/2014 3
## 4 12/12/2014 4
## 5 1/4/2015 1
## 6 1/5/2015 5
## 7 1/9/2015 6
If for some reason you didn’t want to use the car package, you can do the recoding manually like this:-
df2
## dates items
## 1 1/24/2014 A
## 2 10/28/2014 b
## 3 10/29/2014 c
## 4 12/12/2014 d
## 5 1/4/2015 A
## 6 1/5/2015 e
## 7 1/9/2015 f
df2$items <- as.factor(df2$items)
items.1 <- df[,"items"]
num.items = length(levels(items.1))
levels(items.1) = 1:num.items
df$items <- items.1
df2$items <- as.numeric(df2$items)
df2
## dates items
## 1 1/24/2014 1
## 2 10/28/2014 2
## 3 10/29/2014 3
## 4 12/12/2014 4
## 5 1/4/2015 1
## 6 1/5/2015 5
## 7 1/9/2015 6
Note: We didn’t just use as.numeric(df2$items) here because of the way factors work. You can see more about this in the R FAQ.