The goal of this tutorial is to split a string in different characters and keep only the first element
library(purrr)
# In this tutorial we are going to use the mtcars dataset
data(mtcars)
# We are going to take the names of the rows to find the brand of the car
brandcar <- rownames(mtcars)
head(brandcar)
## [1] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710"
## [4] "Hornet 4 Drive" "Hornet Sportabout" "Valiant"
# We are going to use the strsplit function
# However as every model has different number of words the output is a list
head(strsplit(brandcar, split = " "))
## [[1]]
## [1] "Mazda" "RX4"
##
## [[2]]
## [1] "Mazda" "RX4" "Wag"
##
## [[3]]
## [1] "Datsun" "710"
##
## [[4]]
## [1] "Hornet" "4" "Drive"
##
## [[5]]
## [1] "Hornet" "Sportabout"
##
## [[6]]
## [1] "Valiant"
# Now we are going to select only the first element of each list
# We will use the function map from purrr
head(map(strsplit(brandcar, split = " "), 1))
## [[1]]
## [1] "Mazda"
##
## [[2]]
## [1] "Mazda"
##
## [[3]]
## [1] "Datsun"
##
## [[4]]
## [1] "Hornet"
##
## [[5]]
## [1] "Hornet"
##
## [[6]]
## [1] "Valiant"
# Now we can transform into a character vector because we have 1 element per car
head(as.character(map(strsplit(brandcar, split = " "), 1)))
## [1] "Mazda" "Mazda" "Datsun" "Hornet" "Hornet" "Valiant"
# In short we can define a string vector and do the whole process in two lines
brandcar <- rownames(mtcars)
mtcars$Brand <- as.character(map(strsplit(brandcar, split = " "), 1))
table(mtcars$Brand)
##
## AMC Cadillac Camaro Chrysler Datsun Dodge Duster Ferrari
## 1 1 1 1 1 1 1 1
## Fiat Ford Honda Hornet Lincoln Lotus Maserati Mazda
## 2 1 1 2 1 1 1 2
## Merc Pontiac Porsche Toyota Valiant Volvo
## 7 1 1 2 1 1
In this tutorial we have learnt how to split a string into different words and select only the first one.