stringr: a package used to manipulate strings
Getting started
First we need to load these packages:
- tidyverse
- stringr
- dplyr - used for subsetting data in our analysis
- rmdformats - used to for styling html document
We’re going to load a dataset from fivethirtyeight.com to help us show examples of stringr at work. Our data shows murders in cities in America from 2014 to 2015.
We’ll take the first 10 rows of the data for simplicity’s sake.
url <- 'https://raw.githubusercontent.com/fivethirtyeight/data/master/murder_2016/murder_2015_final.csv'
murder_raw <- read_csv(url)## Parsed with column specification:
## cols(
## city = col_character(),
## state = col_character(),
## `2014_murders` = col_double(),
## `2015_murders` = col_double(),
## change = col_double()
## )
Ordering Strings
str_order(character vector,decreasing = X)
Purpose:
Order a character vector alphabetically.
Input:
character vector - what you want to order
X - indicate whether to order characters decreasing (FALSE - alphabetically) or increasing (TRUE - order from Z to A)
Output:
An ordered character vector
Example:
We’ll order the column ‘city’ from our dataframe ‘murder’
## [1] "Baltimore" "Chicago" "Cleveland" "Houston" "Kansas City"
## [6] "Milwaukee" "Nashville" "Philadelphia" "St. Louis" "Washington"
If you want to reverse the order to Z-A you can set decreasing = FALSE
## [1] "Washington" "St. Louis" "Philadelphia" "Nashville" "Milwaukee"
## [6] "Kansas City" "Houston" "Cleveland" "Chicago" "Baltimore"
Combining Strings
str_c(String1,String2,…Stringn)
Purpose:
The function takes in a strings or vectors of strings and concatentates them together
Input:
String or vector of strings separated by comma
Output:
Single string of vector of combined strings
Example:
You can combine as many strings as you want together at once
## [1] "abcdefgh"
Let’s let’s see how we can combine two vectors of strings together from our dataframe: the city and the state
## [1] "BaltimoreMaryland" "ChicagoIllinois"
## [3] "HoustonTexas" "ClevelandOhio"
## [5] "WashingtonD.C." "MilwaukeeWisconsin"
## [7] "PhiladelphiaPennsylvania" "Kansas CityMissouri"
## [9] "NashvilleTennessee" "St. LouisMissouri"
You can add a separator between the strings you’re combining using the sep = ’’ argument. Let’s separate the city and state by a comma.
Add this new data as a column, named City_State, in our dataframe murder.
## [1] "Baltimore,Maryland" "Chicago,Illinois"
## [3] "Houston,Texas" "Cleveland,Ohio"
## [5] "Washington,D.C." "Milwaukee,Wisconsin"
## [7] "Philadelphia,Pennsylvania" "Kansas City,Missouri"
## [9] "Nashville,Tennessee" "St. Louis,Missouri"
Replacing Strings
str_replace_all(string, pattern, string)
Purpose:
This function will replace all instances of a pattern with the given replacement
Input:
String or vector of strings
Pattern - you can use regular expressions here
Output:
Single string of vector of combined strings
Example:
Supposed we wanted to replace all appearances of . in the column ‘City_State’. We can easily do this with str_replace_all()
## [1] "Baltimore,Maryland" "Chicago,Illinois"
## [3] "Houston,Texas" "Cleveland,Ohio"
## [5] "Washington,D*C*" "Milwaukee,Wisconsin"
## [7] "Philadelphia,Pennsylvania" "Kansas City,Missouri"
## [9] "Nashville,Tennessee" "St* Louis,Missouri"
Get the Length of a String
str_length(string)
Purpose:
Find out the length of a string or a vector of strings
Input:
String or vector of strings
Output:
Integer
Example:
Let’s find how out how long each city name
## [1] 9 7 7 9 10 9 12 11 9 9
Let’s only view the rows in the dataframe where the city has more than 9 letters in the name. To do this we’ll also use the filter function from the package dplyr.
## # A tibble: 3 x 6
## city state `2014_murders` `2015_murders` change City_State
## <chr> <chr> <dbl> <dbl> <dbl> <chr>
## 1 Washington D.C. 105 162 57 Washington,D*C*
## 2 Philadelp… Pennsylva… 248 280 32 Philadelphia,Penns…
## 3 Kansas Ci… Missouri 78 109 31 Kansas City,Missou…
Conclusion
These examples are just the beginning of what you can do with stringr. If you need to manipulate, combine or work with strings in general, stringr is a great package to do so. Here’s a great stringr cheatsheet released by RStudio (https://rstudio.com/resources/cheatsheets/).
Resources: