Assignment on RPubs
Create Rmd on Github
Extend Rmd on Github
This project is to extend Michael Munguia’s stringr CREATE project. It will provide additional demonstration functions for the stringr package.
This Rmd will employ Michael’s csv data that is already a part of this repository. The csv file is endorsements-2020.csv and resides at this link: https://raw.githubusercontent.com/acatlin/SPRING2020TIDYVERSE/master/endorsements-2020.csv.
Let’s setup the initial libraries.
# Setup of the initial libraries for this demo
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(stringr)
# The repo variable is the SPRING2020TIDYVERSE repository
repo <- "https://raw.githubusercontent.com/acatlin/SPRING2020TIDYVERSE/master/"
We will read in the csv that resides in the repository.
# Imports the csv as a dataframe
(df <- readr::read_csv(paste0(repo, "endorsements-2020.csv")))
Michael’s project demonstrated the use of str_detect, mutate_at, str_replace, str_split, select, and filter. I will look at two functions.
Let’s look at the function str_replace_na. This function is able to replace NA values with a suitable replacement string. Let’s create a new dataframe where NA values in the city column are replaced with the string “Not Available”.
# Creates a new dataframe where NA is replaced with "Not Available"
(city_df <- df %>% mutate_at("city", str_replace_na, "Not Available"))
The function str_to_upper converts a string to upper case letters. This function may be handy for future text wrangling.
Let’s capitalize the data in the position column using this function.
# Creates a new dataframe where position is capitalized
(capitalize_position_df <- df %>% mutate_at("position", str_to_upper, "en"))
stringr provides handy functions to wrangle data. More information about stringr can be found here: https://cran.r-project.org/web/packages/stringr/stringr.pdf