Convert strings to dates

Step 1: Read the dataset:

candy <- read.csv("data/candy_production.csv")
# We're using base R to read the CSV because we want it to be read as a character 
# string, so we can learn to manually change it to date format. If we used 
# readr::read_csv(), it would automatically be read in date format: 
candy2 <- readr::read_csv("data/candy_production.csv")

head(candy)
##   observation_date IPG3113N
## 1       1972-01-01  85.6945
## 2       1972-02-01  71.8200
## 3       1972-03-01  66.0229
## 4       1972-04-01  64.5645
## 5       1972-05-01  65.0100
## 6       1972-06-01  67.6467
class(candy)
## [1] "data.frame"
head(candy)
##   observation_date IPG3113N
## 1       1972-01-01  85.6945
## 2       1972-02-01  71.8200
## 3       1972-03-01  66.0229
## 4       1972-04-01  64.5645
## 5       1972-05-01  65.0100
## 6       1972-06-01  67.6467
head(candy2)
## # A tibble: 6 x 2
##   observation_date IPG3113N
##   <date>              <dbl>
## 1 1972-01-01           85.7
## 2 1972-02-01           71.8
## 3 1972-03-01           66.0
## 4 1972-04-01           64.6
## 5 1972-05-01           65.0
## 6 1972-06-01           67.6
class(candy2)
## [1] "spec_tbl_df" "tbl_df"      "tbl"         "data.frame"
head(candy2)
## # A tibble: 6 x 2
##   observation_date IPG3113N
##   <date>              <dbl>
## 1 1972-01-01           85.7
## 2 1972-02-01           71.8
## 3 1972-03-01           66.0
## 4 1972-04-01           64.6
## 5 1972-05-01           65.0
## 6 1972-06-01           67.6

Step 2: Check the structure of the data

str(candy$observation_date)
##  chr [1:548] "1972-01-01" "1972-02-01" "1972-03-01" "1972-04-01" ...

Note that if we similarly check candy2, which was the same dataset, but read using readr::read_csv(), the structure is not character:

str(candy2$observation_date)
##  Date[1:548], format: "1972-01-01" "1972-02-01" "1972-03-01" "1972-04-01" "1972-05-01" ...

This is why we used the base R approach to read the data - so that we can demonstrate how to convert from a character string to a date.

The observation_date variable was read in as a character. In order to convert this to a date format, you can use different strategies. First one is to convert using as.Date() function under Base R.

Step 3: Use the as.Date() function to convert the date format.

candy$observation_date <- as.Date(candy$observation_date)

# check the structure
str(candy$observation_date)
##  Date[1:548], format: "1972-01-01" "1972-02-01" "1972-03-01" "1972-04-01" "1972-05-01" ...

Alternatively, we could include this using the mutate() function to update the variable:

candy %<>% 
  mutate(observation_date = as.Date(observation_date))

Note that the default date format is YYYY-MM-DD; therefore, if your string is of different format you must incorporate the format argument. There are multiple formats that dates can be in; for a complete list of formatting code options in R type ?strftime in your console.

Have a look at these two examples:

x <- c("08/03/2018", "23/03/2016", "30/01/2018")
y <- c("08.03.2018", "23.03.2016", "30.01.2018")

This time the string format is DD/MM/YYYY for x and DD.MM.YYYY for y; therefore, you need to specify the format argument explicitly.

Step 4: Create the above two examples and specify the format argument explicitly.

x_date <- as.Date(x, format = "%d/%m/%Y")
x_date %>% str()
##  Date[1:3], format: "2018-03-08" "2016-03-23" "2018-01-30"

Similarly, for y (with an example of including as part of a pipe):

y_date <- y %>% 
  as.Date(format = "%d.%m.%Y") # Note that it can recognise the full stop 
str(y_date)
##  Date[1:3], format: "2018-03-08" "2016-03-23" "2018-01-30"

It's possible to change the date formatting for outputs, however this will revert them back to character strings, now using the format() function:

candy %<>% 
  mutate(dmy_date = format(observation_date, "%d-%m-%Y")) 

str(candy$dmy_date)
##  chr [1:548] "01-01-1972" "01-02-1972" "01-03-1972" "01-04-1972" ...

When might something like this be useful? Sometimes outputs for management or clients might require dates in a different format. You can also use inline code in R Markdown to include dates in other formats in your reports:

# Note, the data relates to monthly production and the observation_date relates to 
# the month, not to a specific day. Accordingly, we're dropping the day here. 
reference_values <- candy %>% 
  summarise(first_date = min(observation_date) %>% format("%B %Y"), 
            last_date = max(observation_date) %>% format("%B %Y"), 
            min_prod = min(IPG3113N) %>% round(digits = 1), 
            max_prod = max(IPG3113N) %>% round(digits = 1)) 

max_production_date <- candy %>% 
  filter(IPG3113N == max(IPG3113N)) %>% # locating when production was at its maximum
  pull(observation_date) %>% # extracting the associated date 
  format("%B %Y")  

If we were writing a report to the candy manufacturer, we could include something along the lines of:

"The data provided covered the monthly candy production in the United States, ranging from January 1972 until August 2017. Production values are expressed as a percentage of 2012 production and ranged from 50.7% to 139.9%. The maximum production occurred in December 2005."

Note, you'll need to Knit the Markdown document for the values of the inline code to be visible.