Step 1: Read the dataset:
candy <- read.csv("data/candy_production.csv")
# We're using base R to read the CSV because we want it to be read as a character
# string, so we can learn to manually change it to date format. If we used
# readr::read_csv(), it would automatically be read in date format:
candy2 <- readr::read_csv("data/candy_production.csv")
head(candy)
## observation_date IPG3113N
## 1 1972-01-01 85.6945
## 2 1972-02-01 71.8200
## 3 1972-03-01 66.0229
## 4 1972-04-01 64.5645
## 5 1972-05-01 65.0100
## 6 1972-06-01 67.6467
class(candy)
## [1] "data.frame"
head(candy)
## observation_date IPG3113N
## 1 1972-01-01 85.6945
## 2 1972-02-01 71.8200
## 3 1972-03-01 66.0229
## 4 1972-04-01 64.5645
## 5 1972-05-01 65.0100
## 6 1972-06-01 67.6467
head(candy2)
## # A tibble: 6 x 2
## observation_date IPG3113N
## <date> <dbl>
## 1 1972-01-01 85.7
## 2 1972-02-01 71.8
## 3 1972-03-01 66.0
## 4 1972-04-01 64.6
## 5 1972-05-01 65.0
## 6 1972-06-01 67.6
class(candy2)
## [1] "spec_tbl_df" "tbl_df" "tbl" "data.frame"
head(candy2)
## # A tibble: 6 x 2
## observation_date IPG3113N
## <date> <dbl>
## 1 1972-01-01 85.7
## 2 1972-02-01 71.8
## 3 1972-03-01 66.0
## 4 1972-04-01 64.6
## 5 1972-05-01 65.0
## 6 1972-06-01 67.6
Step 2: Check the structure of the data
str(candy$observation_date)
## chr [1:548] "1972-01-01" "1972-02-01" "1972-03-01" "1972-04-01" ...
Note that if we similarly check candy2, which was the same dataset, but read using readr::read_csv(), the structure is not character:
str(candy2$observation_date)
## Date[1:548], format: "1972-01-01" "1972-02-01" "1972-03-01" "1972-04-01" "1972-05-01" ...
This is why we used the base R approach to read the data - so that we can demonstrate how to convert from a character string to a date.
The observation_date variable was read in as a character. In order to convert this to a date format, you can use different strategies. First one is to convert using as.Date() function under Base R.
Step 3: Use the as.Date() function to convert the date format.
candy$observation_date <- as.Date(candy$observation_date)
# check the structure
str(candy$observation_date)
## Date[1:548], format: "1972-01-01" "1972-02-01" "1972-03-01" "1972-04-01" "1972-05-01" ...
Alternatively, we could include this using the mutate() function to update the variable:
candy %<>%
mutate(observation_date = as.Date(observation_date))
Note that the default date format is YYYY-MM-DD; therefore, if your string is of different format you must incorporate the format argument. There are multiple formats that dates can be in; for a complete list of formatting code options in R type ?strftime in your console.
Have a look at these two examples:
x <- c("08/03/2018", "23/03/2016", "30/01/2018")
y <- c("08.03.2018", "23.03.2016", "30.01.2018")
This time the string format is DD/MM/YYYY for x and DD.MM.YYYY for y; therefore, you need to specify the format argument explicitly.
Step 4: Create the above two examples and specify the format argument explicitly.
x_date <- as.Date(x, format = "%d/%m/%Y")
x_date %>% str()
## Date[1:3], format: "2018-03-08" "2016-03-23" "2018-01-30"
Similarly, for y (with an example of including as part of a pipe):
y_date <- y %>%
as.Date(format = "%d.%m.%Y") # Note that it can recognise the full stop
str(y_date)
## Date[1:3], format: "2018-03-08" "2016-03-23" "2018-01-30"
It's possible to change the date formatting for outputs, however this will revert them back to character strings, now using the format() function:
candy %<>%
mutate(dmy_date = format(observation_date, "%d-%m-%Y"))
str(candy$dmy_date)
## chr [1:548] "01-01-1972" "01-02-1972" "01-03-1972" "01-04-1972" ...
When might something like this be useful? Sometimes outputs for management or clients might require dates in a different format. You can also use inline code in R Markdown to include dates in other formats in your reports:
# Note, the data relates to monthly production and the observation_date relates to
# the month, not to a specific day. Accordingly, we're dropping the day here.
reference_values <- candy %>%
summarise(first_date = min(observation_date) %>% format("%B %Y"),
last_date = max(observation_date) %>% format("%B %Y"),
min_prod = min(IPG3113N) %>% round(digits = 1),
max_prod = max(IPG3113N) %>% round(digits = 1))
max_production_date <- candy %>%
filter(IPG3113N == max(IPG3113N)) %>% # locating when production was at its maximum
pull(observation_date) %>% # extracting the associated date
format("%B %Y")
If we were writing a report to the candy manufacturer, we could include something along the lines of:
"The data provided covered the monthly candy production in the United States, ranging from January 1972 until August 2017. Production values are expressed as a percentage of 2012 production and ranged from 50.7% to 139.9%. The maximum production occurred in December 2005."
Note, you'll need to Knit the Markdown document for the values of the inline code to be visible.