library(lubridate)
Meet your awkward dates, here I have created a character vector of five date structures to demonstrate how the function works. You might notice that they are all the same date - April 20 2019 - but represented in different ways.
awkward.dates <- c("2019-20-04", "2019-04-20", "04-20-2019", "2019-04-20 00:10:00", "2019-04-20 00:10")
Lets try and convert these to a date/time format.
as.POSIXct(awkward.dates)
## Error in as.POSIXlt.character(x, tz, ...): character string is not in a standard unambiguous format
as.Date(awkward.dates)
## Error in charToDate(x): character string is not in a standard unambiguous format
You will likely face similar data structures. Even if they are the same y-m-d format, some may contain times and others might not. This will cause issues with aggregation functions, particularly when using functions that require standard (unamibguous) string conventions.
The parse_date_time function from the Lubridate 0ackage allows all date formats to be converted into a standardised format.
As a minimum, parse_date_time needs the follwing aruments:
a character or numeric vector of dates (awkward.dates in our case) passed via the x argument.
a character vector of date-time formats, passed via the orders argument
lubridate::parse_date_time(x = awkward.dates,
orders = c("dmy"))
## Warning: All formats failed to parse. No formats found.
## [1] NA NA NA NA NA
Not even close - if I had checked some of my dates, I would have seen the format dmy does not occur.
Lets try again.
lubridate::parse_date_time(x = awkward.dates,
orders = c("ydm"))
## Warning: 4 failed to parse.
## [1] "2019-04-20 UTC" NA NA NA
## [5] NA
Better - adding ydm picked up the first date - note that it has also given it a time zone (UTC). You can adjust this if required using the tz arugment.
lubridate::parse_date_time(x = awkward.dates,
orders = c("ydm", "ymd", "mdy"), tz = "Australia/Sydney")
## Warning: 2 failed to parse.
## [1] "2019-04-20 AEST" "2019-04-20 AEST" "2019-04-20 AEST" NA
## [5] NA
Whilst we have now picked up Australia Eastern Standard Time (AEST), there are no times in our output yet.
lubridate::parse_date_time(x = awkward.dates,
orders = c("ydm", "ymd", "mdy", "ymd HMS"), tz = "Australia/Sydney")
## Warning: 1 failed to parse.
## [1] "2019-04-20 00:00:00 AEST" "2019-04-20 00:00:00 AEST"
## [3] "2019-04-20 00:00:00 AEST" "2019-04-20 00:10:00 AEST"
## [5] NA
There we have it - just one last awkward date to deal with and weโre there. The last date contains a time with only hours and minutes so lets pick that up.
new <- lubridate::parse_date_time(x = awkward.dates,
orders = c("ydm", "ymd", "mdy", "ymd HMS", "ymd HM"))
as.POSIXct(new)
## [1] "2019-04-20 00:00:00 UTC" "2019-04-20 00:00:00 UTC"
## [3] "2019-04-20 00:00:00 UTC" "2019-04-20 00:10:00 UTC"
## [5] "2019-04-20 00:10:00 UTC"
as.Date(new)
## [1] "2019-04-20" "2019-04-20" "2019-04-20" "2019-04-20" "2019-04-20"
Great, we now have an unambiguous date format we can work with!