Dates come in many different forms. At first it can be challenging to
learn how to clean them up. Luckily, the tidyverse comes with a package
called lubridate that
makes it easy.
This post addresses a specific question brought up by a student: how
do we turn a date formatted like 2017-03-10T00:00:00.000Z
into a date object?
If you want more resources, take a look at the Dates and Times chapter in R For Data Science, or the lubridate cheat sheet
library(tidyverse) # so we can use the %>% operator and use dplyr
library(lubridate) # so we can deal with dates. It is part of the extended tidyverse but it is not part of the core tidyverse that loads automatically.
2017-03-10T00:00:00.000Z is a date-time.
As it name implies, date-times include both the date and the time.
You can get the current date-time by using the function
lubridate::now()
now()
## [1] "2022-11-08 14:14:13 EST"
Dates just include the date. You can get the current date using the
function lubridate::today()
today()
## [1] "2022-11-08"
Often when you load a dataset, the dates will load in as character
strings. lubridate has a class of functions that allows you
to easily parse these strings into dates and date-times. Here are some
simple examples
ymd(): parses dates that are written in year-month-day
format like 2022-11-08ymd_hms(): parses date-times written in
year-month-day-hour-minute-second format, like
2022-11-08 13:42:53 ESTfirst, we use ymd_hms() to parse the string into a
date-time object.
"2017-03-10T00:00:00.000Z" %>% ymd_hms()
## [1] "2017-03-10 UTC"
And then we use ymd() to transform the date-time object
into a date object.
date_object <- "2017-03-10T00:00:00.000Z" %>%
ymd_hms() %>%
ymd()
date_object
## [1] "2017-03-10"
date_object %>% class()
## [1] "Date"
And that’s it. Now we can turn this into a function
ymd_hms_string_to_ymd_date <- function(ymd_hms_string) {
ymd_hms_string %>%
lubridate::ymd_hms() %>%
lubridate::ymd()
}
Let’s try it out.
"2017-03-10T00:00:00.000Z" %>% ymd_hms_string_to_ymd_date()
## [1] "2017-03-10"
When you are creating datasets, it is often helpful to create data features that you can group data by during analysis. For example, perhaps you want to group by all deals done during a specific year or month.
There are another set of functions in lubridate that
help with this.
First let’s start with some data.
date_time_tibble <- tibble(date_time_char = c("2017-03-10T00:00:00.000Z", "2022-03-10T00:00:00.000Z", "2015-11-05T00:00:00.000Z"))
date_time_tibble
First, we test that our function works to clean the dates in our dataset
date_time_tibble %>%
mutate(date = ymd_hms_string_to_ymd_date(date_time_char))
Next, we will extract date features, such as the year, the month, and the week.
date_time_tibble %>%
mutate(date = ymd_hms_string_to_ymd_date(date_time_char),
year = year(date),
month = month(date, label = TRUE),
week_of_year = week(date))
There is plenty more you can do.