Extract and manipulate functions

Lubridate

Let’s explore functions that let you get and set individual components of date and time. You can extract individual parts of the date with the accessor functions in {lubridate}. Here is the list of available functions:

lubridate_accessor_functions <- data.frame(`Accessor Function` = c("year()", "month()", 
                                                                   "mday()", "yday()", "wday()", 
                                                                   "hour()", "minute()", "second()"), 
                                           Extracts = c("year", "month", 
                                                        "day of the month", "day of the year", 
                                                        "day of the week", "hour", 
                                                        "minute", "second"))

Let’s explore some of these functions using {nycflights13} package.

Step 1: Load the flights data:

head(flights)
## # A tibble: 6 x 19
##    year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time
##   <int> <int> <int>    <int>          <int>     <dbl>    <int>          <int>
## 1  2013     1     1      517            515         2      830            819
## 2  2013     1     1      533            529         4      850            830
## 3  2013     1     1      542            540         2      923            850
## 4  2013     1     1      544            545        -1     1004           1022
## 5  2013     1     1      554            600        -6      812            837
## 6  2013     1     1      554            558        -4      740            728
## # … with 11 more variables: arr_delay <dbl>, carrier <chr>, flight <int>,
## #   tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>,
## #   hour <dbl>, minute <dbl>, time_hour <dttm>

This data frame includes 19 variables, for date manipulations, you will use only the year, month, day, hour and minute columns.

Step 2: Create a data frame using only the year, month, day, hour and minute columns shown above.

flights_new <- flights %>% 
  select(year, month, day, hour, minute) 

flights_new %>% head()
## # A tibble: 6 x 5
##    year month   day  hour minute
##   <int> <int> <int> <dbl>  <dbl>
## 1  2013     1     1     5     15
## 2  2013     1     1     5     29
## 3  2013     1     1     5     40
## 4  2013     1     1     5     45
## 5  2013     1     1     6      0
## 6  2013     1     1     5     58

Step 3: To create a date/time from this sort of input, you can use make_date() for dates and make_datetime() for date-times.

flights_new %<>% 
  mutate(departure = make_datetime(year, month, day, hour, minute))

head(flights_new)
## # A tibble: 6 x 6
##    year month   day  hour minute departure          
##   <int> <int> <int> <dbl>  <dbl> <dttm>             
## 1  2013     1     1     5     15 2013-01-01 05:15:00
## 2  2013     1     1     5     29 2013-01-01 05:29:00
## 3  2013     1     1     5     40 2013-01-01 05:40:00
## 4  2013     1     1     5     45 2013-01-01 05:45:00
## 5  2013     1     1     6      0 2013-01-01 06:00:00
## 6  2013     1     1     5     58 2013-01-01 05:58:00

Step 4: Now, to extract the year information of the flights_new$departure column you can use following command:

flights_new$departure %>% 
  year() %>% 
  head()
## [1] 2013 2013 2013 2013 2013 2013

Step 5: For month() and wday() you can set label = TRUE argument to return the abbreviated name of the month or day of the week. You can also set abbr = FALSE to return the full name:

flights_new$departure %>% 
  month(label = TRUE, abbr = TRUE) %>% 
  head()
## [1] Jan Jan Jan Jan Jan Jan
## 12 Levels: Jan < Feb < Mar < Apr < May < Jun < Jul < Aug < Sep < ... < Dec
flights_new$departure %>% 
  month(label = TRUE, abbr = FALSE) %>% 
  head()
## [1] January January January January January January
## 12 Levels: January < February < March < April < May < June < ... < December

Base R

Base R works slightly differently to lubridate, however the base R approach is covered in "Module 6.3.1: Convert strings to dates", where we will look at converting a character string to a date using the as.Date() function and changing the date format using base R.

dates <- c("2020-01-01", "2019-06-30", "2012-04-27", "2008-08-08", "2010-11-19")
dates
## [1] "2020-01-01" "2019-06-30" "2012-04-27" "2008-08-08" "2010-11-19"
class(dates)
## [1] "character"

The as.Date() function converts character strings to date format.

as.Date(dates)
## [1] "2020-01-01" "2019-06-30" "2012-04-27" "2008-08-08" "2010-11-19"
class(as.Date(dates))
## [1] "Date"

It can also handle dates in the US format:

us_dates <- c("2020-01-01", "2019-30-06", "2012-27-04", "2008-08-08", "2010-19-11")
us_dates
## [1] "2020-01-01" "2019-30-06" "2012-27-04" "2008-08-08" "2010-19-11"
class(us_dates)
## [1] "character"
as.Date(us_dates) 
## [1] "2020-01-01" NA           NA           "2008-08-08" NA
# Note that there are missing values... that's because R thinks that we're using ISO8601
# How do we change this? We use the format argument: 

as.Date(us_dates, format = "%Y-%d-%m")
## [1] "2020-01-01" "2019-06-30" "2012-04-27" "2008-08-08" "2010-11-19"
# Note that it has handled those US dates, and also converted them into ISO8601 format
class(as.Date(us_dates, format = "%Y-%d-%m"))
## [1] "Date"

We can also convert to other formats, using the format() function, however this changes the date back into a character string:

dates2 <- format(as.Date(dates), format = "%d/%m/%Y")
dates2
## [1] "01/01/2020" "30/06/2019" "27/04/2012" "08/08/2008" "19/11/2010"
class(dates2)
## [1] "character"

Because R can handle dates in other formats and convert them to date, we can change from "dd/mm/YYYY" to "dd.mm.YY" quite easily:

dates3 <- format(as.Date(dates2, format = "%d/%m/%y"), format = "%d.%m.%y")
dates3
## [1] "01.01.20" "30.06.20" "27.04.20" "08.08.20" "19.11.20"
class(dates3)
## [1] "character"

The base R approach to dates will be addressed more fully in Module 6.3.1.