Introduction

Greetings!. These are Chapter 1 and Chapter 2 course notes for the Datacamp course: “Working with Dates and Times in R”. The instructor for this course is Charlotte Wickham.

1 Chapter 1 - Introduction

1.1 Loading additional libraries

library(tidyverse)
## -- Attaching packages ---------------------------------------------------------------------- tidyverse 1.2.1 --
## v ggplot2 3.0.0     v purrr   0.2.5
## v tibble  1.4.2     v dplyr   0.7.6
## v tidyr   0.8.1     v stringr 1.3.1
## v readr   1.1.1     v forcats 0.3.0
## -- Conflicts ------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

1.2 Recognizing ISO 8601 Dates

Question: Which of the following is the correct way to specify the 4th of October 2004 according to ISO 8601

Answer: 2004-10-04

1.3 Specifying Dates

NOTE: The string “2013-04-03” is stored in a variable called x

Instruction:

  • Use str() to look at the structure of x and confirm it is just a character string

  • Convert x to a date using as.Date()

  • Use str() to look at the structore of x_date and confirm it is a Date

  • Now, use as.Date() to store the date April 10, 2014

# The date R 3.0.0 was released
x <- "2013-04-03"

# Examine structure of x
str(x)
##  chr "2013-04-03"
# Use as.Date() to interpret x as a date
x_date <- as.Date(x)

# Examine structure of x_date
str(x_date)
##  Date[1:1], format: "2013-04-03"
# Store April 10 2014 as a Date
april_10_2014 <- as.Date("2014-04-10")

1.4 Automatic Import

Instructions:

  • Use read_csv() to read in the CSV file rversions.csv as releases

  • Use str() to examine the structure of the date column

  • anytime is loaded and created an object called sep_10_2009. Use the anytime() function to parse sep_10_2009

# Load the readr package
library(readr)

# Use read_csv() to import rversions.csv
releases <- read_csv("https://assets.datacamp.com/production/course_5348/datasets/rversions.csv")
## Parsed with column specification:
## cols(
##   major = col_integer(),
##   minor = col_integer(),
##   patch = col_integer(),
##   date = col_date(format = ""),
##   datetime = col_datetime(format = ""),
##   time = col_time(format = ""),
##   type = col_character()
## )
# Examine the structure of the date column
str(releases$date)
##  Date[1:105], format: "1997-12-04" "1997-12-21" "1998-01-10" "1998-03-14" "1998-05-02" ...
# Load the anytime package
library(anytime)

# Various ways of writing Sep 10 2009
sep_10_2009 <- c("September 10 2009", "2009-09-10", "10 Sep 2009", "09-10-2009")

# Use anytime() to parse sep_10_2009
anytime(sep_10_2009)
## [1] "2009-09-10 CST" "2009-09-10 CST" "2009-09-10 CST" "2009-09-10 CST"

1.5 Why use Dates?

  • Once an object is stored as a date, one can apply mathematical equations or use them as a plot with plotting packages

  • Dates act like numbers

1.6 Plotting

Instructions:

  • Make a plot of releases over time by setting the x argument of the aes() function to the date column

  • Zoom in to the period from 2010 to 2014 by specifying limits from “2010-01-01” to “2014-01-01”. (Note: These strings need to be wrapped in as.Date() to be interpreted as Date objects)

  • Adjust the axis labeling by specifying date_breaks of “10 years”and date_labels of “%Y”

library(ggplot2)

# Set the x axis to the date column
ggplot(releases, aes(x = date, y = type)) + geom_line(aes(group = 1, color = factor(major)))

# Limit the axis to between 2010-01-01 and 2014-01-01
ggplot(releases, aes(x = date, y = type)) + geom_line(aes(group = 1, color = factor(major))) + xlim(as.Date("2010-01-01"), as.Date("2014-01-01"))
## Warning: Removed 87 rows containing missing values (geom_path).

# Specify breaks every ten years and labels with "%Y"
ggplot(releases, aes(x = date, y = type)) + geom_line(aes(group = 1, color = factor(major)))  + scale_x_date(date_breaks = "10 years", date_labels = "%Y")

1.7 Arithmetic and logical operators

Instructions:

  • Find the date eof the most recent release by calling max() on the date column in releases

  • Find the rows in releases that have the most recent date, by specifying the comparison: date = last_release_date in filter()

  • Print last_release to see which release this was

  • Calculate how long it has been since the most recent release by subtracting last_release_date frorm Sys.Date()

# Find the largest date
last_release_date <- max(releases$date)

# Filter row for last release
last_release <- filter(releases, date == last_release_date)

# Print last_release
last_release
## # A tibble: 1 x 7
##   major minor patch date       datetime            time   type 
##   <int> <int> <int> <date>     <dttm>              <time> <chr>
## 1     3     4     1 2017-06-30 2017-06-30 07:04:11 07:04  patch
# How long since last release?
Sys.Date() - last_release_date
## Time difference of 451 days

1.8 What about times?

  • ISO 8601 has something for times with the format HH:MM:SS.

  • There are two object types when we deal with Datetimes in R:
  • POSIXlt - list with named components
  • POSIXct - seconds since 1970-01-01.

  • POSIXct will go in a data frame.

  • as.POSIXct() turns a string into a POSIXct.

x <- as.POSIXct("1970-01-01 00:01:00")
str(x)
##  POSIXct[1:1], format: "1970-01-01 00:01:00"
  • ISO 8601 also handles time zones.

1.9 Getting Datetimes into R

  • Use as.POSIXct() and an appropriate string to input the datetime corresponding to Oct 1st 2010 at 12:12:00.

  • Enter the same datetime again, but now specify the timezone as “America/Los_Angeles”.

  • Use read_csv() to read in rversions.csv again.

  • Examine the structure of the datetime column to verify read_csv() has correctly interpreted it as a datetime.

# Use as.POSIXct to enter the datetime 
as.POSIXct("2010-10-01 12:12:00")
## [1] "2010-10-01 12:12:00 CST"
# Use as.POSIXct again but set the timezone to `"America/Los_Angeles"`
as.POSIXct("2010-10-01 12:12:00", tz = "America/Los_Angeles")
## [1] "2010-10-01 12:12:00 PDT"
# Use readr to import rversions.csv
library(readr)
releases <- read_csv("https://assets.datacamp.com/production/course_5348/datasets/rversions.csv")
## Parsed with column specification:
## cols(
##   major = col_integer(),
##   minor = col_integer(),
##   patch = col_integer(),
##   date = col_date(format = ""),
##   datetime = col_datetime(format = ""),
##   time = col_time(format = ""),
##   type = col_character()
## )
# Examine structure of datetime column
str(releases$datetime)
##  POSIXct[1:105], format: "1997-12-04 08:47:58" "1997-12-21 13:09:22" ...

1.10 Datetimes behave nicely too

Instructions:

  • USe read_csv() to import cran-logs_2015-04-17.csv

  • print logs to see the information we have on each download

  • Store the R 3.2.0 release time as a POSIXct object

  • Find out when the first request for 3.2.0 was made by filtering for values in datetime column that are greater than release time

  • Finally, see how downloads increase by creating histograms of download times for 3.2.0 and the previous version 3.1.3.

# Import "cran-logs_2015-04-17.csv" with read_csv()
logs <- read_csv("https://assets.datacamp.com/production/course_5348/datasets/cran-logs_2015-04-17.csv")
## Parsed with column specification:
## cols(
##   datetime = col_datetime(format = ""),
##   r_version = col_character(),
##   country = col_character()
## )
# Print logs
print(logs)
## # A tibble: 100,000 x 3
##    datetime            r_version country
##    <dttm>              <chr>     <chr>  
##  1 2015-04-16 22:40:19 3.1.3     CO     
##  2 2015-04-16 09:11:04 3.1.3     GB     
##  3 2015-04-16 17:12:37 3.1.3     DE     
##  4 2015-04-18 12:34:43 3.2.0     GB     
##  5 2015-04-16 04:49:18 3.1.3     PE     
##  6 2015-04-16 06:40:44 3.1.3     TW     
##  7 2015-04-16 00:21:36 3.1.3     US     
##  8 2015-04-16 10:27:23 3.1.3     US     
##  9 2015-04-16 01:59:43 3.1.3     SG     
## 10 2015-04-18 15:41:32 3.2.0     CA     
## # ... with 99,990 more rows
# Store the release time as a POSIXct object
release_time <- as.POSIXct("2015-04-16 07:13:33", tz = "UTC")

# When is the first download of 3.2.0?
logs %>% 
  filter(datetime > release_time,
    r_version == "3.2.0")
## # A tibble: 35,826 x 3
##    datetime            r_version country
##    <dttm>              <chr>     <chr>  
##  1 2015-04-18 12:34:43 3.2.0     GB     
##  2 2015-04-18 15:41:32 3.2.0     CA     
##  3 2015-04-18 14:58:41 3.2.0     IE     
##  4 2015-04-18 16:44:45 3.2.0     US     
##  5 2015-04-18 04:34:35 3.2.0     US     
##  6 2015-04-18 22:29:45 3.2.0     CH     
##  7 2015-04-17 16:21:06 3.2.0     US     
##  8 2015-04-18 20:34:57 3.2.0     AT     
##  9 2015-04-17 18:23:19 3.2.0     US     
## 10 2015-04-18 03:00:31 3.2.0     US     
## # ... with 35,816 more rows
# Examine histograms of downloads by version
ggplot(logs, aes(x = datetime)) +
  geom_histogram() +
  geom_vline(aes(xintercept = as.numeric(release_time)))+
  facet_wrap(~ r_version, ncol = 1)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

1.11 Why lubridate?

  • Designed to make working with dates in times easier

  • It is part of the tidyverse package
  • Plays nicely with other tidyverse packages

  • They have consistent behavior regardless of underlying object

2 Chapter 2 - Parsing Dates with “Lubridate”

# Let us load the library of lubridate
library(lubridate)
## 
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
## 
##     date

2.1 Selecting the right parsing function

Instructions:

For each date, the ISO 8601 format is displayed as a comment after it, to help you check your work:

  • Choose the correct function to parse x

  • Choose the correct function to parse y

  • Choose the correct function to parse z

# Parse x 
x <- "2010 September 20th" # 2010-09-20
ymd(x)
## [1] "2010-09-20"
# Parse y 
y <- "02.01.2010"  # 2010-01-02
dmy(y)
## [1] "2010-01-02"
# Parse z 
z <- "Sep, 12th 2010 14:00"  # 2010-09-12T14:00
mdy_hm(z)
## [1] "2010-09-12 14:00:00 UTC"

2.2 Specifying an order with ‘parse_date_time()’

  • x is a trickier datetime. Use the clues in the instructions to parse x.

  • two_orders has two different orders, parse both by specifying the order to be c(“mdy”, “dmy”).

  • Parse short_dates with orders = c(“dOmY”, “OmY”, “Y”). What happens to the dates that don’t have months or days specified?

# Specify an order string to parse x
x <- "Monday June 1st 2010 at 4pm"
parse_date_time(x, orders = "amdyIp")
## [1] "2010-06-01 16:00:00 UTC"
# Specify order to include both "mdy" and "dmy"
two_orders <- c("October 7, 2001", "October 13, 2002", "April 13, 2003", 
  "17 April 2005", "23 April 2017")
parse_date_time(two_orders, orders = c("mdy", "dmy"))
## [1] "2001-10-07 UTC" "2002-10-13 UTC" "2003-04-13 UTC" "2005-04-17 UTC"
## [5] "2017-04-23 UTC"
# Specify order to include "dOmY", "OmY" and "Y"
short_dates <- c("11 December 1282", "May 1372", "1253")
parse_date_time(short_dates, orders = c("dOmY", "OmY", "Y"))
## [1] "1282-12-11 UTC" "1372-05-01 UTC" "1253-01-01 UTC"

2.3 Import daily weather data

  • Import the daily data “akl_weather_daily.csv” with read_csv()

  • Print akl_daily_raw to confirm the date column hasn’t been interpreted as a date. Can you see why?

  • Using mutate() overwrite the column date with a parsed version of date. You need to specify the parsing function. Hint: the first date should be September 1.

  • Print akl_daily to verify the date column is now a Date.

  • Take a look at the data by plotting date on the x-axis and max_temp of the y-axis.

# Import CSV with read_csv()
akl_daily_raw <- read_csv("https://assets.datacamp.com/production/course_5348/datasets/akl_weather_daily.csv")
## Parsed with column specification:
## cols(
##   date = col_character(),
##   max_temp = col_integer(),
##   min_temp = col_integer(),
##   mean_temp = col_integer(),
##   mean_rh = col_integer(),
##   events = col_character(),
##   cloud_cover = col_integer()
## )
# Print akl_daily_raw
akl_daily_raw
## # A tibble: 3,661 x 7
##    date      max_temp min_temp mean_temp mean_rh events cloud_cover
##    <chr>        <int>    <int>     <int>   <int> <chr>        <int>
##  1 2007-9-1        60       51        56      75 <NA>             4
##  2 2007-9-2        60       53        56      82 Rain             4
##  3 2007-9-3        57       51        54      78 <NA>             6
##  4 2007-9-4        64       50        57      80 Rain             6
##  5 2007-9-5        53       48        50      90 Rain             7
##  6 2007-9-6        57       42        50      69 <NA>             1
##  7 2007-9-7        59       41        50      77 <NA>             4
##  8 2007-9-8        59       46        52      80 <NA>             5
##  9 2007-9-9        55       50        52      88 Rain             7
## 10 2007-9-10       59       50        54      82 Rain             4
## # ... with 3,651 more rows
# Parse date 
akl_daily <- akl_daily_raw %>%
  mutate(date = ymd(date))

# Print akl_daily
akl_daily
## # A tibble: 3,661 x 7
##    date       max_temp min_temp mean_temp mean_rh events cloud_cover
##    <date>        <int>    <int>     <int>   <int> <chr>        <int>
##  1 2007-09-01       60       51        56      75 <NA>             4
##  2 2007-09-02       60       53        56      82 Rain             4
##  3 2007-09-03       57       51        54      78 <NA>             6
##  4 2007-09-04       64       50        57      80 Rain             6
##  5 2007-09-05       53       48        50      90 Rain             7
##  6 2007-09-06       57       42        50      69 <NA>             1
##  7 2007-09-07       59       41        50      77 <NA>             4
##  8 2007-09-08       59       46        52      80 <NA>             5
##  9 2007-09-09       55       50        52      88 Rain             7
## 10 2007-09-10       59       50        54      82 Rain             4
## # ... with 3,651 more rows
# Plot to check work
ggplot(akl_daily, aes(x = date, y = max_temp)) +
  geom_line()
## Warning: Removed 1 rows containing missing values (geom_path).

2.4 Import hourly weather data

  • Import the hourly data, “akl_weather_hourly_2016.csv” with read_csv(), then print akl_hourly_raw to confirm the date is spread over year, month and mday.

  • Using mutate() create the column date with using make_date().

  • We’ve pasted together the date and time columns. Create datetime by parsing the datetime_string column.

  • Take a look at the date, time and datetime columns to verify they match up.

  • Take a look at the data by plotting datetime on the x-axis and temperature of the y-axis.

# Import "akl_weather_hourly_2016.csv"
akl_hourly_raw <- read_csv("https://assets.datacamp.com/production/course_5348/datasets/akl_weather_hourly_2016.csv")
## Parsed with column specification:
## cols(
##   year = col_integer(),
##   month = col_integer(),
##   mday = col_integer(),
##   time = col_time(format = ""),
##   temperature = col_double(),
##   weather = col_character(),
##   conditions = col_character(),
##   events = col_character(),
##   humidity = col_integer(),
##   date_utc = col_datetime(format = "")
## )
# Print akl_hourly_raw
akl_hourly_raw
## # A tibble: 17,454 x 10
##     year month  mday time  temperature weather conditions events humidity
##    <int> <int> <int> <tim>       <dbl> <chr>   <chr>      <chr>     <int>
##  1  2016     1     1 00:00        68   Clear   Clear      <NA>         68
##  2  2016     1     1 00:30        68   Clear   Clear      <NA>         68
##  3  2016     1     1 01:00        68   Clear   Clear      <NA>         73
##  4  2016     1     1 01:30        68   Clear   Clear      <NA>         68
##  5  2016     1     1 02:00        68   Clear   Clear      <NA>         68
##  6  2016     1     1 02:30        68   Clear   Clear      <NA>         68
##  7  2016     1     1 03:00        68   Clear   Clear      <NA>         68
##  8  2016     1     1 03:30        68   Cloudy  Partly Cl~ <NA>         68
##  9  2016     1     1 04:00        68   Cloudy  Scattered~ <NA>         68
## 10  2016     1     1 04:30        66.2 Cloudy  Partly Cl~ <NA>         73
## # ... with 17,444 more rows, and 1 more variable: date_utc <dttm>
# Use make_date() to combine year, month and mday 
akl_hourly  <- akl_hourly_raw  %>% 
  mutate(date = make_date(year = year, month = month, day = mday))

# Parse datetime_string 
akl_hourly <- akl_hourly  %>% 
  mutate(
    datetime_string = paste(date, time, sep = "T"),
    datetime = ymd_hms(datetime_string)
  )

# Print date, time and datetime columns of akl_hourly
akl_hourly %>% select(date, time, datetime)
## # A tibble: 17,454 x 3
##    date       time   datetime           
##    <date>     <time> <dttm>             
##  1 2016-01-01 00:00  2016-01-01 00:00:00
##  2 2016-01-01 00:30  2016-01-01 00:30:00
##  3 2016-01-01 01:00  2016-01-01 01:00:00
##  4 2016-01-01 01:30  2016-01-01 01:30:00
##  5 2016-01-01 02:00  2016-01-01 02:00:00
##  6 2016-01-01 02:30  2016-01-01 02:30:00
##  7 2016-01-01 03:00  2016-01-01 03:00:00
##  8 2016-01-01 03:30  2016-01-01 03:30:00
##  9 2016-01-01 04:00  2016-01-01 04:00:00
## 10 2016-01-01 04:30  2016-01-01 04:30:00
## # ... with 17,444 more rows
# Plot to check work
ggplot(akl_hourly, aes(x = datetime, y = temperature)) +
  geom_line()

2.5 Extracting parts of a datetime

  • In lubridate, extracting parts of a datetime is easier

Example:

x <- ymd("2013-02-23")

year(x)
## [1] 2013
month(x)
## [1] 2
day(x)
## [1] 23
  • It is also possible to set parts of a datetime Example:
print(x)
## [1] "2013-02-23"
year(x) <- 2017
x
## [1] "2017-02-23"

2.6 What can you extract?

Before we begin this exercise, we must extract certain columns first.

release_time <- releases$datetime

Instructions: - Examine the head() of release_Time to verify this is a vector of datetimes

  • Extract the month from release_time and examine the first few with head()

  • To see which months have most releases, extract the month and then pipe to table()

  • Repeat, to see which years have the most releases

  • Do releases happen in the morning of UTC?. Find out if the hour of a release is less than 12 and summarise with mean()

  • Alternatively use am() to find out how many releases happen in the morning

# Examine the head() of release_time
head(release_time)
## [1] "1997-12-04 08:47:58 UTC" "1997-12-21 13:09:22 UTC"
## [3] "1998-01-10 00:31:55 UTC" "1998-03-14 19:25:55 UTC"
## [5] "1998-05-02 07:58:17 UTC" "1998-06-14 12:56:20 UTC"
# Examine the head() of the months of release_time
head(month(release_time))
## [1] 12 12  1  3  5  6
# Extract the month of releases 
month(release_time) %>% table()
## .
##  1  2  3  4  5  6  7  8  9 10 11 12 
##  5  6  8 18  5 16  4  7  2 15  6 13
# Extract the year of releases
year(release_time) %>% table()
## .
## 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 
##    2   10    9    6    6    5    5    4    4    4    4    6    5    4    6 
## 2012 2013 2014 2015 2016 2017 
##    4    4    4    5    5    3
# How often is the hour before 12 (noon)?
mean(hour(release_time) < 12)
## [1] 0.752381
# How often is the release in am?
mean(am(release_time))
## [1] 0.752381

2.7 Adding useful labels

Instructions:

  • First, see what wday() does without labeling, by calling it on the datetime column of releases and tabulating the result. Do you know if 1 is Sunday or Monday?

  • Repeat above, but now use labels by specifying the label argument. Better, right?

  • Now store the labelled weekdays in a new column called wday

  • Create a barchart of releases by weekday, facetted by the type of release.

# Use wday() to tabulate release by day of the week
wday(releases$datetime) %>% table()
## .
##  1  2  3  4  5  6  7 
##  3 29  9 12 18 31  3
# Add label = TRUE to make table more readable
wday(releases$datetime, label = TRUE) %>% table()
## .
## Sun Mon Tue Wed Thu Fri Sat 
##   3  29   9  12  18  31   3
# Create column wday to hold week days
releases$wday <- wday(releases$datetime, label = TRUE)

# Plot barchart of weekday by type of release
ggplot(releases, aes(wday)) +
  geom_bar() +
  facet_wrap(~ type, ncol = 1, scale = "free_y")

2.8 Extraction for plotting

Before we begin this exercise, let us load in ggridges

library(ggridges)
## 
## Attaching package: 'ggridges'
## The following object is masked from 'package:ggplot2':
## 
##     scale_discrete_manual

Instructions:

  • Use mutate() to create three new columns: year, yday and month that respectively hold the same components of the date column. Don’t forget to label the months with their names.

  • Create a plot of yday on the x-axis, max_temp of the y-axis where lines are grouped by year. Each year is a line on this plot, with the x-axis running from Jan 1 to Dec 31.

  • To take an alternate look, create a ridgeline plot(formerly known as a joyplot) with max_temp on the x-axis, month on the y-axis, using geom_density_ridges() from the ggridges package.

# Add columns for year, yday and month
akl_daily <- akl_daily %>%
  mutate(
    year = year(date),
    yday = yday(date),
    month = month(date, label = TRUE))

# Plot max_temp by yday for all years
ggplot(akl_daily, aes(x = yday, y = max_temp)) +
  geom_line(aes(group = year), alpha = 0.5)
## Warning: Removed 1 rows containing missing values (geom_path).

# Examine distribtion of max_temp by month
ggplot(akl_daily, aes(x = max_temp, y = month, height = ..density..)) +
  geom_density_ridges(stat = "density")
## Warning: Removed 10 rows containing non-finite values (stat_density).

2.9 Extracting for filtering and summarizing

Instructions:

  • Create new columns for the hour and month of the observation from datetime. Make sure you label the month

  • Filter to just daytime observations, where the hour is greater than 8 and less than 22.

  • Group the observations first by month, then by date and summarise by using any() on the rainy column

  • Summarise again by summing up any_rainy

# Create new columns hour, month and rainy
akl_hourly <- akl_hourly %>%
  mutate(
    hour = hour(datetime),
    month = month(datetime, label = TRUE),
    rainy = weather == "Precipitation"
  )

# Filter for hours between 8am and 10pm (inclusive)
akl_day <- akl_hourly %>% 
  filter(hour >= 8, hour <= 22)

# Summarise for each date if there is any rain
rainy_days <- akl_day %>% 
  group_by(month, date) %>%
  summarise(
    any_rain = any(rainy)
  )

# Summarise for each month, the number of days with rain
rainy_days %>% 
  summarise(
    days_rainy = sum(any_rain)
  )
## # A tibble: 12 x 2
##    month days_rainy
##    <ord>      <int>
##  1 Jan           15
##  2 Feb           13
##  3 Mar           12
##  4 Apr           15
##  5 May           21
##  6 Jun           19
##  7 Jul           22
##  8 Aug           16
##  9 Sep           25
## 10 Oct           20
## 11 Nov           19
## 12 Dec           11

2.10 Rounding Datetimes

  • It will result into another object of a same type

Example:

release_time <- releases$datetime
head(release_time)
## [1] "1997-12-04 08:47:58 UTC" "1997-12-21 13:09:22 UTC"
## [3] "1998-01-10 00:31:55 UTC" "1998-03-14 19:25:55 UTC"
## [5] "1998-05-02 07:58:17 UTC" "1998-06-14 12:56:20 UTC"
head(release_time) %>% hour()
## [1]  8 13  0 19  7 12

If we use floor_date. It will result into something like this:

head(release_time) %>% floor_date(unit = "hour")
## [1] "1997-12-04 08:00:00 UTC" "1997-12-21 13:00:00 UTC"
## [3] "1998-01-10 00:00:00 UTC" "1998-03-14 19:00:00 UTC"
## [5] "1998-05-02 07:00:00 UTC" "1998-06-14 12:00:00 UTC"

In lubridate, there are 3 functions that does the rounding:

  • round_date() - rounds to the nearest
head(release_time) %>% round_date(unit = "hour")
## [1] "1997-12-04 09:00:00 UTC" "1997-12-21 13:00:00 UTC"
## [3] "1998-01-10 01:00:00 UTC" "1998-03-14 19:00:00 UTC"
## [5] "1998-05-02 08:00:00 UTC" "1998-06-14 13:00:00 UTC"
  • ceiling date() - rounds up
head(release_time) %>% ceiling_date(unit = "hour")
## [1] "1997-12-04 09:00:00 UTC" "1997-12-21 14:00:00 UTC"
## [3] "1998-01-10 01:00:00 UTC" "1998-03-14 20:00:00 UTC"
## [5] "1998-05-02 08:00:00 UTC" "1998-06-14 13:00:00 UTC"
  • floor_date() - rounds down

2.11 Practice Rounding

  • Choose the right function and units to round r_3_4_1 down to the nearest day

  • Choose the right function and units to round r_3_4_1 to the nearest 5 minutes

  • Choose the right function and units to round r_3_4_1 up to the nearest week

  • Find the time elapsed on the day of release at the time of release by subtracting r_3_4_1 rounded down to the day from r_3_4_1

r_3_4_1 <- ymd_hms("2016-05-03 07:13:28 UTC")

# Round down to day
floor_date(r_3_4_1, unit = "day")
## [1] "2016-05-03 UTC"
# Round to nearest 5 minutes
round_date(r_3_4_1, unit = "5 minutes")
## [1] "2016-05-03 07:15:00 UTC"
# Round up to week
ceiling_date(r_3_4_1, unit = "week")
## [1] "2016-05-08 UTC"
# Subtract r_3_4_1 rounded down to day
r_3_4_1 - floor_date(r_3_4_1, unit = "day")
## Time difference of 7.224444 hours

2.12 Rounding with the weather data

Instructions:

  • Create a new column called day_hour that is datetime rounded down to the nearest hour

  • Use count() on day_hour to count how many observations there are in each hour

  • Extend the pipeline so that after counting, you filter for observations where n is not equal to 2

# Create day_hour, datetime rounded down to hour
akl_hourly <- akl_hourly %>%
  mutate(
    day_hour = floor_date(datetime, unit = "hour")
  )

# Count observations per hour  
akl_hourly %>% 
  count(day_hour) 
## # A tibble: 8,770 x 2
##    day_hour                n
##    <dttm>              <int>
##  1 2016-01-01 00:00:00     2
##  2 2016-01-01 01:00:00     2
##  3 2016-01-01 02:00:00     2
##  4 2016-01-01 03:00:00     2
##  5 2016-01-01 04:00:00     2
##  6 2016-01-01 05:00:00     2
##  7 2016-01-01 06:00:00     2
##  8 2016-01-01 07:00:00     2
##  9 2016-01-01 08:00:00     2
## 10 2016-01-01 09:00:00     2
## # ... with 8,760 more rows
# Find day_hours with n != 2 
akl_hourly %>% 
  count(day_hour) %>%
  filter(n != 2) %>% 
  arrange(desc(n))
## # A tibble: 92 x 2
##    day_hour                n
##    <dttm>              <int>
##  1 2016-04-03 02:00:00     4
##  2 2016-09-25 00:00:00     4
##  3 2016-06-26 09:00:00     1
##  4 2016-09-01 23:00:00     1
##  5 2016-09-02 01:00:00     1
##  6 2016-09-04 11:00:00     1
##  7 2016-09-04 16:00:00     1
##  8 2016-09-04 17:00:00     1
##  9 2016-09-05 00:00:00     1
## 10 2016-09-05 15:00:00     1
## # ... with 82 more rows