Introduction
Greetings!. These are Chapter 1 and Chapter 2 course notes for the Datacamp course: “Working with Dates and Times in R”. The instructor for this course is Charlotte Wickham.
library(tidyverse)## -- Attaching packages ---------------------------------------------------------------------- tidyverse 1.2.1 --
## v ggplot2 3.0.0 v purrr 0.2.5
## v tibble 1.4.2 v dplyr 0.7.6
## v tidyr 0.8.1 v stringr 1.3.1
## v readr 1.1.1 v forcats 0.3.0
## -- Conflicts ------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
Question: Which of the following is the correct way to specify the 4th of October 2004 according to ISO 8601
Answer: 2004-10-04
NOTE: The string “2013-04-03” is stored in a variable called x
Instruction:
Use str() to look at the structure of x and confirm it is just a character string
Convert x to a date using as.Date()
Use str() to look at the structore of x_date and confirm it is a Date
Now, use as.Date() to store the date April 10, 2014
# The date R 3.0.0 was released
x <- "2013-04-03"
# Examine structure of x
str(x)## chr "2013-04-03"
# Use as.Date() to interpret x as a date
x_date <- as.Date(x)
# Examine structure of x_date
str(x_date)## Date[1:1], format: "2013-04-03"
# Store April 10 2014 as a Date
april_10_2014 <- as.Date("2014-04-10")Instructions:
Use read_csv() to read in the CSV file rversions.csv as releases
Use str() to examine the structure of the date column
anytime is loaded and created an object called sep_10_2009. Use the anytime() function to parse sep_10_2009
# Load the readr package
library(readr)
# Use read_csv() to import rversions.csv
releases <- read_csv("https://assets.datacamp.com/production/course_5348/datasets/rversions.csv")## Parsed with column specification:
## cols(
## major = col_integer(),
## minor = col_integer(),
## patch = col_integer(),
## date = col_date(format = ""),
## datetime = col_datetime(format = ""),
## time = col_time(format = ""),
## type = col_character()
## )
# Examine the structure of the date column
str(releases$date)## Date[1:105], format: "1997-12-04" "1997-12-21" "1998-01-10" "1998-03-14" "1998-05-02" ...
# Load the anytime package
library(anytime)
# Various ways of writing Sep 10 2009
sep_10_2009 <- c("September 10 2009", "2009-09-10", "10 Sep 2009", "09-10-2009")
# Use anytime() to parse sep_10_2009
anytime(sep_10_2009)## [1] "2009-09-10 CST" "2009-09-10 CST" "2009-09-10 CST" "2009-09-10 CST"
Once an object is stored as a date, one can apply mathematical equations or use them as a plot with plotting packages
Dates act like numbers
Instructions:
Make a plot of releases over time by setting the x argument of the aes() function to the date column
Zoom in to the period from 2010 to 2014 by specifying limits from “2010-01-01” to “2014-01-01”. (Note: These strings need to be wrapped in as.Date() to be interpreted as Date objects)
Adjust the axis labeling by specifying date_breaks of “10 years”and date_labels of “%Y”
library(ggplot2)
# Set the x axis to the date column
ggplot(releases, aes(x = date, y = type)) + geom_line(aes(group = 1, color = factor(major)))# Limit the axis to between 2010-01-01 and 2014-01-01
ggplot(releases, aes(x = date, y = type)) + geom_line(aes(group = 1, color = factor(major))) + xlim(as.Date("2010-01-01"), as.Date("2014-01-01"))## Warning: Removed 87 rows containing missing values (geom_path).
# Specify breaks every ten years and labels with "%Y"
ggplot(releases, aes(x = date, y = type)) + geom_line(aes(group = 1, color = factor(major))) + scale_x_date(date_breaks = "10 years", date_labels = "%Y")Instructions:
Find the date eof the most recent release by calling max() on the date column in releases
Find the rows in releases that have the most recent date, by specifying the comparison: date = last_release_date in filter()
Print last_release to see which release this was
Calculate how long it has been since the most recent release by subtracting last_release_date frorm Sys.Date()
# Find the largest date
last_release_date <- max(releases$date)
# Filter row for last release
last_release <- filter(releases, date == last_release_date)
# Print last_release
last_release## # A tibble: 1 x 7
## major minor patch date datetime time type
## <int> <int> <int> <date> <dttm> <time> <chr>
## 1 3 4 1 2017-06-30 2017-06-30 07:04:11 07:04 patch
# How long since last release?
Sys.Date() - last_release_date## Time difference of 451 days
ISO 8601 has something for times with the format HH:MM:SS.
POSIXct - seconds since 1970-01-01.
POSIXct will go in a data frame.
as.POSIXct() turns a string into a POSIXct.
x <- as.POSIXct("1970-01-01 00:01:00")
str(x)## POSIXct[1:1], format: "1970-01-01 00:01:00"
Use as.POSIXct() and an appropriate string to input the datetime corresponding to Oct 1st 2010 at 12:12:00.
Enter the same datetime again, but now specify the timezone as “America/Los_Angeles”.
Use read_csv() to read in rversions.csv again.
Examine the structure of the datetime column to verify read_csv() has correctly interpreted it as a datetime.
# Use as.POSIXct to enter the datetime
as.POSIXct("2010-10-01 12:12:00")## [1] "2010-10-01 12:12:00 CST"
# Use as.POSIXct again but set the timezone to `"America/Los_Angeles"`
as.POSIXct("2010-10-01 12:12:00", tz = "America/Los_Angeles")## [1] "2010-10-01 12:12:00 PDT"
# Use readr to import rversions.csv
library(readr)
releases <- read_csv("https://assets.datacamp.com/production/course_5348/datasets/rversions.csv")## Parsed with column specification:
## cols(
## major = col_integer(),
## minor = col_integer(),
## patch = col_integer(),
## date = col_date(format = ""),
## datetime = col_datetime(format = ""),
## time = col_time(format = ""),
## type = col_character()
## )
# Examine structure of datetime column
str(releases$datetime)## POSIXct[1:105], format: "1997-12-04 08:47:58" "1997-12-21 13:09:22" ...
Instructions:
USe read_csv() to import cran-logs_2015-04-17.csv
print logs to see the information we have on each download
Store the R 3.2.0 release time as a POSIXct object
Find out when the first request for 3.2.0 was made by filtering for values in datetime column that are greater than release time
Finally, see how downloads increase by creating histograms of download times for 3.2.0 and the previous version 3.1.3.
# Import "cran-logs_2015-04-17.csv" with read_csv()
logs <- read_csv("https://assets.datacamp.com/production/course_5348/datasets/cran-logs_2015-04-17.csv")## Parsed with column specification:
## cols(
## datetime = col_datetime(format = ""),
## r_version = col_character(),
## country = col_character()
## )
# Print logs
print(logs)## # A tibble: 100,000 x 3
## datetime r_version country
## <dttm> <chr> <chr>
## 1 2015-04-16 22:40:19 3.1.3 CO
## 2 2015-04-16 09:11:04 3.1.3 GB
## 3 2015-04-16 17:12:37 3.1.3 DE
## 4 2015-04-18 12:34:43 3.2.0 GB
## 5 2015-04-16 04:49:18 3.1.3 PE
## 6 2015-04-16 06:40:44 3.1.3 TW
## 7 2015-04-16 00:21:36 3.1.3 US
## 8 2015-04-16 10:27:23 3.1.3 US
## 9 2015-04-16 01:59:43 3.1.3 SG
## 10 2015-04-18 15:41:32 3.2.0 CA
## # ... with 99,990 more rows
# Store the release time as a POSIXct object
release_time <- as.POSIXct("2015-04-16 07:13:33", tz = "UTC")
# When is the first download of 3.2.0?
logs %>%
filter(datetime > release_time,
r_version == "3.2.0")## # A tibble: 35,826 x 3
## datetime r_version country
## <dttm> <chr> <chr>
## 1 2015-04-18 12:34:43 3.2.0 GB
## 2 2015-04-18 15:41:32 3.2.0 CA
## 3 2015-04-18 14:58:41 3.2.0 IE
## 4 2015-04-18 16:44:45 3.2.0 US
## 5 2015-04-18 04:34:35 3.2.0 US
## 6 2015-04-18 22:29:45 3.2.0 CH
## 7 2015-04-17 16:21:06 3.2.0 US
## 8 2015-04-18 20:34:57 3.2.0 AT
## 9 2015-04-17 18:23:19 3.2.0 US
## 10 2015-04-18 03:00:31 3.2.0 US
## # ... with 35,816 more rows
# Examine histograms of downloads by version
ggplot(logs, aes(x = datetime)) +
geom_histogram() +
geom_vline(aes(xintercept = as.numeric(release_time)))+
facet_wrap(~ r_version, ncol = 1)## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Designed to make working with dates in times easier
Plays nicely with other tidyverse packages
They have consistent behavior regardless of underlying object
# Let us load the library of lubridate
library(lubridate)##
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
##
## date
Instructions:
For each date, the ISO 8601 format is displayed as a comment after it, to help you check your work:
Choose the correct function to parse x
Choose the correct function to parse y
Choose the correct function to parse z
# Parse x
x <- "2010 September 20th" # 2010-09-20
ymd(x)## [1] "2010-09-20"
# Parse y
y <- "02.01.2010" # 2010-01-02
dmy(y)## [1] "2010-01-02"
# Parse z
z <- "Sep, 12th 2010 14:00" # 2010-09-12T14:00
mdy_hm(z)## [1] "2010-09-12 14:00:00 UTC"
x is a trickier datetime. Use the clues in the instructions to parse x.
two_orders has two different orders, parse both by specifying the order to be c(“mdy”, “dmy”).
Parse short_dates with orders = c(“dOmY”, “OmY”, “Y”). What happens to the dates that don’t have months or days specified?
# Specify an order string to parse x
x <- "Monday June 1st 2010 at 4pm"
parse_date_time(x, orders = "amdyIp")## [1] "2010-06-01 16:00:00 UTC"
# Specify order to include both "mdy" and "dmy"
two_orders <- c("October 7, 2001", "October 13, 2002", "April 13, 2003",
"17 April 2005", "23 April 2017")
parse_date_time(two_orders, orders = c("mdy", "dmy"))## [1] "2001-10-07 UTC" "2002-10-13 UTC" "2003-04-13 UTC" "2005-04-17 UTC"
## [5] "2017-04-23 UTC"
# Specify order to include "dOmY", "OmY" and "Y"
short_dates <- c("11 December 1282", "May 1372", "1253")
parse_date_time(short_dates, orders = c("dOmY", "OmY", "Y"))## [1] "1282-12-11 UTC" "1372-05-01 UTC" "1253-01-01 UTC"
Import the daily data “akl_weather_daily.csv” with read_csv()
Print akl_daily_raw to confirm the date column hasn’t been interpreted as a date. Can you see why?
Using mutate() overwrite the column date with a parsed version of date. You need to specify the parsing function. Hint: the first date should be September 1.
Print akl_daily to verify the date column is now a Date.
Take a look at the data by plotting date on the x-axis and max_temp of the y-axis.
# Import CSV with read_csv()
akl_daily_raw <- read_csv("https://assets.datacamp.com/production/course_5348/datasets/akl_weather_daily.csv")## Parsed with column specification:
## cols(
## date = col_character(),
## max_temp = col_integer(),
## min_temp = col_integer(),
## mean_temp = col_integer(),
## mean_rh = col_integer(),
## events = col_character(),
## cloud_cover = col_integer()
## )
# Print akl_daily_raw
akl_daily_raw## # A tibble: 3,661 x 7
## date max_temp min_temp mean_temp mean_rh events cloud_cover
## <chr> <int> <int> <int> <int> <chr> <int>
## 1 2007-9-1 60 51 56 75 <NA> 4
## 2 2007-9-2 60 53 56 82 Rain 4
## 3 2007-9-3 57 51 54 78 <NA> 6
## 4 2007-9-4 64 50 57 80 Rain 6
## 5 2007-9-5 53 48 50 90 Rain 7
## 6 2007-9-6 57 42 50 69 <NA> 1
## 7 2007-9-7 59 41 50 77 <NA> 4
## 8 2007-9-8 59 46 52 80 <NA> 5
## 9 2007-9-9 55 50 52 88 Rain 7
## 10 2007-9-10 59 50 54 82 Rain 4
## # ... with 3,651 more rows
# Parse date
akl_daily <- akl_daily_raw %>%
mutate(date = ymd(date))
# Print akl_daily
akl_daily## # A tibble: 3,661 x 7
## date max_temp min_temp mean_temp mean_rh events cloud_cover
## <date> <int> <int> <int> <int> <chr> <int>
## 1 2007-09-01 60 51 56 75 <NA> 4
## 2 2007-09-02 60 53 56 82 Rain 4
## 3 2007-09-03 57 51 54 78 <NA> 6
## 4 2007-09-04 64 50 57 80 Rain 6
## 5 2007-09-05 53 48 50 90 Rain 7
## 6 2007-09-06 57 42 50 69 <NA> 1
## 7 2007-09-07 59 41 50 77 <NA> 4
## 8 2007-09-08 59 46 52 80 <NA> 5
## 9 2007-09-09 55 50 52 88 Rain 7
## 10 2007-09-10 59 50 54 82 Rain 4
## # ... with 3,651 more rows
# Plot to check work
ggplot(akl_daily, aes(x = date, y = max_temp)) +
geom_line()## Warning: Removed 1 rows containing missing values (geom_path).
Import the hourly data, “akl_weather_hourly_2016.csv” with read_csv(), then print akl_hourly_raw to confirm the date is spread over year, month and mday.
Using mutate() create the column date with using make_date().
We’ve pasted together the date and time columns. Create datetime by parsing the datetime_string column.
Take a look at the date, time and datetime columns to verify they match up.
Take a look at the data by plotting datetime on the x-axis and temperature of the y-axis.
# Import "akl_weather_hourly_2016.csv"
akl_hourly_raw <- read_csv("https://assets.datacamp.com/production/course_5348/datasets/akl_weather_hourly_2016.csv")## Parsed with column specification:
## cols(
## year = col_integer(),
## month = col_integer(),
## mday = col_integer(),
## time = col_time(format = ""),
## temperature = col_double(),
## weather = col_character(),
## conditions = col_character(),
## events = col_character(),
## humidity = col_integer(),
## date_utc = col_datetime(format = "")
## )
# Print akl_hourly_raw
akl_hourly_raw## # A tibble: 17,454 x 10
## year month mday time temperature weather conditions events humidity
## <int> <int> <int> <tim> <dbl> <chr> <chr> <chr> <int>
## 1 2016 1 1 00:00 68 Clear Clear <NA> 68
## 2 2016 1 1 00:30 68 Clear Clear <NA> 68
## 3 2016 1 1 01:00 68 Clear Clear <NA> 73
## 4 2016 1 1 01:30 68 Clear Clear <NA> 68
## 5 2016 1 1 02:00 68 Clear Clear <NA> 68
## 6 2016 1 1 02:30 68 Clear Clear <NA> 68
## 7 2016 1 1 03:00 68 Clear Clear <NA> 68
## 8 2016 1 1 03:30 68 Cloudy Partly Cl~ <NA> 68
## 9 2016 1 1 04:00 68 Cloudy Scattered~ <NA> 68
## 10 2016 1 1 04:30 66.2 Cloudy Partly Cl~ <NA> 73
## # ... with 17,444 more rows, and 1 more variable: date_utc <dttm>
# Use make_date() to combine year, month and mday
akl_hourly <- akl_hourly_raw %>%
mutate(date = make_date(year = year, month = month, day = mday))
# Parse datetime_string
akl_hourly <- akl_hourly %>%
mutate(
datetime_string = paste(date, time, sep = "T"),
datetime = ymd_hms(datetime_string)
)
# Print date, time and datetime columns of akl_hourly
akl_hourly %>% select(date, time, datetime)## # A tibble: 17,454 x 3
## date time datetime
## <date> <time> <dttm>
## 1 2016-01-01 00:00 2016-01-01 00:00:00
## 2 2016-01-01 00:30 2016-01-01 00:30:00
## 3 2016-01-01 01:00 2016-01-01 01:00:00
## 4 2016-01-01 01:30 2016-01-01 01:30:00
## 5 2016-01-01 02:00 2016-01-01 02:00:00
## 6 2016-01-01 02:30 2016-01-01 02:30:00
## 7 2016-01-01 03:00 2016-01-01 03:00:00
## 8 2016-01-01 03:30 2016-01-01 03:30:00
## 9 2016-01-01 04:00 2016-01-01 04:00:00
## 10 2016-01-01 04:30 2016-01-01 04:30:00
## # ... with 17,444 more rows
# Plot to check work
ggplot(akl_hourly, aes(x = datetime, y = temperature)) +
geom_line()Example:
x <- ymd("2013-02-23")
year(x)## [1] 2013
month(x)## [1] 2
day(x)## [1] 23
print(x)## [1] "2013-02-23"
year(x) <- 2017
x## [1] "2017-02-23"
Before we begin this exercise, we must extract certain columns first.
release_time <- releases$datetimeInstructions: - Examine the head() of release_Time to verify this is a vector of datetimes
Extract the month from release_time and examine the first few with head()
To see which months have most releases, extract the month and then pipe to table()
Repeat, to see which years have the most releases
Do releases happen in the morning of UTC?. Find out if the hour of a release is less than 12 and summarise with mean()
Alternatively use am() to find out how many releases happen in the morning
# Examine the head() of release_time
head(release_time)## [1] "1997-12-04 08:47:58 UTC" "1997-12-21 13:09:22 UTC"
## [3] "1998-01-10 00:31:55 UTC" "1998-03-14 19:25:55 UTC"
## [5] "1998-05-02 07:58:17 UTC" "1998-06-14 12:56:20 UTC"
# Examine the head() of the months of release_time
head(month(release_time))## [1] 12 12 1 3 5 6
# Extract the month of releases
month(release_time) %>% table()## .
## 1 2 3 4 5 6 7 8 9 10 11 12
## 5 6 8 18 5 16 4 7 2 15 6 13
# Extract the year of releases
year(release_time) %>% table()## .
## 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011
## 2 10 9 6 6 5 5 4 4 4 4 6 5 4 6
## 2012 2013 2014 2015 2016 2017
## 4 4 4 5 5 3
# How often is the hour before 12 (noon)?
mean(hour(release_time) < 12)## [1] 0.752381
# How often is the release in am?
mean(am(release_time))## [1] 0.752381
Instructions:
First, see what wday() does without labeling, by calling it on the datetime column of releases and tabulating the result. Do you know if 1 is Sunday or Monday?
Repeat above, but now use labels by specifying the label argument. Better, right?
Now store the labelled weekdays in a new column called wday
Create a barchart of releases by weekday, facetted by the type of release.
# Use wday() to tabulate release by day of the week
wday(releases$datetime) %>% table()## .
## 1 2 3 4 5 6 7
## 3 29 9 12 18 31 3
# Add label = TRUE to make table more readable
wday(releases$datetime, label = TRUE) %>% table()## .
## Sun Mon Tue Wed Thu Fri Sat
## 3 29 9 12 18 31 3
# Create column wday to hold week days
releases$wday <- wday(releases$datetime, label = TRUE)
# Plot barchart of weekday by type of release
ggplot(releases, aes(wday)) +
geom_bar() +
facet_wrap(~ type, ncol = 1, scale = "free_y")Before we begin this exercise, let us load in ggridges
library(ggridges)##
## Attaching package: 'ggridges'
## The following object is masked from 'package:ggplot2':
##
## scale_discrete_manual
Instructions:
Use mutate() to create three new columns: year, yday and month that respectively hold the same components of the date column. Don’t forget to label the months with their names.
Create a plot of yday on the x-axis, max_temp of the y-axis where lines are grouped by year. Each year is a line on this plot, with the x-axis running from Jan 1 to Dec 31.
To take an alternate look, create a ridgeline plot(formerly known as a joyplot) with max_temp on the x-axis, month on the y-axis, using geom_density_ridges() from the ggridges package.
# Add columns for year, yday and month
akl_daily <- akl_daily %>%
mutate(
year = year(date),
yday = yday(date),
month = month(date, label = TRUE))
# Plot max_temp by yday for all years
ggplot(akl_daily, aes(x = yday, y = max_temp)) +
geom_line(aes(group = year), alpha = 0.5)## Warning: Removed 1 rows containing missing values (geom_path).
# Examine distribtion of max_temp by month
ggplot(akl_daily, aes(x = max_temp, y = month, height = ..density..)) +
geom_density_ridges(stat = "density")## Warning: Removed 10 rows containing non-finite values (stat_density).
Instructions:
Create new columns for the hour and month of the observation from datetime. Make sure you label the month
Filter to just daytime observations, where the hour is greater than 8 and less than 22.
Group the observations first by month, then by date and summarise by using any() on the rainy column
Summarise again by summing up any_rainy
# Create new columns hour, month and rainy
akl_hourly <- akl_hourly %>%
mutate(
hour = hour(datetime),
month = month(datetime, label = TRUE),
rainy = weather == "Precipitation"
)
# Filter for hours between 8am and 10pm (inclusive)
akl_day <- akl_hourly %>%
filter(hour >= 8, hour <= 22)
# Summarise for each date if there is any rain
rainy_days <- akl_day %>%
group_by(month, date) %>%
summarise(
any_rain = any(rainy)
)
# Summarise for each month, the number of days with rain
rainy_days %>%
summarise(
days_rainy = sum(any_rain)
)## # A tibble: 12 x 2
## month days_rainy
## <ord> <int>
## 1 Jan 15
## 2 Feb 13
## 3 Mar 12
## 4 Apr 15
## 5 May 21
## 6 Jun 19
## 7 Jul 22
## 8 Aug 16
## 9 Sep 25
## 10 Oct 20
## 11 Nov 19
## 12 Dec 11
Example:
release_time <- releases$datetime
head(release_time)## [1] "1997-12-04 08:47:58 UTC" "1997-12-21 13:09:22 UTC"
## [3] "1998-01-10 00:31:55 UTC" "1998-03-14 19:25:55 UTC"
## [5] "1998-05-02 07:58:17 UTC" "1998-06-14 12:56:20 UTC"
head(release_time) %>% hour()## [1] 8 13 0 19 7 12
If we use floor_date. It will result into something like this:
head(release_time) %>% floor_date(unit = "hour")## [1] "1997-12-04 08:00:00 UTC" "1997-12-21 13:00:00 UTC"
## [3] "1998-01-10 00:00:00 UTC" "1998-03-14 19:00:00 UTC"
## [5] "1998-05-02 07:00:00 UTC" "1998-06-14 12:00:00 UTC"
In lubridate, there are 3 functions that does the rounding:
head(release_time) %>% round_date(unit = "hour")## [1] "1997-12-04 09:00:00 UTC" "1997-12-21 13:00:00 UTC"
## [3] "1998-01-10 01:00:00 UTC" "1998-03-14 19:00:00 UTC"
## [5] "1998-05-02 08:00:00 UTC" "1998-06-14 13:00:00 UTC"
head(release_time) %>% ceiling_date(unit = "hour")## [1] "1997-12-04 09:00:00 UTC" "1997-12-21 14:00:00 UTC"
## [3] "1998-01-10 01:00:00 UTC" "1998-03-14 20:00:00 UTC"
## [5] "1998-05-02 08:00:00 UTC" "1998-06-14 13:00:00 UTC"
Choose the right function and units to round r_3_4_1 down to the nearest day
Choose the right function and units to round r_3_4_1 to the nearest 5 minutes
Choose the right function and units to round r_3_4_1 up to the nearest week
Find the time elapsed on the day of release at the time of release by subtracting r_3_4_1 rounded down to the day from r_3_4_1
r_3_4_1 <- ymd_hms("2016-05-03 07:13:28 UTC")
# Round down to day
floor_date(r_3_4_1, unit = "day")## [1] "2016-05-03 UTC"
# Round to nearest 5 minutes
round_date(r_3_4_1, unit = "5 minutes")## [1] "2016-05-03 07:15:00 UTC"
# Round up to week
ceiling_date(r_3_4_1, unit = "week")## [1] "2016-05-08 UTC"
# Subtract r_3_4_1 rounded down to day
r_3_4_1 - floor_date(r_3_4_1, unit = "day")## Time difference of 7.224444 hours
Instructions:
Create a new column called day_hour that is datetime rounded down to the nearest hour
Use count() on day_hour to count how many observations there are in each hour
Extend the pipeline so that after counting, you filter for observations where n is not equal to 2
# Create day_hour, datetime rounded down to hour
akl_hourly <- akl_hourly %>%
mutate(
day_hour = floor_date(datetime, unit = "hour")
)
# Count observations per hour
akl_hourly %>%
count(day_hour) ## # A tibble: 8,770 x 2
## day_hour n
## <dttm> <int>
## 1 2016-01-01 00:00:00 2
## 2 2016-01-01 01:00:00 2
## 3 2016-01-01 02:00:00 2
## 4 2016-01-01 03:00:00 2
## 5 2016-01-01 04:00:00 2
## 6 2016-01-01 05:00:00 2
## 7 2016-01-01 06:00:00 2
## 8 2016-01-01 07:00:00 2
## 9 2016-01-01 08:00:00 2
## 10 2016-01-01 09:00:00 2
## # ... with 8,760 more rows
# Find day_hours with n != 2
akl_hourly %>%
count(day_hour) %>%
filter(n != 2) %>%
arrange(desc(n))## # A tibble: 92 x 2
## day_hour n
## <dttm> <int>
## 1 2016-04-03 02:00:00 4
## 2 2016-09-25 00:00:00 4
## 3 2016-06-26 09:00:00 1
## 4 2016-09-01 23:00:00 1
## 5 2016-09-02 01:00:00 1
## 6 2016-09-04 11:00:00 1
## 7 2016-09-04 16:00:00 1
## 8 2016-09-04 17:00:00 1
## 9 2016-09-05 00:00:00 1
## 10 2016-09-05 15:00:00 1
## # ... with 82 more rows