Dates comments

# Load packages 
if(!require(pacman)) install.packages("pacman")
pacman::p_load( tidyverse, here, lubridate, patchwork, outbreaks)

irs <- read_csv(here("data/Illovo_data.csv"))

Calculating Date Intervals section

  • I think you can shorten the “Calculating Date Intervals”. A bit too much time is spent on the comparisons. You could shorten by excluding difftime. Can leave out entirely or just mention it in parentheses for the student’s own exploration. THis is because it is neither the method we recommend students use nor the stupid-simple default method (that would be - ).

  • When you introduce this operator, %--%, we should put in a side note that it is an alias for lubridate::interval(). Also, we should give the operator a name? I think the “interval operator” can work.

  • In the comparison between lubridate’s interval operator and the minus sign, I think you should use an interval that spans leap years. This would be a more convincing demo. Something like 2000-01-01 to 2006-01-01. The lubridate interval method will return exactly 6 years between those dates. But the minus method will return decimals whether you divide by 365 or 366 or 365.25.

date_1 <- as.Date("2000-01-01") 
date_2 <- as.Date("2006-01-01") 
(date_2-date_1)/365.25
## Time difference of 6.001369 days
date_1 %--% date_2/years(1)
## [1] 6
  • You might also mention, in a side note, that the %--% approach is particularly handy when they start dealing with time zones and daylight savings shifts.

Plot too simple?

AMANDA: Is the graph too simple? We can add more and make the graph look a lot nicer, but my instinct is to keep it simple in order to focus on the Dates aspect and refer to the data visualisation courses for more ideas..

irs %>%
  mutate(month = month(start_date_default, label = TRUE)) %>% 
  ggplot(aes(x = month)) +
  geom_bar() +
  labs(x = "Month", y = "Count") +
  theme_minimal()

I agree to focus on the dates, and reduce irrelevant code. In my opinion you can remove the last two lines of code to shorten the plotting code further but maybe you can add some color:

irs %>%
  mutate(month = month(start_date_default, label = TRUE)) %>% 
  ggplot(aes(x = month)) +
  geom_bar(fill = "orange") 

Rounding

For the rounding section, I think the use case currently identified is not so compelling. This is partly because there shouldn’t be specified days for each month if the values in that row pertain to the whole month (this was an insertion for pedagogical purposes, right?).

And also partly because the final plots don’t really look that different:

inc_temp <- read_csv(here("data/Illovo_ir_weather.csv"))

inc_temp_rounded <- 
  inc_temp %>% 
  mutate(date=floor_date(date, unit="month"))
non_rounded <- ggplot(inc_temp, aes(x = date)) +
 geom_line(aes(y = ir_case, color = "ir_case")) +
  geom_line(aes(y = ir_control, color = "ir_control")) +
  labs(title = "Not rounded") 

rounded <- ggplot(inc_temp_rounded, aes(x = date)) +
 geom_line(aes(y = ir_case, color = "ir_case")) +
  geom_line(aes(y = ir_control, color = "ir_control")) +
  labs(title = "Rounded") 

rounded / non_rounded


The use case I’m familiar with, and which I think is quite common, involves aggregating then plotting.

For example, take this Ebola linelist of cases:

ebola <- outbreaks::ebola_sierraleone_2014 %>% 
  as_tibble()

ebola
## # A tibble: 11,903 × 8
##       id   age sex   status    date_of_onset date_of_sample district chiefdom   
##    <int> <dbl> <fct> <fct>     <date>        <date>         <fct>    <fct>      
##  1     1    20 F     confirmed 2014-05-18    2014-05-23     Kailahun Kissi Teng 
##  2     2    42 F     confirmed 2014-05-20    2014-05-25     Kailahun Kissi Teng 
##  3     3    45 F     confirmed 2014-05-20    2014-05-25     Kailahun Kissi Tonge
##  4     4    15 F     confirmed 2014-05-21    2014-05-26     Kailahun Kissi Teng 
##  5     5    19 F     confirmed 2014-05-21    2014-05-26     Kailahun Kissi Teng 
##  6     6    55 F     confirmed 2014-05-21    2014-05-26     Kailahun Kissi Teng 
##  7     7    50 F     confirmed 2014-05-21    2014-05-26     Kailahun Kissi Teng 
##  8     8     8 F     confirmed 2014-05-22    2014-05-27     Kailahun Kissi Teng 
##  9     9    54 F     confirmed 2014-05-22    2014-05-27     Kailahun Kissi Teng 
## 10    10    57 F     confirmed 2014-05-22    2014-05-27     Kailahun Kissi Teng 
## # ℹ 11,893 more rows

Imagine you want to plot a line graph of cases per month, over time. Wwyd?

Need to count cases per year-month and plot those counts. Easiest way to generate such a count without breaking your dates is floor_date:

(monthly_incidence <- ebola %>%
  mutate(year_month = floor_date(date_of_onset, unit = "month")) %>% 
  count(year_month))
## # A tibble: 17 × 2
##    year_month     n
##    <date>     <int>
##  1 2014-05-01    57
##  2 2014-06-01   243
##  3 2014-07-01   351
##  4 2014-08-01   493
##  5 2014-09-01  1504
##  6 2014-10-01  2019
##  7 2014-11-01  2157
##  8 2014-12-01  1635
##  9 2015-01-01  1059
## 10 2015-02-01   549
## 11 2015-03-01   521
## 12 2015-04-01   231
## 13 2015-05-01   303
## 14 2015-06-01   175
## 15 2015-07-01   327
## 16 2015-08-01   206
## 17 2015-09-01    73

Then you can easily plot:

ggplot(monthly_incidence) + 
  geom_line(aes(year_month, n))

Instead of floor date a person might be tempted to do some kind of year-month extraction with stringr, but this would break your dates and make things hard to plot:

(monthly_incidence_bad <- ebola %>%
  mutate(year_month_extract = str_sub(date_of_onset, 1,7)) %>% 
  count(year_month_extract))
## # A tibble: 17 × 2
##    year_month_extract     n
##    <chr>              <int>
##  1 2014-05               57
##  2 2014-06              243
##  3 2014-07              351
##  4 2014-08              493
##  5 2014-09             1504
##  6 2014-10             2019
##  7 2014-11             2157
##  8 2014-12             1635
##  9 2015-01             1059
## 10 2015-02              549
## 11 2015-03              521
## 12 2015-04              231
## 13 2015-05              303
## 14 2015-06              175
## 15 2015-07              327
## 16 2015-08              206
## 17 2015-09               73
ggplot(monthly_incidence_bad) + 
  geom_line(aes(year_month_extract, n))
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?

x axis is no longer a continuous variable so doesn’t work anymore.


Not 100% sure how best to apply this to the inc data. But there are a few options. You could aggregate the case counts by quarter, then plot:

inc_temp %>% 
    mutate(quarter = floor_date(date, unit = "quarter")) %>% 
    group_by(quarter) %>% 
    summarise(cases_in_study = sum(ir_case), 
              cases_in_control = sum(ir_control)) %>% 
    ggplot() + 
    geom_line(aes(quarter, cases_in_study), color = "seagreen") + 
    geom_line(aes(quarter, cases_in_control), color = "orange") 

Or you could use the irs dataset. Maybe you want to count the number of sprayed villages per month

irs %>% 
   mutate(month = floor_date(start_date_default, unit = "month")) %>% 
   group_by(month) %>% 
   summarise(n_villages = n()) %>% 
   ggplot() + 
   geom_line(aes(month, n_villages))