I run into this problem often enough that I decided to create a function for it and to include it in the denodoExtractor package. That way, the function is available whenever you load the package - you don’t have to keep copying it over from one folder/script to another.
Let’s say we have some admits data from a small clinic, for January 2019.
df <- data.frame(admit_date = c("2019-01-01",
"2019-01-04",
"2019-01-10",
"2019-01-23"),
admits = c(25, 40, 12, 50))
df %>%
kable() %>%
kable_styling(bootstrap_options = c("striped","condensed","responsive"),
full_width = F)
| admit_date | admits |
|---|---|
| 2019-01-01 | 25 |
| 2019-01-04 | 40 |
| 2019-01-10 | 12 |
| 2019-01-23 | 50 |
How do I get a dataframe with 1 row for each day in January, regardless of whether or not there was an admit on that date? This is not hard to do, but it is a little tedious, and I want to remove as much friction as possible, so that it’s easy to focus on the big-picture goals of whatever analysis I’m doing.
Most obiously, to plot a time series, calculate a rate with the correct denominator, etc.
# install the package first:
# devtools::install_github("nayefahmad/denodoExtractor")
# library(denodoExtractor)
# ?fill_dates
You only need to supply three arguments:
df %>% fill_dates(admit_date,
"2019-01-01",
"2019-01-31") %>%
# this part is not part of the example:
datatable()
## Joining, by = "dates_fill"
df %>% fill_dates(admit_date,
"2019-01-01",
"2019-01-31") %>%
replace_na(replace = list(admits = 0)) %>%
ggplot(aes(x = dates_fill,
y = admits)) +
geom_point() +
geom_line()
## Joining, by = "dates_fill"