This is am example how to deal with number which represent date after we import data from excel file and R can not read these number properly.

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✔ ggplot2 3.3.6     ✔ purrr   0.3.4
## ✔ tibble  3.1.7     ✔ dplyr   1.0.9
## ✔ tidyr   1.2.0     ✔ stringr 1.4.0
## ✔ readr   2.1.2     ✔ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
HFM_dataset <- readxl::read_excel("HFM_dataset.xlsx")

head(HFM_dataset)
## # A tibble: 6 × 38
##      id   age   sex class homeno   moo OnsetDate fever f_onset rpharynx roral
##   <dbl> <dbl> <dbl> <dbl> <chr>  <dbl> <chr>     <dbl> <chr>      <dbl> <dbl>
## 1     1   3.2     2     2 7/1        6 42976         1 42976          1     0
## 2     2  21       1    NA 7/1        6 -             0 -              0     0
## 3     3  20       2    NA 7/1        6 42984         0 -              0     0
## 4     4  48       1    NA <NA>      NA 42955         0 -              1     0
## 5     5  42       2    NA <NA>      NA -             0 -              0     0
## 6     6  18       1    NA <NA>      NA -             0 -              0     0
## # … with 27 more variables: rpalm <dbl>, rsole <dbl>, rbuttock <dbl>,
## #   rash <dbl>, rash_onset <chr>, abdpain <dbl>, nausea <dbl>, cough <dbl>,
## #   c_onset <chr>, runnynose <dbl>, run_onset <chr>, sorethroat <dbl>,
## #   sore_onset <chr>, URI <dbl>, dyspnea <dbl>, chestdis <dbl>, headache <dbl>,
## #   seizure <dbl>, collapse <dbl>, stool <dbl>, Result1 <chr>, swab <dbl>,
## #   Result2 <chr>, treat <dbl>, status <dbl>, case <dbl>, date_onset <chr>

As result from head() show OnsetDate of first record is 42976, While the actual date in excel is 2017-08-29 and as.Date() in R return “character string is not in a standard unambiguous format”. This error is due to it is a different starting point of calculating date between Excel and R. We can use convertToDate() from openxlsx package to deal with this problem.

hfm <- HFM_dataset %>%
            mutate(OnsetDate = openxlsx::convertToDate(OnsetDate))%>% ## Deal with Excel date
            select(OnsetDate, case)%>%
            filter(case == 1)
## Warning in openxlsx::convertToDate(OnsetDate): NAs introduced by coercion
hfm$OnsetDate
##  [1] "2017-08-29" "2017-09-06" "2017-08-08" "2017-09-03" "2017-09-03"
##  [6] "2017-09-05" "2017-08-26" "2017-08-23" "2017-08-26" "2017-08-26"
## [11] "2017-08-25" "2017-09-05" "2017-08-19" "2017-08-30" "2017-08-27"
## [16] "2017-08-23" "2017-09-05" "2017-08-16" "2017-08-26" "2017-08-21"
## [21] "2017-08-27" "2017-08-27" "2017-08-21" "2017-08-16" "2017-08-23"
## [26] "2017-08-27" "2017-08-26" "2017-08-27" "2017-08-22" "2017-08-30"

Now our date is clean we can do our job this time i.e. some visualization involve with date.

ggplot( data = hfm,  aes(x = OnsetDate))+
      geom_histogram(binwidth = 1)