This is am example how to deal with number which represent date after we import data from excel file and R can not read these number properly.
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✔ ggplot2 3.3.6 ✔ purrr 0.3.4
## ✔ tibble 3.1.7 ✔ dplyr 1.0.9
## ✔ tidyr 1.2.0 ✔ stringr 1.4.0
## ✔ readr 2.1.2 ✔ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
HFM_dataset <- readxl::read_excel("HFM_dataset.xlsx")
head(HFM_dataset)
## # A tibble: 6 × 38
## id age sex class homeno moo OnsetDate fever f_onset rpharynx roral
## <dbl> <dbl> <dbl> <dbl> <chr> <dbl> <chr> <dbl> <chr> <dbl> <dbl>
## 1 1 3.2 2 2 7/1 6 42976 1 42976 1 0
## 2 2 21 1 NA 7/1 6 - 0 - 0 0
## 3 3 20 2 NA 7/1 6 42984 0 - 0 0
## 4 4 48 1 NA <NA> NA 42955 0 - 1 0
## 5 5 42 2 NA <NA> NA - 0 - 0 0
## 6 6 18 1 NA <NA> NA - 0 - 0 0
## # … with 27 more variables: rpalm <dbl>, rsole <dbl>, rbuttock <dbl>,
## # rash <dbl>, rash_onset <chr>, abdpain <dbl>, nausea <dbl>, cough <dbl>,
## # c_onset <chr>, runnynose <dbl>, run_onset <chr>, sorethroat <dbl>,
## # sore_onset <chr>, URI <dbl>, dyspnea <dbl>, chestdis <dbl>, headache <dbl>,
## # seizure <dbl>, collapse <dbl>, stool <dbl>, Result1 <chr>, swab <dbl>,
## # Result2 <chr>, treat <dbl>, status <dbl>, case <dbl>, date_onset <chr>
As result from head() show OnsetDate of first record is
42976, While the actual date in excel is 2017-08-29 and
as.Date() in R return “character string is not in a
standard unambiguous format”. This error is due to it is a different
starting point of calculating date between Excel and R. We can use
convertToDate() from openxlsx package to deal
with this problem.
hfm <- HFM_dataset %>%
mutate(OnsetDate = openxlsx::convertToDate(OnsetDate))%>% ## Deal with Excel date
select(OnsetDate, case)%>%
filter(case == 1)
## Warning in openxlsx::convertToDate(OnsetDate): NAs introduced by coercion
hfm$OnsetDate
## [1] "2017-08-29" "2017-09-06" "2017-08-08" "2017-09-03" "2017-09-03"
## [6] "2017-09-05" "2017-08-26" "2017-08-23" "2017-08-26" "2017-08-26"
## [11] "2017-08-25" "2017-09-05" "2017-08-19" "2017-08-30" "2017-08-27"
## [16] "2017-08-23" "2017-09-05" "2017-08-16" "2017-08-26" "2017-08-21"
## [21] "2017-08-27" "2017-08-27" "2017-08-21" "2017-08-16" "2017-08-23"
## [26] "2017-08-27" "2017-08-26" "2017-08-27" "2017-08-22" "2017-08-30"
Now our date is clean we can do our job this time i.e. some visualization involve with date.
ggplot( data = hfm, aes(x = OnsetDate))+
geom_histogram(binwidth = 1)