class: center, middle, inverse, title-slide # Clinical Registry Reporting ## using R ### Farhad Salimi ### Clinical Outcomes Data Reporting and Research Program (CORRP) ### 13 April 2022 --- class: inverse, center, middle # Why R? --- ### Large user community There are currently more than **two million** users of R around the world. As of 18 March 2022, there were **19,013** packages available on **CRAN** and there are thousands more housed on other repositories platforms such as GitHub. Most (if not all!) statistical methods are likely to have at least one package that is freely available. ### Integrated tool for sharing results There are tools available in R that let you share and communicate the results. **R Markdown** facilitates making reports in various formats directly from R. You can also make interactive dashboards and web apps using **RShiny**. ### Efficient Scalability R can be scaled so it can be used across a large team or an organisation. It integrates seamlessly with popular data science technologies such as **PowerBI**, **Python**, and **Git**. ### Great Graphics R can be used to develop high quality static graphics as well as animations and interactive graphics. --- ### Machine Learning and Big Data R supports numerous methods used for the development of predictive models and can be seamlessly intergrated with large-scale data processing tools such as **Apache Spark**. ### R is free! --- class: inverse, center, middle # Workflow --- <div class="figure"> <img src="r4ds_data-science.png" alt="Grolemund, G., & Wickham, H. (2017). R for Data Science. O’Reilly Media." width="100%" /> <p class="caption">Grolemund, G., & Wickham, H. (2017). R for Data Science. O’Reilly Media.</p> </div> --- class: inverse, center, middle # Import --- ## Flat files (csv, xlsx, etc) ### CSV files ```r library(readr) read_csv(file = "path_to_the_file") ``` ### EXCEL files ```r library(readxl) read_excel(path = "path_to_the_file", sheet = "sheet_number", range = "typical_EXCEL_range (e.g. B3:D87)") ``` --- ## Database ### REDCap ```r library(REDCapR) redcap_read(redcap_uri = "URI_of_the_REDCap_project", token = "the_user_specific_token") ``` ### SQL, Oracle, MySQL, PostgreSQL, SQLite ```r library(DBI) library(odbc) con <- dbConnect(odbc(), "DSN name") library(dbplyr) q1 <- tbl(con, "bank") %>% group_by(month_idx, year, month) %>% summarise( subscribe = sum(ifelse(term_deposit == "yes", 1, 0)), total = n()) show_query(q1) ``` --- ## Formats Used by other Statistical Packages ```r library(haven) # SAS read_sas("mtcars.sas7bdat") write_sas(mtcars, "mtcars.sas7bdat") # SPSS read_sav("mtcars.sav") write_sav(mtcars, "mtcars.sav") # Stata read_dta("mtcars.dta") write_dta(mtcars, "mtcars.dta") ``` --- class: inverse, center, middle # Communicate --- ## R Markdown R Markdown provides a framework combining code, its results, and text. R Markdown documents are fully reproducible and support dozens of output formats (e.g. PDFs, Word files, Powerpoint, HTML, etc). This is an R Markdown file, a plain text file that has the extension `.Rmd` .pull-left[ ```` --- title: "Report" date: 2020-01-01 output: html_document --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) library(ggplot2) library(dplyr) library(knitr) library(registryr) ``` ```` ] .pull-right[ ```` We have data about `r registryr::fake_data %>% nrow()` participants. Out of which, `r registryr::fake_data %>% dplyr::filter(sex_cat == "Male") %>% nrow()` are males. The age distribution of the participants by severity is shown below: ```{r, echo = FALSE} mortality_summary <- registryr::fake_data %>% group_by(site_id) %>% summarise(n = n(), mortality_rate = sum(dead) / n) mortality_summary %>% slice_max(mortality_rate, n = 3) %>% kable() ``` ```` ] --- It contains three important types of content: 1. A **header** surrounded by `---` 2. **Chunks** of R code surrounded by ```` ``` ```` 3. Text mixed with inline code ``` `r ` ``` .pull-left[ ```` --- title: "Report" date: 2020-01-01 output: html_document --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) library(ggplot2) library(dplyr) library(knitr) library(registryr) ``` ```` ] .pull-right[ ```` We have data about `r registryr::fake_data %>% nrow()` participants. Out of which, `r registryr::fake_data %>% dplyr::filter(sex_cat == "Male") %>% nrow()` are males. The age distribution of the participants by severity is shown below: ```{r, echo = FALSE} mortality_summary <- registryr::fake_data %>% group_by(site_id) %>% summarise(n = n(), mortality_rate = sum(dead) / n) mortality_summary %>% slice_max(mortality_rate, n = 3) %>% kable() ``` ```` ] --- ### The output is: We have data about 40000 participants. Out of which, 19941 are males. The age distribution of the participants by severity is shown below: | site_id| n| mortality_rate| |-------:|----:|--------------:| | 19| 2000| 0.3320| | 8| 2000| 0.3285| | 13| 2000| 0.3255| <img src="registry_data_analysis_R_files/figure-html/unnamed-chunk-32-1.png" width="100%" /> --- ### Parameters * R Markdown documents can include one or more parameters. * Parameters are useful when you want to regenerate the same report with different inputs * This is very handy when you want to generate site reports (i.e. same reports for different sites) .pull-left[ ```` --- title: "Report" date: 2020-01-01 output: html_document params: SiteID: A --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) library(ggplot2) library(dplyr) library(knitr) library(registryr) ``` ```` ] .pull-right[ ```` We have data about `r registryr::fake_data %>% nrow()` participants. Out of which, `r registryr::fake_data %>% dplyr::filter(sex_cat == "Male") %>% nrow()` are males. The age distribution of the participants by severity is shown below: ```{r, echo = FALSE} mortality_summary <- registryr::fake_data %>% dplyr::filter(site_id == params$SiteID) group_by(site_id) %>% summarise(n = n(), mortality_rate = sum(dead) / n) mortality_summary %>% slice_max(mortality_rate, n = 3) %>% kable() ``` ```` ] --- ### Making several reports ```r library(knitr) render(input = "path_to_the_Rmd_file", params = "list_of_the_parameters_to_use", output_format = "output_fromat (e.g. pdf_document)", output_file = "the_name_of_the_output_file", ...) make_my_report <- function(SiteID) { render(input = "report.Rmd", params = list(SiteID = SiteID), output_format = "pdf_document" output_file = glue::glue("report_{SiteID}_{Sys.Date()}.pdf")) } input_df <- data.frame(SiteID = c("group_1", "group_2", "etc")) purrr::pwalk(input_df, make_my_report) ``` --- ### R Markdown formats * HTML * PDF * WORD * RTF * POWEPOINT * HTML Dashboards (requires Flexdashboard) * Interactive Dashboards (requires Shiny) * Websites (needs a little additional infrastructure) * ETC --- ## Interactive outputs and animation ```r DT::datatable(head(registryr::fake_data), fillContainer = FALSE, options = list(pageLength = 4)) ```
--- ```r graph <- registryr::fake_data %>% group_by(site_id) %>% summarise(mortality_rate = sum(dead) / n()) %>% mutate(n = seq(100, 2000, length.out = 20)) %>% ggplot(aes(x = n, y = mortality_rate)) + geom_point() + theme_bw() plotly::ggplotly(graph) ```
--- ```r p <- registryr::fake_data %>% group_by(site_id) %>% summarise(mortality_rate = sum(dead) / n()) %>% mutate(year = seq(2001, 2020)) %>% ggplot(aes(x = year, y = mortality_rate)) + geom_line() + theme_bw() p ``` <img src="registry_data_analysis_R_files/figure-html/unnamed-chunk-24-1.png" width="100%" /> --- ```r p <- p + gganimate::transition_reveal(year) animate(p) ``` <img src="registry_data_analysis_R_files/figure-html/unnamed-chunk-25-1.gif" width="100%" /> --- <img src="registry_data_analysis_R_files/figure-html/unnamed-chunk-27-1.gif" width="100%" height="60%" /> --- ## R Shiny Shiny is an R package that makes it possible to build interactive web apps straight from R. <iframe src="https://vac-lshtm.shinyapps.io/ncov_tracker?showcase=0" width="100%" height="500px" data-external="1"></iframe> --- <iframe src="https://monash-corrp.shinyapps.io/DCQR_NNT_dashboard?showcase=0" width="100%" height="600px" data-external="1"></iframe> --- class: inverse, center, middle # Reproducible Analysis --- ## Internal Packages * An internal packages wraps most common analysis tasks in functions * Improve an organisation’s code quality * Promote reproducible analysis frameworks * Enhance knowledge management ```r devtools::install_github('farhadsalimi/registryr') library(registryr) registryr::risk_adjust() registryr::create_adnet_report() registryr::population_pyramid() ``` --- ## Git and Github .pull-left[ <img src="http://www.phdcomics.com/comics/archive/phd101212s.gif" width="80%" /> ] .pull-right[ * Git is a version control system, tracks changes to your code and shares those changes with others * Git can be combined with GitHub, a website that allows you to share/back up your code with <img src="https://r-bio.github.io/img/rstudio-commit.png" width="60%" /> <img src="https://flight-manual.atom.io/using-atom/images/github-stage.png" width="60%" /> ] --- class: center, middle # Thanks! Slides were created via the R packages: [**xaringan**](https://github.com/yihui/xaringan)<br> [gadenbuie/xaringanthemer](https://github.com/gadenbuie/xaringanthemer) **No PowerPoint or Keynote was harmed during the making of these slides!!**