class: center, middle, inverse, title-slide # Creating Reports with RMarkdown ###
Dr Alyce Russell
Postdoctoral Research Fellow
a.russell@ecu.edu.au
### School of Medical and Health Sciences, ECU ### 19 June 2020 --- # Overview In today's talk, I will introduce how to create reports in RMarkdown. RMarkdown facilitates effective documentation of not only analyses performed and figures generated, but a complete record of all data cleaning/processing steps and exploratory analyses too (the latter where you may make decisions on what models to fit). <img src="./images/What-data-scientists-spend-the-most-time-doing.jpg" width="40%" style="display: block; margin: auto;" /> --- ## First things first... <img src="./images/DataCleaning.jpg" width="60%" style="display: block; margin: auto;" /> --- ## Have a good filing system in place You should have set 'working environments' for your analyses. At a minimum, I would include a data processing folder, cleaned data folder, folder containing any articles/scripts used as reference for your analyses, and your analyses folder with the necessary subfolders for different studies (for example). <br/> <img src="./images/folders.png" width="40%" style="display: block; margin: auto;" /> --- ## The folder commandments **Use folders and subfolders -** group files within folders so information on a particular topic is located in one place **Adhere to existing procedures -** check for established approaches in your team or department which you can adopt **Name folders appropriately -** name folders after the areas of work to which they relate and not after individual researchers or students. This avoids confusion in shared workspaces if a member of staff leaves, and makes the file system easier to navigate for new people joining the workspace **Be consistent -** when developing a naming scheme for your folders it is important that once you have decided on a method, you stick to it. If you can, try to agree on a naming scheme from the outset of your research project **Structure folders hierarchically -** start with a limited number of folders for the broader topics, and then create more specific folders within these **Separate data processing and final data -** it is a good idea to have a separate folder with all your final and cleaned datasets, separate from your working folders **Backup -** ensure that your files, whether they are on your local drive, or on a network drive, are backed up **Review records -** assess materials regularly or at the end of a project to ensure files are not kept needlessly. --- ## Things to consider when preparing data There are many things to consider when you are preparing your dataset for not only analysis but storage, since (in theory) it should be easy for someone to pick up where you left off. * Is it in the right format (i.e. did you convert numbers stored as text strings into numeric values, format dates, etc)? * Is it consistent and comparable? * Are there spelling mistakes in the variables or the values? * Do you have missing data? Have you considered imputation? * Are you certain the data you are cleaning was collected correctly? * Were your actions to clean the data justified or have you maybe gone too far (**beware!!**)? * Is what you did in the data cleaning steps repeatable? * Did you get rid of redundant blank spaces? * Did you remove duplicates? * Have you changed text to lower/upper/proper case consistently? * Etc... --- ## Suitable documentation At a very minimum, you should have at least detailed written notes that allow for the complete reproduction of your work. **This is especially important for reconciling your research at completion**. <br/> Optimally, you will have an automated script that runs your analyses from start to finish. <br/> #### RMarkdown and knitr, packages within the R Programming Language, give you the flexibility of storing all your data cleaning, labelling, exploratory analyses and final analyses/outputs in the same document, with executable code. --- ## Keeping a record of your 'working environment' Although using a version control protocol is preferred (i.e. you can set up GitHub to link with RStudio in a current R Project... not for today)... <img src="http://phdcomics.com/comics/archive/phd101212s.gif" width="30%" style="display: block; margin: auto;" /> --- ## Keeping a record of your 'working environment' At a minimum, you can record the `sessionInfo()`, which takes a snapshot of all the packages and package versions within the current working environment. Cool, huh? ```r sessionInfo() ``` ``` ## R version 4.0.0 (2020-04-24) ## Platform: x86_64-w64-mingw32/x64 (64-bit) ## Running under: Windows 10 x64 (build 18362) ## ## Matrix products: default ## ## locale: ## [1] LC_COLLATE=English_Australia.1252 LC_CTYPE=English_Australia.1252 ## [3] LC_MONETARY=English_Australia.1252 LC_NUMERIC=C ## [5] LC_TIME=English_Australia.1252 ## ## attached base packages: ## [1] stats graphics grDevices utils datasets methods base ## ## loaded via a namespace (and not attached): ## [1] compiler_4.0.0 magrittr_1.5 tools_4.0.0 htmltools_0.4.0 ## [5] xaringan_0.16 yaml_2.2.1 Rcpp_1.0.4.6 stringi_1.4.6 ## [9] rmarkdown_2.1 knitr_1.28 stringr_1.4.0 xfun_0.13 ## [13] digest_0.6.25 rlang_0.4.6 evaluate_0.14 ``` --- ## What is this wizardry?! A suite of packages in R, known as the `tidyverse`, contains particular functions for reading, processing, manipulating, calculating, plotting, and reporting your analyses. You can even 'pipe' between the individual `tidyverse` packages so they flow between functions (neat, right?!). <br/> <img src="./images/tidyverse.png" width="80%" style="display: block; margin: auto;" /> --- ## What is this wizardry?! The rest of this presentation assumes some knowledge in R. If you are a beginner, however, there will be clickable links to useful resources along the way. <br/> Check out [https://tidyverse.org] for access to the **R for Data Science** eBook (an excellent first reference) as well as working through RStudio workshops, including [https://rmarkdown.rstudio.com] and [https://bookdown.org/yihui/bookdown/] for more on RMarkdown functions. <br/> Before you begin, make sure you have R and RStudio downloaded on your computer. These are easily searchable through the <img src="https://www.google.com/logos/doodles/2015/googles-new-logo-5078286822539264.3-hp2x.gif" width="10%" /> search engine. Download the latest version for your computers capacity. --- ## What is RMarkdown? RMarkdown documents are **fully reproducible**. You code into the `RMarkdown` document using RStudio, then use `knitr` to weave the narrative text, code chunks and outputs together to produce elegantly formatted reports (in Word, PDF, HTML, and it even has formats to produce these slides!!!). <img src="http://applied-r.com/wp-content/uploads/2019/01/rmarkdown_workflow.png" width="30%" style="display: block; margin: auto;" /> --- ## Why not use normal scripting?? Simple answer is, you can. However, writing in the console or a basic script can be error prone. <br/> RMarkdown files are fully executable so you can be sure there are no mistakes once the file is weaved together (it even sends error messages to let you know where issues are within the document). <br/> <img src="./images/RMarkdownFlow.png" width="80%" style="display: block; margin: auto;" /> --- ## It is almost practise time, but here's a quick run down first ```r install.packages("tidyverse") # run this line once per computer library(tidyverse) # run this line everytime you want to use any of the tidyverse packages #NOTE: text can be left in code chunks by using a hash symbol from the point of code you want "blanked out"" ``` <br/> ### Two options are available for inputting code * Code chunks (as shown above) * Inline code --- ## Code chunks have many options * `include = FALSE` - code runs and results be used by other chunks, but doesn't appear in finished file * `echo = FALSE` - prevents code, but not the results from appearing in the finished file (useful way to embed figures) * `eval = FALSE` - prevents code chuck from being evaluated (useful way to show but not run code) * `message = FALSE` - messages do not appear in the finished file * `warning = FALSE` - warnings do not appear in the finished file * `fig.cap = "..."` - adds a caption to graphical results <br/> See this [**RMarkdown Cheatsheet**](https://rstudio.com/wp-content/uploads/2015/03/rmarkdown-reference.pdf) for further options. See the [**Definitive RMarkdown Guide**](https://bookdown.org/yihui/rmarkdown/) for more options. --- ## Inline Code Code results can be inserted directly into the text of a .Rmd file by enclosing the code with ```r `r` #you get these side flicks just under the ESC button ``` --- ## Correct "SOP" for starting an RMarkdown document * Set up your folder directory in a sensible location and include relevant subdirectories (i.e. common subdirectories are `data`, `images`, `results`, etc...) * Open RStudio * Click on `File` -> `New Project` -> `Existing Directory` and find your newly created RMarkdown directory * Open a new RMarkdown document, via `File` -> `New File` -> `RMarkdown`. Name it however you like and select `HTML` for now. <img src="images/NameCreateRMarkdown.jpg" width="30%" style="display: block; margin: auto;" /> --- ## Adding images Add images to text chuck using: `` A caption can go within the [ ]. It is best to store these images within a separate `image` subdirectory in the .Rmd file directory (i.e. where you set up the RProject). You can also use this format in a code chunk: ```r knitr::include_graphics('./images/RLadiesLogo.png') ``` <img src="./images/RLadiesLogo.png" width="50%" style="display: block; margin: auto;" /> -- **FUN FACT: you can insert hyperlinks into your document using `[](webaddress.com)`. ** --- ## You can create simple tables All that is required is inputting the below into the normal code, not the code chunks themselves. ```r Table Header | Second Header ------------- | ------------- Table Cell | Cell 2 Cell 3 | Cell 4 ``` Table Header | Second Header ------------- | ------------- Table Cell | Cell 2 Cell 3 | Cell 4 <br/> > Other options for tables are available but check online for full options --- ## Debugging tip before we begin * **Line number in error message -** note that this will refer to the beginning of the code chunk, hence shorter code chunks make debugging easier, as well as make them clearly visible in your document. <br/> <img src="https://cdn2.iconfinder.com/data/icons/color-svg-vector-icons-2/512/error_warning_alert_attention-512.png" width="30%" style="display: block; margin: auto;" /> --- ## References **Below are some reference materials you may use to further your learning journey. ** <br/> * A workshop I gave earlier in the year summarising a lot of the learning resources in R: [https://youtu.be/GWbgDYRaPr8] * A website deployment introducing reproducible research: [http://ropensci.github.io/reproducibility-guide/] * Another workshop, this time extending into version control: [http://kbroman.org/steps2rr/] * A summary of the different output types you can generate in RMarkdown: [https://mdozmorov.github.io/BIOS567.2017/presentations/01a_Markdown/01e_Presentation.pdf] * Another workshop on reproducible research, with links to examples of the different outputs available: [http://applied-r.com/project-reporting-template/] * Remember, many R-users deploy their code online, which makes learning new code easier (i.e. you can look at the final product and code tandemly to figure out the options needed for your document). [**R-Ladies Global**](https://rladies.org) GitHub repository can be found here, and you can download other respositories as a reference during your learning journey: [https://github.com/rladies] --- ## Let's do this!! 1. Create a file directory for your RMarkdown project. 2. Open RStudio, create new project (assign to existing directory just created), create new RMarkdown `HTML` file. 3. Go to [https://rpubs.com/R-LadiesPerth/creating-tables-rmarkdown] and follow along. --- class: center, middle, inverse # Thanks! Dr Alyce Russell Email: a.russell@ecu.edu.au Twitter: [**@nerdrusty**](https://www.twitter.com/nerdrusty) <br/> R-Ladies Perth: Follow us on our [**Meetup Page**](https://www.meetup.com/rladies-perth/) or [**Twitter @RLadiesPerth**](https://www.twitter.com/RLadiesPerth) <br/> These slides were created via the R packages [**xaringan**](https://github.com/yihui/xaringan), [**knitr**](http://yihui.name/knitr), and [**R Markdown**](https://rmarkdown.rstudio.com), as well as learning from the array of uploads on the [**RLadies Global GitHub Repository**](https://github.com/rladies). Practise data modified from [**Data Carpentries**](https://datacarpentry.org/lessons/#ecology-workshop).