Return to Home Page


Dee Chiluiza, PhD
Northeastern University
Introduction to data analysis using R, R Studio and R Markdown
Short manual series: Basic codes for R Markdown


Important notes before you read the document:


R Markdown

In my opinion, R Markdown is one of the best tools to prepare professional reports while working with data analysis in R. If your purpose is to perform pure data analysis, R Script files are preferred to focus on that task. Make sure you choose the right tool for the work you are performing. I like to use R Markdown in my Analytic classes for the following reasons:

Like anything regarding coding, R markdown can be as complicated as you decide to, but remember that you are learning, focus on the basics. Once you understand this document, later on, and based on your personal exposure to codes, you will be able to create complex presentations. For now, again, focus on the basics.

Follow this document, creating an R Markdown document, and practicing all the codes. Remember to Knit your Rmd file in a regular basis to observe the outcome document.

Let’s start!


Install R Markdown

If your R Studio does not have R Markdown, perform these tasks on the console:

Visit the following page to learn more about R Markdown: https://rmarkdown.rstudio.com/

Click on the right Code icon to access the list of libraries and data sets used on this document.

# {r, library_data, message=FALSE, warning=FALSE}

# Libraries used in this document
library(tidyverse)
library(ggplot2)
library(dplyr)
library(gridExtra)   # For grid.arrange()
library(grid)        # For grid tables
library(DT)          # For datatables
library(knitr) 

# Data sets used in this document
data("faithful")
data("mpg")
data("iris")
data("mtcars")


The YAML Header

On top of your R Markdown document you will have a YALM headed (John, n.d). This header contains important information for the document you are creating.

# {r, fig.align = 'center', out.width="100%", out.height="100%"}
knitr::include_graphics("Images/RMk01.JPG") 
Figure 1. <B>A</B>. When you create a new R Markdown document from the top-left drop-down menu, make sure you select HTML as the Default Output Format. <B>B</B>. You will then obtain a document with basic codes. <B>C</B>. Knit the document to obtain the HTML outcome file.

Figure 1. A. When you create a new R Markdown document from the top-left drop-down menu, make sure you select HTML as the Default Output Format. B. You will then obtain a document with basic codes. C. Knit the document to obtain the HTML outcome file.



My personal preference is to delete all these codes and start fresh with my basic codes (figure below).

- 1. I remove tittle, author and date from the YALM header and create a title for my document using HTML codes.
- 2. I add code_folding: “hide” to the html command.
- 3. I add a new code for date, this code will introduce the actual date the document is knitted. If you pay attention to the new date code, it will introduce the data using your computer date information. Time can also be added.

Remember: These YALM headers can contain complex commands, we are doing basics at this point.

# {r, fig.align = 'center', out.width="100%", out.height="100%"}
knitr::include_graphics("Images/RMk02.JPG") 
Figure 2. Recommended codes to start your first R Markdown experience.

Figure 2. Recommended codes to start your first R Markdown experience.



R Chunks

You know how to use basic codes for your data analysis projects, and you probably are familiar with R Script files. In R Markdown, codes are prepared inside what is called “R Chunks.”
- Create your first R Chunk by going to the drop-down “Insert” Icon on top of your Rmd file and select R.

# {r, fig.align = 'center', out.width="100%", out.height="100%"}
knitr::include_graphics("Images/RMk03.JPG") 
Figure 3. <B>A</B>. insert drop-down menu. <B>B</B>. New R chunk. 1. Allows collapsing and expanding code contents, 2. the area where the codes are entered, 3. options menu, 4. Run R chunks above, 5. run the current R chunk.

Figure 3. A. insert drop-down menu. B. New R chunk. 1. Allows collapsing and expanding code contents, 2. the area where the codes are entered, 3. options menu, 4. Run R chunks above, 5. run the current R chunk.


The new R chunk contains only {r}, this can be filled with many commands from a long list of options. Here you will learn just a few to get you started:


- 1. You can enter a name for the chunk after the r, it is optional. Write the name after an space, here some examples: {r libraies}, {r plot1}, {r ANOVA_codes}, etc.
- 2. Use comas to enter commands after the r or after the name.
- 3. echo=FALSE will run and present the outcomes in the HTML file, but not the codes. echo=TRUE will present the codes, it is default and you do not need to write it.
- 4. include=FALSE will run the codes but will not present the codes or outcomes in the knitted file.
- 5. fig.align = ‘center’ will align any figure presenting to the center of the page. It can be changed to left or right.
- 6. out.width=“100%” and out.height=“100%” control the size of the figure you are presenting. It can be changed to a different percentage. - 7. fig.width = 6 and fig.height = 6 control the size of the printed figure in inches. It can be changed to a different value.

Note: Remember to knit your document to be sure there are no mistakes on your codes.


Activate libraries and import data

This is the way I like to activate libraries and to insert codes to import data into my R Markdown files; you can use the same strategy to start. Some people prefer to enter the library names anywhere they are used for the first time.

I like to create an R chunk right below the YALM header. In this R chunk I enter the codes for every library I used in the document and the codes for all data sets I import.
For this R chunk, I include message=FALSE and warning=FALSE inside the {r} code to prevent any message from being presented in my HTML knitted file. The libraries are activated and data sets imported. You can see the code at the very beginning of this document, before the title. Here is the code I use:

{r library_data, message=FALSE, warning=FALSE}


Working with your data



Example # 1a

data1 = c(2, 5, 3, 6, 8, 12, 9, 25, 8, 10, 16)
mean(data1)
## [1] 9.454545
sd(data1)
## [1] 6.51711
sum(data1)
## [1] 104
sqrt((9/2)+(16/5))
## [1] 2.774887
This is quite a very un-organized way to present data results. Observe example 1b, inline R codes section, example 1c, and Tables section for better organization ideas.



Example # 1b

Observe what happens when I provide names to all codes and equations inside the R chunk.
Hint: only the R chunk is presented, not the data results. Now you control the behaviour of that data.
I used the R chuck below to create five objects: data1, data1_mean, data1_sd, data1_sum, and equation_1.
Again: click the Code icon on the right to see the codes inside the R chunk.
Note. I like to use the equal symbol (=) instead of the arrow (<-) to enter object names.

data1 = c(2, 5, 3, 6, 8, 12, 9, 25, 8, 10, 16)
data1_mean = mean(data1)
data1_sd = sd(data1)
data1_sum = sum(data1)
equation_1 = sqrt((9/2)+(16/5))


Inline R codes

Now for the fun part. In example 1b I created four different objects, now I can use inline R codes to present their values on the main text of my reports.
I do not need to repeat the code since they are already created above.
Inline R codes are used to call those values and present them in an organized manner.
For now, let’s focus on two ways to use inline r codes.

  1. Using only one back quote*. This will present the values using the same format as the main text of your report.

  2. Using two back quotes. This will highlight the values.


This is how the two inline R codes look like when presenting the data1_mean value.

Remember: The difference between the two codes, is that the first one produces a regular text, the second one produces a high-lighted text.



Example # 1c

Let’s use inline R codes to present the values we produced above.
Note: the mean of data 1 contains six decimals (9.454545), let’s use code round() to reduce the amount of decimals.
Recommendation: Use code round() ONLY when you are ready to present the data, never when you are performing your calculations or creating objects.
Observe the values below:

Here is how the inline R codes look on the original Rmd file:


Notice that I did not use round() on the equation value, therefore, it was presented with too many decimals. This is not considered good data presentation. It is a good practice to control the number of decimals for proper data presentations.

Presenting JPG or PNG images


Images can be inserted on your R Markdown documents. In the section “Creating an R Project,” I mentioned about my recommendation to create at least two folders inside your R project folder: a folder named DataSets and a folder named Images.

If you produce an image using any other program, for example, using Power Point, Photoshop, or if you want to present one of your photographs, drawings, or an image you found from in the Internet (remember to mention the reference), save those images as JPG or PNG inside the “Images” folder.

Once you have an image in the "Images’ folder, you can present it on your Rmd report using the following two strategies:

Option 1. Enter code on the R chunk and call the image from inside the same R chuck..
It is very simple: use the code include_graphics() from library knitr, therefore knitr::include_graphics. The code to insert the image of my dog Bruno is presented below. Notice the use of fig.cap=" " to enter a caption for the figures. Also notice the use of folder named “Images/”to locate the picture of my dog.

# R chunk code used: {r, fig.align = 'right', out.width="50%", out.height="50%", 
# fig.cap="Picture 1. Bruno image at 50%"} 

knitr::include_graphics("Images/bruno.JPG") 
Picture 1. Bruno image at 50%

Picture 1. Bruno image at 50%


Option 2. Using inline R codes. Remember that inline R codes are not used inside R chunks, they are written in the main text of your report.
Here is an inline R code and the image presented.

Bruno

Bruno


Presenting Graphs


Option 1. Presenting one figure.

Simply, write the codes for your graphs inside the R chunk. My preference is to create a graph using an object name (in this case I call them graph_1 and graph_2).

graph_1 =  plot(faithful$eruptions ~ faithful$waiting,
                las=1,
                ylab = "Eruption time in minutes",
                xlab = "Waiting time in minutes",
                col = "#A11515")
abline(lm(faithful$eruptions ~ faithful$waiting), col="blue")



Option 2. Presenting two figures. If you use basic graphs, use the par(mfrow=c(,)) or par(mfcol=c(,)) codes. For basic graphs, I refer to plot(), barplot(), hist(), etc. Compare to option 3.

# par code to present figures using 1 row and 2 columns. You can present 4 figures using (2,2)
par(mfrow=c(1,2))

# Figure 1 
graph_1 =  plot(faithful$eruptions ~ faithful$waiting,
                las=1,
                ylab = "Eruption time in minutes",
                xlab = "Waiting time in minutes",
                col = "#A11515")
abline(lm(faithful$eruptions ~ faithful$waiting), col="blue")

# Figure 2
graph_2 =  boxplot(faithful$eruptions,
                   las=1,
                   col = "#2BD5D5",
                ylab = "Eruption time in minutes")
points(mean(faithful$eruptions), 
       pch = 18, 
       col = "#F20909", 
       lwd = 7)



Option 3. Presenting two figures prepared using GGPLOT2. In this case, you need to use the code grid.arrange() from library(gridExtra)

ggd1 = iris %>%
        ggplot(aes(x=Petal.Width, 
                   y=Petal.Length),
               color="#66DACA") +
        geom_point(alpha=0.4) + 
        geom_smooth(method = lm)
ggd2 = iris %>%
        ggplot(aes(x=Petal.Width, 
                   y=Petal.Length,
                   color=Species)) +
        geom_point(alpha=0.4) + 
        geom_smooth(method = lm) + 
        geom_density2d()


grid.arrange(ggd1, ggd2, nrow=2, ncol=1)



Presenting Tables

Although there are many ways to present tables in R Markdown, an easy way to start (and enough for our classes reports) is to use the Knit::kable() code. Observe the code below:

table_1 = head(iris)
table_2 = tibble(rbind(head(mtcars,2), tail(mtcars,2)))

Using the object names we created (table_1 and table_2 in this case), we can present the tables from inside the R chunk or using inline R codes; let’s practice with inline R codes.

Table 1 is presented using inline R code: ` r knitr::kable(table_1)`.

Table 2 is presented using inline R code: ` r knitr::kable(table_2, full_width = T)`.

Table 1.

Sepal.Length Sepal.Width Petal.Length Petal.Width Species
5.1 3.5 1.4 0.2 setosa
4.9 3.0 1.4 0.2 setosa
4.7 3.2 1.3 0.2 setosa
4.6 3.1 1.5 0.2 setosa
5.0 3.6 1.4 0.2 setosa
5.4 3.9 1.7 0.4 setosa

Table 2

mpg cyl disp hp drat wt qsec vs am gear carb
21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
15.0 8 301 335 3.54 3.570 14.60 0 1 5 8
21.4 4 121 109 4.11 2.780 18.60 1 1 4 2


Enter additional spaces in a line.

Use the \ line followed by an space. Use it as many times as needed:

Used 0 times: Keep the social distance.
Used 1 times: Keep the social  distance.
Used 2 times: Keep the social   distance.
Used 3 times: Keep the social    distance.
Used 6 times: Keep the social       distance.
Used 12 times: Keep the social             distance.

Enter comments that will not appear on the outcome file.

To enter a comment that you need on your Rmd file but you do not need to present on your HTMl report, use the following format:
<!-- Comment -->


Enter Internet Links

To enter Internet links, enter the name to display in square brackets, and the link in parenthesis.

For example: [Name to display](Internet Link).
Below there is a link to an R Markdowm manual, notice that the internet link is not sisplayed.

R Markdown: The Definitive Guide.


Codes to prepare your text

There are several options to format the main text of your report. In this case, you will use HTML codes (Moraes, 2020).

• Observe that all HTML codes you will use here, are surrounded by <>.
• An opening code indicated where the format begins, you need to present a closing code </> to indicate where it ends.

Here are the most comon codes:

• To make your text bold:
Code: <B>My text is bold</B>
Result: My text is bold

• To italicize your text:
Code: <I>My text is italicized</I>
Result: My text is italicized

• To underline your text:
Code: <U>My text is underlined</U>
Result: My text is underlined

• To center your text, this code can also be used to center images, tables, etc.:
Code: <CENTER>My text is at the center</CENTER>
Result:

My text is at the center

• To change the size of your text:
Code: <FONT SIZE = 1>My text is size = 1</FONT>

My text is size = 1

My text is size = 2

My text is size = 4


Codes to add colors to your text

• To change the color of your text. Here you have a couple of options, we will review two for now.

Option 1
- One option is to use the name of the color, in quotations: “blue”, “red”, “pink”, etc.
<FONT COLOR = “1”blue">My text is blue = 1</FONT>.
My text is blue
My text is red
My text is pink

Option 2
- Another option is to use the RGB palette codes, sometimes referred as the hexadecimal RGB color specification.
Check the following website to find the codes:


Taken from: https://www.rapidtables.com/web/color/RGB_Color.html

e.g., <FONT COLOR = “#55B4E4”>My text has a new color 1</FONT> : My text has a new color

Combine color and size
Observe the following example:

Code: <FONT SIZE=2, COLOR = “#A11515”>My text has a new color and is size 3</FONT>
Result: My text has a new color and is size 3

Combine FONT, B, and other codes
Several codes can be combined to provide your text with different formats.
It is important to remember to close all commands you use. Observe the following examples:

Code: <B> <FONT SIZE=2, COLOR = “#119A1F”>My text is size 2, bold and green</FONT> </B>
Result: My text is size 2, bold and green


How to remove the border from the white output boxes

To complete this task, enter the following code at the beginning of your document, or where you prefer to start hiding codes.

Note: Notice that this code goes in the main text of the document, not inside R chunks, and also, that the code does not appear in the HTML document.

2+2*12
## [1] 26
This is a JPG image of the code:


Conclusions and final remarks

R Markdown is a very useful tool to prepare codes and at the same time prepare documents, presentations, interactive reports (Holtz, 2018). It has been available since 2012 when it was introduced as part of the knitr package (Xie, 2015; Xie, 2021).
As a new student in data analysis, you should try to incorporate R Markdown as one of your professional tools. In its basic structure, it is easy to understand and very soon you will be able to start producing beautiful professional documents.
Preparing your reports in R and passing figures, tables, data results, etc. to word processor programs is a time-consuming task. Using R Markdown, you can do all at the same time.
R Markdown allows you to produce not only HTML but also WORD and PDF documents. As mentioned at the beginning, this manual was created to help my students in the Introduction to Analytic and probability Theory classes. You are learning both: analytic concepts and programming in R. In that context, this manual presents just some basic concepts and codes to get you started producing basic but beautiful reports in HTML format. As any other coding package, R Markdown and HTML language, can be as complex as you can imagine; start by learning these concepts and codes, and later on, with continuous use, you will become familiar with other codes and coding strategies, adding an unlimited number of presentation possibilities to your professional reports.


References and additional recommended reading materials:


Disclaimer: This short series manual project is a work in progress. Until otherwise clearly stated, this material is considered to be draft version.


Dee Chiluiza, PhD
10 January, 2022
Boston, Massachusetts, USA

Bruno Dog