Processing math: 100%

About R-Markdown

The programming language R is, fundamentally, a language designed for statisticians and data scientists for analyzing data. One of the main benefits of using a programming language for analyzing data is that, by saving all instructions provided to the computer in code-files, we can easily make our analysis replicable and easy to share with others. This is one of the best advantages of working in data analysis with R.

Despite the advantages of using code for analyzing data and saving the steps followed in the analysis, there may be a small drawback when analyzing data with code: the human interface that code provides is not precisely friendly. Code files are helpful since they provide a system for communicating complex operations to the computer. Despite this, code is not the nicest interface where to read explanations of each step, or to comment on the results and findings of an analysis.

Often, when analyzing data, what practitioners do is perform a couple of steps and deciding what is the next step as a function of the outputs that have been obtained so far. Statisticians often make choices on how to proceed based on how data looks like

  1. after a couple preliminary visualizations,
  2. after the data has been cleaned (removing missing observations, eliminating redundant variables),
  3. after doing some data processing (creating auxiliary new variables, applying data transformations),
  4. after generating descriptive statistics and doing data visualizations (histograms, boxplots, scatterplots, etc…)
  5. after some analysis plans failed to work.

The task of analyzing data is inherently worked in incremental steps. In each step, we perform some operations or visualizations on our data. Then, we make choices on how to move forward based on the newly gathered information.

To improve the interface that data scientists have for working on their data analysis, as well as to help create a step-by-step interface for data analysis, R allows you to create notebooks with code inserted in them. These are known as R-Markdown documents. Notebook documents of this type have the extension document.Rmd when saved.

To clarify further R allows programming on two type of files: * “file.R” (source code): can only contain code and comments. * “file.Rmd” (R-Markdown notebook): more text, runs “chunks of code.” Useful for step-by-step work.

In this document, we provide the basics of R-Markdown for writing, coding and knitting documents. We will teach the basics for the purposes of our course. For a more comprehensive treatment of R-Markdown, check-out other resources like R Markdown: The Definitive Guide.

All class in the semester will be taught through R-Markdown documents. From this, it is important for you to have familiarity with this type of documents.

How R-Mardown works.

As you can see, an R-Markdown document is like a text document, where you can write. While the format of this document might seem minimal, you can change style like font, size, titles and orientation of the text with simple commands. While you will not see all of this formating in the document in progress, you can see a formatted version of the R-Markdown document by knitting your R-Markdown. some examples of format are:

  1. Bold text can be added with double asterisks **like this**.
  2. Italics text can be added in two ways: _like this_ and *like this*.
  3. We can add subscripts with the tilde ~ sign: like H~3~PO~4~ turns into H3PO4.
  4. Hyperlinks can be added to text with the [text](link) syntax. For example, you can find an R-Markdown tutorial here.
  5. You can add code-type text in your writing using the this.

In addition to writing text, you can create titles and subtitles for your text:

  1. A main title is written with two hashtags: ##
  2. A subtitle is written with three hashtags: ###
  3. A sub-subtitle uses four hastags: ####

Discussion Assignment 1

Due: 01/10/25

by Paul Plecnik

In addition to writing text with format, you can also write code. This can be done by creating a “chunk” of code.

## Printing Hello world!
char <- "Hello world!"
print(char)
## [1] "Hello world!"
## Operating on numerical variables.
x <- 5
print(x + 5)
## [1] 10
print(x+5)
## [1] 10

As you can see, you can run these lines of code by copy-pasting the lines in your R-console in the bottom-left pane. You can also run it by clicking the “play” green button in the upper-right corner of the chunk.


Note: To run multiple lines of code in RStudio, select lines and press CTRL + ENTER

To run an entire R code file, or a chunk of code, press CTRL + SHIFT + ENTER


Try using the CTRL + ENTER and CTRL + SHIFT + ENTER in the following chunk.

## I ran this line with only my keyboard.
print("I ran this line with only my keyboard.")
## [1] "I ran this line with only my keyboard."
## I ran this chunk with my keyboard.
print("I ran this chunk with my keyboard.")
## [1] "I ran this chunk with my keyboard."

In addition to running code, you can create plots which are automatically included in your output documents! This helps a lot, since you can create plots in a chunk, and then subsequently discuss the findings of the plot below in text.

## Exponential graph is created here.

x <- seq(-5,5, length.out = 100) 
y <- 2.71^x

plot(x,y, main = "Exponential Graph", type = "l", col= "red")

Without further instruction, chunks show both their code and the outputs of that code. When writing formal reports, this may not be desirable. To improve the aspect, you can:

  1. prevent the code from showing with echo=FALSE;
  2. modify the plot dimensions with fig.height and fig.width (measured in inches);
  3. add a caption describing the figure with fig.cap;
  4. change the alignment of the plot with fig.align.
## Parabola is created here using sequence function. 
## Also used the Plot function to create a graph.
## Color is red and main is parabola

x <- seq(-5,5, length.out = 100) 
y <- x^2
plot(x = x, y = y, type = "l", col = "red", main = "Parabola")
Figure 2: Plot of a parabola.

Figure 2: Plot of a parabola.

Exercise 1: Knitting your first RMarkdown

A) Below this, create a chunk of code. In the chunk, create a plot of the parabola y=ex for x between -5 and 5. It must have the following features:

  1. The dimensions of the plot should be 4in width x 5in height.
  2. Your code should be hidden.
  3. The plot should be of type line (i.e. type = "l"), and the line should be of color "red".
  4. The title of the plot should be Exponential Function.
  5. The caption of the plot should be “Figure 3: Visualization of the natural exponential function from -5 to 5”.

B) Knit your file to both PDF and HTML. If you have knitting issues, ask for help. To knit, you need to have the R package tinytex in your computer. If your knitting is not working, and your code does not have any issue, do the following:

  1. Go to the bottom-right pane, and click on the packages tab.
  2. Use the search function to check if tinytex package is on the list. If not, this means it is not yet installed.
  3. Go to the Console on the bottom-left pane, and run the line of code install.packages("tinytex").
  4. Try knitting a couple of documents to verify if it went well.

‘install.packages(“tinytex”)’