R Markdown

Intro to R

Every programming language has similar elements. We need to input data, identify variables, perform calculations, create functions, control program flow, and output data and graphics. The R language is attractive in that it has it has a full online development environment -RStudio Cloud- and built in functions for regression analysis, solving equations, and producing graphics. R is increasingly used for those involved in data analysis and statistics, particularly social sciences, bio-statistics, and medicine. References on R tend to emphasize different features of R more suitable for social sciences, data mining, ecology, and medicine. A good reference for scientists using numerical methods is available1, and an Analytical Chemistry textbook2 that provides examples in R is available. This manual will present the elements of R gradually, as needed, mostly through chemical data examples. This document itself is created with RMarkdown, which integrates a document creation with R code.

R in RStudio Cloud

We will write our programs on a online platform (an IDE) called Rstudio Cloud (RStudio Cloud is being renamed Posit Cloud), which makes it platform independent - we access the program the same way on a PC, Mac, or Chromebook. The Rstudio Cloud environment is divided into 4 main sections, with the most important elements listed below.

  • Top left: Script - where you write R code
  • Bottom left: Console - show output
  • Top right: Environment - we will mostly ignore this
  • Bottom right: show Plots, help, packages

To start writing a program in Rstudio Cloud:

create a Space (like a file folder that can contain related programs) , select New project, and from the menu:
File ——> Newfile —–> Rscript

& start typing!

To run a program, highlight the code and select “run”.

You can create multiple Projcts and R scripts in a single Space. Spaces are displayed on the left and you can create multiple spaces. In the tools icon (upper right) spaces can be shared.

When you run a script program, results and error messages will appear on the console, and plots appear on the plot area.

In this document, the shaded area is R code, and the output appears in a white background following thr shaded R sceipt. The theory is described in the text, and the details of R code are described in the code comment (text preceded by #)

Vectors & Numerics

In R we use “<-” to set a variable to a value. R uses “vector” calculation, which avoids use of loops in more traditional programming. For instance, below we set x equal to a series of values (a vector), and then calculate a series of values of y.

#  A comment line 

x <- 5.0  
#  Set x equal to 5.0

y <- x^2
# y is equal to x squared. 

x <- c(1.0, 2.0, 3.0)
# x is equal to a numbered list of values - a “vector” 

x <- seq(1,2,0.2)      
# create an incremented sequence

xx <-matrix(c(2,4,6,8),2,2)
# a matrix row 1 = 2 6 row 2 = 4 8
#  length(x) returns the length of the vector x.
#  x returns all the values of x

length(x)    
## [1] 6
x           
## [1] 1.0 1.2 1.4 1.6 1.8 2.0
#  length(x) returns the length of the vector x.
#  x returns all the values of x
xx
##      [,1] [,2]
## [1,]    2    6
## [2,]    4    8
#  exponents and logs

10^(-2) 
## [1] 0.01
# power of 10

exp(1)
## [1] 2.718282
#  power of 2.718282 .....

log(10)  
## [1] 2.302585
# base e

log10(10)   # base 10
## [1] 1

Once we have a series of x and y, of equal length, we can easily create a graph.

x   <-  seq(0,10,0.5) 
# a sequence from 1 to 10, increments 0f 0.5

y  <-   x^2 
#  Note that y is calculated for every x.
#  This is called vectorized.

plot(x,y)  

# Create a plot. We can add a lot of formatting!

Below is an example of basic formatting commands.

plot(x,y,type = "b",main = "A Formatted Graph",
col = "darkblue", xlab = "X Label", ylab = "Y Label")

type=“p”: for points (by default) type=“l”: for lines type=“b”: for both; points are connected by a line

Functions

We can define a function and evaluate it later in the R script.

Afunc <- function(x)  {x^2 + log(x)}

y <- c(1,2,3)

Afunc(y)
## [1]  1.000000  4.693147 10.098612
# note the vectorized evaluation 

Reading and Writing Data Files

With larger data sets, as in a titration, we prefer to be able to read a data file, such as output from a spreadsheet. A common filetype is csv (comma separated values) files. Usually, the first row contains the headers, and them remaining rows the values. If you upload a csv file into your working directory, they can be read into Dataframes. Dataframes are columns of values that do not have to be of the same type (Names, dates, income, etc), and they are very important in many R applications. However, here’ we are focusing on vectors, and we can convert a dataframe with two or more numeric columns (such as volume and pH) into respective vectors. The data file “mydat.csv” looks like this:

vol, pH
1.5 , 3.2
3.0, 6.6
7.0, 14.0

# Mydata <- read.csv("mydat.csv")
# a file with headers vol and pH 

#  Mydata 
#  it's a dataframe

#  volume <- Mydata$vol  
# extract column vol as vector

#   pH <- Mydata$pH       
# extract column pH as vector 

#  volume 

#  pH

#  and to write a vector file

# write(volume,file="newfile")
#  Exponential fit, US population growth 1790 t0 1850

year <- c(0,10,20,30,40,50,60)  
length(year)
## [1] 7
pop <-  c( 3.929, 5.308, 7.240, 9.638, 12.866, 17.069, 23.192)
length(pop)
## [1] 7
#   A = Ai * exp(-k*t)
tryfit <- nls(pop ~ alpha * exp(k*year) ,  
              start = c(alpha = 2.0, k = 0.05))

plot(year,pop,main = "US Population Growth, 1790-1850", xlab = "Starting 1790", ylab = "Population / Million")
lines(year,predict(tryfit),col="red")

summary(tryfit)
## 
## Formula: pop ~ alpha * exp(k * year)
## 
## Parameters:
##        Estimate Std. Error t value Pr(>|t|)    
## alpha 3.9744064  0.0407277   97.58 2.14e-09 ***
## k     0.0293421  0.0002023  145.02 2.96e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.09817 on 5 degrees of freedom
## 
## Number of iterations to convergence: 5 
## Achieved convergence tolerance: 5.046e-08
#   us population logistic  

year <- seq(0,150, by = 10)    
year
##  [1]   0  10  20  30  40  50  60  70  80  90 100 110 120 130 140 150
pop <-  c( 3.929, 5.308, 7.240, 9.638, 12.866, 17.069, 23.192, 31.443, 38.558,
  50.156, 62.948, 75.996, 91.97,  105.711, 122.775, 131.66)


length(year)
## [1] 16
length(pop)
## [1] 16
pop
##  [1]   3.929   5.308   7.240   9.638  12.866  17.069  23.192  31.443  38.558
## [10]  50.156  62.948  75.996  91.970 105.711 122.775 131.660
Pi <- 3.929     #  Initial Population

tryfit <- nls((pop ~   (K*Pi)/(Pi + (K - Pi)*exp(-r*year))), 
                start = list( K = 155,  r = 0.03))

summary(tryfit)
## 
## Formula: pop ~ (K * Pi)/(Pi + (K - Pi) * exp(-r * year))
## 
## Parameters:
##    Estimate Std. Error t value Pr(>|t|)    
## K 1.882e+02  2.967e+00   63.44   <2e-16 ***
## r 3.158e-02  1.615e-04  195.56   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9336 on 14 degrees of freedom
## 
## Number of iterations to convergence: 3 
## Achieved convergence tolerance: 1.947e-06
length(predict(tryfit))
## [1] 16
x <-   predict(tryfit)
x 
##  [1]   3.929000   5.346405   7.255018   9.808444  13.195035  17.635352
##  [7]  23.370397  30.635572  39.616558  50.388586  62.850930  76.680934
## [13]  91.337289 106.129855 120.342752 133.367268
r <- 0.029
xpo <- 3.97 *  exp(r* year)
xpo
##  [1]   3.970000   5.305617   7.090573   9.476036  12.664035  16.924565
##  [7]  22.618453  30.227923  40.397427  53.988232  72.151357  96.425057
## [13] 128.865097 172.218857 230.158015 307.589498
plot(year,pop,main = "US Population Growth",xlab = "Year, 1790 to 1940", ylab = "Population / Millions")
# plot(year,pop)
  lines(year,xpo,col = "blue")
  lines(year,predict(tryfit), col = "red")

  plot(year,residuals(tryfit))

  residuals(tryfit)
##  [1]  0.00000000 -0.03840518 -0.01501772 -0.17044402 -0.32903454 -0.56635183
##  [7] -0.17839749  0.80742797 -1.05855752 -0.23258554  0.09706985 -0.68493355
## [13]  0.63271051 -0.41885454  2.43224833 -1.70726803
## attr(,"label")
## [1] "Residuals"
  #  Bacteria Growth, Exponential
  
  
  minutes <- seq(0,64, by = 16)  
  length(minutes)
## [1] 5
  pop <-  c(0.022,0.036,0.060,0.101,0.169)
  length(pop)
## [1] 5
  tryfit <- nls(pop ~ alpha * exp(k*minutes) ,  
                  start = c(alpha = 0.05, k = 0.05))
  
  plot(minutes,pop,main = "Bacterial Density", xlab = "Minutes", ylab = "Relative Population Density")
  lines(minutes,predict(tryfit),col="red")

  summary(tryfit)
## 
## Formula: pop ~ alpha * exp(k * minutes)
## 
## Parameters:
##        Estimate Std. Error t value Pr(>|t|)    
## alpha 0.0215382  0.0001423   151.3 6.36e-07 ***
## k     0.0321848  0.0001155   278.6 1.02e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.0003288 on 3 degrees of freedom
## 
## Number of iterations to convergence: 6 
## Achieved convergence tolerance: 1.927e-08
  #    Logistic 
  
  minutes <- seq(0,160, by = 16)    
  minutes
##  [1]   0  16  32  48  64  80  96 112 128 144 160
  pop <-  c(0.022,0.036,0.060,0.101,0.169,0.266,0.360,0.510,0.704,0.827,0.928)
  
  
  length(minutes)
## [1] 11
  length(pop)
## [1] 11
  pop
##  [1] 0.022 0.036 0.060 0.101 0.169 0.266 0.360 0.510 0.704 0.827 0.928
  Pi <- 0.022     #  Initial Population
  
  tryfit <- nls((pop ~   (K*Pi)/(Pi + (K - Pi)*exp(-r*minutes))), 
                  start = list( K = 1.5,  r = 0.03))
  
  summary(tryfit)
## 
## Formula: pop ~ (K * Pi)/(Pi + (K - Pi) * exp(-r * minutes))
## 
## Parameters:
##    Estimate Std. Error t value Pr(>|t|)    
## K 1.1653827  0.0275806   42.25 1.16e-11 ***
## r 0.0335128  0.0003406   98.39 5.87e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.01226 on 9 degrees of freedom
## 
## Number of iterations to convergence: 5 
## Achieved convergence tolerance: 3.757e-06
  length(predict(tryfit))
## [1] 11
  x <-   predict(tryfit)
  x 
##  [1] 0.02200000 0.03711208 0.06204156 0.10220017 0.16447787 0.25558274
##  [7] 0.37808856 0.52540371 0.68050449 0.82254397 0.93694239
  r <- 0.032
  xpo <- 0.0215 *  exp(r* minutes)
  xpo
##  [1] 0.02150000 0.03587544 0.05986266 0.09988834 0.16667619 0.27812007
##  [7] 0.46407814 0.77437243 1.29213728 2.15609272 3.59771045
  plot(minutes,pop,main = "Bacterial Growth",xlab = "Minutes", ylab = "Relative Population Density")
  # plot(year,pop)
  lines(minutes,xpo,col = "blue")
  lines(minutes,predict(tryfit), col = "red")

  #  ********************************************************
 
  
  #  Simulate population density

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

summary(cars)
##      speed           dist       
##  Min.   : 4.0   Min.   :  2.00  
##  1st Qu.:12.0   1st Qu.: 26.00  
##  Median :15.0   Median : 36.00  
##  Mean   :15.4   Mean   : 42.98  
##  3rd Qu.:19.0   3rd Qu.: 56.00  
##  Max.   :25.0   Max.   :120.00

Including Plots

You can also embed plots, for example:

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.


  1. O Jones, R Maillardet, A Robinson. Introduction to Scientific Programming and Simulation Using R, CRC Press (2009)↩︎

  2. D. Harvey. Analytical Chemistry↩︎