Every programming language has similar elements. We need to input data, identify variables, perform calculations, create functions, control program flow, and output data and graphics. The R language is attractive in that it has it has a full online development environment -RStudio Cloud- and built in functions for regression analysis, solving equations, and producing graphics. R is increasingly used for those involved in data analysis and statistics, particularly social sciences, bio-statistics, and medicine. References on R tend to emphasize different features of R more suitable for social sciences, data mining, ecology, and medicine. A good reference for scientists using numerical methods is available1, and an Analytical Chemistry textbook2 that provides examples in R is available. This manual will present the elements of R gradually, as needed, mostly through chemical data examples. This document itself is created with RMarkdown, which integrates a document creation with R code.
We will write our programs on a online platform (an IDE) called Rstudio Cloud (RStudio Cloud is being renamed Posit Cloud), which makes it platform independent - we access the program the same way on a PC, Mac, or Chromebook. The Rstudio Cloud environment is divided into 4 main sections, with the most important elements listed below.
To start writing a program in Rstudio Cloud:
create a Space (like a file folder that can contain
related programs) , select New project, and from the
menu:
File ——> Newfile —–> Rscript
& start typing!
To run a program, highlight the code and select “run”.
You can create multiple Projcts and R scripts in a single Space. Spaces are displayed on the left and you can create multiple spaces. In the tools icon (upper right) spaces can be shared.
When you run a script program, results and error messages will appear on the console, and plots appear on the plot area.
In this document, the shaded area is R code, and the output appears in a white background following thr shaded R sceipt. The theory is described in the text, and the details of R code are described in the code comment (text preceded by #)
In R we use “<-” to set a variable to a value. R uses “vector” calculation, which avoids use of loops in more traditional programming. For instance, below we set x equal to a series of values (a vector), and then calculate a series of values of y.
# A comment line
x <- 5.0
# Set x equal to 5.0
y <- x^2
# y is equal to x squared.
x <- c(1.0, 2.0, 3.0)
# x is equal to a numbered list of values - a “vector”
x <- seq(1,2,0.2)
# create an incremented sequence
xx <-matrix(c(2,4,6,8),2,2)
# a matrix row 1 = 2 6 row 2 = 4 8
# length(x) returns the length of the vector x.
# x returns all the values of x
length(x)
## [1] 6
x
## [1] 1.0 1.2 1.4 1.6 1.8 2.0
# length(x) returns the length of the vector x.
# x returns all the values of x
xx
## [,1] [,2]
## [1,] 2 6
## [2,] 4 8
# exponents and logs
10^(-2)
## [1] 0.01
# power of 10
exp(1)
## [1] 2.718282
# power of 2.718282 .....
log(10)
## [1] 2.302585
# base e
log10(10) # base 10
## [1] 1
Once we have a series of x and y, of equal length, we can easily create a graph.
x <- seq(0,10,0.5)
# a sequence from 1 to 10, increments 0f 0.5
y <- x^2
# Note that y is calculated for every x.
# This is called vectorized.
plot(x,y)
# Create a plot. We can add a lot of formatting!
Below is an example of basic formatting commands.
plot(x,y,type = "b",main = "A Formatted Graph",
col = "darkblue", xlab = "X Label", ylab = "Y Label")
type=“p”: for points (by default) type=“l”: for lines type=“b”: for both; points are connected by a line
We can define a function and evaluate it later in the R script.
Afunc <- function(x) {x^2 + log(x)}
y <- c(1,2,3)
Afunc(y)
## [1] 1.000000 4.693147 10.098612
# note the vectorized evaluation
With larger data sets, as in a titration, we prefer to be able to read a data file, such as output from a spreadsheet. A common filetype is csv (comma separated values) files. Usually, the first row contains the headers, and them remaining rows the values. If you upload a csv file into your working directory, they can be read into Dataframes. Dataframes are columns of values that do not have to be of the same type (Names, dates, income, etc), and they are very important in many R applications. However, here’ we are focusing on vectors, and we can convert a dataframe with two or more numeric columns (such as volume and pH) into respective vectors. The data file “mydat.csv” looks like this:
vol, pH
1.5 , 3.2
3.0, 6.6
7.0, 14.0
# Mydata <- read.csv("mydat.csv")
# a file with headers vol and pH
# Mydata
# it's a dataframe
# volume <- Mydata$vol
# extract column vol as vector
# pH <- Mydata$pH
# extract column pH as vector
# volume
# pH
# and to write a vector file
# write(volume,file="newfile")
# Exponential fit, US population growth 1790 t0 1850
year <- c(0,10,20,30,40,50,60)
length(year)
## [1] 7
pop <- c( 3.929, 5.308, 7.240, 9.638, 12.866, 17.069, 23.192)
length(pop)
## [1] 7
# A = Ai * exp(-k*t)
tryfit <- nls(pop ~ alpha * exp(k*year) ,
start = c(alpha = 2.0, k = 0.05))
plot(year,pop,main = "US Population Growth, 1790-1850", xlab = "Starting 1790", ylab = "Population / Million")
lines(year,predict(tryfit),col="red")
summary(tryfit)
##
## Formula: pop ~ alpha * exp(k * year)
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## alpha 3.9744064 0.0407277 97.58 2.14e-09 ***
## k 0.0293421 0.0002023 145.02 2.96e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.09817 on 5 degrees of freedom
##
## Number of iterations to convergence: 5
## Achieved convergence tolerance: 5.046e-08
# us population logistic
year <- seq(0,150, by = 10)
year
## [1] 0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150
pop <- c( 3.929, 5.308, 7.240, 9.638, 12.866, 17.069, 23.192, 31.443, 38.558,
50.156, 62.948, 75.996, 91.97, 105.711, 122.775, 131.66)
length(year)
## [1] 16
length(pop)
## [1] 16
pop
## [1] 3.929 5.308 7.240 9.638 12.866 17.069 23.192 31.443 38.558
## [10] 50.156 62.948 75.996 91.970 105.711 122.775 131.660
Pi <- 3.929 # Initial Population
tryfit <- nls((pop ~ (K*Pi)/(Pi + (K - Pi)*exp(-r*year))),
start = list( K = 155, r = 0.03))
summary(tryfit)
##
## Formula: pop ~ (K * Pi)/(Pi + (K - Pi) * exp(-r * year))
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## K 1.882e+02 2.967e+00 63.44 <2e-16 ***
## r 3.158e-02 1.615e-04 195.56 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9336 on 14 degrees of freedom
##
## Number of iterations to convergence: 3
## Achieved convergence tolerance: 1.947e-06
length(predict(tryfit))
## [1] 16
x <- predict(tryfit)
x
## [1] 3.929000 5.346405 7.255018 9.808444 13.195035 17.635352
## [7] 23.370397 30.635572 39.616558 50.388586 62.850930 76.680934
## [13] 91.337289 106.129855 120.342752 133.367268
r <- 0.029
xpo <- 3.97 * exp(r* year)
xpo
## [1] 3.970000 5.305617 7.090573 9.476036 12.664035 16.924565
## [7] 22.618453 30.227923 40.397427 53.988232 72.151357 96.425057
## [13] 128.865097 172.218857 230.158015 307.589498
plot(year,pop,main = "US Population Growth",xlab = "Year, 1790 to 1940", ylab = "Population / Millions")
# plot(year,pop)
lines(year,xpo,col = "blue")
lines(year,predict(tryfit), col = "red")
plot(year,residuals(tryfit))
residuals(tryfit)
## [1] 0.00000000 -0.03840518 -0.01501772 -0.17044402 -0.32903454 -0.56635183
## [7] -0.17839749 0.80742797 -1.05855752 -0.23258554 0.09706985 -0.68493355
## [13] 0.63271051 -0.41885454 2.43224833 -1.70726803
## attr(,"label")
## [1] "Residuals"
# Bacteria Growth, Exponential
minutes <- seq(0,64, by = 16)
length(minutes)
## [1] 5
pop <- c(0.022,0.036,0.060,0.101,0.169)
length(pop)
## [1] 5
tryfit <- nls(pop ~ alpha * exp(k*minutes) ,
start = c(alpha = 0.05, k = 0.05))
plot(minutes,pop,main = "Bacterial Density", xlab = "Minutes", ylab = "Relative Population Density")
lines(minutes,predict(tryfit),col="red")
summary(tryfit)
##
## Formula: pop ~ alpha * exp(k * minutes)
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## alpha 0.0215382 0.0001423 151.3 6.36e-07 ***
## k 0.0321848 0.0001155 278.6 1.02e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.0003288 on 3 degrees of freedom
##
## Number of iterations to convergence: 6
## Achieved convergence tolerance: 1.927e-08
# Logistic
minutes <- seq(0,160, by = 16)
minutes
## [1] 0 16 32 48 64 80 96 112 128 144 160
pop <- c(0.022,0.036,0.060,0.101,0.169,0.266,0.360,0.510,0.704,0.827,0.928)
length(minutes)
## [1] 11
length(pop)
## [1] 11
pop
## [1] 0.022 0.036 0.060 0.101 0.169 0.266 0.360 0.510 0.704 0.827 0.928
Pi <- 0.022 # Initial Population
tryfit <- nls((pop ~ (K*Pi)/(Pi + (K - Pi)*exp(-r*minutes))),
start = list( K = 1.5, r = 0.03))
summary(tryfit)
##
## Formula: pop ~ (K * Pi)/(Pi + (K - Pi) * exp(-r * minutes))
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## K 1.1653827 0.0275806 42.25 1.16e-11 ***
## r 0.0335128 0.0003406 98.39 5.87e-15 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.01226 on 9 degrees of freedom
##
## Number of iterations to convergence: 5
## Achieved convergence tolerance: 3.757e-06
length(predict(tryfit))
## [1] 11
x <- predict(tryfit)
x
## [1] 0.02200000 0.03711208 0.06204156 0.10220017 0.16447787 0.25558274
## [7] 0.37808856 0.52540371 0.68050449 0.82254397 0.93694239
r <- 0.032
xpo <- 0.0215 * exp(r* minutes)
xpo
## [1] 0.02150000 0.03587544 0.05986266 0.09988834 0.16667619 0.27812007
## [7] 0.46407814 0.77437243 1.29213728 2.15609272 3.59771045
plot(minutes,pop,main = "Bacterial Growth",xlab = "Minutes", ylab = "Relative Population Density")
# plot(year,pop)
lines(minutes,xpo,col = "blue")
lines(minutes,predict(tryfit), col = "red")
# ********************************************************
# Simulate population density
This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
summary(cars)
## speed dist
## Min. : 4.0 Min. : 2.00
## 1st Qu.:12.0 1st Qu.: 26.00
## Median :15.0 Median : 36.00
## Mean :15.4 Mean : 42.98
## 3rd Qu.:19.0 3rd Qu.: 56.00
## Max. :25.0 Max. :120.00
You can also embed plots, for example:
Note that the echo = FALSE parameter was added to the
code chunk to prevent printing of the R code that generated the
plot.
O Jones, R Maillardet, A Robinson. Introduction to Scientific Programming and Simulation Using R, CRC Press (2009)↩︎