Every programming language has similar elements. We need to input data, identify variables, perform calculations, create functions, control program flow, and output data and graphics. The R language is attractive in that it has it has a full online development environment -RStudio Cloud- and built in functions for regression analysis, solving equations, and producing graphics. R is increasingly used for those involved in data analysis and statistics, particularly social sciences, bio-statistics, and medicine. It is currently available in a free version. References on R tend to emphasize different features of R more suitable for social sciences, data mining, ecology, and medicine. A good reference for scientists using numerical methods is available1, as well as an Analytical Chemistry textbook2. This manual will present the elements of R gradually, as needed, mostly through chemical data examples. This document itself is created with RMarkdown, which integrates a document creation with R code.
We will write our programs on a online platform (an IDE) called Rstudio Cloud, which makes it platform independent - we access the program the same way on a PC, Mac, or Chromebook. The Rstudio Cloud environment is divided into 4 main sections, with the most important elements listed below.
A package is set of programs that R can call, such as a statistical package. There are base packages that come with R, and there are many that have to be added.
To start writing a program in Rstudio Cloud:
from a Workspace (like a file folder that can contain related
programs) , select New project, and fromthe menu:
File ——> Newfile —–> Rscript
& start typing!
To run a program, highlight the code and select “run”,
When you run a script program, results and error messages will appear on the console, and plots appear on the plot area.
The great escape. Sometimes a program doesn’t run & only commands show up on the console. Put the cursor on the console and press the esc key to get out of this.
In this document, the shaded area is R code, and the white background appears as output from a previous R command.
# A comment line
x <- 5.0 # Set x equal to 5.0
y <- x^2 # y is equal to x squared.
x <- c(1.0, 2.0, 3.0) # x is equal to a numbered list of values - a “vector”
x <- seq(1,2,0.2) # create an incremented sequence
xx <-matrix(c(2,4,6,8),2,2) # a matrix row 1 = 2 6 row 2 = 4 8
# length(x) returns the length of the vector x.
# print(x) # print(x) or simply x returns all the values of x
length(x)
## [1] 6
print(x)
## [1] 1.0 1.2 1.4 1.6 1.8 2.0
x
## [1] 1.0 1.2 1.4 1.6 1.8 2.0
xx
## [,1] [,2]
## [1,] 2 6
## [2,] 4 8
Here is an example of a simple R program that plots \(y = x^2\)
x <- seq(0,10,0.5) # a sequence from 1 to 10, increments 0f 0.5
y <- x^2 # Note that y is calculated for every x. This is called vectorized.
plot(x,y) # Create a plot. We can add a lot of formatting!
We will want to custom format our plots: Here is an example!
plot(x,y,type = "b",main = "A Formatted Graph",col = "darkblue", xlab = "X Label", ylab = "Y Label")
grid (NULL,NULL, lty = 1, col = "lightgreen") # to add a grid
Later, you may want to learn to use ggplot, which is very popular but which uses something called dataframes. For beginners, I suggest using the simpler “plot” command.
mydat <- data.frame(x,y) # creating a data frame
library(ggplot2)
ggplot(mydat, aes(x,y)) + geom_point() + xlab("New Label")
Every measurement has random error: Error that is inherent in the nature of the measurement.
Significant Figures: The number of significant figures is all the certain figures plus one uncertain figure. The degree of uncertainty in the last digit is ultimately determined by a statistical analysis.
The Mean (or average) of a set of n measurements x is defined:
\[\large\overline{x} = \frac{\sum_{i=1}^{n}x_i}{n}\] and the Standard Deviation is:
\[\large s = \large\sqrt\frac{\sum(x_i - \overline{x})^2}{n-1} \]
The mean and standard deviation can be related to the Guassian distribution - that gives the probability of observing a particular value of x. For a finite number of measurements, the Gaussian distribution can be approximated as:
\[\large y =\frac{1}{s\sqrt(2\pi)}e^\frac{-(x-\overline{x})^2}{2s^2}\]
According to one reference, the average (mean) male weight is 199.8 pounds with a standard deviation of 29 pounds. Substituting this into the Gaussian distribution formula
s <- 29
xmean <- 199.8
xval <- seq(72,282,5)
xval
## [1] 72 77 82 87 92 97 102 107 112 117 122 127 132 137 142 147 152 157 162
## [20] 167 172 177 182 187 192 197 202 207 212 217 222 227 232 237 242 247 252 257
## [39] 262 267 272 277 282
# here we calculate a gaussian distribution
yp <- (1/(s*sqrt(2*pi)))*exp(1)^((-(xval-xmean)^2)/(2*s^2))
yp
## [1] 8.343582e-07 1.757433e-06 3.593311e-06 7.131828e-06 1.374032e-05
## [6] 2.569702e-05 4.665074e-05 8.220994e-05 1.406307e-04 2.335209e-04
## [11] 3.764101e-04 5.889614e-04 8.945452e-04 1.318887e-03 1.887570e-03
## [16] 2.622336e-03 3.536418e-03 4.629443e-03 5.882796e-03 7.256528e-03
## [21] 8.688880e-03 1.009924e-02 1.139472e-02 1.247982e-02 1.326793e-02
## [26] 1.369266e-02 1.371710e-02 1.333911e-02 1.259161e-02 1.153787e-02
## [31] 1.026267e-02 8.861035e-03 7.426748e-03 6.042308e-03 4.771962e-03
## [36] 3.658314e-03 2.722419e-03 1.966613e-03 1.379027e-03 9.386779e-04
## [41] 6.202265e-04 3.978084e-04 2.476781e-04
plot(xval,yp)
# R has a command for Gaussian distribution, dnorm.
dp <- dnorm(xval,199.8,29) # we can also get probability distribution using the command dnorm distribution
plot(xval,dp,type="l",col="red",xlab="Weight: women/blue men/red", ylab="Probability")
fdp <- dnorm(xval,170.8,29) # Average Weight, women
lines(xval,fdp,col="blue")
Beer’s law for UV-Vis Spectroscopy is: \[\large Abs = \epsilon_\lambda b c \]
where A = absorbance \(\epsilon_\lambda\) = absorption coefficient at a particular wavelength b = pathlength (typically cm) c = concentration (usually in molarity or \(\mu\)grams)
There is a linear relationship between the abosorbance and the concentration, given a fixed pathlength and wavelength. In any experimental situation, the inherent noise in the data will mean that we need to include a constant (I = intercept).
\[\large Abs = \epsilon_\lambda b c + I\] In order to use Beer’s Law to analyze for a particular component in solution (an ~analyte), we construct a calibration curve (in this case, a line!) using experimentally determined data. The object is to determine the absorption coeficcient, \(\epsilon\). We prepare solutions of varying concentrations of the analyte, and measure the absorbance. According to Beer’s Law, a plot of Absorbance versus concentration should result in a straight line with a slope equal to \(\epsilon\).
Lets take an example. The data comes from the widely use Bradford assay for protein3. In the Bradford assay, a dye (Coomassie brilliant blue G-250) changes from red to blue when attched to proteins, and this color absorbtion at 595 nm) is used to measure protein concentration. Here is sample data for a calibration4 for the BSA (Bovine Serum Albumin) standard.
Conc <- c(2.0, 5.0, 10.0, 15.0, 18.0) # [BSA], in micrograms / ml
Abs <- c(0.115,0.266,0.413,0.701,0.811) # Absorbance
Xdata <- data.frame(Conc,Abs)
library(flextable)
ft <- flextable(Xdata)
ft <- theme_vanilla(ft)
set_flextable_defaults(fonts_ignore=TRUE)
ft
Conc | Abs |
2 | 0.115 |
5 | 0.266 |
10 | 0.413 |
15 | 0.701 |
18 | 0.811 |
We use the “lm” command to do linear regression - to find the best fit to the data according to the least squares criterion: the slope (\(\epsilon\)) and intercept are adjusted to minimize the sum of the squares of the deviations of the experimental data from the fitted line .
BL <- lm(Abs ~ Conc) # linear regression, lm(y(x)~x)
summary(BL) # summarize results of linear regression
##
## Call:
## lm(formula = Abs ~ Conc)
##
## Residuals:
## 1 2 3 4 5
## 0.0018 0.0223 -0.0482 0.0223 0.0018
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.026200 0.029054 0.902 0.433649
## Conc 0.043500 0.002495 17.435 0.000411 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.03329 on 3 degrees of freedom
## Multiple R-squared: 0.9902, Adjusted R-squared: 0.987
## F-statistic: 304 on 1 and 3 DF, p-value: 0.0004113
coef(BL)
## (Intercept) Conc
## 0.0262 0.0435
plot(Conc, Abs, main = "Beer's Law ", xlab = "", ylab = "Absorbance", xlim = c(0.0,20.0), ylim = c(0.0, 1.0), sub = "Figure 1. Beer's Law Plot for BSA")
title(xlab="Concentration, ug/ml", line=2, cex.lab=1.0)
abline(a = 0.0262, b = 0.0435, col = "blue") # add a line with intercept = 0.262 and slope = 0.0435, taken from regression.
fitted <- coef(BL)
int <- fitted[1]
slope <- fitted[2]
slope
## Conc
## 0.0435
Coomassie Blue. Associates with Proteins through non-covalent interactions
From summary(BL) we learn the best fit values for the slope( \(\epsilon\)) and the intercept, the standard deviations of these fitted values, the p values, and the r value - which should be close to 1.
The final step in an analysis is to calculate the concentration of an unknown based on the absorbance measured. Rearranging Beer’s Law, we have, for example, for an absorbance = 0.350
\[\large c = \frac{Abs-I}{\epsilon}\]
The unknown concentration is 7.45 micrgrams/ml.
In a regression analysis, it can be helpful to plot the residuals (the deviations) of the experimental data from the theoretical best fit line (or curve). The residuals should appear randomly distributed above and below the best fit line.
We can simulate an example, based on the Bradfoed assay. The simulated data exaggerates the magnitude of typical deviations to help visualize.
Scon <- seq(0.5,20,0.5)
length(Scon)
## [1] 40
rnoise <- rnorm(40,0,0.1)
length(rnoise)
## [1] 40
Sabs <- 0.0435 * Scon + 0.026 + rnoise
plot(Scon,Sabs)
SBL <- lm(Sabs ~ Scon)
summary(SBL)
##
## Call:
## lm(formula = Sabs ~ Scon)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.32399 -0.07004 -0.00103 0.07351 0.31633
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.063239 0.037217 1.699 0.0975 .
## Scon 0.040869 0.003164 12.918 1.77e-15 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1155 on 38 degrees of freedom
## Multiple R-squared: 0.8145, Adjusted R-squared: 0.8096
## F-statistic: 166.9 on 1 and 38 DF, p-value: 1.775e-15
abline(a = 0.052, b = 0.042, col = "blue")
plot(Scon,residuals(SBL))
abline(a = 0.0, b = 0.0, col = "blue")
plot(BL$residuals, pch = 16, col = “red”)
Find real world data that you expect to follow a linear relationship. Data points should be > 10.
Enter the x y data as vector in R. xval <- c(1,3,5)
Use lm command to perform linear regression.
Summarize the results.
Plot the data and the best fit line.
Analysis & Interpretation. p value for intercept and slope.
Make a residual plot. Is it consistent with linear relationship?
In solving chemical equilibrium, we often make approximation to simplify the mathematical solution. For instance, the approximation of negligible dissociation of weak acids. Here, we will explore the use of R to solve equilibrium without approximations. We find the equation to solve through aplication of a systematic analysis of equilibrium, which takes into account the equilibrium, charge balance, and mass balance expressions.
We can show four equations:
HF dissociation
\(HF \rightleftarrows H^+ + F^-\)
\(K_a = \large \frac{[H^+][F^-]}{[HF]} = 6.8 \times 10^{-4}\)
Water Dissociation
\(H_2O \rightleftarrows H^+ + OH^-\)
\(K_w = [H^+][OH^-]] = 10^{-14}\)
Mass Balance
\([HF]_i = [HF] + [F^-]\)
Charge Balance
\([H^+] = [F^-]+ [OH^-]\)
Successively substituting the mass balance, charge balance, and \(K_w\) into the dissociation expression:
eliminate \([HF]\)
\(K_a = \large \frac{[H^+][F^-]}{[HF]_i-[[F^-]]}\)
eliminate \([F^-]\)
\(K_a = \large \frac{[H^+]([H^+]-[OH^-])}{[HF]_i-[H^+]+[OH^-]}\)
eliminate \([OH^-]\)
\(K_a = \large \frac{[H^+]([H^+]-\frac{K_w}{[H^+]})}{[HF]_i-[H^+]+\frac{K_w}{[H^+]}}\)
Finally, simplifying a bit:
\(K_a\times([HF]_i[H^+]-[H^+]^2+K_w)-([H^+]^3-K_w[H^+] = 0\)
We can solve this for \([H^+]\) using the uniroot command.
Ka <- 6.8 * 10^-4
Kw <- 10^(-14)
CHF <- 0.0010
Ka
## [1] 0.00068
Kw
## [1] 1e-14
HFFunc <- function(H) {
Ka * (CHF*H - H^2 + Kw) -(H^3-Kw*H)
}
Hroot <- uniroot(HFFunc,c(10^-6,10^-2),tol = 10^-18)
hC <- Hroot$root
hC
## [1] 0.0005519641
solpH <- -log(hC,base = 10)
solpH
## [1] 3.258089
# compare with approximate method
hapx <- sqrt(0.001 * Ka)
hapx
## [1] 0.0008246211
pHapx <- -log(hapx,base=10)
pHapx
## [1] 3.083746
We can also make a graphical illustration of root finding. We can explore the value of HFFunc for values of H+ ariund the root.
phseq <- seq(3.2,3.4,0.01)
phseq
## [1] 3.20 3.21 3.22 3.23 3.24 3.25 3.26 3.27 3.28 3.29 3.30 3.31 3.32 3.33 3.34
## [16] 3.35 3.36 3.37 3.38 3.39 3.40
hval <- 10^-phseq
hval
## [1] 0.0006309573 0.0006165950 0.0006025596 0.0005888437 0.0005754399
## [6] 0.0005623413 0.0005495409 0.0005370318 0.0005248075 0.0005128614
## [11] 0.0005011872 0.0004897788 0.0004786301 0.0004677351 0.0004570882
## [16] 0.0004466836 0.0004365158 0.0004265795 0.0004168694 0.0004073803
## [21] 0.0003981072
val <- HFFunc(hval)
val
## [1] -9.285051e-11 -7.366706e-11 -5.592871e-11 -3.954115e-11 -2.441606e-11
## [6] -1.047071e-11 2.372399e-12 1.418583e-11 2.503756e-11 3.499124e-11
## [11] 4.410651e-11 5.243921e-11 6.004165e-11 6.696282e-11 7.324858e-11
## [16] 7.894192e-11 8.408307e-11 8.870971e-11 9.285714e-11 9.655839e-11
## [21] 9.984442e-11
plot(phseq,val)
# numerical soution of titration weak acid / strong base
# derivation of acid base titratin curve in Harris
# used sapply to run unirooot for different values
# generate full titration curve
cA = 0.100 # concentration acid Molar
cB = 0.100 # base Molar
VA = 25.00 # volume of acid ml
Ka = 1.00 * 10^(-3)
Kw = 1.00 * 10^(-14)
Kw
## [1] 1e-14
VB <- 0
froot <- function(VB,H)
{ VB - (Ka/(H + Ka) - (H - Kw/H)/cA) /
(1.00 + (H - Kw/H)/cB) * cA*VA/cB }
numb <- sapply(seq(0,30,1), function(VB) uniroot(froot, c(10^-12,10^-1),tol = 10^-16, VB=VB)$root)
VB <- seq(0,30,1)
numb
## [1] 9.512492e-03 7.485454e-03 5.938106e-03 4.767242e-03 3.878322e-03
## [6] 3.195567e-03 2.662759e-03 2.239722e-03 1.898090e-03 1.617788e-03
## [11] 1.384459e-03 1.187692e-03 1.019821e-03 8.751112e-04 7.492070e-04
## [16] 6.387531e-04 5.411300e-04 4.542669e-04 3.765081e-04 3.065155e-04
## [21] 2.431973e-04 1.856545e-04 1.331406e-04 8.503107e-05 4.079969e-05
## [26] 1.400290e-08 5.100016e-12 2.600001e-12 1.766666e-12 1.350022e-12
## [31] 1.099991e-12
length(numb)
## [1] 31
ph <- -log10(numb)
ph
## [1] 2.021706 2.125782 2.226352 2.321733 2.411356 2.495452 2.574668
## [8] 2.649806 2.721683 2.791078 2.858720 2.925296 2.991476 3.057937
## [15] 3.125398 3.194667 3.266698 3.342689 3.424226 3.513548 3.614041
## [22] 3.731295 3.875690 4.070422 4.389343 7.853782 11.292428 11.585027
## [29] 11.752846 11.869659 11.958611
VB
## [1] 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
## [26] 25 26 27 28 29 30
length(VB)
## [1] 31
length(ph)
## [1] 31
plot(VB,ph,xlab = "volume of base/ml", main = "Titration")
Kh <- 0.0034
Khyd <- 0.0026
Ka1 <- 1.67 * 10^-4
Ka2 <- 5.6 * 10^-11
Kw <- 1.00 * 10^-14
ca <- 1.41 * 10^-5 # concentration of co2(aq)
Kh
## [1] 0.0034
Khyd
## [1] 0.0026
Ka1
## [1] 0.000167
Ka2
## [1] 5.6e-11
Kw
## [1] 1e-14
ca
## [1] 1.41e-05
froot3 <- function(H)
{H^3 - Kw*H - Ka1 * Khyd * ca * H - 2.0 * Ka1 * Ka2 * Khyd * ca}
X <- uniroot(froot3, c(10^(-6),10^(-4)), tol = 10^(-18))$root
X
## [1] 2.476388e-06
pH <- -log10(X)
pH
## [1] 5.606181
# graph of froot3 in area around solution
ph <- seq(5.5,5.7,0.02)
ph
## [1] 5.50 5.52 5.54 5.56 5.58 5.60 5.62 5.64 5.66 5.68 5.70
h <- 10^-ph
h
## [1] 3.162278e-06 3.019952e-06 2.884032e-06 2.754229e-06 2.630268e-06
## [6] 2.511886e-06 2.398833e-06 2.290868e-06 2.187762e-06 2.089296e-06
## [11] 1.995262e-06
val <- froot3(h)
val
## [1] 1.223031e-17 9.022593e-18 6.302128e-18 4.002739e-18 2.066941e-18
## [6] 4.448060e-19 -9.070142e-19 -2.026146e-18 -2.945236e-18 -3.692601e-18
## [11] -4.292791e-18
plot(ph,val)
# the pH of standing water is 5.6
## volume as function pH
# Generating Titration Curve
# pH f vol
cA = 0.100 # concentration acid Molar
cB = 0.100 # base Molar
VA = 25.00 # volume of acid ml
Ka = 1.38 * 10^(-4)
Kw = 1.00 * 10^(-14)
pH <- seq(3,12,0.5)
pH
## [1] 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5 9.0 9.5 10.0
## [16] 10.5 11.0 11.5 12.0
# try using sequence
length(pH)
## [1] 19
H <- 10^-pH
VB <- (Ka/(H + Ka) - (H - Kw/H)/cA) /
(1.00 + (H - Kw/H)/cB) * cA*VA/cB
VB
## [1] 2.754094 7.492558 14.456342 20.324917 23.305981 24.438395 24.819651
## [8] 24.942701 24.981897 24.994415 24.998684 25.001007 25.004819 25.015759
## [15] 25.050032 25.158610 25.505049 26.632771 30.555555
pH
## [1] 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5 9.0 9.5 10.0
## [16] 10.5 11.0 11.5 12.0
plot(VB,pH,main = "Weak Acid Titration",
xlab = "Volume Base / ml", xlim = c(0,35),ylim = c(2,13) )
cA = 0.100 # concentration acid Molar
cB = 0.100 # base Molar
VA = 25.00 # volume of acid ml
Ka = 1.00 * 10^-4
Kw = 1.00 * 10^-14
VB = 12.500 # midpoint test
froot <- function(H) {
VB - (Ka/(H + Ka) - (H - Kw/H)/cA) /
(1.00 + (H - Kw/H)/cB) * cA*VA/cB }
# VB - (1.00 + (H*cB - Kw/H)/cB) * (Ka/(H + Ka) - (H - Kw/H)/cA) * cA*VA/cB
X <- uniroot(froot, c(10^(-12),10^(-3)), tol = 10^(-14))
X
## $root
## [1] 9.940534e-05
##
## $f.root
## [1] 3.215206e-12
##
## $iter
## [1] 10
##
## $init.it
## [1] NA
##
## $estim.prec
## [1] 5.000042e-15
# numerical soution of titration weak acid / strong base
# derivation of acid base titratin curve in Harris
# used sapply to run unirooot for different values
# generate full titration curve
cA = 0.100 # concentration acid Molar
cB = 0.100 # base Molar
VA = 25.00 # volume of acid ml
Ka = 1.00 * 10^(-3)
Kw = 1.00 * 10^(-14)
Kw
## [1] 1e-14
VB <- 0
froot <- function(VB,H)
{ VB - (Ka/(H + Ka) - (H - Kw/H)/cA) /
(1.00 + (H - Kw/H)/cB) * cA*VA/cB }
numb <- sapply(seq(0,30,1), function(VB) uniroot(froot, c(10^-12,10^-1),tol = 10^-16, VB=VB)$root)
# sapply can use sequential functions
VB <- seq(0,30,1)
numb
## [1] 9.512492e-03 7.485454e-03 5.938106e-03 4.767242e-03 3.878322e-03
## [6] 3.195567e-03 2.662759e-03 2.239722e-03 1.898090e-03 1.617788e-03
## [11] 1.384459e-03 1.187692e-03 1.019821e-03 8.751112e-04 7.492070e-04
## [16] 6.387531e-04 5.411300e-04 4.542669e-04 3.765081e-04 3.065155e-04
## [21] 2.431973e-04 1.856545e-04 1.331406e-04 8.503107e-05 4.079969e-05
## [26] 1.400290e-08 5.100016e-12 2.600001e-12 1.766666e-12 1.350022e-12
## [31] 1.099991e-12
length(numb)
## [1] 31
ph <- -log10(numb)
ph
## [1] 2.021706 2.125782 2.226352 2.321733 2.411356 2.495452 2.574668
## [8] 2.649806 2.721683 2.791078 2.858720 2.925296 2.991476 3.057937
## [15] 3.125398 3.194667 3.266698 3.342689 3.424226 3.513548 3.614041
## [22] 3.731295 3.875690 4.070422 4.389343 7.853782 11.292428 11.585027
## [29] 11.752846 11.869659 11.958611
VB
## [1] 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
## [26] 25 26 27 28 29 30
length(VB)
## [1] 31
length(ph)
## [1] 31
plot(VB,ph,xlab = "volume of base/ml", main = "Titration")
Binding Curves (or Binding Assays) are an approach to determining the equilibrium constant for the interaction of two molecules, commonly used in biochemistry5. Typically, the a small molecule (commonly called the “ligand”) attaches to a larger molecule such as a protein. If we denote the protein as P and the ligand as L we can express the dissociation as:
\(PL \rightleftharpoons P + L\)
and where the total protein is the bound plus unbound.
\(P_{tot} = PL + P\)
With the dissociation constant:
\(K_d = \frac{[P][L]}{[PL]}\)
The concentration of the ligand (in excess) is varied and the fraction bound \(\frac{[PL]}{P_{tot}}\) is plotted versus the L6.
Substituting \(P_{tot} = P + PL\) into the \(K_d\) expression:
we get
\(K_d = \frac{([P_{tot}]-[PL])[L]}{PL]} = \frac{([P_{tot}][L]-[PL]{L]}}{[PL]}\)
\(K_d[PL] + [PL][L] = [P_{tot}][L]\)
\([PL](K_d +[L]) = [P_{tot}][L]\)
and we get expression for fraction bound as a function of \(K_d\) and L.
\(\frac{[PL]}{P_{tot}} = \frac{[L]}{K_d + [L]}\)
note as L increases the fraction bound goes from 0 to 1.
The key to binding assay is to find an experimental method to determine the concentration of PL, the bound form.
Although it is called a "binding assay" the equilibrium is discussed in terms of the dissociation, with $K_d$
knitr::include_graphics('250px-Myoglobin.png')
Myoglobin Binding
# Nonlinear Regression / Binding Curve
# simulate noisy data for myoglobin KD = 0.26 [L] in kPa
# data generated with random noise added
library(nls2)
## Loading required package: proto
KD <- 0.26 # initializing variables
L <- 0.0
y <- 0.0
L <- seq(.1,1.5,0.1) # set of 15 ligand concentrations
rnd <- rnorm(15,0,0.04) # set of 15 random values
y <- L/(KD+L) + rnd # binding curve fuction with noise
plot(L,y, main = "Simulated binding data", xlab = "pO2 Kpa",
ylab = "Fractional Binding")
tryfit <- nls2(y ~ L/(KD+L),
start = c(KD = 5))
summary(tryfit)
##
## Formula: y ~ L/(KD + L)
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## KD 0.24041 0.01382 17.39 7.07e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.04139 on 14 degrees of freedom
##
## Number of iterations to convergence: 7
## Achieved convergence tolerance: 7.26e-06
plot(L,y, main = "Myoglobin Binding Curve", xlab = "pO2 Kpa", ylab = "Fractional Binding")
lines(L,predict(tryfit), col = "blue") # add a line fitted to points
# Exponential fit, US population growth 1790 t0 1850
year <- c(0,10,20,30,40,50,60)
length(year)
## [1] 7
pop <- c( 3.929, 5.308, 7.240, 9.638, 12.866, 17.069, 23.192)
length(pop)
## [1] 7
# A = Ai * exp(-k*t)
tryfit <- nls2(pop ~ alpha * exp(k*year) ,
start = c(alpha = 2.0, k = 0.05))
plot(year,pop,main = "US Population Growth, 1790-1850", xlab = "Starting 1790", ylab = "Population / Million")
lines(year,predict(tryfit),col="red")
summary(tryfit)
##
## Formula: pop ~ alpha * exp(k * year)
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## alpha 3.9744064 0.0407277 97.58 2.14e-09 ***
## k 0.0293421 0.0002023 145.02 2.96e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.09817 on 5 degrees of freedom
##
## Number of iterations to convergence: 5
## Achieved convergence tolerance: 5.046e-08
# test of sublex (nelder mead variation)
# use monte carlo to get error estimate
# compare with nls2
re <- 0
# generate exponential data
xv <- seq(0,9,1)
alpha <- 7.5
k <- 0.2
y <- alpha*exp(k*xv)
y
## [1] 7.500000 9.160521 11.188685 13.665891 16.691557 20.387114 24.900877
## [8] 30.414000 37.147743 45.372356
plot(xv,y)
length(xv)
## [1] 10
rn <- rnorm(10,0,2)
y <- y + rn # exp with noise
plot(xv,y)
yfit <- 0
library(nls2)
yfit <- nls2(y~alpha*exp(k*xv),start=c(alpha=5,k=0.5))
yfit # nls or nls2
## Nonlinear regression model
## model: y ~ alpha * exp(k * xv)
## data: <environment>
## alpha k
## 7.3450 0.2025
## residual sum-of-squares: 31.85
##
## Number of iterations to convergence: 7
## Achieved convergence tolerance: 3.089e-06
# nelder mead function to be minimized
library(subplex)
# exponential fit using subplex
# this looks similar to uniroot can do sapply?
# y is previously defined, data set
myfun <- function (x) {
x1 <- x[1]
x2 <- x[2]
sum((y - x1*exp(x2*xv))^2)
}
re <- subplex(par=c(1,6),fn=myfun)
re
## $par
## [1] 7.3450388 0.2024967
##
## $value
## [1] 31.84612
##
## $counts
## [1] 492
##
## $convergence
## [1] 0
##
## $message
## [1] "success! tolerance satisfied"
##
## $hessian
## NULL
# this method described in phd thesis
# can we iterate this / sapply to do monte carlo
# or solve equations
Introduction to Scientific Programming and Simulation Using R, CRC Press (2014)↩︎
Bradford, M.M. (1976), Rapid and sensitive method for the quantitation of microgram quantities of protein utilizing the principle of protein-dye binding, Anal. Biochem., 72 (1–2): 248–254↩︎
See Pollard, Thomas (2017). A Guide to Simple and Informative Binding Assays. Molecular Biology of the Cell 21(23)↩︎
While the mathematics of equilibrium is similar to a simple acid, note the difference in a titration curve, where the -log(H) is measured↩︎