Every programming language has similar elements. We need to input data, identify variables, perform calculations, create functions, control program flow, and output data and graphics. The R language is attractive in that it has it has a full online development environment -RStudio Cloud- and built in functions for regression analysis, solving equations, and producing graphics. R is increasingly used for those involved in data analysis and statistics, particularly social sciences, bio-statistics, and medicine. References on R tend to emphasize different features of R more suitable for social sciences, data mining, ecology, and medicine. A good reference for scientists using numerical methods is available1, and an Analytical Chemistry textbook2 that provides examples in R is available. This manual will present the elements of R gradually, as needed, mostly through chemical data examples. This document itself is created with RMarkdown, which integrates a document creation with R code.
We will write our programs on a online platform (an IDE) called Rstudio Cloud (RStudio Cloud is being renamed Posit Cloud), which makes it platform independent - we access the program the same way on a PC, Mac, or Chromebook. The Rstudio Cloud environment is divided into 4 main sections, with the most important elements listed below.
To start writing a program in Rstudio Cloud:
create a Space (like a file folder that can contain
related programs) , select New project, and from the
menu:
File ——> Newfile —–> Rscript
& start typing!
To run a program, highlight the code and select “run”.
You can create multiple Projcts and R scripts in a single Space. Spaces are displayed on the left and you can create multiple spaces. In the tools icon (upper right) spaces can be shared.
When you run a script program, results and error messages will appear on the console, and plots appear on the plot area.
In this document, the shaded area is R code, and the output appears in a white background following thr shaded R sceipt. The theory is described in the text, and the details of R code are described in the code comment (text preceded by #)
In R we use “<-” to set a variable to a value. R uses “vector” calculation, which avoids use of loops in more traditional programming. For instance, below we set x equal to a series of values (a vector), and then calculate a series of values of y.
# A comment line
x <- 5.0
# Set x equal to 5.0
y <- x^2
# y is equal to x squared.
x <- c(1.0, 2.0, 3.0)
# x is equal to a numbered list of values - a “vector”
x <- seq(1,2,0.2)
# create an incremented sequence
xx <-matrix(c(2,4,6,8),2,2)
# a matrix row 1 = 2 6 row 2 = 4 8
# length(x) returns the length of the vector x.
# x returns all the values of x
length(x)
## [1] 6
x
## [1] 1.0 1.2 1.4 1.6 1.8 2.0
# length(x) returns the length of the vector x.
# x returns all the values of x
xx
## [,1] [,2]
## [1,] 2 6
## [2,] 4 8
# exponents and logs
10^(-2)
## [1] 0.01
# power of 10
exp(1)
## [1] 2.718282
# power of 2.718282 .....
log(10)
## [1] 2.302585
# base e
log10(10) # base 10
## [1] 1
Once we have a series of x and y, of equal length, we can easily create a graph.
x <- seq(0,10,0.5)
# a sequence from 1 to 10, increments 0f 0.5
y <- x^2
# Note that y is calculated for every x.
# This is called vectorized.
plot(x,y)
# Create a plot. We can add a lot of formatting!
Below is an example of basic formatting commands.
plot(x,y,type = "b",main = "A Formatted Graph",
col = "darkblue", xlab = "X Label", ylab = "Y Label")
type=“p”: for points (by default) type=“l”: for lines type=“b”: for both; points are connected by a line
We can define a function and evaluate it later in the R script.
Afunc <- function(x) {x^2 + log(x)}
y <- c(1,2,3)
Afunc(y)
## [1] 1.000000 4.693147 10.098612
# note the vectorized evaluation
With larger data sets, as in a titration, we prefer to be able to read a data file, such as output from a spreadsheet. A common filetype is csv (comma separated values) files. Usually, the first row contains the headers, and them remaining rows the values. If you upload a csv file into your working directory, they can be read into Dataframes. Dataframes are columns of values that do not have to be of the same type (Names, dates, income, etc), and they are very important in many R applications. However, here’ we are focusing on vectors, and we can convert a dataframe with two or more numeric columns (such as volume and pH) into respective vectors. The data file “mydat.csv” looks like this:
vol, pH
1.5 , 3.2
3.0, 6.6
7.0, 14.0
# Mydata <- read.csv("mydat.csv")
# a file with headers vol and pH
# Mydata
# it's a dataframe
# volume <- Mydata$vol
# extract column vol as vector
# pH <- Mydata$pH
# extract column pH as vector
# volume
# pH
# and to write a vector file
# write(volume,file="newfile")
Every measurement has random error: Error that is inherent in the nature of the measurement.For instance, while the true mass of an object might be 1.0000 g, the actual measured mass will deviate from that value in a random manner, and the measured mass cold well be 1.02 grams, or 0.98 grams. These deviations follow a Gaussian distribution. Random error is also referred to as noise.
We can simulate the results of a number of measurements with the use of the R command rnorm. Below we simulate 10 measured values assuming a true value 0f 1. The rnorm command is useful in simulating relistic data. Using set.seed will reproduce the same set of random numbers.
noisy1 <- rnorm(5,0,0.2)
noisy1
## [1] -0.3895936 -0.1783962 -0.1596842 0.1169115 -0.4421683
noisey2 <- 1 + rnorm(5,0,0.2)
noisey2
## [1] 0.8969841 0.7237467 1.0489504 1.1670184 1.1020308
set.seed(1)
# if set.seed(1) then gives same sequence
noisey3 <- 1.0 + rnorm(5,0,0.2)
noisey3
## [1] 0.8747092 1.0367287 0.8328743 1.3190562 1.0659016
rnorm(n,mean=0,sd=1) Generates Gaussian Random Values
input: n is the number of values, mean, statndard deviation. Default values are 0 for mean and standad deviation of 1.
Comments: Most used here to simulate realitic data by adding noise to and idealized model. If we want a reproducible simulation we can use set.seed.
Significant Figures: The number of significant figures is all the certain figures plus one uncertain figure. The degree of uncertainty in the last digit is ultimately determined by a statistical analysis.
The Mean (or average) of a set of n measurements x is defined:
\[\large\overline{x} = \frac{\sum_{i=1}^{n}x_i}{n}\] The mean and standard deviation can be related to the Guassian distribution - that gives the probability of observing a particular value of x. For a finite number of measurements, the Gaussian distribution can be approximated as:
\[\large y =\frac{1}{s\sqrt(2\pi)}e^\frac{-(x-\overline{x})^2}{2s^2}\] Data that we can collect from individuals or society can roughly follow a Gaussian distribution. In a study3, the average male weight is 76.7 kg with a standard deviation of 12.1 kilograms, while women had an average weight of 61.5 with a standard deviation of 11.1 kg. Substituting this into the Gaussian distribution formula
s <- 12.1
# standard deviation
meanw <- 76.7
# mean weight
xval <- seq(20,100,2)
xval
## [1] 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56
## [20] 58 60 62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94
## [39] 96 98 100
# here we calculate a gaussian distribution
yp <- (1/(s*sqrt(2*pi)))*exp(1)^((-(xval-meanw)^2)/(2*s^2))
yp
## [1] 5.623116e-07 1.203433e-06 2.506117e-06 5.078268e-06 1.001301e-05
## [6] 1.921095e-05 3.586475e-05 6.515106e-05 1.151622e-04 1.980767e-04
## [11] 3.315062e-04 5.398645e-04 8.554856e-04 1.319093e-03 1.979124e-03
## [16] 2.889386e-03 4.104619e-03 5.673812e-03 7.631536e-03 9.988120e-03
## [21] 1.272010e-02 1.576275e-02 1.900677e-02 2.230075e-02 2.546041e-02
## [26] 2.828436e-02 3.057469e-02 3.215974e-02 3.291531e-02 3.278070e-02
## [31] 3.176678e-02 2.995457e-02 2.748450e-02 2.453847e-02 2.131778e-02
## [36] 1.802068e-02 1.482297e-02 1.186409e-02 9.239917e-03 7.002235e-03
## [41] 5.163452e-03
plot(xval,yp,xlab="Weight in kg",ylab="Probability")
# R has a command for Gaussian distribution, dnorm.
dp <- dnorm(xval,meanw,s)
plot(xval,dp,xlab="Weight in kg",ylab="Probability",ylim=c(0,.04),
main="Women (x) and Men (o) Weight Distributions")
fdp <- dnorm(xval,61.5,11.1)
# women weight distribution
points(xval,fdp,pch=4)
# adding points to previous plot
Beer’s law for UV-Vis Spectroscopy is: \[\large Abs = \epsilon_\lambda b c \]
where A = absorbance \(\large\epsilon_\lambda\) = absorption coefficient at a particular wavelength, b = pathlength (typically cm), and c = concentration (usually in molarity or \(\large\mu\)grams)
There is a linear relationship between the abosorbance and the concentration, given a fixed pathlength and wavelength. In any experimental situation, the inherent noise in the data will mean that we need to include a constant (I = intercept).
\[\large Abs = \epsilon_\lambda b c + I\] In order to use Beer’s Law to analyze for a particular component in solution (an ~analyte), we construct a calibration curve using experimentally determined data. The object is to determine the absorption coeficcient, \(\epsilon\). We prepare solutions of varying concentrations of the analyte, and measure the absorbance. According to Beer’s Law, a plot of Absorbance versus concentration should result in a straight line with a slope equal to \(\epsilon\).
Lets take an example. The data comes from the widely use Bradford assay for protein4. In the Bradford assay, a dye (Coomassie brilliant blue G-250) changes from red to blue when attched to proteins, and this color absorbtion at 595 nm) is used to measure protein concentration. Here is sample data for a calibration5 for the BSA (Bovine Serum Albumin) standard.
Conc <- c(2.0, 5.0, 10.0, 15.0, 18.0)
# in micrograms/ml
Abs <- c(0.115,0.266,0.413,0.701,0.811)
# Absorbance
Xdata <- data.frame(Conc,Abs)
# x and y to a data frame
knitr::kable(Xdata[,],
col.names = c('Conc(microg/ml)','Abs'),
caption = "Bradford assay data") # makes a table
| Conc(microg/ml) | Abs |
|---|---|
| 2 | 0.115 |
| 5 | 0.266 |
| 10 | 0.413 |
| 15 | 0.701 |
| 18 | 0.811 |
We use the “lm” command to do linear regression - to find the best fitto the data according to the least squares criterion: the slope (\(\epsilon\)) and intercept are adjusted to minimize the sum of the squares of the deviations of the experimental data from the fitted line.The concept of regression is used throughout this book and is central to analyzing data in chemistry. Models in Chemistry have adjustable parameters - for Beer’s Law they are the absorptivity coefficient (the slope) and the intercept. Regression methods find the “best fit” line: the choice of slope and intercept which generates a line which will have the smallest sum of the squares of the deviations from the experimental data.
BL <- lm(Abs ~ Conc)
# linear regression
summary(BL)
##
## Call:
## lm(formula = Abs ~ Conc)
##
## Residuals:
## 1 2 3 4 5
## 0.0018 0.0223 -0.0482 0.0223 0.0018
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.026200 0.029054 0.902 0.433649
## Conc 0.043500 0.002495 17.435 0.000411 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.03329 on 3 degrees of freedom
## Multiple R-squared: 0.9902, Adjusted R-squared: 0.987
## F-statistic: 304 on 1 and 3 DF, p-value: 0.0004113
# summarize results of linear regression
cf <- coef(BL)
int <- cf[1]
sl <- cf[2]
int
## (Intercept)
## 0.0262
sl
## Conc
## 0.0435
# vector: intercept and slope
plot(Conc, Abs, main = "Beer's Law ", xlab = "",
ylab = "Absorbance", xlim = c(0.0,20.0),
ylim = c(0.0, 1.0),
sub = "Figure 1. Beer's Law Plot for BSA")
title(xlab="Concentration, ug/ml", line=2, cex.lab=1.0)
library(plotrix)
abline(BL,col = "red")
ablineclip(a =0.0, b = 0.03,x1=1,x2=18)
ablineclip
## function (a = NULL, b = NULL, h = NULL, v = NULL, reg = NULL,
## coef = NULL, untf = FALSE, x1 = NULL, x2 = NULL, y1 = NULL,
## y2 = NULL, ...)
## {
## if (!is.null(c(x1, x2, y1, y2))) {
## oldclip <- par("usr")
## if (is.null(x1))
## x1 <- oldclip[1]
## if (is.null(x2))
## x2 <- oldclip[2]
## if (is.null(y1))
## y1 <- oldclip[3]
## if (is.null(y2))
## y2 <- oldclip[4]
## clip(x1, x2, y1, y2)
## abline(h = oldclip[4] + 1)
## clip(x1, x2, y1, y2)
## }
## abline(a = a, b = b, h = h, v = v, reg = reg, coef = coef,
## untf = untf, ...)
## if (!is.null(c(x1, x2, y1, y2)))
## do.call("clip", as.list(oldclip))
## }
## <bytecode: 0x5b629eca3eb0>
## <environment: namespace:plotrix>
# adding lines that show other slopes
From summary(BL) we learn the best fit values for the slope( \(\epsilon\)) and the intercept, the standard deviations of these fitted values, the p values, and the \(r^2\) value: how well a linear model explains the data.
lm(ydata,xdata) Linear regression on a set of paied x and y values.
input: xdata and ydata are the names of x and y vectors previosly defined.
Comments: Output of lm are the intercept and slope of the best fit line, the standard error of these values, and r squared, a measure of how well a linear model explains the data, and the p value indicates the the probability pf thr “null” hypothesis: there is no linear relationship between x and y.
The final step in an analysis is to calculate the concentration of an unknown based on the absorbance measured. Rearranging Beer’s Law, we have, for example, for an absorbance = 0.350
\[\large c = \frac{Abs-I}{\epsilon}\]
val <- (0.350 - cf[1] )/cf[2]
val
## (Intercept)
## 7.443678
# the unknown concentratiom
In a regression analysis, it can be helpful to plot the residuals (the deviations) of the experimental data from the theoretical best fit line (or curve). The residuals should appear randomly distributed above and below the best fit line.
For the Bradford determination, we have:
res <- residuals(BL)
plot(Conc,res,ylab="residuals",ylim=c(0.05,-.05),
main="Residual Plot Bradford Method")
abline(a = 0.0, b = 0.0, col = "blue")
RStudio Cloud contains a powerful document creation tool Rmarkdown, suitable for scientific reports, thesis, publication quality research papers. Some nice features
Creating Simple Report with RMarkdown
In rstudio cloud we can choose file -> new file -> R markdown. A new file (.rmd) will be created which already includes the first part of an rmd file the “YAML” header, which declares the title, and html and pdf outputs.
---
title: "Untitled"
output:
pdf_document: default
html_document: default
date: "2022-10-02"
---
The default created Rmarkdown file also includes examples of simple formatting and embedding r code.
Now you can input and format text, equations, images, and references, and R code. You can view your formatted document: Select “knit” to produce an html or pdf document. For detailed reference to rmarkdown see (xie guide) or a cheatsheet.
Headers and subheaders are created with #, ##, ###.
Header Levels
# Header 1
## Header 2
### Header 3
Below are examples of Rmarkdown code and the output they produce.
**Boldface**
Boldface
[I'm an inline-style link](https://www.google.com)
```r
x <- seq(1,20,1)
y <- x^2
plot(x,y)
```
<img src="R-for-Science_files/figure-html/unnamed-chunk-12-1.png" width="672" />
x <- seq(1,20,1)
y <- x^2
plot(x,y)
For scientific work the ability to insert latex equations is very useful. To insert an inline equation bracket the latex code with dollar signs.
Examples of latex equations
$y^2$
$k_{cat}$
$a \cdot b$
$\frac {a \cdot b}{c}$
$\alpha \cdot \beta$
$$\alpha \cdot \beta$$
which result in:
\(y^2\)
\(k_{cat}\)
\(a \cdot b\)
\(\frac {a \cdot b}{c}\)
\(\alpha \cdot \beta\)
\[\alpha \cdot \beta\] (next line, centered)
After knitting to html, a simple way to share is to publish the document in an rpub account, which provides an attractive html based document.In Rstudiocloud, after you knit to html, choose “publish” and choose rpub and create an account.
# Exponential fit, US population growth 1790 t0 1850
year <- c(0,10,20,30,40,50,60)
length(year)
## [1] 7
pop <- c( 3.929, 5.308, 7.240, 9.638, 12.866, 17.069, 23.192)
length(pop)
## [1] 7
# A = Ai * exp(-k*t)
tryfit <- nls(pop ~ alpha * exp(k*year) ,
start = c(alpha = 2.0, k = 0.05))
plot(year,pop,main = "US Population Growth, 1790-1850", xlab = "Starting 1790", ylab = "Population / Million")
lines(year,predict(tryfit),col="red")
summary(tryfit)
##
## Formula: pop ~ alpha * exp(k * year)
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## alpha 3.9744064 0.0407277 97.58 2.14e-09 ***
## k 0.0293421 0.0002023 145.02 2.96e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.09817 on 5 degrees of freedom
##
## Number of iterations to convergence: 5
## Achieved convergence tolerance: 5.672e-08
# us population logistic
year <- seq(0,150, by = 10)
year
## [1] 0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150
pop <- c( 3.929, 5.308, 7.240, 9.638, 12.866, 17.069, 23.192, 31.443, 38.558,
50.156, 62.948, 75.996, 91.97, 105.711, 122.775, 131.66)
length(year)
## [1] 16
length(pop)
## [1] 16
pop
## [1] 3.929 5.308 7.240 9.638 12.866 17.069 23.192 31.443 38.558
## [10] 50.156 62.948 75.996 91.970 105.711 122.775 131.660
Pi <- 3.929 # Initial Population
tryfit <- nls((pop ~ (K*Pi)/(Pi + (K - Pi)*exp(-r*year))),
start = list( K = 155, r = 0.03))
summary(tryfit)
##
## Formula: pop ~ (K * Pi)/(Pi + (K - Pi) * exp(-r * year))
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## K 1.882e+02 2.967e+00 63.44 <2e-16 ***
## r 3.158e-02 1.615e-04 195.56 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9336 on 14 degrees of freedom
##
## Number of iterations to convergence: 3
## Achieved convergence tolerance: 1.954e-06
length(predict(tryfit))
## [1] 16
x <- predict(tryfit)
x
## [1] 3.929000 5.346405 7.255018 9.808444 13.195035 17.635352
## [7] 23.370397 30.635572 39.616557 50.388586 62.850930 76.680934
## [13] 91.337289 106.129855 120.342752 133.367268
r <- 0.029
xpo <- 3.97 * exp(r* year)
xpo
## [1] 3.970000 5.305617 7.090573 9.476036 12.664035 16.924565
## [7] 22.618453 30.227923 40.397427 53.988232 72.151357 96.425057
## [13] 128.865097 172.218857 230.158015 307.589498
plot(year,pop,main = "US Population Growth",xlab = "Year, 1790 to 1940", ylab = "Population / Millions")
# plot(year,pop)
lines(year,xpo,col = "blue")
lines(year,predict(tryfit), col = "red")
plot(year,residuals(tryfit))
residuals(tryfit)
## [1] 0.00000000 -0.03840518 -0.01501772 -0.17044402 -0.32903453 -0.56635183
## [7] -0.17839748 0.80742798 -1.05855750 -0.23258552 0.09706987 -0.68493353
## [13] 0.63271053 -0.41885454 2.43224831 -1.70726806
## attr(,"label")
## [1] "Residuals"
# Bacteria Growth, Exponential
minutes <- seq(0,64, by = 16)
length(minutes)
## [1] 5
pop <- c(0.022,0.036,0.060,0.101,0.169)
length(pop)
## [1] 5
tryfit <- nls(pop ~ alpha * exp(k*minutes) ,
start = c(alpha = 0.05, k = 0.05))
plot(minutes,pop,main = "Bacterial Density", xlab = "Minutes", ylab = "Relative Population Density")
lines(minutes,predict(tryfit),col="red")
summary(tryfit)
##
## Formula: pop ~ alpha * exp(k * minutes)
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## alpha 0.0215382 0.0001423 151.3 6.36e-07 ***
## k 0.0321848 0.0001155 278.6 1.02e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.0003288 on 3 degrees of freedom
##
## Number of iterations to convergence: 6
## Achieved convergence tolerance: 2.66e-08
# Logistic
minutes <- seq(0,160, by = 16)
minutes
## [1] 0 16 32 48 64 80 96 112 128 144 160
pop <- c(0.022,0.036,0.060,0.101,0.169,0.266,0.360,0.510,0.704,0.827,0.928)
length(minutes)
## [1] 11
length(pop)
## [1] 11
pop
## [1] 0.022 0.036 0.060 0.101 0.169 0.266 0.360 0.510 0.704 0.827 0.928
Pi <- 0.022 # Initial Population
tryfit <- nls((pop ~ (K*Pi)/(Pi + (K - Pi)*exp(-r*minutes))),
start = list( K = 1.5, r = 0.03))
summary(tryfit)
##
## Formula: pop ~ (K * Pi)/(Pi + (K - Pi) * exp(-r * minutes))
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## K 1.1653827 0.0275806 42.25 1.16e-11 ***
## r 0.0335128 0.0003406 98.39 5.87e-15 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.01226 on 9 degrees of freedom
##
## Number of iterations to convergence: 5
## Achieved convergence tolerance: 3.757e-06
length(predict(tryfit))
## [1] 11
x <- predict(tryfit)
x
## [1] 0.02200000 0.03711208 0.06204156 0.10220017 0.16447787 0.25558274
## [7] 0.37808856 0.52540371 0.68050449 0.82254397 0.93694239
r <- 0.032
xpo <- 0.0215 * exp(r* minutes)
xpo
## [1] 0.02150000 0.03587544 0.05986266 0.09988834 0.16667619 0.27812007
## [7] 0.46407814 0.77437243 1.29213728 2.15609272 3.59771045
plot(minutes,pop,main = "Bacterial Growth",xlab = "Minutes", ylab = "Relative Population Density")
# plot(year,pop)
lines(minutes,xpo,col = "blue")
lines(minutes,predict(tryfit), col = "red")
# ********************************************************
# Simulate population density
Sound waves share useful similarities to electromagnetic radiation. The are both represented as travelling sine waves. Sound waves are a periodic disturbance in physical medium (air) whereas em radiation is a periodic change in an electric field.
After a pulse excitation, the nmr signal decay contains that reflect multiple relaxations at different frequencies.
To explore the nature of wave motion and the Fourier Transform we can use audio files and R packages that create, read, and manipuate audio files (.wav).
packages:
fftw
TuneR
Below we used R to generate representations of sound. The graphs are of sound waves of different frequency, as might be detected by a microphone: the plots are amplitude of sound versus time.
library(tuneR,fftw)
Wobj1 <- 0.75*sine(880)
# here is unit = freq not degree in radians
tdir <- tempdir()
tfile <- file.path(tdir, "myWave.wav")
writeWave(Wobj1, filename = tfile)
sdat1 <- tuneR::readWave(tfile)
Wobj2 <- 1.0*sine(4400)
tdir <- tempdir()
tfile2 <- file.path(tdir, "myWave2.wav")
writeWave(Wobj2, filename = tfile2)
sdat2 <- tuneR::readWave(tfile2)
Wobj3 <- 0.5*sine(2000)
tdir <- tempdir()
tfile3 <- file.path(tdir, "myWave3.wav")
writeWave(Wobj3, filename = tfile3)
sdat3 <- tuneR::readWave(tfile3)
N <- 44100
x <- seq(1,44100,1)
w1 <- sdat1@left
w2 <- sdat2@left
w3 <- sdat3@left
wt <- w1 + w2 + w3
times <- x/44100
# same as freq because time = 1 sec
length(times)
## [1] 44100
par(mfrow=c(2,2))
# following plots 2x2
plot(times,w1,type="l",xlab="Time/sec",ylab="Amplitude",xlim=c(0,0.01))
plot(times,w2,type="l",xlab="Time/sec",ylab="Amplitude",xlim=c(0,0.01))
plot(times,w3,type="l",xlab="Time/sec",ylab="Amplitude",xlim=c(0,0.01))
plot(times,wt,type="l",xlab="Time/sec",ylab="Amplitude",xlim=c(0,0.01))
Consider the first graph. Imagine you are the microphone, and you measure amplitude by raising and lowering your hand.
According to the graph estimate how long does it take to go up and down once?
How many times will it go up and down in one second? This is the frequency.
Sound amplitude is related to compression of air. Sketch what compression of air might look like as a function of distance.
What is oscillating up and down in the case of EM radiation?
Now we will consider how the Fourier Transform can translate a time domain representation to a frequency representation. For study of sound, and in spectroscopy, frequency representations are more useful. We use the R fftw package command FFT to do this. Now we have a graph of intensity versus frequency. We call this a transformation from the time domain to the frequency domain. For the individual time domain graphs you could easily sketch a frequency domain representation. But what about the fourth graph? Somehow the FFT is able to unscramble the waves and do the conversion.
ft1 <- abs(fftw::FFT(w1))
ft2 <- abs(fftw::FFT(w2))
ft3 <- abs(fftw::FFT(w3))
ft4 <- abs(fftw::FFT(wt))
par(mfrow=c(2,2))
plot(x,ft1/N,type="l",xlim=c(0,5000))
plot(x,ft2/N,type="l",xlim=c(0,5000))
plot(x,ft3/N,type="l",xlim=c(0,5000))
plot(x,ft4/N,type="l",xlim=c(0,5000))
As we have seen, the FT transforms time domain data to frequency domain. This turns out to be widely important, and is key part of modern NMR (Earning Richard Ernst the Noble Prize), other spectrocopies (FTIR), signal processing, image processing, acoustics, and biology (bioacoustics).
Even though we may not encounter the Fourier Transform in our math courses, we can try to get a conceptual understanding with the math we know. The key is in the idea of orthogonality.
To try to understand what the FFT is doing, we can start with someting familiar, a vector represented on x y coordinates.
Draw a vector v(.75,.50) on an x y grid. Now we know that if we drop a vertical line from the end of the vector, we will find the x component. How d we find the y component? Illustrate both.
In each case, more formally, what you just did was to multiply the vector v(0.75, 0.50) by x and y unit vectors to find the scalar product.
vec1 = v(1,0) vec2 = v(0,1) vec3 <- v(2,3)
Vector multiplication
v(x1,y1) * v(x2,y2) = v(x1x2,y1y2)
Ok, what does this have to do sound waves and the fourier transform. We see that a complex waveform is composed of many individual waves. Now it turns out that the individual waves are orthogonal to each other, and we can use that fact to “pick out” the individual waves using the fourier transform.
Since the math is more complicated, let’s use R to explore.
If we multiply two functions, and find the area under the curve of the resulting function, if it is zero then those are orthogonal.
We can sometimes see if the area under the curve is zero because the positive and negative area are symmetrical.
using R:
graph the product for the following:
sin(x)sin(x) sine(x)sin(2x) sin(x)sin(5x) sin(x)sin(2x)sin(5x)sin(x)
can use estimate the area under the curve of these graph? finite or zero
The fourier transform is simply doing this over and over again, with many frequencies. when it “hits” a part of the curve that matches the frequncy (that is non-orthogonal) it has a non zero result and picks that frequency out.
Here is example of actual soud wave & fft using R.
# **********************************************
# reading a music sample wav file
mwave = "CantinaBand3.wav"
# loading the wav file
data <- tuneR::readWave(mwave)
# sampw = "sample.wav"
# sdata <- tuneR::readWave(sampw)
summary(data)
##
## Wave Object
## Number of Samples: 66150
## Duration (seconds): 3
## Samplingrate (Hertz): 22050
## Channels (Mono/Stereo): Mono
## PCM (integer format): TRUE
## Bit (8/16/24/32/64): 16
##
## Summary statistics for channel(s):
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -8587.000 -568.000 1.000 -2.556 588.000 9002.000
# sampling rate 44,100
ysig <- data@left
N <- length(ysig)
x <- seq(1,N,1)
T <- 6.85
freq <- x/T
length(freq)
## [1] 66150
time <- T/x
ysig <- ysig / 2^(data@bit -1)
plot(time,ysig,xlim=c(0,0.002))
length(time)
## [1] 66150
par(mar=c(1,1,1,1))
# figure margin error fix
ffty <- fftw::FFT(ysig) # discrete fourier transform
amp <- abs(ffty)
# DFT is a sequence of complex sinusoids
plot(freq,amp/N,type="l",xlim = c(0,2000))
Since FT spectrocopies are faster than continous wave spectroscopies, more spectra can be collected, and we can take advantage of signal averaging. Use R to generate noisy sine waves and use signal averaging to see reduction in S/N ratio.
rnorm can be used to add noise to a signal. Here we simply demonstrate the impact of signal averaging by lookind at the standard deviation as a function of number of signals.
after 1 after 10 after 100
how does the average noise change with n (=number of signals)
rn1 <- rnorm(10,1,0.5)
arn1 <-mean(rn1)
arn1
## [1] 1.043315
rn2 <- rnorm(50,1,0.5)
arn2 <-mean(rn2)
arn2
## [1] 1.072812
rn3 <- rnorm(1000,1,0.5)
arn3 <-mean(rn3)
arn3
## [1] 0.9902493
O Jones, R Maillardet, A Robinson. Introduction to Scientific Programming and Simulation Using R, CRC Press (2009)↩︎
[Distribution of body weight and height. W. Millar, Journal of Epidemiology and Community Health, 1986, 40, 319-323]↩︎
Bradford, M.M. (1976), Rapid and sensitive method for the quantitation of microgram quantities of protein utilizing the principle of protein-dye binding, Anal. Biochem., 72 (1–2): 248–254↩︎
Brady and Macnaughtan, Anal Biochem. 2015 Dec 15; 491: 43–51.[Evaluation of Colorimetric Assays for Analyzing Reductively Methylated Proteins: Biases and Mechanistic Insights](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4631703/↩︎