Bio23: Analyzing sodium bioreporter data

Using this page

Below, you’ll see text instructions and code snippets. You should be able to directly copy and paste code into the R terminal and execute it. In some places, you’ll see code snippets where you need to enter your information, then copy and paste. For example, imagine that we wanted you to assign a number to a variable called sample. Then we would give you a prompt like this:

sample <-  # Enter your number here.

Say you wanted to assign the number 9 to the variable sample. You would do so like this:

sample <- 9

If you prefer, you can also download a .Rmd file with the same information from Sakai.

Import your data and get set up

Download the text data file from the location specified by the instructor. Save the file to the same directory you are working with R from. To load it into R, use the read.csv command as shown below. Note that you need to replace “Name Of Your Data File” with the actual file name, also in quotes.

specData <- read.csv("Name Of Your Data File", header = FALSE, sep = ",")
saltConcentrations <- c(0:5/2)

How many columns of salt standards are there on the plate? (This is probably the same as the number of groups in the class.)

numSaltStandards <-  #Enter the number here

What column are your salt standards in?

myCol <-  #Enter number here

The code below will format the data table and set up R to analyze your team’s values.

stackedSpecData <- cbind(stack(specData[1:6,1:numSaltStandards]), rep(saltConcentrations, times = numSaltStandards)) #This line creates a new data frame with the spectrophotemetry data and salt concentrations of the standards.
colnames(stackedSpecData) <- c("values", "colFromPlate", "saltConc") #This line gives the new data frame sensable column names.
myValues <- stackedSpecData$values[stackedSpecData$colFromPlate == paste("V",myCol, sep="")] #This creates a variable with the spectrophotemtry data (blueness) of only your group's standards.

The code below calculates the mean and standard errors of the sodium standards.

specMeans <- tapply(X = stackedSpecData$values, INDEX = stackedSpecData$saltConc, FUN = mean, na.rm=TRUE) #The function tapply 
specSD <- tapply(X = stackedSpecData$values, INDEX = stackedSpecData$saltConc, FUN = sd, na.rm=TRUE)
lengthNoNa <- function(v) length(v[!is.na(v)])
specNs <- tapply(X = stackedSpecData$values, INDEX = stackedSpecData$saltConc, FUN = lengthNoNa)
specSE <- specSD/sqrt(specNs)

Plot the class data and your team’s data

The code below will plot the mean of the class data in black dots, and your team’s data in gray dots. The error bars represent the standard error of the mean of the class data.

plot(specMeans ~ saltConcentrations, pch = 16, cex=2, xlab="Salt Concentration", ylab="Normalized blueness", ylim=c(.9,1.4)) #You can adjust the y-axis by changing the values in ylim
  arrows(x0 = saltConcentrations, y0 = specMeans, x1 = saltConcentrations, y1 = specMeans + specSE, length = .04, angle = 90)
  arrows(x0 = saltConcentrations, y0 = specMeans, x1 = saltConcentrations, y1 = specMeans - specSE, length = .04, angle = 90)
  points(saltConcentrations, myValues, pch=16, col="gray")

Fit a line to the relationship bewtween salt concentration and blueness

Fit a linear regression to the whole data set and see the slope and intercept of the line and R^2 value. As you are learning this week in Math35, regression calculates the line that best fits the data. Below, you’ll see the intercept and the slope of that line. You’ll also see the R^2 value, which describes how well a line describes the data points. R^2 values range from 1 (the data fit perfectly on a line) to 0 (the data are a cloud of points with no linear trend.)

salt.lm <- lm(values ~ saltConc, data=stackedSpecData)

summary(salt.lm)$coefficients[1] #The intercept
summary(salt.lm)$coefficients[2] #The slope
summary(salt.lm)$r.squared # The R-squared value

The code below plots the regression line on top of the data.

plot(specMeans ~ saltConcentrations, pch = 16, cex=2, xlab="Salt Concentration", ylab="Normalized blueness", ylim=c(.9,1.4)) #You can adjust the y-axis by changing the values in ylim
  arrows(x0 = saltConcentrations, y0 = specMeans, x1 = saltConcentrations, y1 = specMeans + specSE, length = .04, angle = 90)
  arrows(x0 = saltConcentrations, y0 = specMeans, x1 = saltConcentrations, y1 = specMeans - specSE, length = .04, angle = 90)
  points(saltConcentrations, myValues, pch=16, col="gray") 
abline(salt.lm)

If you’ve learned about regression diagnostics in Math 35, you are welcome to examine them, although this is not required.

Fitting a line to only your team’s data

You may be interested in seeing how regression performs on just your team’s data. To do this, run the code below:

mySalt.lm <- lm(myValues ~ saltConcentrations)

summary(mySalt.lm)$coefficients[1] #The intercept
summary(mySalt.lm)$coefficients[2] #The slope
summary(mySalt.lm)$r.squared # The R-squared value

Plot the regression line with just your team’s data set.

plot(specMeans ~ saltConcentrations, pch = 16, cex=2, xlab="Salt Concentration", ylab="Normalized blueness", ylim=c(.9,1.4)) #You can adjust the y-axis by changing the values in ylim
  arrows(x0 = saltConcentrations, y0 = specMeans, x1 = saltConcentrations, y1 = specMeans + specSE, length = .04, angle = 90)
  arrows(x0 = saltConcentrations, y0 = specMeans, x1 = saltConcentrations, y1 = specMeans - specSE, length = .04, angle = 90)
  points(saltConcentrations, myValues, pch=16, col="gray") 
abline(mySalt.lm)

Fiting the line with only a subset of the data

If you want to limit the regression to only part of the data set, you can use the subset argument to the function lm. For example, to only use points with a salt concentration of 1% and higher, use lm(values~saltConc, data= stackedSpecData, subset = (saltConc > 0.5))

You can string together multiple logical arguments to subset using &. For example, lm(values~saltConc, data= stackedSpecData, subset = (saltConc > 0) & (saltConc < 2.5))

You can also use the not equals operator != to exclude a point. For example, lm(values~saltConc, data= stackedSpecData, subset = (saltConc != 0)

Here is an example. (Note that it isn’t necessarily a good choice!)

salt2.lm <- lm(values~saltConc, data= stackedSpecData, subset = (saltConc > 0.5))
plot(specMeans ~ saltConcentrations, pch = 16, cex=2, xlab="Salt Concentration", ylab="Normalized blueness", ylim=c(.9,1.4)) #You can adjust the y-axis by changing the values in ylim
  arrows(x0 = saltConcentrations, y0 = specMeans, x1 = saltConcentrations, y1 = specMeans + specSE, length = .04, angle = 90)
  arrows(x0 = saltConcentrations, y0 = specMeans, x1 = saltConcentrations, y1 = specMeans - specSE, length = .04, angle = 90)
  points(saltConcentrations, myValues, pch=16, col="gray") 
abline(salt2.lm)

summary(salt2.lm)$coefficients[1] #The intercept
summary(salt2.lm)$coefficients[2] #The slope
summary(salt2.lm)$r.squared # The R-squared value

Deciding on a regression to use

Now that you’ve explored some of these possible ways to analyze the data, you will need to decide which model to use. Here are some questions to consider to help you decide. (Recording answers to these questions in your lab notebook will help you when you write up next week.)

In general, do you think your group’s data, or the combined class data are better estimates of the true relationship between salt and blueness?
Do your data seem like they deviate in some systematic way from the class data? How does the answer to this question influence your choice of data to use for the standard curve?

Estimating the salt concentration of the samples you are studying.

Once you have a fit you are happy with, use the slope and the intercept of your regression, together with the spectrophotometry data (blueness) to predict the sodium concentration of your data.

You can access the blueness of your sample with the command below. Enter the row number before the comma, and the column number after the comma. (Note that both need to be numbers, even though the plate itself may have had letters for rows. In this case, A = 1, B = 2, etc…) You can also look at the original file if you prefer.

specData[ , ]

Once you’ve got your estimate of the blueness of the sample, and the slope and intercept of the regression line, you can rearrange the equation for the line and calculate the sodium concentration of your sample. (If you had multiple samples, you’ll need to make multiple calculations.)

Some questions to consider:

Did you dilute sample? If you so, you’ll want calculate the sodium concentration of your original sample
Did your blueness value fall beyond the range of the standard curve? If so, are there caveats to your estimate?