SUMMARY

The goal of this analysis is to evaluate the fit of AFM data to ID3 ensemble (based on NMR and SAXS).

List of tasks:

Data input

The original data file is Angela movie data 2016-06-03.xlsx, from where columns A to J were extracted and saved in .csv format in AFM.data.

Reading in the data and removing missing values (marked as “-” in the original file).

# Data input
AFM_data <- read.csv("AFM.data", sep="\t", head=T)
# Remove missing values
AFM_combined <- unlist(AFM_data[!is.na(AFM_data)])

Tasks

1.) Compare CBP measurements

Helper functions for performing the comparisons.

# Helper functions
plot_AFM_density <- function(data){
  lines(density(data))
}

plot_template <- function(){
  plot(0, type="n", 
     xlim=c(0,max(AFM_data, na.rm=T)),
     ylim=c(0,0.2),
     xlab="Distance (nm)",
     ylab="Density (frequency)",
     main="Distance distribution in CBP measurements") 
}

plot_combined_AFM_density <- function(data){
  lines(density(data), col="orange", lwd=3)
}

Plotting all the measurements

# Density plotting all measurements
plot_template()
invisible(lapply(names(AFM_data), function(x) {
  current_col = AFM_data[, x]
  current_col_noNa = current_col[!is.na(current_col)]
  plot_AFM_density(current_col_noNa)
}))
plot_combined_AFM_density(AFM_combined)
legend("topright", c("Individual measurements", "All measurements combined"), col=c("black", "orange"), lwd=3)

Outlier measurements:

  • X0935m3
  • X0935m4
  • X0918m11
  • X0919m1
  • X0915m1

Note: In terms of statistical testing (t-test for means), every measurement is significantly (p-value << 0.05) different then the distribution of the combined dataset. So statistically these sets may not be combined - but for the sake of this exercise the similar datasets were combined nonetheless.

Plotting only the measurements that are similar

#Subsetting the similar datasets
AFM_data_similars <- AFM_data[, c("X0935m1", 
                                 "X0918m4", 
                                 "X0918m8", 
                                 "X0918m9", 
                                 "X0922m1")]
AFM_combined_similars <- unlist(AFM_data_similars[!is.na(AFM_data_similars)])
# Density plotting similar measurement
plot_template()
invisible(lapply(names(AFM_data_similars), function(x) {
  current_col = AFM_data[, x]
  current_col_noNa = current_col[!is.na(current_col)]
  plot_AFM_density(current_col_noNa)
}))
plot_combined_AFM_density(AFM_combined_similars)
legend("topright", c("Individual (similar) measurements", "All (similar) measurements combined"), col=c("black", "orange"), lwd=3)

2.) Comparing AFM data to ensemble end-to-end distances

End-to-end distances were measured in PyMOL on the conformers in the ID3 ensemble. This ensemble was based on NMR and SAXS data in solution.

# End-to-end distances measured on the conformers of ID3 ensemble
id3_e2e <- read.csv("e2e.data", sep="\t", head=T)
id3_e2e
##                     Conformer   e2e freq
## 1     132-131a_406_sccomp.pdb 19.99 0.20
## 2   1049-1065a_406_sccomp.pdb  9.84 0.10
## 3   6052-6129a_406_sccomp.pdb 18.87 0.08
## 4   8225-8322a_406_sccomp.pdb  9.09 0.02
## 5 13785-13945a_406_sccomp.pdb 26.15 0.44
## 6 16528-16720a_406_sccomp.pdb 16.41 0.12
## 7 16662-16855a_406_sccomp.pdb 15.86 0.04
weighted_id3_e2e <- rep(id3_e2e$e2e, id3_e2e$freq * 100)

plot(density(AFM_combined_similars), 
     main="Distribution of distances",
     xlab="Distances (nm)",
     ylab="Density (frequency)")
for(i in 1: length(id3_e2e$e2e)){
  points(id3_e2e$e2e[i], 0, pch=21, bg="blue")
}
lines(density(weighted_id3_e2e), col="blue")
legend("topright", c("AFM data", "ID3 ensemble"), col=c("black", "blue"), lwd=3)

Outcome:

The end-to-end distances in ID3 conformers in the ensemble based on NMR and SAXS data (measured in solution) significantly differ from that of the AFM data (measured on a surface).

Question to y’all

Can AFM distances be used to filter a pool of conformers that were generated using NMR data that was measured under significantly different conditions?