The goal of this analysis is to evaluate the fit of AFM data to ID3 ensemble (based on NMR and SAXS).
List of tasks:
The original data file is Angela movie data 2016-06-03.xlsx, from where columns A to J were extracted and saved in .csv format in AFM.data.
Reading in the data and removing missing values (marked as “-” in the original file).
# Data input
AFM_data <- read.csv("AFM.data", sep="\t", head=T)
# Remove missing values
AFM_combined <- unlist(AFM_data[!is.na(AFM_data)])
Helper functions for performing the comparisons.
# Helper functions
plot_AFM_density <- function(data){
lines(density(data))
}
plot_template <- function(){
plot(0, type="n",
xlim=c(0,max(AFM_data, na.rm=T)),
ylim=c(0,0.2),
xlab="Distance (nm)",
ylab="Density (frequency)",
main="Distance distribution in CBP measurements")
}
plot_combined_AFM_density <- function(data){
lines(density(data), col="orange", lwd=3)
}
Plotting all the measurements
# Density plotting all measurements
plot_template()
invisible(lapply(names(AFM_data), function(x) {
current_col = AFM_data[, x]
current_col_noNa = current_col[!is.na(current_col)]
plot_AFM_density(current_col_noNa)
}))
plot_combined_AFM_density(AFM_combined)
legend("topright", c("Individual measurements", "All measurements combined"), col=c("black", "orange"), lwd=3)
Note: In terms of statistical testing (t-test for means), every measurement is significantly (p-value << 0.05) different then the distribution of the combined dataset. So statistically these sets may not be combined - but for the sake of this exercise the similar datasets were combined nonetheless.
Plotting only the measurements that are similar
#Subsetting the similar datasets
AFM_data_similars <- AFM_data[, c("X0935m1",
"X0918m4",
"X0918m8",
"X0918m9",
"X0922m1")]
AFM_combined_similars <- unlist(AFM_data_similars[!is.na(AFM_data_similars)])
# Density plotting similar measurement
plot_template()
invisible(lapply(names(AFM_data_similars), function(x) {
current_col = AFM_data[, x]
current_col_noNa = current_col[!is.na(current_col)]
plot_AFM_density(current_col_noNa)
}))
plot_combined_AFM_density(AFM_combined_similars)
legend("topright", c("Individual (similar) measurements", "All (similar) measurements combined"), col=c("black", "orange"), lwd=3)
End-to-end distances were measured in PyMOL on the conformers in the ID3 ensemble. This ensemble was based on NMR and SAXS data in solution.
# End-to-end distances measured on the conformers of ID3 ensemble
id3_e2e <- read.csv("e2e.data", sep="\t", head=T)
id3_e2e
## Conformer e2e freq
## 1 132-131a_406_sccomp.pdb 19.99 0.20
## 2 1049-1065a_406_sccomp.pdb 9.84 0.10
## 3 6052-6129a_406_sccomp.pdb 18.87 0.08
## 4 8225-8322a_406_sccomp.pdb 9.09 0.02
## 5 13785-13945a_406_sccomp.pdb 26.15 0.44
## 6 16528-16720a_406_sccomp.pdb 16.41 0.12
## 7 16662-16855a_406_sccomp.pdb 15.86 0.04
weighted_id3_e2e <- rep(id3_e2e$e2e, id3_e2e$freq * 100)
plot(density(AFM_combined_similars),
main="Distribution of distances",
xlab="Distances (nm)",
ylab="Density (frequency)")
for(i in 1: length(id3_e2e$e2e)){
points(id3_e2e$e2e[i], 0, pch=21, bg="blue")
}
lines(density(weighted_id3_e2e), col="blue")
legend("topright", c("AFM data", "ID3 ensemble"), col=c("black", "blue"), lwd=3)
The end-to-end distances in ID3 conformers in the ensemble based on NMR and SAXS data (measured in solution) significantly differ from that of the AFM data (measured on a surface).
Can AFM distances be used to filter a pool of conformers that were generated using NMR data that was measured under significantly different conditions?