hi pat! good news! it looks very promising and i think we will use it for the manuscript. it is working better than the usual arbitrary cutoffs as well as our own.
here’s our goal
Develop a procedure for identifying poorly performing tags; specific goal being to distinguish gene roster of strain presence in three YKO pools:
auxotrophic YKO (hom)
prototrophic collection made by SGA/back-crossing (caudy)
prototrophic collection made by CEN plasmid complementation of the auxotrophies
the big benefit of using EM is that we can call presence/absence with pvalues for different collections
this is uber important as this is a major point of this manuscript
got rid of a lot of crappy tags i was whining about
took me some time to understand another stupid package
the good news is is that you don’t really need the package (per usual), at least in our scenario because you can get pretty much the same results as just doing some simple stuff using dnorm and pnorm – everything converges really quick, and i had a hard time reading the EM package, so that was satisgfying (for me)
erica this sight might me as useful for you as it was for me for walking through the algorithm
http://rstudio-pubs-static.s3.amazonaws.com/1001_3177e85f5e4840be840c84452780db52.html
pat, my main issue as always is where to draw the cutoff?
for now i just enforced the rule that all 3 control experiments had to have a posterior > 0.5
i am sure there are more robust ways of calling/refining it. maybe even on a per tag basis rather than a per experiment basis? (eventually). also wasn’t sure if the batch correction was kosher…and not sure how much it mattered
we did lose a few precious genes (i hope not too many) and probably need to tweak it, but overall it worked well
steps
get SC data for all pools, xsc is the data matrix, psc describes experiments
assign tag type to each, plot ctrl experiments to compare distributions
batch correct and normalize first – not sure if this is necessary because it doesn’t really change much –
plot before and after batch correction
density
dendograms
## 'dendrogram' with 2 branches and 30 members total, at height 127.2112
## 'dendrogram' with 2 branches and 30 members total, at height 77.30638
## 'dendrogram' with 2 branches and 30 members total, at height 127.2112
## 'dendrogram' with 2 branches and 30 members total, at height 77.30638
compute batch corrected tag presence using our standard approach for later comparison
reformat long-2-wide
estimate of the unassigned means and sd – pretty equal for all pools
## pool value
## 1 caudy 5.440389
## 2 hom 5.286499
## 3 oliver 5.409946
## pool value
## 1 caudy 0.7095629
## 2 hom 0.4205895
## 3 oliver 0.7331928
Plot histogram of assigned and unassigned for all experiments. This will generate a tall lattice of plots, so only save to pdf, but do not show in notebook.
It is pretty clear than around June 2 there is a leftward shift in the assigned tags and then a shift back to the right around July 15.
ggnote – i batch corrected this round –
I think we can fit a single gaussian to the unassigned tag distribution. Then fit a mixture of two Gaussians with one of the two fixed at the unassigned tag distribution and the other unconstrained for the assigned tags. Then we can use the posterior distribution over the cluster assignment variables to give an odds-ratio of the tag assignement. Then we can draw a threshold on the odds-ratio.
ggnote: drew a threshold based on 0.05
Fit a 2 component Gaussian mixture model to each experiment’s NONESS assigned tags. Store the posterior probability of the tag assignment to the “good” or “not unassigned” distribution" for each experiment.
Noting the difference in number of iterations needed to converge each time this is run – us this something to worry about?
caudy = 174 hom = 83 oliver = 308
reformat dataframe wide to long for plotting
grab the fits from the models based on unassigned tags instead: erica we can justify using the lower bounds (mean - 3*sd) from the ctrls for our bg threshold cutoff values – they are higher than before
hom = 7.05 caudy = 7.46 oliver = 7.47 *also captured a bad caudy array = caudy_lys_15Jul15.1 that i forgot about
note lower variance and lower means for the hom pool in general
## [1] stats for unass/assigned hom,caudy and oliver respectively =
## bkg_mean ass_mean bkg_sd ass_sd bkg_lambda ass_lambda
## 6.1684288 9.6465535 0.9093650 0.8695046 0.2422054 0.7577946
## bkg_mean ass_mean bkg_sd ass_sd bkg_lambda ass_lambda
## 6.5739015 9.8560063 1.0589987 0.7629437 0.3936433 0.6063567
## bkg_mean ass_mean bkg_sd ass_sd bkg_lambda ass_lambda
## 6.4708480 9.8328984 1.0714609 0.7760142 0.3350390 0.6649610
## [1] hom,caudy and oliver respectively: +/- 3 stdev from mean assigned tags =
## ass_mean ass_mean
## 7.03804 12.25507
## ass_mean ass_mean
## 7.567175 12.144837
## ass_mean ass_mean
## 7.504856 12.160941
check out differences in fitted gaussian parameters betweenn pools
plot the assigned tag distribution with the posterior probability of it being a “good” tag. (only ctrl exps shown)
from here split out the pools and draw the Venn for tags w/ pvalue < 0.05
from barplots its pretty clear that the significance call is all or none there are lots of ways to call presence or absence given we have a pval across many experiments, but for now enforcing that pvalue must be > 0.05 for all 3 control chips for each pool
need to check consistency between replicates of absence/presence call
## [1] min value of ctrl medians = 6.99685579592234
## [1] 4452 30
## [1] min value of ctrl medians = 7.51122346724224
## [1] 3961 34
## [1] min value of ctrl medians = 7.14956099241794
## [1] 4383 39
## [1] 3961 34
## [1] 3949 34
## [1] 4452 30
## [1] 4438 30
## [1] 4383 39
## [1] 4297 39
i hate venns but here goes: make a list of strains in each pool tags absent in all strains were removed top venn is up tag, bottom is down tag – they agree really well which is good
compare this to our standard call of gene presence:
## Using gene as id variables
## Using gene as id variables
## Loading required package: rJava
##
## mixcau mixhom mixoliv
## 0 840 351 492
## 1 3949 4438 4297
##
## standcau standhom standoliv
## 0 436 304 239
## 1 4447 4579 4644
##
## mixcau mixhom mixoliv
## 0 0.17 0.07 0.10
## 1 0.79 0.89 0.86
##
## standcau standhom standoliv
## 0 0.09 0.06 0.05
## 1 0.91 0.94 0.95
compare EM to standard for fitness profiles
## [1] "ACB1" "ADE3" "ADE8" "ADK1" "AIM26"
## [6] "ALD5" "AMA1" "ARO1" "ARP5" "ARV1"
## [11] "ASP1" "ATP12" "ATP15" "BEM2" "BST1"
## [16] "BTS1" "BUD32" "CAX4" "CCR4" "CHK1"
## [21] "CHS1" "CKB1" "COA3" "COR1" "COX20"
## [26] "CPS1" "CTR1" "CYB2" "DAL82" "DEG1"
## [31] "DIA4" "DID4" "EIS1" "ERG6" "EXO5"
## [36] "FMP52" "FUR4" "FYV10" "FYV6" "HAP2"
## [41] "HBN1" "HFI1" "HPC2" "HSL1" "HSL7"
## [46] "IDH1" "IES1" "INM2" "IPK1" "IRC15"
## [51] "KAP122" "KAR3" "KRE28" "KRE6" "MDS3"
## [56] "MET13" "MHO1" "MNN10" "MON2" "MOT2"
## [61] "MSS1" "MSS18" "NGG1" "NOT5" "NUP84"
## [66] "OPI3" "PET112" "PHO85" "PIN2" "PKH2"
## [71] "PPZ1" "PRO2" "PRS1" "PSR1" "RBK1"
## [76] "RLM1" "RPB4" "RPL1B" "RPS16B" "RPS23B"
## [81] "RRG8" "RSC2" "RSM26" "RTF1" "RTG3"
## [86] "SAF1" "SAM3" "SFP1" "SGO1" "SIR3"
## [91] "SNF12" "SNF2" "SOM1" "SPC1" "SPT7"
## [96] "SRC1" "SSA2" "SSB2" "SUV3" "SYM1"
## [101] "TAF14" "THO2" "TIM8" "TKL2" "UBP15"
## [106] "UIP4" "VAM6" "VBA5" "VMA1" "VMA16"
## [111] "VMA21" "VMA22" "VMA6" "VMA7" "VPH2"
## [116] "VPS15" "VPS29" "VPS52" "YBR096W" "YBR221W-A"
## [121] "YDR524C-B" "YER076C" "YGL088W" "YGL218W" "YGR035W-A"
## [126] "YGR219W" "YHR086W-A" "YKL018C-A" "YKL118W" "YKL177W"
## [131] "YLR361C-A" "YML007C-A" "YML009W-B" "YML079W" "YMR141C"
## [136] "YMR230W-A" "YMR294W-A" "YNL184C" "YNR073C" "YOL118C"
## [141] "YOR331C"
## [1] "AAH1" "ADE1" "ADE3" "ADE4" "ADE5,7"
## [6] "ADE6" "ADK1" "AEP2" "AGP2" "AIM44"
## [11] "AMA1" "ANP1" "APL6" "APQ12" "ARF1"
## [16] "ARG80" "ARL3" "ARO4" "ARP8" "ARV1"
## [21] "ARX1" "ASC1" "ATG4" "ATP15" "ATP25"
## [26] "AVT5" "BAR1" "BEM4" "BNA7" "BOR1"
## [31] "BTS1" "BUD20" "BUD23" "CAX4" "CBS2"
## [36] "CHA1" "CHC1" "CKA1" "CKB1" "CLA4"
## [41] "CNM67" "COQ8" "CPR1" "CRP1" "CSE2"
## [46] "CTF4" "CTF8" "CTK3" "CTR9" "DAL81"
## [51] "DAL82" "DBP3" "DCI1" "DEF1" "DEG1"
## [56] "DFG16" "DON1" "DRS2" "DUG3" "DUN1"
## [61] "ECM31" "EDE1" "EFT2" "EIS1" "ELM1"
## [66] "ELO2" "ELO3" "ELP2" "ELP3" "EMC6"
## [71] "EMP24" "END3" "ENT3" "ERG24" "ERG3"
## [76] "ERJ5" "ERP2" "EST3" "ETR1" "EXO5"
## [81] "FMP52" "FMT1" "FPR1" "FRA1" "FUN30"
## [86] "FYV10" "FYV6" "GCN1" "GCN4" "GLE2"
## [91] "GLO3" "GRX8" "HAP3" "HCR1" "HFI1"
## [96] "HHY1" "HIR1" "HMO1" "HOC1" "HOM3"
## [101] "HSM3" "HXT17" "IDH2" "IMD4" "INM2"
## [106] "IOC4" "IRC18" "IST3" "IZH1" "JJJ3"
## [111] "KAP122" "KRE6" "KTI11" "KTR1" "KTR6"
## [116] "LAC1" "LGE1" "LHS1" "LOA1" "LOC1"
## [121] "LSM1" "LSM6" "LTV1" "LYS20" "MAP1"
## [126] "MDJ1" "MDL2" "MDM10" "MET10" "MET22"
## [131] "MET7" "MFM1" "MGR2" "MGR3" "MGT1"
## [136] "MIP6" "MIX17" "MNN10" "MNN2" "MPO1"
## [141] "MRPL20" "MRPL28" "MRPL4" "MRPS5" "MRT4"
## [146] "MRX11" "MSA1" "MSB4" "MSR1" "MSS1"
## [151] "MTC1" "MVB12" "NCR1" "NDE1" "NDI1"
## [156] "NGG1" "NHX1" "NKP2" "NQM1" "NSR1"
## [161] "NUC1" "NUP60" "OM14" "OPI8" "OSW7"
## [166] "OTU2" "PDH1" "PEP12" "PEX12" "PHO89"
## [171] "PIM1" "PMP3" "PMR1" "PMT1" "POP2"
## [176] "PPH22" "PPN1" "PPT2" "PPZ1" "PRM2"
## [181] "PRM9" "PRO2" "PRS3" "PRS5" "PRY2"
## [186] "PSR1" "PUF4" "PXA2" "RAD6" "RAD9"
## [191] "RAM1" "RCR1" "RCR2" "REF2" "REI1"
## [196] "RGC1" "RHB1" "RHO4" "RIM4" "RPL21A"
## [201] "RPL22A" "RPL2B" "RPL36B" "RPL37A" "RPL40A"
## [206] "RPL42B" "RPL7B" "RPS0A" "RPS10A" "RPS11A"
## [211] "RPS11B" "RPS17A" "RPS19B" "RPS21A" "RPS27B"
## [216] "RPS29B" "RPS7A" "RPS9B" "RRG8" "RSA1"
## [221] "RSM26" "RTF1" "RTT109" "SAP1" "SAP4"
## [226] "SAS2" "SDH1" "SEM1" "SEO1" "SIN4"
## [231] "SIR4" "SIT4" "SKG1" "SLS1" "SMF2"
## [236] "SMK1" "SNF4" "SNG1" "SPC72" "SPF1"
## [241] "SPL2" "SPO23" "SRB5" "SSE1" "SSF1"
## [246] "SSH1" "SUS1" "SUV3" "SWC5" "SWI5"
## [251] "SWI6" "SWR1" "TAT2" "TGS1" "THI2"
## [256] "TIF1" "TKL2" "TOD6" "TOF2" "TOM1"
## [261] "TOP1" "TRM9" "TSA2" "TSC3" "TSR3"
## [266] "UBP15" "UBP3" "UBP5" "UFO1" "UGO1"
## [271] "UME6" "UMP1" "URC2" "URM1" "UTP30"
## [276] "UTR4" "VBA1" "VID27" "VMA1" "VMA10"
## [281] "VMA11" "VMA16" "VMA2" "VMA21" "VMA5"
## [286] "VMA6" "VMA7" "VMA9" "VPS15" "VPS28"
## [291] "VPS3" "VPS52" "VRP1" "XRS2" "YAL066W"
## [296] "YAP1" "YBL059W" "YBL071C" "YBR096W" "YBR178W"
## [301] "YBR225W" "YCR102C" "YDJ1" "YDL183C" "YDR222W"
## [306] "YDR401W" "YDR433W" "YDR514C" "YER046W-A" "YFL012W"
## [311] "YFL013W-A" "YFR018C" "YGK3" "YGL108C" "YGL188C-A"
## [316] "YGR015C" "YGR064W" "YGR107W" "YGR117C" "YGR160W"
## [321] "YHC3" "YHK8" "YIL025C" "YJL077W-B" "YJR018W"
## [326] "YKE2" "YKL136W" "YKR051W" "YLR269C" "YMR031W-A"
## [331] "YMR141C" "YMR242W-A" "YNL140C" "YNL171C" "YOL079W"
## [336] "YOL118C" "YOR072W" "YOR139C" "YOR186W" "YOR318C"
## [341] "YOR331C" "YOR366W" "YPR084W" "YPR123C" "YRM1"
## [346] "YRR1" "ZRT2"
## [1] "1-Oct" "AAH1" "ACO2" "ADE4" "ADE8"
## [6] "ADK1" "ADO1" "AEP2" "AFT1" "AIM10"
## [11] "AIM22" "AIM44" "ANP1" "APL6" "APQ12"
## [16] "ARC1" "ARF1" "ARG2" "ARG5,6" "ARG7"
## [21] "ARL1" "ARL3" "ARO4" "ARP8" "ARV1"
## [26] "ASF1" "ASK10" "ATG11" "ATG14" "ATG23"
## [31] "ATP1" "ATP17" "ATP3" "ATP7" "AVL9"
## [36] "AVT5" "BBC1" "BCS1" "BEM2" "BFR1"
## [41] "BNA2" "BNA5" "BOR1" "BTS1" "BUB3"
## [46] "BUD14" "BUD16" "BUD19" "BUD20" "BUD21"
## [51] "BUD28" "BUD30" "BUD32" "CBP2" "CCE1"
## [56] "CCR4" "CCS1" "CDA2" "CHA1" "CHC1"
## [61] "CHO2" "CIR2" "CKB1" "CKB2" "CKI1"
## [66] "CLA4" "CNE1" "CNM67" "COG1" "COQ9"
## [71] "COR1" "CPA1" "CPA1_uorf" "CPA2" "CRP1"
## [76] "CTF8" "CTK2" "CTK3" "CTR1" "CTR9"
## [81] "CUS2" "CYB2" "CYT1" "DAL82" "DCI1"
## [86] "DDC1" "DIA2" "DIA4" "DIG1" "DOA4"
## [91] "DOC1" "DRS2" "DSS1" "DUS1" "EAF3"
## [96] "ECM22" "ECM27" "ECM31" "ECM33" "EFG1"
## [101] "EIS1" "ELM1" "ELO2" "ELO3" "ELP2"
## [106] "ELP3" "EMC6" "ENT3" "EOS1" "ERJ5"
## [111] "ERP2" "EST1" "EST2" "ETR1" "FAB1"
## [116] "FAR11" "FAR7" "FEN2" "FMP48" "FMP52"
## [121] "FMT1" "FRA1" "FUN14" "FUN30" "FYV5"
## [126] "FYV6" "FYV7" "FZO1" "GAS1" "GCN1"
## [131] "GCN5" "GCV3" "GEP3" "GLE2" "GLO3"
## [136] "GLR1" "GLY1" "GPM2" "GRX8" "GSF2"
## [141] "GTF1" "HBN1" "HEF3" "HER2" "HIS2"
## [146] "HIS5" "HIS7" "HIT1" "HMO1" "HMT1"
## [151] "HOC1" "HOM2" "HOM6" "HOP2" "HPM1"
## [156] "HRQ1" "HSL1" "HSL7" "HSM3" "HTD2"
## [161] "HUL4" "HUR1" "ICE2" "IES1" "IKI3"
## [166] "IMD4" "INM2" "INO4" "IOC4" "IRC18"
## [171] "ISA2" "JJJ3" "JNM1" "KCS1" "KEX2"
## [176] "KGD2" "KRE1" "KTI11" "KTR1" "KTR6"
## [181] "LAC1" "LAS21" "LAT1" "LCL2" "LDB16"
## [186] "LHS1" "LIP2" "LMO1" "LPD1" "LRG1"
## [191] "LSC2" "LSM1" "LSM6" "LTV1" "LYP1"
## [196] "LYS5" "LYS9" "MAF1" "MCM16" "MCT1"
## [201] "MEF2" "MET18" "MET3" "MET31" "MET8"
## [206] "MFA1" "MFM1" "MGR2" "MGR3" "MGT1"
## [211] "MIP1" "MIP6" "MMM1" "MMS22" "MNN2"
## [216] "MPC54" "MRP1" "MRPL1" "MRPL11" "MRPL13"
## [221] "MRPL15" "MRPL27" "MRPL37" "MRPL51" "MRPL9"
## [226] "MRPS9" "MRX14" "MSB4" "MSM1" "MSS1"
## [231] "MST1" "MTG2" "MUM2" "MVB12" "NCS6"
## [236] "NDI1" "NEW1" "NGG1" "NPL6" "NUC1"
## [241] "NUP133" "NUP170" "NUP60" "NUP84" "OM14"
## [246] "OPI11" "OPI3" "OSW7" "PAC10" "PBP4"
## [251] "PDE2" "PDH1" "PET309" "PET54" "PEX12"
## [256] "PEX13" "PEX21" "PEX28" "PFF1" "PHO89"
## [261] "PKR1" "PLM2" "PMS1" "PMT1" "POP2"
## [266] "POR1" "PPN1" "PPT2" "PRM9" "PRP18"
## [271] "PRY2" "PSD1" "PSR1" "PUS6" "PXA2"
## [276] "RAD24" "RAD4" "RAD9" "RAI1" "RCF2"
## [281] "RCN1" "RCR1" "RCR2" "REI1" "REV7"
## [286] "RFU1" "RGC1" "RHO5" "RMD6" "RNR3"
## [291] "RNR4" "ROM1" "ROM2" "RPB4" "RPB9"
## [296] "RPL13A" "RPL16B" "RPL19A" "RPL21A" "RPL22A"
## [301] "RPL24A" "RPL27A" "RPL2B" "RPL31A" "RPL36B"
## [306] "RPL40A" "RPL42B" "RPL43A" "RPL7B" "RPO41"
## [311] "RPS0A" "RPS17A" "RPS18B" "RPS23A" "RPS23B"
## [316] "RPS24A" "RPS24B" "RPS29B" "RPS7A" "RPS8A"
## [321] "RPS9B" "RRG9" "RRP40" "RSA1" "RSC2"
## [326] "RSM19" "RSM22" "RSM24" "RSM26" "RSM27"
## [331] "RTF1" "RTG2" "RTS1" "RTT109" "SAC1"
## [336] "SAC3" "SAF1" "SAM37" "SAP1" "SAP4"
## [341] "SBA1" "SDH1" "SDH4" "SEO1" "SER2"
## [346] "SHS1" "SIC1" "SIN3" "SIN4" "SLM5"
## [351] "SLM6" "SLT2" "SMF2" "SNF1" "SNF2"
## [356] "SNF3" "SNF4" "SNF7" "SNG1" "SOD2"
## [361] "SOH1" "SPF1" "SPL2" "SPO23" "SPO77"
## [366] "SPT10" "SPT2" "SPT3" "SRB2" "SRN2"
## [371] "SSA2" "SSD1" "SSE1" "SST2" "STB5"
## [376] "STE5" "SUS1" "SVF1" "SWF1" "SWH1"
## [381] "SWI5" "SYG1" "TAT2" "TCO89" "THR1"
## [386] "THR4" "TIF1" "TKL2" "TOF2" "TPN1"
## [391] "TRM9" "TRP1" "TRX3" "TSA2" "TSC3"
## [396] "TSR3" "UAF30" "UBC4" "UBP3" "UBP5"
## [401] "UME6" "URC2" "URM1" "UTP30" "UTR1"
## [406] "VAM3" "VBA1" "VBA2" "VBA5" "VID22"
## [411] "VID27" "VMA11" "VMA21" "VMA22" "VMS1"
## [416] "VPH2" "VPS1" "VPS28" "VPS29" "VPS52"
## [421] "VPS53" "VPS69" "VPS70" "VPS71" "VPS73"
## [426] "XRS2" "YAK1" "YAL056C-A" "YAL066W" "YAP1"
## [431] "YBL065W" "YBL071C" "YBR096W" "YBR178W" "YCL001W-B"
## [436] "YCL002C" "YCL007C" "YCL021W-A" "YCR101C" "YCR102C"
## [441] "YDJ1" "YDL041W" "YDL172C" "YDL211C" "YDR048C"
## [446] "YDR114C" "YDR222W" "YDR442W" "YDR514C" "YEL045C"
## [451] "YER010C" "YET3" "YFL012W" "YFL013W-A" "YFR045W"
## [456] "YGK3" "YGL041C" "YGL042C" "YGL072C" "YGL088W"
## [461] "YGL108C" "YGL188C-A" "YGL218W" "YGR015C" "YGR064W"
## [466] "YGR117C" "YGR160W" "YHK8" "YHR050W-A" "YIL025C"
## [471] "YIL029C" "YIL163C" "YJL055W" "YJL077W-B" "YJR107W"
## [476] "YKL136W" "YKR051W" "YLR297W" "YLR358C" "YML012C-A"
## [481] "YML079W" "YML094C-A" "YMR194C-A" "YNG2" "YNL120C"
## [486] "YNL140C" "YNL203C" "YNR068C" "YOL118C" "YOR199W"
## [491] "YOR283W" "YOR376W" "YPL260W" "YPR014C" "YPR084W"
## [496] "YPT32" "ZRT2" "ZUO1"