same but rerun only with conditions that will be in manuscript

hi pat! good news! it looks very promising and i think we will use it for the manuscript. it is working better than the usual arbitrary cutoffs as well as our own.

here’s our goal

Purpose

Develop a procedure for identifying poorly performing tags; specific goal being to distinguish gene roster of strain presence in three YKO pools:

  1. auxotrophic YKO (hom)

  2. prototrophic collection made by SGA/back-crossing (caudy)

  3. prototrophic collection made by CEN plasmid complementation of the auxotrophies

the big benefit of using EM is that we can call presence/absence with pvalues for different collections

this is uber important as this is a major point of this manuscript

got rid of a lot of crappy tags i was whining about

took me some time to understand another stupid package

the good news is is that you don’t really need the package (per usual), at least in our scenario because you can get pretty much the same results as just doing some simple stuff using dnorm and pnorm – everything converges really quick, and i had a hard time reading the EM package, so that was satisgfying (for me)

erica this sight might me as useful for you as it was for me for walking through the algorithm

http://rstudio-pubs-static.s3.amazonaws.com/1001_3177e85f5e4840be840c84452780db52.html

pat, my main issue as always is where to draw the cutoff?

for now i just enforced the rule that all 3 control experiments had to have a p-value > 0.5

i am sure there are more robust ways of calling/refining it. maybe even on a per tag basis rather than a per experiment basis? (eventually). also wasn’t sure if the batch correction was kosher…and not sure how much it mattered

we did lose a few precious genes (i hope not too many) and probably need to tweak it, but overall it worked well

steps

get SC data for all pools, xsc is the data matrix, psc describes experiments

assign tag type to each, plot ctrl experiments to compare distributions

removed unassigned in this run and used essentials instead

batch correct and normalize first – not sure if this is necessary because it doesn’t really change much –

plot before and after batch correction

  1. density

  2. dendograms

reformat long-2-wide

## [1] "unass included"

estimate of the unassigned means and sd – pretty equal for all pools

##     pool    value
## 1  caudy 5.405844
## 2    hom 5.217357
## 3 oliver 5.315958
##     pool     value
## 1  caudy 0.6958178
## 2    hom 0.4251037
## 3 oliver 0.7314023
## Found 3 batches
## Adjusting for 12 covariate(s) or covariate level(s)
## Standardizing Data across genes
## Fitting L/S model and finding priors
## Finding parametric adjustments
## Adjusting the Data
## Found 3 batches
## Adjusting for 9 covariate(s) or covariate level(s)
## Standardizing Data across genes
## Fitting L/S model and finding priors
## Finding parametric adjustments
## Adjusting the Data
## Found 3 batches
## Adjusting for 12 covariate(s) or covariate level(s)
## Standardizing Data across genes
## Fitting L/S model and finding priors
## Finding parametric adjustments
## Adjusting the Data
## [1] "unass removed"
## 
##  TRUE 
## 10216

Plot histogram of assigned and unassigned for all experiments. This will generate a tall lattice of plots, so only save to pdf, but do not show in notebook.

It is pretty clear than around June 2 there is a leftward shift in the assigned tags and then a shift back to the right around July 15.

ggnote – i batch corrected this round –

I think we can fit a single gaussian to the unassigned tag distribution. Then fit a mixture of two Gaussians with one of the two fixed at the unassigned tag distribution and the other unconstrained for the assigned tags. Then we can use the posterior distribution over the cluster assignment variables to give an odds-ratio of the tag assignement. Then we can draw a threshold on the odds-ratio.

ggnote: drew a threshold based on 0.05

Fit a 2 component Gaussian mixture model to each experiment’s NONESS assigned tags. Store the posterior probability of the tag assignment to the “good” or “not unassigned” distribution" for each experiment.

Noting the difference in number of iterations needed to converge each time this is run – us this something to worry about?

caudy = 174 – 282 no batch correction hom = 83 131 oliver = 308 242

reformat dataframe wide to long for plotting

grab the fits from the models based on unassigned tags instead: erica we can justify using the lower bounds (mean - 3*sd) from the ctrls for our bg threshold cutoff values – they are higher than before

hom = 7.05 caudy = 7.46 oliver = 7.47 *also captured a bad caudy array = caudy_lys_15Jul15.1 that i forgot about

note lower variance and lower means for the hom pool in general

## [1] stats for unass/assigned hom,caudy and oliver respectively =
##   bkg_mean   ass_mean     bkg_sd     ass_sd bkg_lambda ass_lambda 
##  6.1019831  9.5738050  0.9242400  0.8667852  0.2446167  0.7553833
##   bkg_mean   ass_mean     bkg_sd     ass_sd bkg_lambda ass_lambda 
##  6.5650372  9.8201733  1.0970971  0.7369216  0.4035549  0.5964451
##   bkg_mean   ass_mean     bkg_sd     ass_sd bkg_lambda ass_lambda 
##  6.3821403  9.7263986  1.0899718  0.7657322  0.3400108  0.6599892
## [1] hom,caudy and oliver respectively:  +/- 3 stdev from mean assigned tags =
##  bkg_mean  ass_mean 
##  6.101983 12.174161
##  bkg_mean  ass_mean 
##  6.565037 12.030938
## bkg_mean ass_mean 
##  6.38214 12.02360

check out differences in fitted gaussian parameters betweenn pools

plot the assigned tag distribution with the posterior probability of it being a “good” tag. (only ctrl exps shown)

from here split out the pools and draw the Venn for tags w/ pvalue < 0.05

## [1] min value of ctrl medians =  7.46767155284592            
## [1] 4470   30
## [1] min value of ctrl medians =  7.59040103547903            
## [1] 3981   34
## [1] min value of ctrl medians =  7.81958360782921            
## [1] 4380   39
## Found 3 batches
## Adjusting for 12 covariate(s) or covariate level(s)
## Standardizing Data across genes
## Fitting L/S model and finding priors
## Finding parametric adjustments
## Adjusting the Data
## Found 3 batches
## Adjusting for 9 covariate(s) or covariate level(s)
## Standardizing Data across genes
## Fitting L/S model and finding priors
## Finding parametric adjustments
## Adjusting the Data
## Found 3 batches
## Adjusting for 12 covariate(s) or covariate level(s)
## Standardizing Data across genes
## Fitting L/S model and finding priors
## Finding parametric adjustments
## Adjusting the Data

from barplots its pretty clear that the significance call is all or none there are lots of ways to call presence or absence given we have a pval across many experiments, but for now enforcing that pvalue must be > 0.05 for all 3 control chips for each pool

need to check consistency between replicates of absence/presence call

compute batch corrected tag presence using our standard approach for later comparison

## [1] "rejected by EM, caudy hom oliver"
## [1] 963
## [1] 368
## [1] 710
## [1] "rejected by std"
## [1] 342
## [1] 159
## [1] 313

r compare distributions of rejected strains by mixEM and std

orange = mix, blue = standard

check fitness profiles; mixEM is on the right, standard on the left

mixEM seems to works well – did it get rid of too much?

i hate venns but here goes: make a list of strains in each pool tags absent in all strains were removed

pie charts! think its easier to visualize

compare this to our standard call of gene presence:

top venn is genes by mixEM, bottom is by standard

## [1] 110
## [1] 215
## Using gene as id variables
## Using gene as id variables

## Loading required package: rJava

compare EM to standard for fitness profiles

compare on same scales

looks like mixEM is working pretty good, check out the crap genes we get rid of:

other than a few ADE genes which we know are slow, it looks pretty good: order is hom genes, oliver genes and caudy genes

## [1] "hom reject EM"
##   [1] "AAH1"      "ACB1"      "ADE5,7"    "ADE8"      "ADK1"     
##   [6] "AIM26"     "AMA1"      "APL6"      "ARG8"      "ARO1"     
##  [11] "ARO7"      "ARP5"      "ARV1"      "ASP1"      "ATP12"    
##  [16] "ATP15"     "BCS1"      "BDS1"      "BTS1"      "BUB3"     
##  [21] "BUD32"     "CAX4"      "CCR4"      "CHK1"      "CNM67"    
##  [26] "COA2"      "COQ6"      "COR1"      "COX10"     "COX20"    
##  [31] "CPS1"      "CTF4"      "CYB2"      "DAL82"     "DEG1"     
##  [36] "DIA4"      "DID4"      "DSC2"      "ECM22"     "ECM31"    
##  [41] "EIS1"      "ERG6"      "EXO5"      "FEN2"      "FLX1"     
##  [46] "FMP52"     "FYV6"      "HAP2"      "HBN1"      "HSL1"     
##  [51] "ICE2"      "IDH1"      "IMD4"      "IMG2"      "IOC4"     
##  [56] "IPK1"      "KAP122"    "KAR3"      "KRE28"     "KRE6"     
##  [61] "LSO1"      "LTV1"      "MDS3"      "MET13"     "MHO1"     
##  [66] "MNN10"     "MOT2"      "MRH4"      "MSS1"      "NOT5"     
##  [71] "NUP84"     "OPI9"      "PDR5"      "PET112"    "PHO85"    
##  [76] "PIN2"      "POG1"      "PRO1"      "PRO2"      "PRS2"     
##  [81] "PSR1"      "QCR9"      "RBK1"      "RLM1"      "RNR4"     
##  [86] "RPB4"      "RPL1B"     "RPL31A"    "RPS16B"    "RPS23B"   
##  [91] "RPS25B"    "RPS6A"     "RRF1"      "RRG8"      "RRT14"    
##  [96] "RSC2"      "RSM24"     "RSM26"     "RTF1"      "SAF1"     
## [101] "SAM3"      "SAN1"      "SIR3"      "SKO1"      "SNF12"    
## [106] "SNF2"      "SOM1"      "SPC1"      "SPT7"      "SRC1"     
## [111] "SSA2"      "SSB2"      "STE11"     "STE3"      "SUS1"     
## [116] "SUV3"      "SYM1"      "TAF14"     "THO2"      "THP1"     
## [121] "TIM8"      "TKL2"      "TRK1"      "UBP15"     "UGO1"     
## [126] "UIP4"      "VAM6"      "VBA5"      "VMA16"     "VMA21"    
## [131] "VMA22"     "VMA7"      "VPS15"     "VPS16"     "VPS33"    
## [136] "VPS53"     "YAL016C-B" "YBR096W"   "YBR221W-A" "YCL021W-A"
## [141] "YDL183C"   "YDL206W"   "YDR524C-B" "YER076C"   "YGL088W"  
## [146] "YGL185C"   "YGL218W"   "YGR035W-A" "YGR121W-A" "YGR161W-C"
## [151] "YGR174W-A" "YGR204C-A" "YGR219W"   "YHL015W-A" "YHR022C-A"
## [156] "YHR086W-A" "YIR018C-A" "YJL136W-A" "YKL018C-A" "YKL106C-A"
## [161] "YLL006W-A" "YLR361C-A" "YLR419W"   "YML007C-A" "YML009W-B"
## [166] "YML079W"   "YMR013W-A" "YMR141C"   "YMR230W-A" "YNG2"     
## [171] "YNL028W"   "YNL097C-B" "YNL184C"   "YOL097W-A" "YOL118C"  
## [176] "YOL159C-A" "YOL164W-A" "YOR161C-C" "YOR316C-A" "YOR331C"  
## [181] "YPL152W-A" "YPR099C"   "YPR159C-A" "YTA12"     "ZAP1"
##   [1] "AAH1"         "ADE1"         "ADE3"         "ADE4"        
##   [5] "ADE5,7"       "ADE6"         "ADK1"         "AGP2"        
##   [9] "AIM44"        "ALD5"         "ALG9"         "AMA1"        
##  [13] "ANP1"         "APL6"         "APQ12"        "ARC18"       
##  [17] "ARG80"        "ARL3"         "ARP8"         "ARV1"        
##  [21] "ASC1"         "ASF1"         "ASK10"        "ASP1"        
##  [25] "ATG11"        "ATG4"         "ATP12"        "ATP15"       
##  [29] "AVT5"         "BAR1"         "BEM4"         "BNA7"        
##  [33] "BOR1"         "BTS1"         "BUD14"        "BUD20"       
##  [37] "BUD23"        "CAX4"         "CBS2"         "CCE1"        
##  [41] "CHA1"         "CHC1"         "CKA1"         "CKB1"        
##  [45] "CLA4"         "CNE1"         "CNM67"        "COQ8"        
##  [49] "CRP1"         "CST6"         "CTF8"         "CTK3"        
##  [53] "CTR9"         "CYC3"         "DAK1"         "DAL82"       
##  [57] "DCI1"         "DEF1"         "DEG1"         "DFG16"       
##  [61] "DHH1"         "DON1"         "DRS2"         "DUG3"        
##  [65] "DUN1"         "ECM22"        "ECM31"        "EDE1"        
##  [69] "EIS1"         "ELO2"         "ELO3"         "ELP2"        
##  [73] "EMC6"         "END3"         "ENT3"         "ERG24"       
##  [77] "ERG3"         "ERJ5"         "ERP2"         "EST3"        
##  [81] "ETR1"         "EXO5"         "FMP52"        "FMT1"        
##  [85] "FPR1"         "FRA1"         "FUN14"        "FUN30"       
##  [89] "FYV10"        "FYV5"         "FYV6"         "GCN1"        
##  [93] "GCN4"         "GLE2"         "GLO3"         "GRX8"        
##  [97] "HAP3"         "HCR1"         "HFI1"         "HFM1"        
## [101] "HIR1"         "HMO1"         "HMT1"         "HOM3"        
## [105] "HSM3"         "HXT17"        "IDH2"         "IMD4"        
## [109] "INM2"         "IOC4"         "IRC18"        "IST3"        
## [113] "IZH1"         "JJJ3"         "KAP122"       "KTI11"       
## [117] "KTR1"         "KTR6"         "LAC1"         "LGE1"        
## [121] "LHS1"         "LOA1"         "LOC1"         "LSM1"        
## [125] "LSM6"         "LTV1"         "LYS20"        "MAP1"        
## [129] "MDJ1"         "MDL2"         "MDM10"        "MED1"        
## [133] "MET10"        "MET13"        "MET22"        "MET7"        
## [137] "MFM1"         "MGR2"         "MGR3"         "MGT1"        
## [141] "MIP6"         "MIX17"        "MMM1"         "MMS22"       
## [145] "MNN10"        "MNN2"         "MPO1"         "MRPL1"       
## [149] "MRPL20"       "MRPL4"        "MRPS5"        "MRT4"        
## [153] "MSA1"         "MSB4"         "MSD1"         "MSR1"        
## [157] "MSS1"         "MVB12"        "NCR1"         "NDE1"        
## [161] "NDI1"         "NGG1"         "NHX1"         "NPL3"        
## [165] "NSR1"         "NUC1"         "NUP60"        "OM14"        
## [169] "OPI8"         "OSW7"         "OTU2"         "PDH1"        
## [173] "PEP12"        "PEX12"        "PEX13"        "PEX21"       
## [177] "PEX28"        "PHO89"        "PIM1"         "PMP3"        
## [181] "PMR1"         "PMT1"         "POP2"         "POS5"        
## [185] "PPN1"         "PPT2"         "PPZ1"         "PRM2"        
## [189] "PRM9"         "PRO2"         "PRS3"         "PRY2"        
## [193] "PSR1"         "PUF4"         "PUF6"         "PXA2"        
## [197] "RAD6"         "RAD9"         "RAM1"         "RCN1"        
## [201] "RCO1"         "RCR1"         "RCR2"         "REI1"        
## [205] "REV7"         "RGC1"         "RHB1"         "RIM4"        
## [209] "ROM1"         "RPB4"         "RPL12B"       "RPL21A"      
## [213] "RPL22A"       "RPL27B"       "RPL2B"        "RPL36B"      
## [217] "RPL37A"       "RPL40A"       "RPL42B"       "RPL6A"       
## [221] "RPL7B"        "RPS10A"       "RPS11A"       "RPS17A"      
## [225] "RPS18A"       "RPS19B"       "RPS21A"       "RPS29B"      
## [229] "RPS7A"        "RPS9B"        "RRG8"         "RSA1"        
## [233] "RSM26"        "RTT109"       "SAP1"         "SAP4"        
## [237] "SAS2"         "SDH1"         "SEM1"         "SIC1"        
## [241] "SIN4"         "SIR4"         "SIT4"         "SKG1"        
## [245] "SLS1"         "SMF2"         "SMK1"         "SNF6"        
## [249] "SNG1"         "SNT309"       "SPC72"        "SPF1"        
## [253] "SPL2"         "SPO23"        "SPT10"        "SPT4"        
## [257] "SRB5"         "SSA2"         "SSE1"         "SSF1"        
## [261] "SSH1"         "SSQ1"         "SSZ1"         "STR2_paralog"
## [265] "SUS1"         "SUV3"         "SWC5"         "SWI5"        
## [269] "SWI6"         "SWR1"         "TAT2"         "TGS1"        
## [273] "THI2"         "TIF1"         "TKL2"         "TMA10"       
## [277] "TOD6"         "TOF2"         "TOM1"         "TOP1"        
## [281] "TRM9"         "TSA2"         "TSC3"         "TSR3"        
## [285] "TVP15"        "UBP15"        "UBP3"         "UBP5"        
## [289] "UFO1"         "UME6"         "UMP1"         "URC2"        
## [293] "URM1"         "UTP30"        "UTR4"         "VBA1"        
## [297] "VID27"        "VMA1"         "VMA10"        "VMA11"       
## [301] "VMA16"        "VMA2"         "VMA21"        "VMA5"        
## [305] "VMA7"         "VMA9"         "VPS15"        "VPS28"       
## [309] "VPS3"         "VPS52"        "XRS2"         "YAL066W"     
## [313] "YBL059W"      "YBL071C"      "YBR096W"      "YBR178W"     
## [317] "YBR225W"      "YCL001W-B"    "YCR101C"      "YCR102C"     
## [321] "YDJ1"         "YDL068W"      "YDL129W"      "YDL183C"     
## [325] "YDR199W"      "YDR222W"      "YDR401W"      "YDR433W"     
## [329] "YDR514C"      "YEL045C"      "YER010C"      "YER046W-A"   
## [333] "YFL012W"      "YFL013W-A"    "YFR018C"      "YGK3"        
## [337] "YGL072C"      "YGL108C"      "YGL188C-A"    "YGR015C"     
## [341] "YGR064W"      "YGR117C"      "YGR160W"      "YHC3"        
## [345] "YHK8"         "YHR050W-A"    "YIL025C"      "YJL077W-B"   
## [349] "YJR018W"      "YJR154W"      "YKL136W"      "YKR051W"     
## [353] "YLR269C"      "YML009W-B"    "YMR031W-A"    "YMR242W-A"   
## [357] "YNG2"         "YNL140C"      "YNL171C"      "YOL079W"     
## [361] "YOL118C"      "YOR072W"      "YOR139C"      "YOR199W"     
## [365] "YOR318C"      "YOR331C"      "YPL260W"      "YPR084W"     
## [369] "YPR123C"      "YRR1"         "ZRT2"
## [1] "oliv reject EM"
##   [1] "1-Oct"     "AAH1"      "AAT2"      "ACO2"      "ADE2"     
##   [6] "ADE8"      "ADH3"      "ADK1"      "ADO1"      "AEP1"     
##  [11] "AEP2"      "AFT1"      "AIM10"     "AIM22"     "AIM44"    
##  [16] "ALO1"      "ANP1"      "APL6"      "ARG1"      "ARG2"     
##  [21] "ARG4"      "ARG5,6"    "ARG7"      "ARL3"      "ARO4"     
##  [26] "ARO7"      "ARP8"      "ARV1"      "ARX1"      "ASC1"     
##  [31] "ASF1"      "ASK10"     "ATG11"     "ATG14"     "ATG20"    
##  [36] "ATP1"      "ATP17"     "ATP22"     "ATP3"      "ATP5"     
##  [41] "AVL9"      "AVT5"      "BAS1"      "BBC1"      "BCS1"     
##  [46] "BFR1"      "BNA1"      "BOR1"      "BRE1"      "BTS1"     
##  [51] "BUB3"      "BUD14"     "BUD16"     "BUD19"     "BUD20"    
##  [56] "BUD21"     "BUD27"     "CBC2"      "CBP2"      "CCE1"     
##  [61] "CCM1"      "CCR4"      "CCS1"      "CDA1"      "CDA2"     
##  [66] "CDC10"     "CHA1"      "CHD1"      "CHM7"      "CHO2"     
##  [71] "CIR2"      "CKB1"      "CKB2"      "CKI1"      "CLA4"     
##  [76] "CLC1"      "CNE1"      "CNM67"     "COA3"      "COG1"     
##  [81] "COQ9"      "COR1"      "COX10"     "COX11"     "COX18"    
##  [86] "COX19"     "COX7"      "CPA1_uorf" "CPA2"      "CRP1"     
##  [91] "CTF8"      "CTK2"      "CTK3"      "CTR1"      "CTR9"     
##  [96] "CUS2"      "CYT1"      "DAL82"     "DCI1"      "DDC1"     
## [101] "DFG16"     "DIA2"      "DIA4"      "DID4"      "DOA4"     
## [106] "DOC1"      "DRS2"      "DSS1"      "DST1"      "DUG3"     
## [111] "DYN3"      "ECM27"     "ECM31"     "ECM33"     "ECM38"    
## [116] "EFG1"      "EFT2"      "EIS1"      "ELM1"      "ELO2"     
## [121] "ELO3"      "ELP2"      "ELP3"      "EMC6"      "END3"     
## [126] "ENT3"      "ERJ5"      "ERP2"      "EST1"      "EST2"     
## [131] "EST3"      "ETR1"      "EXO1"      "FAB1"      "FAR11"    
## [136] "FAR7"      "FEN2"      "FMP48"     "FMP52"     "FMT1"     
## [141] "FRA1"      "FUN30"     "FYV5"      "FZO1"      "GAL80"    
## [146] "GAS1"      "GCN1"      "GCN5"      "GCV3"      "GEA2"     
## [151] "GEP3"      "GGC1"      "GLE2"      "GLO3"      "GLR1"     
## [156] "GLY1"      "GPM2"      "GRX8"      "GSF2"      "GTF1"     
## [161] "GTR2"      "HAC1"      "HBN1"      "HEF3"      "HER2"     
## [166] "HFM1"      "HIR1"      "HIS1"      "HIS2"      "HIS4"     
## [171] "HIS5"      "HIS7"      "HIT1"      "HMI1"      "HMO1"     
## [176] "HMT1"      "HOC1"      "HOM2"      "HOM3"      "HOM6"     
## [181] "HOP2"      "HPM1"      "HPR1"      "HRQ1"      "HSL1"     
## [186] "HSM3"      "HTD2"      "HUL4"      "HUR1"      "HXT17"    
## [191] "ICE2"      "ICS2"      "IES1"      "IKI1"      "IMP2'"    
## [196] "INM2"      "INO4"      "IOC4"      "IRC18"     "ISA2"     
## [201] "IST3"      "JIP4"      "JJJ3"      "JNM1"      "KCS1"     
## [206] "KGD1"      "KGD2"      "KIP1"      "KTI11"     "KTR1"     
## [211] "KTR6"      "LAT1"      "LCL2"      "LEA1"      "LHS1"     
## [216] "LIP2"      "LMO1"      "LPD1"      "LRG1"      "LSC2"     
## [221] "LSM1"      "LSM6"      "LSO1"      "LTV1"      "LYP1"     
## [226] "LYS12"     "LYS5"      "LYS9"      "MAF1"      "MAM33"    
## [231] "MCM21"     "MCT1"      "MEF1"      "MEF2"      "MET1"     
## [236] "MET18"     "MET31"     "MET8"      "MFA1"      "MFM1"     
## [241] "MGM101"    "MGR2"      "MGR3"      "MGT1"      "MID1"     
## [246] "MIP1"      "MIP6"      "MMM1"      "MMS22"     "MMS4"     
## [251] "MNN2"      "MRH4"      "MRP1"      "MRP20"     "MRPL1"    
## [256] "MRPL11"    "MRPL13"    "MRPL15"    "MRPL16"    "MRPL17"   
## [261] "MRPL22"    "MRPL27"    "MRPL28"    "MRPL35"    "MRPL37"   
## [266] "MRPL51"    "MRPL9"     "MRPS16"    "MRPS17"    "MRPS28"   
## [271] "MRPS35"    "MRPS9"     "MRX14"     "MSB4"      "MSF1"     
## [276] "MSH1"      "MSM1"      "MSR1"      "MSS1"      "MSS116"   
## [281] "MST1"      "MSW1"      "MTG2"      "MUM2"      "MVB12"    
## [286] "NAT3"      "NCS6"      "NDI1"      "NEW1"      "NGG1"     
## [291] "NOT3"      "NUC1"      "NUP133"    "NUP60"     "NUP84"    
## [296] "OM14"      "OPI3"      "OSW7"      "OXA1"      "PAN5"     
## [301] "PAP2"      "PBP4"      "PBY1"      "PDE2"      "PDH1"     
## [306] "PET111"    "PET123"    "PET130"    "PET309"    "PET54"    
## [311] "PEX12"     "PEX13"     "PEX19"     "PEX21"     "PEX28"    
## [316] "PFD1"      "PFF1"      "PHO89"     "PKR1"      "PLM2"     
## [321] "PMR1"      "PMS1"      "PMT1"      "POP2"      "POR1"     
## [326] "PPA2"      "PPT2"      "PRM9"      "PRP18"     "PRS3"     
## [331] "PSD1"      "PSR1"      "PXA2"      "QCR8"      "QDR2"     
## [336] "RAD9"      "RAI1"      "RCF2"      "RCN1"      "RCR1"     
## [341] "RCR2"      "REI1"      "REV7"      "REX2"      "RFU1"     
## [346] "RGC1"      "RHB1"      "RHO5"      "RIM1"      "RMD6"     
## [351] "RNR4"      "ROM1"      "ROM2"      "RPB4"      "RPB9"     
## [356] "RPL11B"    "RPL13A"    "RPL16A"    "RPL16B"    "RPL19A"   
## [361] "RPL21A"    "RPL22A"    "RPL24A"    "RPL27A"    "RPL2B"    
## [366] "RPL31A"    "RPL34B"    "RPL36B"    "RPL42B"    "RPL43A"   
## [371] "RPL7B"     "RPO41"     "RPP2B"     "RPS0A"     "RPS10A"   
## [376] "RPS11A"    "RPS17A"    "RPS21B"    "RPS23B"    "RPS24B"   
## [381] "RPS29B"    "RPS7A"     "RPS8A"     "RPS9B"     "RRT8"     
## [386] "RSA1"      "RSM19"     "RSM22"     "RSM26"     "RSM27"    
## [391] "RTF1"      "RTS1"      "RTT101"    "RTT109"    "SAC1"     
## [396] "SAM37"     "SAP1"      "SAP4"      "SCO1"      "SCS22"    
## [401] "SDH1"      "SDH4"      "SDS3"      "SEA4"      "SEO1"     
## [406] "SER2"      "SHS1"      "SIC1"      "SIR3"      "SLM5"     
## [411] "SLM6"      "SLT2"      "SLX9"      "SMF2"      "SMK1"     
## [416] "SNA3"      "SNF1"      "SNF3"      "SNF4"      "SNF7"     
## [421] "SNG1"      "SOH1"      "SPL2"      "SPO23"     "SPO77"    
## [426] "SPT3"      "SPT7"      "SRB2"      "SRB5"      "SRN2"     
## [431] "SRO7"      "SSA2"      "SSE1"      "SSH1"      "SST2"     
## [436] "STB5"      "STE11"     "STE5"      "SUS1"      "SVF1"     
## [441] "SWC5"      "SWF1"      "SWH1"      "SWI5"      "SYG1"     
## [446] "TAT2"      "TCO89"     "TEF4"      "TFB5"      "THI2"     
## [451] "THP3"      "THR1"      "THR4"      "TIF1"      "TKL2"     
## [456] "TOF2"      "TPN1"      "TRM9"      "TRP1"      "TRP2"     
## [461] "TRX3"      "TSA2"      "TSC3"      "TSR3"      "UBC4"     
## [466] "UBP3"      "UME6"      "URA1"      "URA7"      "URC2"     
## [471] "URM1"      "UTR1"      "VAM3"      "VBA1"      "VBA2"     
## [476] "VID22"     "VID27"     "VMA11"     "VMA2"      "VMA21"    
## [481] "VMA9"      "VPH2"      "VPS1"      "VPS24"     "VPS28"    
## [486] "VPS29"     "VPS41"     "VPS52"     "VPS53"     "VPS69"    
## [491] "VPS70"     "VPS73"     "WHI3"      "XRS2"      "YAK1"     
## [496] "YAL066W"   "YAP1"      "YBL065W"   "YBL071C"   "YBR096W"  
## [501] "YBR178W"   "YBR225W"   "YCL001W-B" "YCL002C"   "YCL007C"  
## [506] "YCL021W-A" "YCR101C"   "YCR102C"   "YDJ1"      "YDL041W"  
## [511] "YDL118W"   "YDL172C"   "YDL211C"   "YDR048C"   "YDR090C"  
## [516] "YDR114C"   "YDR222W"   "YDR442W"   "YDR455C"   "YDR514C"  
## [521] "YEF1"      "YEL045C"   "YER010C"   "YET3"      "YFL012W"  
## [526] "YFL013W-A" "YFR045W"   "YGK3"      "YGL041C"   "YGL042C"  
## [531] "YGL088W"   "YGL108C"   "YGL188C-A" "YGL218W"   "YGR015C"  
## [536] "YGR064W"   "YGR117C"   "YGR160W"   "YHK8"      "YHR050W-A"
## [541] "YIL025C"   "YJL022W"   "YJL043W"   "YJL055W"   "YJL077W-B"
## [546] "YJR011C"   "YJR107W"   "YJR154W"   "YKL033W-A" "YKL136W"  
## [551] "YKR051W"   "YLR202C"   "YLR297W"   "YLR358C"   "YLR407W"  
## [556] "YML012C-A" "YML079W"   "YML094C-A" "YMR294W-A" "YNL146W"  
## [561] "YNL203C"   "YNL226W"   "YNR068C"   "YOL118C"   "YOR139C"  
## [566] "YOR199W"   "YOR283W"   "YOR376W"   "YPL062W"   "YPL260W"  
## [571] "YPR014C"   "YPR084W"   "YPT32"     "YTP1"      "ZRT2"     
## [576] "ZUO1"
## [1] "caudy reject EM"
## [1] "hom reject std"
## character(0)
## [1] "oliv reject std"
## character(0)
## [1] "caudy reject std"
## character(0)

mix prefix = presence determined by fitting two gaussians

stand prefix = presence determined by standard protocol of tag presences

##    
##     mixcau mixhom mixoliv
##   0    814    329     486
##   1   3970   4455    4298
##    
##     standcau standhom standoliv
##   0      343      249       220
##   1     4546     4640      4669
##    
##     mixcau mixhom mixoliv
##   0   0.17   0.07    0.10
##   1   0.83   0.93    0.90
##    
##     standcau standhom standoliv
##   0     0.07     0.05      0.04
##   1     0.93     0.95      0.96

plot median vs mad of tags rejected ONLY by mixEM and not by our standard bkrnd thresholds:

EM looks better than standard in general – it did remove some ADE and other genes that we want/need, most notably for oliver, but other than that it looks convincing. we should talk about what our exact criteria should be for calling strain presence