SC using 0.5 posterior for all 3 ctrls for each pool

better than my last cutoff, but seems like there should be a better way than my adhoc for just the ctrls…

mean?

hi pat! good news! it looks very promising and i think we will use it for the manuscript. it is working better than the usual arbitrary cutoffs as well as our own.

here’s our goal

Purpose

Develop a procedure for identifying poorly performing tags; specific goal being to distinguish gene roster of strain presence in three YKO pools:

  1. auxotrophic YKO (hom)

  2. prototrophic collection made by SGA/back-crossing (caudy)

  3. prototrophic collection made by CEN plasmid complementation of the auxotrophies

the big benefit of using EM is that we can call presence/absence with pvalues for different collections

this is uber important as this is a major point of this manuscript

got rid of a lot of crappy tags i was whining about

took me some time to understand another stupid package

the good news is is that you don’t really need the package (per usual), at least in our scenario because you can get pretty much the same results as just doing some simple stuff using dnorm and pnorm – everything converges really quick, and i had a hard time reading the EM package, so that was satisgfying (for me)

erica this sight might me as useful for you as it was for me for walking through the algorithm

http://rstudio-pubs-static.s3.amazonaws.com/1001_3177e85f5e4840be840c84452780db52.html

pat, my main issue as always is where to draw the cutoff?

for now i just enforced the rule that all 3 control experiments had to have a posterior > 0.5

i am sure there are more robust ways of calling/refining it. maybe even on a per tag basis rather than a per experiment basis? (eventually). also wasn’t sure if the batch correction was kosher…and not sure how much it mattered

we did lose a few precious genes (i hope not too many) and probably need to tweak it, but overall it worked well

steps

get SC data for all pools, xsc is the data matrix, psc describes experiments

assign tag type to each, plot ctrl experiments to compare distributions

removed unassigned in this run and used essentials instead

batch correct and normalize first – not sure if this is necessary because it doesn’t really change much –

plot before and after batch correction

  1. density

  2. dendograms

## 'dendrogram' with 2 branches and 30 members total, at height 127.2112

## 'dendrogram' with 2 branches and 30 members total, at height 77.30638

## 'dendrogram' with 2 branches and 30 members total, at height 127.2112

## 'dendrogram' with 2 branches and 30 members total, at height 77.30638

compute batch corrected tag presence using our standard approach for later comparison

reformat long-2-wide

estimate of the unassigned means and sd – pretty equal for all pools

##     pool    value
## 1  caudy 5.440389
## 2    hom 5.286499
## 3 oliver 5.409946
##     pool     value
## 1  caudy 0.7095629
## 2    hom 0.4205895
## 3 oliver 0.7331928

Plot histogram of assigned and unassigned for all experiments. This will generate a tall lattice of plots, so only save to pdf, but do not show in notebook.

It is pretty clear than around June 2 there is a leftward shift in the assigned tags and then a shift back to the right around July 15.

ggnote – i batch corrected this round –

I think we can fit a single gaussian to the unassigned tag distribution. Then fit a mixture of two Gaussians with one of the two fixed at the unassigned tag distribution and the other unconstrained for the assigned tags. Then we can use the posterior distribution over the cluster assignment variables to give an odds-ratio of the tag assignement. Then we can draw a threshold on the odds-ratio.

ggnote: drew a threshold based on 0.05

Fit a 2 component Gaussian mixture model to each experiment’s NONESS assigned tags. Store the posterior probability of the tag assignment to the “good” or “not unassigned” distribution" for each experiment.

Noting the difference in number of iterations needed to converge each time this is run – us this something to worry about?

caudy = 174 hom = 83 oliver = 308

reformat dataframe wide to long for plotting

grab the fits from the models based on unassigned tags instead: erica we can justify using the lower bounds (mean - 3*sd) from the ctrls for our bg threshold cutoff values – they are higher than before

hom = 7.05 caudy = 7.46 oliver = 7.47 *also captured a bad caudy array = caudy_lys_15Jul15.1 that i forgot about

note lower variance and lower means for the hom pool in general

## [1] stats for unass/assigned hom,caudy and oliver respectively =
##   bkg_mean   ass_mean     bkg_sd     ass_sd bkg_lambda ass_lambda 
##  6.1684288  9.6465535  0.9093650  0.8695046  0.2422054  0.7577946
##   bkg_mean   ass_mean     bkg_sd     ass_sd bkg_lambda ass_lambda 
##  6.5739015  9.8560063  1.0589987  0.7629437  0.3936433  0.6063567
##   bkg_mean   ass_mean     bkg_sd     ass_sd bkg_lambda ass_lambda 
##  6.4708480  9.8328984  1.0714609  0.7760142  0.3350390  0.6649610
## [1] hom,caudy and oliver respectively:  +/- 3 stdev from mean assigned tags =
## ass_mean ass_mean 
##  7.03804 12.25507
##  ass_mean  ass_mean 
##  7.567175 12.144837
##  ass_mean  ass_mean 
##  7.504856 12.160941

check out differences in fitted gaussian parameters betweenn pools

plot the assigned tag distribution with the posterior probability of it being a “good” tag. (only ctrl exps shown)

from here split out the pools and draw the Venn for tags w/ pvalue < 0.05

from barplots its pretty clear that the significance call is all or none there are lots of ways to call presence or absence given we have a pval across many experiments, but for now enforcing that pvalue must be > 0.05 for all 3 control chips for each pool

need to check consistency between replicates of absence/presence call

## [1] min value of ctrl medians =  6.99685579592234            
## [1] 4452   30
## [1] min value of ctrl medians =  7.51122346724224            
## [1] 3961   34
## [1] min value of ctrl medians =  7.14956099241794            
## [1] 4383   39
## [1] 3961   34
## [1] 3949   34
## [1] 4452   30
## [1] 4438   30
## [1] 4383   39
## [1] 4297   39

i hate venns but here goes: make a list of strains in each pool tags absent in all strains were removed top venn is up tag, bottom is down tag – they agree really well which is good

compare this to our standard call of gene presence:

top venn is genes by mixEM, bottom is by standard

## Using gene as id variables
## Using gene as id variables

plot ~proportional venns of missing strains

## Loading required package: rJava

plot ~proportional venns of missing strains after removing those common to all

mix prefix = presence determined by fitting two gaussians

stand prefix = presence determined by standard protocol of tag presences

##    
##     mixcau mixhom mixoliv
##   0    840    351     492
##   1   3949   4438    4297
##    
##     standcau standhom standoliv
##   0      436      304       239
##   1     4447     4579      4644
##    
##     mixcau mixhom mixoliv
##   0   0.17   0.07    0.10
##   1   0.79   0.89    0.86
##    
##     standcau standhom standoliv
##   0     0.09     0.06      0.05
##   1     0.91     0.94      0.95

check fitness profiles; mixEM is on the right, standard on the left

mixEM seems to works well – did it get rid of too much?

compare EM to standard for fitness profiles

looks like mixEM is working pretty good, check out the crap genes we get rid of:

other than a few ADE genes which we know are slow, it looks pretty good: order is hom genes, oliver genes and caudy genes

##   [1] "ACB1"      "ADE3"      "ADE8"      "ADK1"      "AIM26"    
##   [6] "ALD5"      "AMA1"      "ARO1"      "ARP5"      "ARV1"     
##  [11] "ASP1"      "ATP12"     "ATP15"     "BEM2"      "BST1"     
##  [16] "BTS1"      "BUD32"     "CAX4"      "CCR4"      "CHK1"     
##  [21] "CHS1"      "CKB1"      "COA3"      "COR1"      "COX20"    
##  [26] "CPS1"      "CTR1"      "CYB2"      "DAL82"     "DEG1"     
##  [31] "DIA4"      "DID4"      "EIS1"      "ERG6"      "EXO5"     
##  [36] "FMP52"     "FUR4"      "FYV10"     "FYV6"      "HAP2"     
##  [41] "HBN1"      "HFI1"      "HPC2"      "HSL1"      "HSL7"     
##  [46] "IDH1"      "IES1"      "INM2"      "IPK1"      "IRC15"    
##  [51] "KAP122"    "KAR3"      "KRE28"     "KRE6"      "MDS3"     
##  [56] "MET13"     "MHO1"      "MNN10"     "MON2"      "MOT2"     
##  [61] "MSS1"      "MSS18"     "NGG1"      "NOT5"      "NUP84"    
##  [66] "OPI3"      "PET112"    "PHO85"     "PIN2"      "PKH2"     
##  [71] "PPZ1"      "PRO2"      "PRS1"      "PSR1"      "RBK1"     
##  [76] "RLM1"      "RPB4"      "RPL1B"     "RPS16B"    "RPS23B"   
##  [81] "RRG8"      "RSC2"      "RSM26"     "RTF1"      "RTG3"     
##  [86] "SAF1"      "SAM3"      "SFP1"      "SGO1"      "SIR3"     
##  [91] "SNF12"     "SNF2"      "SOM1"      "SPC1"      "SPT7"     
##  [96] "SRC1"      "SSA2"      "SSB2"      "SUV3"      "SYM1"     
## [101] "TAF14"     "THO2"      "TIM8"      "TKL2"      "UBP15"    
## [106] "UIP4"      "VAM6"      "VBA5"      "VMA1"      "VMA16"    
## [111] "VMA21"     "VMA22"     "VMA6"      "VMA7"      "VPH2"     
## [116] "VPS15"     "VPS29"     "VPS52"     "YBR096W"   "YBR221W-A"
## [121] "YDR524C-B" "YER076C"   "YGL088W"   "YGL218W"   "YGR035W-A"
## [126] "YGR219W"   "YHR086W-A" "YKL018C-A" "YKL118W"   "YKL177W"  
## [131] "YLR361C-A" "YML007C-A" "YML009W-B" "YML079W"   "YMR141C"  
## [136] "YMR230W-A" "YMR294W-A" "YNL184C"   "YNR073C"   "YOL118C"  
## [141] "YOR331C"
##   [1] "AAH1"      "ADE1"      "ADE3"      "ADE4"      "ADE5,7"   
##   [6] "ADE6"      "ADK1"      "AEP2"      "AGP2"      "AIM44"    
##  [11] "AMA1"      "ANP1"      "APL6"      "APQ12"     "ARF1"     
##  [16] "ARG80"     "ARL3"      "ARO4"      "ARP8"      "ARV1"     
##  [21] "ARX1"      "ASC1"      "ATG4"      "ATP15"     "ATP25"    
##  [26] "AVT5"      "BAR1"      "BEM4"      "BNA7"      "BOR1"     
##  [31] "BTS1"      "BUD20"     "BUD23"     "CAX4"      "CBS2"     
##  [36] "CHA1"      "CHC1"      "CKA1"      "CKB1"      "CLA4"     
##  [41] "CNM67"     "COQ8"      "CPR1"      "CRP1"      "CSE2"     
##  [46] "CTF4"      "CTF8"      "CTK3"      "CTR9"      "DAL81"    
##  [51] "DAL82"     "DBP3"      "DCI1"      "DEF1"      "DEG1"     
##  [56] "DFG16"     "DON1"      "DRS2"      "DUG3"      "DUN1"     
##  [61] "ECM31"     "EDE1"      "EFT2"      "EIS1"      "ELM1"     
##  [66] "ELO2"      "ELO3"      "ELP2"      "ELP3"      "EMC6"     
##  [71] "EMP24"     "END3"      "ENT3"      "ERG24"     "ERG3"     
##  [76] "ERJ5"      "ERP2"      "EST3"      "ETR1"      "EXO5"     
##  [81] "FMP52"     "FMT1"      "FPR1"      "FRA1"      "FUN30"    
##  [86] "FYV10"     "FYV6"      "GCN1"      "GCN4"      "GLE2"     
##  [91] "GLO3"      "GRX8"      "HAP3"      "HCR1"      "HFI1"     
##  [96] "HHY1"      "HIR1"      "HMO1"      "HOC1"      "HOM3"     
## [101] "HSM3"      "HXT17"     "IDH2"      "IMD4"      "INM2"     
## [106] "IOC4"      "IRC18"     "IST3"      "IZH1"      "JJJ3"     
## [111] "KAP122"    "KRE6"      "KTI11"     "KTR1"      "KTR6"     
## [116] "LAC1"      "LGE1"      "LHS1"      "LOA1"      "LOC1"     
## [121] "LSM1"      "LSM6"      "LTV1"      "LYS20"     "MAP1"     
## [126] "MDJ1"      "MDL2"      "MDM10"     "MET10"     "MET22"    
## [131] "MET7"      "MFM1"      "MGR2"      "MGR3"      "MGT1"     
## [136] "MIP6"      "MIX17"     "MNN10"     "MNN2"      "MPO1"     
## [141] "MRPL20"    "MRPL28"    "MRPL4"     "MRPS5"     "MRT4"     
## [146] "MRX11"     "MSA1"      "MSB4"      "MSR1"      "MSS1"     
## [151] "MTC1"      "MVB12"     "NCR1"      "NDE1"      "NDI1"     
## [156] "NGG1"      "NHX1"      "NKP2"      "NQM1"      "NSR1"     
## [161] "NUC1"      "NUP60"     "OM14"      "OPI8"      "OSW7"     
## [166] "OTU2"      "PDH1"      "PEP12"     "PEX12"     "PHO89"    
## [171] "PIM1"      "PMP3"      "PMR1"      "PMT1"      "POP2"     
## [176] "PPH22"     "PPN1"      "PPT2"      "PPZ1"      "PRM2"     
## [181] "PRM9"      "PRO2"      "PRS3"      "PRS5"      "PRY2"     
## [186] "PSR1"      "PUF4"      "PXA2"      "RAD6"      "RAD9"     
## [191] "RAM1"      "RCR1"      "RCR2"      "REF2"      "REI1"     
## [196] "RGC1"      "RHB1"      "RHO4"      "RIM4"      "RPL21A"   
## [201] "RPL22A"    "RPL2B"     "RPL36B"    "RPL37A"    "RPL40A"   
## [206] "RPL42B"    "RPL7B"     "RPS0A"     "RPS10A"    "RPS11A"   
## [211] "RPS11B"    "RPS17A"    "RPS19B"    "RPS21A"    "RPS27B"   
## [216] "RPS29B"    "RPS7A"     "RPS9B"     "RRG8"      "RSA1"     
## [221] "RSM26"     "RTF1"      "RTT109"    "SAP1"      "SAP4"     
## [226] "SAS2"      "SDH1"      "SEM1"      "SEO1"      "SIN4"     
## [231] "SIR4"      "SIT4"      "SKG1"      "SLS1"      "SMF2"     
## [236] "SMK1"      "SNF4"      "SNG1"      "SPC72"     "SPF1"     
## [241] "SPL2"      "SPO23"     "SRB5"      "SSE1"      "SSF1"     
## [246] "SSH1"      "SUS1"      "SUV3"      "SWC5"      "SWI5"     
## [251] "SWI6"      "SWR1"      "TAT2"      "TGS1"      "THI2"     
## [256] "TIF1"      "TKL2"      "TOD6"      "TOF2"      "TOM1"     
## [261] "TOP1"      "TRM9"      "TSA2"      "TSC3"      "TSR3"     
## [266] "UBP15"     "UBP3"      "UBP5"      "UFO1"      "UGO1"     
## [271] "UME6"      "UMP1"      "URC2"      "URM1"      "UTP30"    
## [276] "UTR4"      "VBA1"      "VID27"     "VMA1"      "VMA10"    
## [281] "VMA11"     "VMA16"     "VMA2"      "VMA21"     "VMA5"     
## [286] "VMA6"      "VMA7"      "VMA9"      "VPS15"     "VPS28"    
## [291] "VPS3"      "VPS52"     "VRP1"      "XRS2"      "YAL066W"  
## [296] "YAP1"      "YBL059W"   "YBL071C"   "YBR096W"   "YBR178W"  
## [301] "YBR225W"   "YCR102C"   "YDJ1"      "YDL183C"   "YDR222W"  
## [306] "YDR401W"   "YDR433W"   "YDR514C"   "YER046W-A" "YFL012W"  
## [311] "YFL013W-A" "YFR018C"   "YGK3"      "YGL108C"   "YGL188C-A"
## [316] "YGR015C"   "YGR064W"   "YGR107W"   "YGR117C"   "YGR160W"  
## [321] "YHC3"      "YHK8"      "YIL025C"   "YJL077W-B" "YJR018W"  
## [326] "YKE2"      "YKL136W"   "YKR051W"   "YLR269C"   "YMR031W-A"
## [331] "YMR141C"   "YMR242W-A" "YNL140C"   "YNL171C"   "YOL079W"  
## [336] "YOL118C"   "YOR072W"   "YOR139C"   "YOR186W"   "YOR318C"  
## [341] "YOR331C"   "YOR366W"   "YPR084W"   "YPR123C"   "YRM1"     
## [346] "YRR1"      "ZRT2"
##   [1] "1-Oct"     "AAH1"      "ACO2"      "ADE4"      "ADE8"     
##   [6] "ADK1"      "ADO1"      "AEP2"      "AFT1"      "AIM10"    
##  [11] "AIM22"     "AIM44"     "ANP1"      "APL6"      "APQ12"    
##  [16] "ARC1"      "ARF1"      "ARG2"      "ARG5,6"    "ARG7"     
##  [21] "ARL1"      "ARL3"      "ARO4"      "ARP8"      "ARV1"     
##  [26] "ASF1"      "ASK10"     "ATG11"     "ATG14"     "ATG23"    
##  [31] "ATP1"      "ATP17"     "ATP3"      "ATP7"      "AVL9"     
##  [36] "AVT5"      "BBC1"      "BCS1"      "BEM2"      "BFR1"     
##  [41] "BNA2"      "BNA5"      "BOR1"      "BTS1"      "BUB3"     
##  [46] "BUD14"     "BUD16"     "BUD19"     "BUD20"     "BUD21"    
##  [51] "BUD28"     "BUD30"     "BUD32"     "CBP2"      "CCE1"     
##  [56] "CCR4"      "CCS1"      "CDA2"      "CHA1"      "CHC1"     
##  [61] "CHO2"      "CIR2"      "CKB1"      "CKB2"      "CKI1"     
##  [66] "CLA4"      "CNE1"      "CNM67"     "COG1"      "COQ9"     
##  [71] "COR1"      "CPA1"      "CPA1_uorf" "CPA2"      "CRP1"     
##  [76] "CTF8"      "CTK2"      "CTK3"      "CTR1"      "CTR9"     
##  [81] "CUS2"      "CYB2"      "CYT1"      "DAL82"     "DCI1"     
##  [86] "DDC1"      "DIA2"      "DIA4"      "DIG1"      "DOA4"     
##  [91] "DOC1"      "DRS2"      "DSS1"      "DUS1"      "EAF3"     
##  [96] "ECM22"     "ECM27"     "ECM31"     "ECM33"     "EFG1"     
## [101] "EIS1"      "ELM1"      "ELO2"      "ELO3"      "ELP2"     
## [106] "ELP3"      "EMC6"      "ENT3"      "EOS1"      "ERJ5"     
## [111] "ERP2"      "EST1"      "EST2"      "ETR1"      "FAB1"     
## [116] "FAR11"     "FAR7"      "FEN2"      "FMP48"     "FMP52"    
## [121] "FMT1"      "FRA1"      "FUN14"     "FUN30"     "FYV5"     
## [126] "FYV6"      "FYV7"      "FZO1"      "GAS1"      "GCN1"     
## [131] "GCN5"      "GCV3"      "GEP3"      "GLE2"      "GLO3"     
## [136] "GLR1"      "GLY1"      "GPM2"      "GRX8"      "GSF2"     
## [141] "GTF1"      "HBN1"      "HEF3"      "HER2"      "HIS2"     
## [146] "HIS5"      "HIS7"      "HIT1"      "HMO1"      "HMT1"     
## [151] "HOC1"      "HOM2"      "HOM6"      "HOP2"      "HPM1"     
## [156] "HRQ1"      "HSL1"      "HSL7"      "HSM3"      "HTD2"     
## [161] "HUL4"      "HUR1"      "ICE2"      "IES1"      "IKI3"     
## [166] "IMD4"      "INM2"      "INO4"      "IOC4"      "IRC18"    
## [171] "ISA2"      "JJJ3"      "JNM1"      "KCS1"      "KEX2"     
## [176] "KGD2"      "KRE1"      "KTI11"     "KTR1"      "KTR6"     
## [181] "LAC1"      "LAS21"     "LAT1"      "LCL2"      "LDB16"    
## [186] "LHS1"      "LIP2"      "LMO1"      "LPD1"      "LRG1"     
## [191] "LSC2"      "LSM1"      "LSM6"      "LTV1"      "LYP1"     
## [196] "LYS5"      "LYS9"      "MAF1"      "MCM16"     "MCT1"     
## [201] "MEF2"      "MET18"     "MET3"      "MET31"     "MET8"     
## [206] "MFA1"      "MFM1"      "MGR2"      "MGR3"      "MGT1"     
## [211] "MIP1"      "MIP6"      "MMM1"      "MMS22"     "MNN2"     
## [216] "MPC54"     "MRP1"      "MRPL1"     "MRPL11"    "MRPL13"   
## [221] "MRPL15"    "MRPL27"    "MRPL37"    "MRPL51"    "MRPL9"    
## [226] "MRPS9"     "MRX14"     "MSB4"      "MSM1"      "MSS1"     
## [231] "MST1"      "MTG2"      "MUM2"      "MVB12"     "NCS6"     
## [236] "NDI1"      "NEW1"      "NGG1"      "NPL6"      "NUC1"     
## [241] "NUP133"    "NUP170"    "NUP60"     "NUP84"     "OM14"     
## [246] "OPI11"     "OPI3"      "OSW7"      "PAC10"     "PBP4"     
## [251] "PDE2"      "PDH1"      "PET309"    "PET54"     "PEX12"    
## [256] "PEX13"     "PEX21"     "PEX28"     "PFF1"      "PHO89"    
## [261] "PKR1"      "PLM2"      "PMS1"      "PMT1"      "POP2"     
## [266] "POR1"      "PPN1"      "PPT2"      "PRM9"      "PRP18"    
## [271] "PRY2"      "PSD1"      "PSR1"      "PUS6"      "PXA2"     
## [276] "RAD24"     "RAD4"      "RAD9"      "RAI1"      "RCF2"     
## [281] "RCN1"      "RCR1"      "RCR2"      "REI1"      "REV7"     
## [286] "RFU1"      "RGC1"      "RHO5"      "RMD6"      "RNR3"     
## [291] "RNR4"      "ROM1"      "ROM2"      "RPB4"      "RPB9"     
## [296] "RPL13A"    "RPL16B"    "RPL19A"    "RPL21A"    "RPL22A"   
## [301] "RPL24A"    "RPL27A"    "RPL2B"     "RPL31A"    "RPL36B"   
## [306] "RPL40A"    "RPL42B"    "RPL43A"    "RPL7B"     "RPO41"    
## [311] "RPS0A"     "RPS17A"    "RPS18B"    "RPS23A"    "RPS23B"   
## [316] "RPS24A"    "RPS24B"    "RPS29B"    "RPS7A"     "RPS8A"    
## [321] "RPS9B"     "RRG9"      "RRP40"     "RSA1"      "RSC2"     
## [326] "RSM19"     "RSM22"     "RSM24"     "RSM26"     "RSM27"    
## [331] "RTF1"      "RTG2"      "RTS1"      "RTT109"    "SAC1"     
## [336] "SAC3"      "SAF1"      "SAM37"     "SAP1"      "SAP4"     
## [341] "SBA1"      "SDH1"      "SDH4"      "SEO1"      "SER2"     
## [346] "SHS1"      "SIC1"      "SIN3"      "SIN4"      "SLM5"     
## [351] "SLM6"      "SLT2"      "SMF2"      "SNF1"      "SNF2"     
## [356] "SNF3"      "SNF4"      "SNF7"      "SNG1"      "SOD2"     
## [361] "SOH1"      "SPF1"      "SPL2"      "SPO23"     "SPO77"    
## [366] "SPT10"     "SPT2"      "SPT3"      "SRB2"      "SRN2"     
## [371] "SSA2"      "SSD1"      "SSE1"      "SST2"      "STB5"     
## [376] "STE5"      "SUS1"      "SVF1"      "SWF1"      "SWH1"     
## [381] "SWI5"      "SYG1"      "TAT2"      "TCO89"     "THR1"     
## [386] "THR4"      "TIF1"      "TKL2"      "TOF2"      "TPN1"     
## [391] "TRM9"      "TRP1"      "TRX3"      "TSA2"      "TSC3"     
## [396] "TSR3"      "UAF30"     "UBC4"      "UBP3"      "UBP5"     
## [401] "UME6"      "URC2"      "URM1"      "UTP30"     "UTR1"     
## [406] "VAM3"      "VBA1"      "VBA2"      "VBA5"      "VID22"    
## [411] "VID27"     "VMA11"     "VMA21"     "VMA22"     "VMS1"     
## [416] "VPH2"      "VPS1"      "VPS28"     "VPS29"     "VPS52"    
## [421] "VPS53"     "VPS69"     "VPS70"     "VPS71"     "VPS73"    
## [426] "XRS2"      "YAK1"      "YAL056C-A" "YAL066W"   "YAP1"     
## [431] "YBL065W"   "YBL071C"   "YBR096W"   "YBR178W"   "YCL001W-B"
## [436] "YCL002C"   "YCL007C"   "YCL021W-A" "YCR101C"   "YCR102C"  
## [441] "YDJ1"      "YDL041W"   "YDL172C"   "YDL211C"   "YDR048C"  
## [446] "YDR114C"   "YDR222W"   "YDR442W"   "YDR514C"   "YEL045C"  
## [451] "YER010C"   "YET3"      "YFL012W"   "YFL013W-A" "YFR045W"  
## [456] "YGK3"      "YGL041C"   "YGL042C"   "YGL072C"   "YGL088W"  
## [461] "YGL108C"   "YGL188C-A" "YGL218W"   "YGR015C"   "YGR064W"  
## [466] "YGR117C"   "YGR160W"   "YHK8"      "YHR050W-A" "YIL025C"  
## [471] "YIL029C"   "YIL163C"   "YJL055W"   "YJL077W-B" "YJR107W"  
## [476] "YKL136W"   "YKR051W"   "YLR297W"   "YLR358C"   "YML012C-A"
## [481] "YML079W"   "YML094C-A" "YMR194C-A" "YNG2"      "YNL120C"  
## [486] "YNL140C"   "YNL203C"   "YNR068C"   "YOL118C"   "YOR199W"  
## [491] "YOR283W"   "YOR376W"   "YPL260W"   "YPR014C"   "YPR084W"  
## [496] "YPT32"     "ZRT2"      "ZUO1"

plot median vs mad of tags rejected ONLY by mixEM and not by our standard bkrnd thresholds:

EM looks better than standard in general – it did remove some ADE and other genes that we want/need, most notably for oliver, but other than that it looks convincing. we should talk about what our exact criteria should be for calling strain presence