Main database binning documentation

Kevin Keenan, 2014

Lough Neagh brown trout project.

Introduction

This notebook outlines the binning process used for the microsatellite loci employed in my Ph.D thesis. The data consists of over 6000 individuals, thus binning manually in GeneMapper v4.0 became more and more complex as the sample size increased. More samples resulted in larger variance in fragment size making the manual definition of the beginning and end of allele bins difficult.

To overcome this limitation, I employed a customised version of Alberto’s (2009) R package, MsatAllele, to visualised the cumulative distribution of fragments and define binning parameters as objectively as possible. The custom version of MsatAllele can be found at https://github.com/kkeenan02/MsatAllele. This package contain a number of new function, mainly designed to speed up computations. The employment of C++ indigrated with the help of the Rcpp package aids in this. Specifically, the database reader function in the custom version of MsatAllele is around 500 time faster than the original.

Another unique benefit of the custom MsatAllele package is the introduction of a binning routine which allow the definition of complex binning criteria, resulting in a more flexible and accurate binning process. This method is expecially important when binning allele fragments for loci with complex repeat patterns. As such, hypothetically, users can specify variable bin limits for any region within a given locus’ range.

This script documents the binning process for the LNBT project.

Load the necessary packages

library("MsatAllele")
library("ggplot2")

Read the baseline database

DB_orig <- fastReadFrag("Main_DB.txt", as.character(Sys.Date()), "all")
saveRDS(DB_orig, "Main_DB.rds")
DB_orig <- readRDS("Main_DB.rds")

Ssa85

Calculate bin statistics for Ssa85

dat <- BinStats(DB_orig, "Ssa85")
dat2 <- apply(dat[,-1], 2, function(x){
  return(round(as.numeric(unlist(x)), 4))
})
tab <- as.data.frame(cbind(rownames(dat), dat2))
colnames(tab) <- colnames(dat)
pander::pandoc.table(tab)

Bins	N	Min	Max	Range	Sd	MEAN	MEDIAN
100	1	100.1	100.1	0	NA	100.1	100.1
107	78	106	106.73	0.73	0.176	106.51	106.57
110	2096	109.98	110.67	0.69	0.14	110.38	110.39
112	2108	111.93	112.6	0.67	0.141	112.34	112.36
114	3471	113.81	114.57	0.76	0.126	114.28	114.28
116	2306	115.74	116.45	0.71	0.131	116.18	116.2
118	51	117.71	118.32	0.61	0.143	118.1	118.1
120	2	119.72	119.91	0.19	0.134	119.81	119.81
133	1	133.18	133.18	0	NA	133.18	133.18

Generate cumulative plot for Ssa85

res <- allCum(DB_orig, "Ssa85", limit = 0.8)
print(res$plt)

All bins look good. Alleles will be generated without further checks! Only low frequency alleles will be checked. They are noted below.

Extract Samples with alleles that occur in fewer than three individuals

tab <- getLowFreq(DB_orig, "Ssa85", 3)
if(!is.null(tab)){
  samp <- as.character(tab$Sample)
  idx <- which(colnames(tab) == "Sample")
  tab <- cbind(samp, tab[,-idx])
  rownames(tab) <- NULL
  pander::pandoc.table(tab)
}

samp	DFrow	Reading	Gel
LND14139	10068	100.1	all
SXM13111	9147	119.7	all
RBW102804	7028	119.9	all
RBW040504	222	133.2	all

LND14139 = Fragment 100.1 was an artifact peak
SXM13111 = appears to be legit. Size standard is normal.
RBW040504 = Legit
RBW102804 = Legit.

DONE!

One102a

Calculate bin statistics for One102a

dat <- BinStats(DB_orig, "One102a")
dat2 <- apply(dat[,-1], 2, function(x){
  return(round(as.numeric(unlist(x)), 4))
})
tab <- as.data.frame(cbind(rownames(dat), dat2))
colnames(tab) <- colnames(dat)
pander::pandoc.table(tab)

Bins	N	Min	Max	Range	Sd	MEAN	MEDIAN
163	1	163.06	163.06	0	NA	163.06	163.06
167	3736	166.82	167.59	0.77	0.152	167.22	167.23
170	5055	169.9	170.71	0.81	0.151	170.32	170.34

Generate cumulative plot for One102a

res <- allCum(DB_orig, "One102a")
print(res$plt)

Bins are good. There appears to be a single sample with the ‘163’ allele. This sample will be checked in GeneMapper for validity, and amended accordingly.

Extract Samples with alleles that occur in fewer than three individuals

tab <- getLowFreq(DB_orig, "One102a", 3)
if(!is.null(tab)){
  samp <- as.character(tab$Sample)
  idx <- which(colnames(tab) == "Sample")
  tab <- cbind(samp, tab[,-idx])
  rownames(tab) <- NULL
  pander::pandoc.table(tab)
}

samp	DFrow	Reading	Gel
CLG040702	13637	163.1	all

CLG040702 = Legit.

DONE!

One102b

Calculate bin statistics for One102b

dat <- BinStats(DB_orig, "One102b")
dat2 <- apply(dat[,-1], 2, function(x){
  return(round(as.numeric(unlist(x)), 4))
})
tab <- as.data.frame(cbind(rownames(dat), dat2))
colnames(tab) <- colnames(dat)
pander::pandoc.table(tab)

Bins	N	Min	Max	Range	Sd	MEAN	MEDIAN
178	2	178.22	178.26	0.04	0.028	178.24	178.24
191	81	190.38	190.82	0.44	0.086	190.64	190.64
195	63	194.44	195.05	0.61	0.162	194.79	194.82
199	225	198.53	199.12	0.59	0.157	198.91	198.97
203	521	202.58	203.23	0.65	0.147	202.96	202.99
207	1260	206.63	207.32	0.69	0.157	207.04	207.06
211	1486	210.65	211.4	0.75	0.161	211.1	211.12
215	1882	214.18	215.46	1.28	0.159	215.19	215.22
219	1693	218.92	220	1.08	0.147	219.32	219.34
223	1168	222.99	223.68	0.69	0.159	223.38	223.4
227	754	227.03	227.72	0.69	0.144	227.44	227.47
232	202	231.1	231.74	0.64	0.151	231.51	231.54
236	113	235.2	235.8	0.6	0.154	235.55	235.56
240	81	239.23	239.82	0.59	0.136	239.59	239.63
244	212	243.38	244.03	0.65	0.133	243.76	243.79
248	466	247.56	248.21	0.65	0.129	247.91	247.92
252	240	251.64	252.26	0.62	0.123	252.01	252.01
256	557	255.67	256.29	0.62	0.13	256.02	256.04
260	161	259.71	260.27	0.56	0.119	260.03	260
264	41	263.82	264.27	0.45	0.097	264.06	264.08
268	11	267.85	268.29	0.44	0.141	268.05	268.01

Generate cumulative plot for One102b

res <- allCum(DB_orig, "One102b")
print(res$plt)

All good. Just need to inspect low frequency alleles.

Extract Samples with alleles that occur in fewer than three individuals

tab <- getLowFreq(DB_orig, "One102b", 3)
if(!is.null(tab)){
  samp <- as.character(tab$Sample)
  idx <- which(colnames(tab) == "Sample")
  tab <- cbind(samp, tab[,-idx])
  rownames(tab) <- NULL
  pander::pandoc.table(tab)
}

samp	DFrow	Reading	Gel
KLW100206	25032	178.2	all
CLM050204	20730	178.3	all

CLM050204 = Legit.
KLW100206 = Legit.

DONE!

Ssa406UoS

Calculate bin statistics for Ssa406UoS

dat <- BinStats(DB_orig, "Ssa406UoS")
dat2 <- apply(dat[,-1], 2, function(x){
  return(round(as.numeric(unlist(x)), 4))
})
tab <- as.data.frame(cbind(rownames(dat), dat2))
colnames(tab) <- colnames(dat)
pander::pandoc.table(tab)

Bins	N	Min	Max	Range	Sd	MEAN	MEDIAN
414	1	414.18	414.18	0	NA	414.18	414.18
422	108	420.86	421.77	0.91	0.17	421.51	421.52
425	106	424.85	425.74	0.89	0.144	425.44	425.46
429	101	428.58	429.64	1.06	0.192	429.21	429.23
433	7	432.86	433.46	0.6	0.192	433.28	433.32
435	16	434.82	435.22	0.4	0.115	435.03	435.05
437	2	436.92	436.96	0.04	0.028	436.94	436.94
439	21	438.42	439.03	0.61	0.167	438.79	438.8
443	32	442.37	443.02	0.65	0.158	442.8	442.82
445	53	444.51	445.23	0.72	0.162	444.92	444.92
447	429	445.95	446.98	1.03	0.152	446.68	446.7
448	1	447.77	447.77	0	NA	447.77	447.77
449	78	448.04	449.08	1.04	0.168	448.73	448.76
450	1	449.78	449.78	0	NA	449.78	449.78
451	530	450.07	450.92	0.85	0.171	450.58	450.59
453	45	451.88	452.95	1.07	0.213	452.61	452.62
455	1165	453.83	454.89	1.06	0.157	454.56	454.57
456	1	456.39	456.39	0	NA	456.39	456.39
458	1	457.86	457.86	0	NA	457.86	457.86
459	782	457.92	458.86	0.94	0.151	458.51	458.53
461	17	460.35	460.81	0.46	0.131	460.62	460.64
462	1302	461.64	462.74	1.1	0.163	462.4	462.42
464	56	463.89	464.68	0.79	0.161	464.42	464.44
466	671	465.6	466.65	1.05	0.15	466.29	466.31
468	6	467.98	468.44	0.46	0.206	468.24	468.25
470	314	469.21	470.48	1.27	0.182	470.17	470.2
472	12	471.97	472.43	0.46	0.124	472.26	472.25
474	76	473.88	474.4	0.52	0.132	474.18	474.18
476	10	475.93	476.22	0.29	0.099	476.06	476
478	49	477.66	478.2	0.54	0.128	478.03	478.04
480	1	479.8	479.8	0	NA	479.8	479.8
482	92	481.47	482.24	0.77	0.18	481.88	481.9
484	7	483.84	484.2	0.36	0.115	483.99	484.01
486	143	485.34	486.16	0.82	0.163	485.81	485.83
488	8	487.62	487.97	0.35	0.132	487.79	487.75
490	164	489.26	490.11	0.85	0.181	489.74	489.76
492	12	491.34	491.92	0.58	0.165	491.71	491.74
494	36	493.46	494	0.54	0.147	493.75	493.77
496	5	495.47	495.76	0.29	0.115	495.64	495.66
498	46	497.09	497.95	0.86	0.192	497.61	497.64
500	24	499.09	499.91	0.82	0.194	499.6	499.62
502	255	500.97	501.91	0.94	0.17	501.54	501.55
504	14	503.37	503.85	0.48	0.149	503.6	503.62
505	418	504.66	505.8	1.14	0.178	505.44	505.46
508	1	507.61	507.61	0	NA	507.61	507.61
509	466	508.55	509.73	1.18	0.172	509.36	509.39
511	9	510.85	511.6	0.75	0.214	511.32	511.3
513	360	512.57	513.8	1.23	0.185	513.28	513.32
515	9	515.11	515.52	0.41	0.143	515.28	515.26
517	153	516.64	517.64	1	0.2	517.29	517.3
519	8	519	519.51	0.51	0.192	519.23	519.24
521	240	520.6	521.65	1.05	0.195	521.24	521.26
525	242	524.59	525.47	0.88	0.172	525.12	525.14
527	1	527.26	527.26	0	NA	527.26	527.26
529	74	528.71	529.27	0.56	0.144	529.05	529.08
531	3	530.8	531.01	0.21	0.112	530.93	530.97
533	82	532.14	533.11	0.97	0.189	532.84	532.87
535	2	534.6	534.84	0.24	0.17	534.72	534.72
537	104	536.17	537.04	0.87	0.161	536.66	536.65
539	3	538.37	539.05	0.68	0.342	538.73	538.78
541	83	540	540.96	0.96	0.206	540.53	540.57
544	39	543.85	544.78	0.93	0.168	544.43	544.43
550	1	550.27	550.27	0	NA	550.27	550.27
554	1	554.11	554.11	0	NA	554.11	554.11
556	4	556.12	556.3	0.18	0.085	556.24	556.27

Generate cumulative plot for Ssa406UoS

res <- allCum(DB_orig, "Ssa406UoS")
print(res$plt)

Identify issues

Problem sample with peak around 449.6.

  res <- allCum(DB_orig, "Ssa406UoS", limit = 0.9, ymin = 445, ymax = 455)
  print(res$plt)

Identify the sample:

  DB_orig %>%
    filter(Marker == "Ssa406UoS") %>%
    filter(Fragment >= 449.6 & Fragment <= 450)

     Marker   Sample Fragment       Date Plate
1 Ssa406UoS RMN13048   449.78 2015-03-07   all

Sample is from plates 59 +. 0.35 added.

Extract Samples with alleles that occur in fewer than three individuals

tab <- getLowFreq(DB_new, "Ssa406UoS", 3, limit = 0.9)
if(!is.null(tab)){
  samp <- as.character(tab$Sample)
  idx <- which(colnames(tab) == "Sample")
  tab <- cbind(samp, tab[,-idx])
  rownames(tab) <- NULL
  pander::pandoc.table(tab)
}

samp	DFrow	Reading	Gel
CLG040205	33969	436.9	all
CLG040210	33979	437	all
IOM120253	36978	456.4	all
RMN13042	39044	479.8	all
SJH100146	36710	507.6	all
BLD100506	35844	527.3	all
BLD102302	36034	531	all
GVY100304	37311	531	all
LSN100202	37493	534.6	all
LSN10T1N205	38312	534.8	all
KGR100507	37764	538.8	all
GGR100504	37866	538.4	all
BLD100602	35856	550.3	all
BLD101203	35912	554.1	all

BLD100506 = Peak 527.26 legit.
BLD100602 = Peak 550.27 legit.
BLD101203 = Peak 554.11 legit.
BLD102302 = Peak 530.97 legit.
BLD102303 = Weak sample, but 530.8 peak is legit.
CLG040205 = Peak 436.92 legit.
CLG040210 = Peak 436.96 legit.
CLG041309 = Peak 539.05 artifact. Amended in “Main_DB_new.txt”.
GGR100504 = Peak 538.37 legit.
GVY100304 = Peak 531.01 legit.
IOM120253 = Peak 456.39 legit.
KGR100507 = Peak 538.78 legit.
LSN100202 = Peak 534.60 legit.
LSN10T1N205 = Peak 534.84 legit.
PRB100301 = Peak 414.18 artifact. Amended in “Main_DB_new.txt”.
RMN13042 = Peak 479.8 legit.
RMN13046 = Peak 447.77 legit.
RMN13048 = Peak 449.78 legit.
RMN13054 = Peak 457.86 legit.
SJH100146 = Peak 507.61 legit.

DONE!

CA048302

Calculate bin statistics for CA048302

dat <- BinStats(DB_orig, "CA048302")
dat2 <- apply(dat[,-1], 2, function(x){
  return(round(as.numeric(unlist(x)), 4))
})
tab <- as.data.frame(cbind(rownames(dat), dat2))
colnames(tab) <- colnames(dat)
pander::pandoc.table(tab)

Bins	N	Min	Max	Range	Sd	MEAN	MEDIAN
172	9	171.72	171.97	0.25	0.079	171.86	171.85
176	8	175.54	175.8	0.26	0.089	175.71	175.75
178	2	177.54	177.54	0	0	177.54	177.54
180	2661	179.3	179.65	0.35	0.061	179.5	179.53
181	96	181.21	181.56	0.35	0.055	181.42	181.43
183	314	183.17	183.5	0.33	0.059	183.37	183.37
185	3336	185.09	185.48	0.39	0.059	185.3	185.3
187	1840	187.01	187.41	0.4	0.057	187.23	187.24
189	520	188.89	189.31	0.42	0.057	189.17	189.18
191	1	191.08	191.08	0	NA	191.08	191.08
193	707	192.85	193.18	0.33	0.058	193.05	193.05
197	45	196.73	197.03	0.3	0.067	196.91	196.92

Generate cumulative plot for CA048302

res <- allCum(DB_orig, "CA048302")
print(res$plt)

Extract Samples with alleles that occur in fewer than three individuals

tab <- getLowFreq(DB_orig, "CA048302", 3)
if(!is.null(tab)){
  samp <- as.character(tab$Sample)
  idx <- which(colnames(tab) == "Sample")
  tab <- cbind(samp, tab[,-idx])
  rownames(tab) <- NULL
  pander::pandoc.table(tab)
}

samp	DFrow	Reading	Gel
UBN140040	47073	177.5	all
SXM13058	47757	177.5	all
LTB040204	43591	191.1	all

LTB040204 = Peak 191.08 legit.
SXM13058 = Peak 177.54 legit.
UBN140040 = Peak 177.54 legit.

DONE!

Ssa419UoS

Calculate bin statistics for Ssa419UoS

dat <- BinStats(DB_orig, "Ssa419UoS")
dat2 <- apply(dat[,-1], 2, function(x){
  return(round(as.numeric(unlist(x)), 4))
})
tab <- as.data.frame(cbind(rownames(dat), dat2))
colnames(tab) <- colnames(dat)
pander::pandoc.table(tab)

Bins	N	Min	Max	Range	Sd	MEAN	MEDIAN
280	1813	279.42	280	0.58	0.093	279.78	279.82
299	3	299.24	299.32	0.08	0.04	299.28	299.29
365	245	364.62	365.31	0.69	0.12	365.07	365.09
369	3989	368.44	369.3	0.86	0.113	368.99	369.01
397	1	396.94	396.94	0	NA	396.94	396.94
450	23	449.88	450.35	0.47	0.113	450.19	450.2
454	11	454.06	454.44	0.38	0.111	454.27	454.28
532	4	531.47	531.69	0.22	0.09	531.58	531.59
536	12	536.17	536.75	0.58	0.161	536.45	536.47
539	3239	538.29	539.62	1.33	0.175	539.23	539.24

Generate cumulative plot for Ssa419UoS

res <- allCum(DB_orig, "Ssa419UoS")
print(res$plt)

Extract Samples with alleles that occur in fewer than three individuals

tab <- getLowFreq(DB_orig, "Ssa419UoS", 3)
if(!is.null(tab)){
  samp <- as.character(tab$Sample)
  idx <- which(colnames(tab) == "Sample")
  tab <- cbind(samp, tab[,-idx])
  rownames(tab) <- NULL
  pander::pandoc.table(tab)
}

samp	DFrow	Reading	Gel
SXM101104	54637	299.2	all
RBW053501	49559	299.3	all
RBW102506	55359	299.3	all
RBW040504	48976	396.9	all

RBW040504 = Peak 396.94 legit.
RBW053501 = Peak 299.29 legit.
RBW102506 = Peak 299.32 legit.
SXM101104 = Peak 299.24 legit.

DONE!

Ssa416UoS

Calculate bin statistics for Ssa416UoS

dat <- BinStats(DB_orig, "Ssa416UoS")
dat2 <- apply(dat[,-1], 2, function(x){
  return(round(as.numeric(unlist(x)), 4))
})
tab <- as.data.frame(cbind(rownames(dat), dat2))
colnames(tab) <- colnames(dat)
pander::pandoc.table(tab)

Bins	N	Min	Max	Range	Sd	MEAN	MEDIAN
113	63	112.59	113.34	0.75	0.166	112.97	113.01
122	1136	121.81	123.19	1.38	0.177	122.31	122.32
131	503	130.5	131.79	1.29	0.189	131	131.01
140	5515	139.48	140.47	0.99	0.198	140.12	140.1

Generate cumulative plot for Ssa416UoS

res <- allCum(DB_orig, "Ssa416UoS", limit = 0.4)
print(res$plt)

Extract Samples with alleles that occur in fewer than three individuals

tab <- getLowFreq(DB_orig, "Ssa416UoS", 3, limit = 0.4)
if(!is.null(tab)){
  samp <- as.character(tab$Sample)
  idx <- which(colnames(tab) == "Sample")
  tab <- cbind(samp, tab[,-idx])
  rownames(tab) <- NULL
  pander::pandoc.table(tab)
}

samp	DFrow	Reading	Gel
RBW043901	58524	123.2	all
RBW043901	58525	131.8	all

RBW043901 = Both alleles were the result of an incorrect size standard peak. Corrected in Main_DB_new.txt

res <- allCum(DB_new, "Ssa416UoS")
print(res$plt)

DONE!

Sssp2201

Calculate bin statistics for Sssp2201

dat <- BinStats(DB_orig, "Sssp2201")
dat2 <- apply(dat[,-1], 2, function(x){
  return(round(as.numeric(unlist(x)), 4))
})
tab <- as.data.frame(cbind(rownames(dat), dat2))
colnames(tab) <- colnames(dat)
pander::pandoc.table(tab)

Bins	N	Min	Max	Range	Sd	MEAN	MEDIAN
181	1	180.57	180.57	0	NA	180.57	180.57
182	10	181.29	181.81	0.52	0.133	181.6	181.61
186	9	185.29	185.88	0.59	0.241	185.62	185.66
194	58	193.34	193.94	0.6	0.158	193.69	193.69
198	102	197.39	198.04	0.65	0.17	197.78	197.81
202	359	201.42	202.06	0.64	0.14	201.79	201.81
206	295	205.34	206.07	0.73	0.156	205.82	205.85
209	1	208.54	208.54	0	NA	208.54	208.54
210	390	209.41	210.1	0.69	0.154	209.81	209.83
214	148	213.41	214.09	0.68	0.173	213.82	213.91
218	151	217.54	218.12	0.58	0.129	217.87	217.9
222	90	221.07	222.16	1.09	0.232	221.87	221.91
225	3	224.68	224.76	0.08	0.04	224.72	224.71
226	310	224.92	226.19	1.27	0.182	225.92	225.96
229	54	228.62	229.08	0.46	0.115	228.94	228.95
230	158	229.09	230.21	1.12	0.248	229.81	229.88
233	511	232.59	233.78	1.19	0.209	233.01	232.97
234	169	233.79	234.21	0.42	0.09	233.93	233.91
237	1035	236.53	237.86	1.33	0.199	236.97	236.96
238	106	237.87	238.14	0.27	0.079	237.97	237.97
241	691	240.5	241.84	1.34	0.224	241.01	241.01
242	152	241.85	242.19	0.34	0.069	241.97	241.97
245	686	244.61	246.07	1.46	0.224	245.15	245.15
246	22	246.09	246.32	0.23	0.07	246.18	246.16
249	627	248.62	250.09	1.47	0.21	249.16	249.16
250	10	250.18	250.35	0.17	0.084	250.26	250.22
252	3	251.95	251.99	0.04	0.021	251.97	251.98
253	961	252.24	254.19	1.95	0.2	253.17	253.2
254	7	254.23	254.37	0.14	0.054	254.32	254.35
257	241	256.11	257.42	1.31	0.214	257.14	257.17
261	201	260.17	261.33	1.16	0.15	261.1	261.11
265	124	264.73	265.35	0.62	0.142	265.1	265.12
269	40	268.79	269.79	1	0.203	269.12	269.09
270	8	270.04	270.26	0.22	0.068	270.17	270.18
273	187	272.74	274.13	1.39	0.138	273.16	273.17
277	145	276.64	277.42	0.78	0.177	277.12	277.15
279	1	278.6	278.6	0	NA	278.6	278.6
281	114	280.76	281.39	0.63	0.121	281.19	281.19
282	1	282.46	282.46	0	NA	282.46	282.46
285	197	284.72	286.1	1.38	0.156	285.14	285.15
286	17	286.13	286.57	0.44	0.143	286.41	286.46
289	323	288.65	289.37	0.72	0.138	289.11	289.12
290	61	290.06	290.48	0.42	0.08	290.36	290.38
293	168	292.66	293.93	1.27	0.149	293.06	293.08
294	24	293.99	294.6	0.61	0.183	294.4	294.47
297	123	296.58	297.32	0.74	0.164	297.07	297.09
298	42	297.93	298.58	0.65	0.177	298.3	298.35
301	61	300.64	301.31	0.67	0.194	300.98	301.03
302	13	302.01	302.51	0.5	0.182	302.22	302.26
305	17	304.75	305.19	0.44	0.114	304.98	304.99
306	89	305.88	306.52	0.64	0.126	306.31	306.33
309	22	308.89	309.18	0.29	0.085	309.01	308.99
310	87	309.74	310.47	0.73	0.166	310.25	310.29
313	2	313	313.01	0.01	0.007	313	313
314	14	313.74	314.66	0.92	0.268	314.16	314.23
317	27	316.63	317.45	0.82	0.285	316.91	316.77
318	29	317.67	318.54	0.87	0.286	318.14	318.03
319	21	318.64	319.52	0.88	0.186	318.89	318.91
320	11	319.82	320.6	0.78	0.208	320	319.91
321	5	320.82	321.07	0.25	0.096	320.98	321.01
322	34	321.16	322.6	1.44	0.358	322.06	322.01
324	8	323.83	324.14	0.31	0.099	323.95	323.96
326	67	324.8	326.13	1.33	0.265	325.8	325.87
328	18	327.55	328.17	0.62	0.175	327.92	327.97
329	25	328.25	329.06	0.81	0.167	328.9	328.96
330	90	329.09	330.15	1.06	0.242	329.82	329.89
331	13	330.82	331.48	0.66	0.194	331.06	331.03
333	42	331.95	333.13	1.18	0.214	332.86	332.93
334	65	333.16	334.18	1.02	0.19	333.87	333.91
335	39	334.76	335.92	1.16	0.312	335.31	335.22
336	11	335.97	336.08	0.11	0.035	336.01	336.02
337	61	336.6	337.23	0.63	0.169	336.95	336.98
338	111	337.55	338.51	0.96	0.132	338	338.02
339	27	339.04	340	0.96	0.322	339.42	339.33
341	32	340.47	341.09	0.62	0.16	340.84	340.88
342	136	341.13	342.71	1.58	0.283	341.89	341.94
343	19	342.93	343.63	0.7	0.181	343.16	343.11
344	6	343.85	343.95	0.1	0.042	343.91	343.91
345	141	344.06	345.64	1.58	0.239	345	344.98
346	99	345.72	346.7	0.98	0.198	346.04	346
347	69	346.74	347.92	1.18	0.258	347.14	347.1
348	1	348.02	348.02	0	NA	348.02	348.02
349	75	348.11	349.24	1.13	0.305	348.88	348.97
350	95	349.26	350.72	1.46	0.225	349.9	349.92
351	2	351.11	351.44	0.33	0.233	351.27	351.27
353	6	352.55	352.79	0.24	0.084	352.68	352.68
354	87	353.01	354.78	1.77	0.327	353.97	354.01
355	21	354.94	355.42	0.48	0.145	355.22	355.23
357	1	356.82	356.82	0	NA	356.82	356.82
358	119	357.11	358.85	1.74	0.25	358	358.01
359	14	359.08	359.49	0.41	0.138	359.4	359.47
361	16	360.47	361.08	0.61	0.22	360.83	360.81
362	103	361.14	362.78	1.64	0.351	361.91	361.94
363	2	363	363.31	0.31	0.219	363.15	363.15
365	3	364.79	364.97	0.18	0.095	364.9	364.93
366	9	365.36	366.01	0.65	0.207	365.87	365.96
367	6	366.69	367.4	0.71	0.33	367	366.93
370	6	369.63	369.95	0.32	0.121	369.87	369.9
371	9	370.82	371.41	0.59	0.219	371.23	371.31
374	4	373.59	374	0.41	0.192	373.85	373.9
375	1	375.36	375.36	0	NA	375.36	375.36
378	17	377.51	378.22	0.71	0.222	377.97	378.01
382	5	381.78	382.36	0.58	0.234	382.01	381.93
383	12	382.41	383.23	0.82	0.295	383	383.13
386	58	385.15	386.92	1.77	0.332	386.1	386.15
390	6	390.19	390.45	0.26	0.099	390.37	390.4
398	2	398.12	398.35	0.23	0.163	398.24	398.24

Generate cumulative plot for Sssp2201

res <- allCum(DB_orig, "Sssp2201")
print(res$plt)

Identify problems

There is a possibly problematic sample ~ 180.5bp:

  res <- allCum(DB_orig, "Sssp2201", ymin = 170, ymax = 200)
  print(res$plt)

Identify the sample:

  DB_orig %>%
    filter(Marker == "Sssp2201") %>%
    filter(Fragment >= 180 & Fragment <= 181)

    Marker    Sample Fragment       Date Plate
1 Sssp2201 BLD102301   180.57 2015-03-07   all

BLD102301 = Peak is an artifact. Edited in “Main_DB_new.txt”.

Fragments begin to exhibit 1bp jumps from 220bp onwards. A binning limit of 0.3 seems to allow differentiation of these fragments:
```
  res <- allCum(DB_new, "Sssp2201", limit = 0.3, ymin = 215, 
                ymax = 230)
  print(res$plt)
```
The fragments between 236 - 251bp require a binning limit of 0.2 to allow the algorithm to differentiate alleles.
```
  res <- allCum(DB_new, "Sssp2201", limit = 0.2, ymin = 232,
                ymax = 251)
  print(res$plt)
```

There is a problematic group of fragments between 251.7 - 254.8:

  res <- allCum(DB_new, "Sssp2201", limit = 0.2, ymin = 245,
                ymax = 260)
  print(res$plt)

Initially, identify the bottom samples between 251 and 252.6:

  # Smaller fragments
  DB_new %>%
    filter(Marker == "Sssp2201") %>%
    filter(Fragment >= 251 & Fragment <= 252.2)

    Marker   Sample Fragment       Date Plate
1 Sssp2201 SXM13048   251.98 2015-03-07   all
2 Sssp2201 SXM13086   251.99 2015-03-07   all
3 Sssp2201 SXM13131   251.95 2015-03-07   all

  # Larger group of fragments
  lrg <- DB_new %>%
    filter(Marker == "Sssp2201") %>%
    filter(Fragment >= 252.21 & Fragment <= 252.6)
  # Return 3 samples
  lrg[1:3,]

    Marker    Sample Fragment       Date Plate
1 Sssp2201 KEL050808   252.32 2015-03-07   all
2 Sssp2201 CLG040219   252.27 2015-03-07   all
3 Sssp2201 RMN103104   252.32 2015-03-07   all

All fragments are legit. The break between these fragments is down to a consistent downward shift of 0.4bp in adult river samples and samples from the lake. These samples were some of the last to be screened and this minor difference is potentially due to a technical change, since the array on the ABI was changed for these plates. The current binning pattern will be retained.

Another group of these technical shift fragment appear ~ 269.8bp:

  res <- allCum(DB_new, "Sssp2201", limit = 0.2, ymin = 263,
                ymax = 273)
  print(res$plt)

Identify the smaller fragments, and check if they were screened within Plate 59+. If so the current binning pattern is appropriate.

  DB_new %>%
    filter(Marker == "Sssp2201") %>%
    filter(Fragment >= 269.5 & Fragment <= 269.9)

    Marker   Sample Fragment       Date Plate
1 Sssp2201 SXM13043   269.79 2015-03-07   all
2 Sssp2201 SXM13064   269.74 2015-03-07   all
3 Sssp2201 SXM13106   269.69 2015-03-07   all

All three samples are from either Plate 59 or Plate 60.

A binning limit of 0.2 is too low for a bin ~ 294bp. Setting the bin limit to 0.45 between 280 and 312bp overcomes this issue:
```
  res <- allCum(DB_new, "Sssp2201", limit = 0.45, ymin = 290,
                ymax = 312)
  print(res$plt)
```

There is a sole fragment at the top of the bin @ 314bp:

  res <- allCum(DB_new, "Sssp2201", limit = 0.45, ymin = 308,
                ymax = 315)
  print(res$plt)

Identify the fragment, as well as the two small frgments @ 313bp:

  # Small samples first
  DB_new %>%
    filter(Marker == "Sssp2201") %>%
    filter(Fragment >= 312.7 & Fragment <= 313.2)

    Marker    Sample Fragment       Date Plate
1 Sssp2201 BLD100602   313.01 2015-03-07   all
2 Sssp2201 BLD100802   313.00 2015-03-07   all

  # Single large fragment
  DB_new %>%
    filter(Marker == "Sssp2201") %>%
    filter(Fragment >= 314.5 & Fragment <= 314.7)

    Marker    Sample Fragment       Date Plate
1 Sssp2201 CLG051401   314.66 2015-03-07   all

All fragments are legit.

There are a number of problematic fragments between 316 and 323 bp:

  res <- allCum(DB_new, "Sssp2201", limit = 0.45, ymin = 316,
                ymax = 323)
  print(res$plt)

Identify the fragments between 316.8 and 317.5:

  DB_orig %>%
  filter(Marker == "Sssp2201") %>%
  filter(Fragment >= 316.8 & Fragment <= 317.5)

     Marker    Sample Fragment       Date Plate
1  Sssp2201 RBW041803   316.81 2015-03-07   all
2  Sssp2201 RBW041804   316.81 2015-03-07   all
3  Sssp2201 RBW041806   316.81 2015-03-07   all
4  Sssp2201 RBW051805   316.81 2015-03-07   all
5  Sssp2201 CLM051601   316.81 2015-03-07   all
6  Sssp2201 KLW050110   316.81 2015-03-07   all
7  Sssp2201 BLD101401   317.09 2015-03-07   all
8  Sssp2201  SXM13030   317.40 2015-03-07   all
9  Sssp2201  SXM13061   317.40 2015-03-07   all
10 Sssp2201  SXM13115   317.35 2015-03-07   all
11 Sssp2201  SXM13185   317.44 2015-03-07   all
12 Sssp2201  LND14039   317.45 2015-03-07   all
13 Sssp2201  LND14071   317.45 2015-03-07   all

All samples between 317.3 - 317.5 are from plates > 59. These samples all exhibit the downward shift mentioned above. These fragments will have 0.35 added to them to allow them to be binned into the bin ~ 318 bp

  # Read the database
  DB <- read.delim("Main_DB_new.txt", header = TRUE)
  # Subset Sssp2201 fragments
  DB_loc <- DB %>% filter(Marker == "Sssp2201")
  # replace values
  DB_loc <- DB_loc %>%
    mutate(Size.1 = ifelse(Size.1 >= 317.3 & Size.1 <= 317.5, 
                           Size.1 + 0.35, Size.1))
  DB_loc <- DB_loc %>%
    mutate(Size.2 = ifelse(Size.2 >= 317.3 & Size.2 <= 317.5, 
                           Size.2 + 0.35, Size.2))
  # Add replaced values to main DB
  DB[DB$Marker == "Sssp2201",] <- DB_loc
  # Write the database (old db is backed up as "Main_DB_new.txt.bkp1")
  write.table(DB, file = "Main_DB_new.txt", append = F, sep = "\t", 
              na = "", row.names = F, col.names = T, quote = F)
  DB_new <- fastReadFrag("Main_DB_new.txt", as.character(Sys.Date()), "all")
  saveRDS(DB_new, "Main_DB_new.rds")

There is a problem bin between 318.2 - 319.1 bp

  res <- allCum(DB_new, "Sssp2201", limit = 0.45, ymin = 317.5, 
                ymax = 319.5)
  print(res$plt)

Identify 5 samples from the group of fragments between 318.3 - 318.6:

  temp <- DB_new %>%
    filter(Marker == "Sssp2201") %>%
    filter(Fragment >= 318.3 & Fragment <= 318.6)
  temp[sample(1:nrow(temp), 5, replace = FALSE),]

     Marker    Sample Fragment       Date Plate
1  Sssp2201 RBW043603   318.52 2015-01-29   all
11 Sssp2201  SXM13114   318.47 2015-01-29   all
8  Sssp2201  SXM13087   318.42 2015-01-29   all
7  Sssp2201 DGR100303   318.54 2015-01-29   all
13 Sssp2201  SXM13168   318.47 2015-01-29   all

Identify 5 samples from the group between 318.61 - 319.5:

  temp <- DB_new %>%
    filter(Marker == "Sssp2201") %>%
    filter(Fragment >= 318.61 & Fragment <= 319.5)
  temp[sample(1:nrow(temp), 5, replace = FALSE),]

     Marker    Sample Fragment       Date Plate
7  Sssp2201 SXM100705   318.73 2015-01-29   all
9  Sssp2201 SXM100925   318.91 2015-01-29   all
20 Sssp2201 GRB100207   318.92 2015-01-29   all
18 Sssp2201 FML100402   319.01 2015-01-29   all
2  Sssp2201 CLG102406   318.78 2015-01-29   all

Upon closer inspection of samples in this region, the differences are not clear enough to warrent splitting the bin. A bin limit of 0.55 between 312.5 and 319.5 allows these samples to be binned into the same allele.

  res <- allCum(DB_new, "Sssp2201", ymin = 310, ymax = 320,
                limit = list(c(220, 0.8), c(293, 0.3),
                             c(312, 0.45), c(319.5, 0.55),
                             c(450, 0.45)))
  print(res$plt)

There is a single fragment at the lower end of the bin @ 320bp:

  res <- allCum(DB_orig, "Sssp2201", ymin = 318, ymax = 321,
                limit = list(c(220, 0.8), c(293, 0.3),
                             c(312.5, 0.45), c(319.5, 0.55),
                             c(450, 0.45)))
  print(res$plt)

Identify the fragment:

  DB_orig %>%
    filter(Marker == "Sssp2201") %>%
    filter(Fragment >= 319.3 & Fragment <= 319.6)

    Marker   Sample Fragment       Date Plate
1 Sssp2201 RMN13127   319.52 2015-03-07   all

This fragment belongs to plate 62, one of the plates with the downward shift. The fragment will have 0.35 manually added onto it.

There is another group of problem fragments between 320.2 - 322.8bp:

  res <- allCum(DB_new, "Sssp2201", ymin = 319.2, ymax = 325,
                limit = list(c(220, 0.8), c(293, 0.3),
                             c(312.5, 0.45), c(325, 0.55),
                             c(450, 0.45)))
print(res$plt)

Identify the points between 320.5 - 320.9:

  DB_orig %>%
    filter(Marker == "Sssp2201") %>%
    filter(Fragment >= 320.5 & Fragment <= 320.9)

    Marker    Sample Fragment       Date Plate
1 Sssp2201 CLG041203   320.82 2015-03-07   all
2 Sssp2201  LND14042   320.60 2015-03-07   all

Sample LND14042 is from plate 67. 0.35 will be added to the fragment manually.
Identify and check the fragments between 321.3 - 321.6:

  DB_orig %>%
    filter(Marker == "Sssp2201") %>%
    filter(Fragment >= 321.3 & Fragment <= 321.6)

    Marker   Sample Fragment       Date Plate
1 Sssp2201 RMN13093   321.43 2015-03-07   all
2 Sssp2201 RMN13147   321.49 2015-03-07   all

Fragemnts are from plates 62 and 63. 0.35 will be added manually.

There is a problem bin ~ 328bp:

  res <- allCum(DB_orig, "Sssp2201", ymin = 327, ymax = 328.5,
                limit = 0.55)
  print(res$plt)

Identify the lower four points:

  DB_orig %>%
    filter(Marker == "Sssp2201") %>%
    filter(Fragment >= 327 & Fragment <= 327.75)

    Marker    Sample Fragment       Date Plate
1 Sssp2201 LTB050204   327.71 2015-03-07   all
2 Sssp2201 LTB050209   327.55 2015-03-07   all
3 Sssp2201 LTB050210   327.61 2015-03-07   all
4 Sssp2201 RSB050101   327.68 2015-03-07   all

All four samples are from Plate 25. This sample was rerun on 14/04/14, due to a failed size standard in the original run. Plate 25 samples show the same downward shift in samples as the plates 59+. I need to check the dates, but these plates all seem to have been run following the replacement of the ABI capillary array. 0.35 will be manually added to these samples.

There seems to be another group of samples that exhibit a small downward shift @ 328.7bp.

  res <- allCum(DB_new, "Sssp2201", ymin = 328, ymax = 330, limit = 0.55)
  print(res$plt)

Identify the three small fragments:

  DB_orig %>%
    filter(Marker == "Sssp2201") %>%
    filter(Fragment >= 328.6 & Fragment <= 328.8)

    Marker   Sample Fragment       Date Plate
1 Sssp2201 SXM13110   328.69 2015-03-07   all
2 Sssp2201 SXM13151   328.73 2015-03-07   all
3 Sssp2201 SXM13174   328.72 2015-03-07   all

All three samples are from plate 60. 0.35 will be manually added to the fragments.

There are a group of appearently large fragments ~ 329.4bp and a group of appearently small fragments ~ 329.6bp. Identify four from each.

  # lower group
  temp <-  DB_orig %>%
    filter(Marker == "Sssp2201") %>%
    filter(Fragment >= 329.3 & Fragment <= 329.4)
  temp[sample(1:nrow(temp), 4, replace = FALSE),]

    Marker    Sample Fragment       Date Plate
2 Sssp2201 RBW100108   329.35 2015-03-07   all
1 Sssp2201 RBW051701   329.37 2015-03-07   all
5 Sssp2201 BRD040820   329.33 2015-03-07   all
3 Sssp2201 RBW100221   329.31 2015-03-07   all

  # upper group
  temp <-  DB_orig %>%
    filter(Marker == "Sssp2201") %>%
    filter(Fragment >= 329.5 & Fragment <= 329.7)
  temp[sample(1:nrow(temp), 4, replace = FALSE),]

    Marker    Sample Fragment       Date Plate
5 Sssp2201  RMN13131   329.57 2015-03-07   all
3 Sssp2201 UBN140017   329.52 2015-03-07   all
4 Sssp2201 UBN140076   329.61 2015-03-07   all
1 Sssp2201 LGY050122   329.67 2015-03-07   all

All samples from the group of fragments between 329.5 - 329.7 are from either plate 25 or plate 59+. 0.35 will be added to each fragment.

Binning limit between 327 - 330.6? should be set to 0.4 to allow accurate binning of fragments within this range
```
  res <- allCum(DB_orig, "Sssp2201", limit = 0.4, ymin = 327, ymax = 330.5)
  print(res$plt)
```

There are some problem samples between 330.6 - 333:

  res <- allCum(DB_orig, "Sssp2201", limit = 0.4, ymin = 330.6, ymax = 333)
  print(res$plt)

Identify the two major outliers

  DB_orig %>%
    filter(Marker == "Sssp2201") %>%
    filter(Fragment >= 331.9 & Fragment <= 332.4)

    Marker    Sample Fragment       Date Plate
1 Sssp2201 KEL051709   331.95 2015-03-07   all
2 Sssp2201 CGN100201   332.24 2015-03-07   all

KEL051709 = Fragment is legit.
CGN100201 = Fragment is legit.
By keeping the bin limit to 0.4, these two fragments can be binned into the same allele.

There are some downward shift samples @ 337.5bp:

  res <- allCum(DB_orig, "Sssp2201", limit = 0.4, ymin = 336, ymax = 339)
  print(res$plt)

Identify the points:

  DB_orig %>%
    filter(Marker == "Sssp2201") %>%
    filter(Fragment >= 337.5 & Fragment <= 337.7)

    Marker   Sample Fragment       Date Plate
1 Sssp2201 SXM13008   337.60 2015-03-07   all
2 Sssp2201 RMN13115   337.55 2015-03-07   all

Both samples come from plates 59+. 0.35 will be added to the fragments.

There is a large fragment ~ 338.5:

Identify the point:

  DB_orig %>%
    filter(Marker == "Sssp2201") %>%
    filter(Fragment >= 338.4 & Fragment <= 338.6)

    Marker    Sample Fragment       Date Plate
1 Sssp2201 PRB040105   338.51 2015-03-07   all

Peak is legit, but will be binned into the allele below it.

There is a problem fragment @ 339.6bp:

Identify the fragment:

  DB_orig %>%
    filter(Marker == "Sssp2201") %>%
    filter(Fragment >= 339.55 & Fragment <= 339.7)

    Marker   Sample Fragment       Date Plate
1 Sssp2201 RMN13107   339.63 2015-03-07   all

This sample is from plates 59+. 0.35 will be added to it.

There are a group of samples with the downward shift around 340.5bp.

Identify the samples:

  DB_orig %>%
     filter(Marker == "Sssp2201") %>%
     filter(Fragment >= 340.4 & Fragment <= 340.65)

    Marker   Sample Fragment       Date Plate
1 Sssp2201 SXM13158   340.57 2015-03-07   all
2 Sssp2201 RMN13090   340.58 2015-03-07   all
3 Sssp2201 RMN13125   340.47 2015-03-07   all
4 Sssp2201 RMN13143   340.58 2015-03-07   all
5 Sssp2201 RMN13213   340.59 2015-03-07   all
6 Sssp2201 LND14085   340.57 2015-03-07   all

All samples are from plates 59 +. 0.35 will be added to each.

There is an outlier sample at the top of the bin @

  res <- allCum(DB_new, "Sssp2201", limit = 0.4, ymin = 339, ymax = 342)
  print(res$plt)

Identify the point:

  DB_orig %>%
     filter(Marker == "Sssp2201") %>%
     filter(Fragment >= 341.22 & Fragment <= 341.38)

    Marker    Sample Fragment       Date Plate
1 Sssp2201 UBN140018   341.29 2015-03-07   all

Sample is very weak. Genotype deleted.

Check of any of the samples at the bottom end of the bin ~ 341.7 are from downward shift plates:

  DB_orig %>%
     filter(Marker == "Sssp2201") %>%
     filter(Fragment >= 341.4 & Fragment <= 341.64)

    Marker    Sample Fragment       Date Plate
1 Sssp2201 UBN140025   341.44 2015-03-07   all
2 Sssp2201 UBN140037   341.46 2015-03-07   all
3 Sssp2201 UBN140131   341.43 2015-03-07   all
4 Sssp2201 UBN140040   341.56 2015-03-07   all
5 Sssp2201  SXM13103   341.53 2015-03-07   all
6 Sssp2201  SXM13139   341.57 2015-03-07   all
7 Sssp2201  BLD13021   341.55 2015-03-07   all
8 Sssp2201  BLD13034   341.46 2015-03-07   all
9 Sssp2201  LND14107   341.58 2015-03-07   all

All samples are from plates 59 +. 0.35 will be added to fragments manually.

There is a small fragment at the bottom of the bin ~ 343bp:

  res <- allCum(DB_orig, "Sssp2201", limit = 0.4, ymin = 342, ymax = 345)
  print(res$plt)

Identify the point:

  DB_orig %>%
     filter(Marker == "Sssp2201") %>%
     filter(Fragment >= 342.6 & Fragment <= 342.8)

    Marker   Sample Fragment       Date Plate
1 Sssp2201 RMN13069   342.71 2015-03-07   all

Sample is from plate 59+. 0.35 added.

There are also small fragments at the lower end of the next bin up.

Identify the fragments:

  DB_orig %>%
     filter(Marker == "Sssp2201") %>%
     filter(Fragment >= 343.5 & Fragment <= 343.7)

    Marker   Sample Fragment       Date Plate
1 Sssp2201 SXM13022   343.63 2015-03-07   all
2 Sssp2201 LND14017   343.56 2015-03-07   all

Both sample are from plates 59+. 0.35 added.

There is a group of downward shift fragments ~ 344.6bp. All samples are from plates 59 +. Fragments are being binned appropriatly, so no manipulations will be made.

There are a group of downward shift fragment at the bottom of the bin ~ 346bp.

  res <- allCum(DB_orig, "Sssp2201", limit = 0.4, ymin = 344, ymax = 348)
  print(res$plt)

Identify the points:

  DB_orig %>%
     filter(Marker == "Sssp2201") %>%
     filter(Fragment >= 345.4 & Fragment <= 345.69)

    Marker    Sample Fragment       Date Plate
1 Sssp2201 RSB050105   345.50 2015-03-07   all
2 Sssp2201  SXM13081   345.53 2015-03-07   all
3 Sssp2201  SXM13157   345.64 2015-03-07   all
4 Sssp2201  RMN13221   345.62 2015-03-07   all
5 Sssp2201  BLD13042   345.51 2015-03-07   all
6 Sssp2201  BLD13046   345.49 2015-03-07   all
7 Sssp2201  BLD13085   345.47 2015-03-07   all
8 Sssp2201  MOY13018   345.59 2015-03-07   all

All samples are from plate 25 or plates 59+. 0.35 will be added to fragments.

There is also a larger fragment associated with the bin ~ 346.

Identify the fragment:

  DB_orig %>%
     filter(Marker == "Sssp2201") %>%
     filter(Fragment >= 346.3 & Fragment <= 346.4)

    Marker    Sample Fragment       Date Plate
1 Sssp2201 KEL051504   346.37 2015-03-07   all

Peak is legit. Will be binned with fragments below it.

There are a group of downward shift fragments at the bin ~ 347. All samples are from plate 25 or plates 59+. Because fragments are being binned correctly, not manipulations are required.
Bin limit should be dropped to 0.35 from 0.4 for fragments above 347.5.

There are three sole fragments between 350.5 - 351.8

  res <- allCum(DB_orig, "Sssp2201", limit = 0.35, ymin = 350, ymax = 355)
  print(res$plt)

Identify the points:

  DB_orig %>%
     filter(Marker == "Sssp2201") %>%
     filter(Fragment >= 350.5 & Fragment <= 351.5)

    Marker    Sample Fragment       Date Plate
1 Sssp2201 KEL040914   351.11 2015-03-07   all
2 Sssp2201 LSN101405   351.44 2015-03-07   all
3 Sssp2201  RMN13068   350.72 2015-03-07   all

All fragments are legit. RMN13068 will have 0.35 added to it and all fragments will be binned into the same allele.

There are some downward shift fragments in the next two bins, but they do not affect binning, so no manupulations are required.

There are a group of fragments ~ 354.3 that are causing binning problems:

  res <- allCum(DB_orig, "Sssp2201", limit = 0.35, ymin = 352, ymax = 357)
  print(res$plt)

Identify the fragments:

  DB_orig %>%
     filter(Marker == "Sssp2201") %>%
     filter(Fragment >= 354.3 & Fragment <= 354.9)

    Marker    Sample Fragment       Date Plate
1 Sssp2201 RBW052603   354.60 2015-03-07   all
2 Sssp2201 RBW054005   354.51 2015-03-07   all
3 Sssp2201 ALK100301   354.52 2015-03-07   all
4 Sssp2201 BBN100401   354.42 2015-03-07   all
5 Sssp2201 SHK100503   354.48 2015-03-07   all
6 Sssp2201  BLD13075   354.60 2015-03-07   all
7 Sssp2201  LND14028   354.78 2015-03-07   all

All peaks are legit. Two samples are from plates 59+. 0.35 is added. This allows the algorithm to bin fragments appropriatly.

Check the smaller fragments at the bottom of the bin ~ 358bp:

  res <- allCum(DB_orig, "Sssp2201", limit = 0.35, ymin = 356, ymax = 360)
  print(res$plt)

Identify the small point ~ 356.8:

  DB_orig %>%
     filter(Marker == "Sssp2201") %>%
     filter(Fragment >= 356.4 & Fragment <= 356.95)

    Marker   Sample Fragment       Date Plate
1 Sssp2201 MOY13044   356.82 2015-03-07   all

The sample is from plate 59+. 0.35 added.
Identify the small fragments between 357.4 - 357.7 bp:

  DB_orig %>%
     filter(Marker == "Sssp2201") %>%
     filter(Fragment >= 357.4 & Fragment <= 357.7)

    Marker    Sample Fragment       Date Plate
1 Sssp2201 UBN140053   357.61 2015-03-07   all
2 Sssp2201 UBN140096   357.64 2015-03-07   all
3 Sssp2201  SXM13005   357.53 2015-03-07   all
4 Sssp2201  RMN13235   357.46 2015-03-07   all

All samples are from plates 59+. 0.35 added. Allows alleles to be differentiated.

There are also two small fragments ~ 358.6

Identify the samples:

  DB_orig %>%
     filter(Marker == "Sssp2201") %>%
     filter(Fragment >= 358.5 & Fragment <= 358.65)

    Marker    Sample Fragment       Date Plate
1 Sssp2201 KEL051208   358.59 2015-03-07   all
2 Sssp2201 ALG100605   358.52 2015-03-07   all

Both fragments are legit.
Identify the next group of four fragments ~ 358.8:

  DB_orig %>%
     filter(Marker == "Sssp2201") %>%
     filter(Fragment >= 358.66 & Fragment <= 358.9)

    Marker    Sample Fragment       Date Plate
1 Sssp2201 SJH100127   358.83 2015-03-07   all
2 Sssp2201 SJH100139   358.85 2015-03-07   all
3 Sssp2201 GVY100303   358.77 2015-03-07   all
4 Sssp2201 GVY101005   358.78 2015-03-07   all

All fragments are legit.
Identify the fragments between 359 - 359.2:

  DB_orig %>%
     filter(Marker == "Sssp2201") %>%
     filter(Fragment >= 359 & Fragment <= 359.2)

    Marker    Sample Fragment       Date Plate
1 Sssp2201 CMV100103   359.08 2015-03-07   all
2 Sssp2201  LND14104   359.11 2015-03-07   all

CMV100103 = Peak is legit.
LND14104 = Sample is from plates 59+. 0.35 added.
Check the next two fragments between 359.3 - 359.43

  DB_orig %>%
     filter(Marker == "Sssp2201") %>%
     filter(Fragment >= 359.3 & Fragment <= 359.43)

    Marker    Sample Fragment       Date Plate
1 Sssp2201 IOM120284   359.33 2015-03-07   all
2 Sssp2201 ALG100102   359.40 2015-03-07   all

IOM120284 = Peak is 359.4. Edited.
ALG100102 = Peak is 359.5. Edited.

Conservatively, all samples between 358.4 - 359.6 will be binned into the same allele.

Binning between 360.5 - 362.3 is incorrect:

  res <- allCum(DB_orig, "Sssp2201", limit = 0.35, ymin = 360, 
                ymax = 364)
  print(res$plt)

Identify any samples from Plate 25 or Plates 59+ and add 0.35 to them.

  DB_orig %>%
    filter(Marker == "Sssp2201") %>%
    filter(Fragment >= 360.4 & Fragment <= 361.8)

     Marker      Sample Fragment       Date Plate
1  Sssp2201   RBW051910   361.15 2015-03-07   all
2  Sssp2201   CLM050401   361.14 2015-03-07   all
3  Sssp2201   CLG051416   361.16 2015-03-07   all
4  Sssp2201   RSB050108   361.51 2015-03-07   all
5  Sssp2201   BRD052123   361.78 2015-03-07   all
6  Sssp2201   KEL040107   361.79 2015-03-07   all
7  Sssp2201   CLG041503   361.07 2015-03-07   all
8  Sssp2201   CLG041601   361.08 2015-03-07   all
9  Sssp2201   SKW040213   361.17 2015-03-07   all
10 Sssp2201   SKW040220   361.08 2015-03-07   all
11 Sssp2201   SKW040225   361.16 2015-03-07   all
12 Sssp2201   CLG102008   361.07 2015-03-07   all
13 Sssp2201   BLD100508   361.08 2015-03-07   all
14 Sssp2201   SXM102504   361.28 2015-03-07   all
15 Sssp2201   LSN100203   361.30 2015-03-07   all
16 Sssp2201   LSN100302   361.36 2015-03-07   all
17 Sssp2201   LSN100705   361.38 2015-03-07   all
18 Sssp2201   DGR100503   361.41 2015-03-07   all
19 Sssp2201   KGR100505   361.39 2015-03-07   all
20 Sssp2201   GGR100604   361.30 2015-03-07   all
21 Sssp2201 LSN10T2W203   361.25 2015-03-07   all
22 Sssp2201   CWH130017   361.79 2015-03-07   all
23 Sssp2201    SXM13047   360.76 2015-03-07   all
24 Sssp2201    SXM13063   361.77 2015-03-07   all
25 Sssp2201    SXM13182   361.69 2015-03-07   all
26 Sssp2201    RMN13079   361.52 2015-03-07   all
27 Sssp2201    RMN13084   361.64 2015-03-07   all
28 Sssp2201    RMN13106   361.62 2015-03-07   all
29 Sssp2201    RMN13119   360.77 2015-03-07   all
30 Sssp2201    RMN13138   361.50 2015-03-07   all
31 Sssp2201    RMN13140   361.60 2015-03-07   all
32 Sssp2201    BLD13007   360.84 2015-03-07   all
33 Sssp2201    BLD13008   360.57 2015-03-07   all
34 Sssp2201    BLD13021   360.68 2015-03-07   all
35 Sssp2201    BLD13037   360.66 2015-03-07   all
36 Sssp2201    BLD13040   361.53 2015-03-07   all
37 Sssp2201    BLD13049   360.48 2015-03-07   all
38 Sssp2201    BLD13065   360.93 2015-03-07   all
39 Sssp2201    BLD13077   360.47 2015-03-07   all
40 Sssp2201    RMN13242   361.50 2015-03-07   all
41 Sssp2201    LND14038   360.78 2015-03-07   all
42 Sssp2201    LND14131   361.00 2015-03-07   all

RSB050108 = Plate 25, 0.35 added.
LSN10T2W203 = Plate 56, 0.35 added.
CWH130017 = Plate 58, 0.35 added.
SXM13047 = Plate 59+. 0.35 added.
SXM13063 = Plate 59+. 0.35 added.
SXM13182 = Plate 59+. 0.35 added.
RMN13079 = Plate 59+. 0.35 added.
RMN13084 = Plate 59+. 0.35 added.
RMN13106 = Plate 59+. 0.35 added.
RMN13119 = Plate 59+. 0.35 added.
RMN13138 = Plate 59+. 0.35 added.
RMN13140 = Plate 59+. 0.35 added.
BLD13007 = Plate 59+. 0.35 added.
BLD13008 = Plate 59+. 0.35 added.
BLD13021 = Plate 59+. 0.35 added.
BLD13037 = Plate 59+. 0.35 added.
BLD13040 = Plate 59+. 0.35 added.
BLD13049 = Plate 59+. 0.35 added.
BLD13065 = Plate 59+. 0.35 added.
BLD13077 = Plate 59+. 0.35 added.
RMN13242 = Plate 59+. 0.35 added.
LND14038 = Plate 59+. 0.35 added.
LND14131 = Plate 59+. 0.35 added.
These edits allow fragments to be binned more appropriatly.

There are two large fragments between 362.9 - 363.4:

  res <- allCum(DB_orig, "Sssp2201", limit = 0.35, ymin = 362, 
                ymax = 365)
  print(res$plt)

Identify the points;

  DB_orig %>%
    filter(Marker == "Sssp2201") %>%
    filter(Fragment >= 362.9 & Fragment <= 363.9)

    Marker    Sample Fragment       Date Plate
1 Sssp2201 SXM101102   363.31 2015-03-07   all
2 Sssp2201  SXM13120   363.00 2015-03-07   all

SXM13120 = Is from plates 59 +. 0.35 added.
To allow the algorithm to split the above two samples from those below. A small value of 0.2 will be added manually.

There is a problem sample ~ 365.3 and 365.7:

  res <- allCum(DB_orig, "Sssp2201", limit = 0.35, ymin = 362, 
                ymax = 368)
  print(res$plt)

Identify the points:

  DB_orig %>%
    filter(Marker == "Sssp2201") %>%
    filter(Fragment >= 365.3 & Fragment <= 365.75)

    Marker    Sample Fragment       Date Plate
1 Sssp2201 ART050108   365.73 2015-03-07   all
2 Sssp2201 KGR100502   365.36 2015-03-07   all

ART050108 = Peak is 365.9. Edited.
KGR100502 = Peak is legit.

There is a binning problem ~ 382 - 383.5;

  res <- allCum(DB_orig, "Sssp2201", limit = 0.35, ymin = 381, 
                ymax = 390)
  print(res$plt)

Identify samples below 382.9

  DB_orig %>%
    filter(Marker == "Sssp2201") %>%
    filter(Fragment >= 381.6 & Fragment <= 382.9)

    Marker    Sample Fragment       Date Plate
1 Sssp2201 SKW050310   382.80 2015-03-07   all
2 Sssp2201 BRD052210   382.13 2015-03-07   all
3 Sssp2201 SXM101809   382.36 2015-03-07   all
4 Sssp2201 BCF100109   382.41 2015-03-07   all
5 Sssp2201 BCR100101   382.41 2015-03-07   all
6 Sssp2201  SXM13024   381.93 2015-03-07   all
7 Sssp2201  LND14036   381.78 2015-03-07   all
8 Sssp2201  LND14100   381.86 2015-03-07   all

SKW050310 = Plate 25. 0.35 added.
SXM13024 = Plates 59+. 0.35 added.
LND14036 = Plates 59+. 0.35 added.
LND14100 = Plates 59+. 0.35 added.

The final binning rules for this locus are:
- Below 220: 0.8
- Below 293: 0.3
- Below 312.5: 0.5
- Below 325: 0.55
- Below 347.5: 0.4
- Below 450: 0.35

Extract Samples with alleles that occur in fewer than three individuals

tab <- getLowFreq(DB_new, "Sssp2201", 3, 
                  limit = list(c(220, 0.8), c(293, 0.3),
                               c(312.5, 0.5), c(347.5, 0.4),
                               c(450, 0.35)))
if(!is.null(tab)){
  samp <- as.character(tab$Sample)
  idx <- which(colnames(tab) == "Sample")
  tab <- cbind(samp, tab[,-idx])
  rownames(tab) <- NULL
  pander::pandoc.table(tab)
}

samp	DFrow	Reading	Gel
SXM100949	72374	221.1	all
BYN100109	73897	221.1	all
BYN100101	73887	221.1	all
RBW102906	73311	260.2	all
RBW054102	66358	260.3	all
SXM101003	72409	274.1	all
SHK100603	73847	278.6	all
CLT102508	74217	282.5	all
BLD100802	71826	313	all
BLD100602	71811	313	all
BRD052107	69164	324.8	all
CMV100103	71466	324.8	all
HUNST1303	75388	324.9	all
KEL051709	68700	331.9	all
CGN100201	73319	332.2	all
RMN13068	75796	351.1	all
KEL040914	69592	351.1	all
LSN101405	73627	351.4	all
SXM101102	72415	363.5	all
SXM13120	75626	363.6	all
SJH100156	72738	375.4	all
SXM13108	75604	387.3	all
BLD101301	71936	398.1	all
BLD102801	72024	398.4	all

RBW040506 = Locus failed. GT deleted.
SXM100949 = Peak is legit.
BYN100109 = Peak is legit.
BYN100101 = Peak is legit.
RBW102906 = Peak is legit.
RBW054102 = Peak is legit.
SXM101003 = Peak is legit.
SHK100603 = Peak is legit.
CLT102508 = Peak is legit.
BLD100802 = Peak is legit.
BLD100602 = Peak is legit.
BRD052107 = Peak is legit.
CMV100103 = Peak is legit.
HUNST1303 = Peak is legit.
KEL051709 = Peak is legit.
CGN100201 = Peak is legit.
RMN13068 = Peak is legit.
KEL040914 = Peak is legit.
LSN101405 = Peak is legit.
SXM101102 = Peak is legit.
SXM13120 = Peak is legit.
SJH100156 = Peak is legit.
SXM13108 = Peak is from plate 59+. 0.35 added. (fixed)
BLD101301 = Peak is legit.
BLD102801 = Peak is legit.

DONE!

CA048828

Calculate bin statistics for CA048828

dat <- BinStats(DB_orig, "CA048828", limit = 0.45)
dat2 <- apply(dat[,-1], 2, function(x){
  return(round(as.numeric(unlist(x)), 4))
})
tab <- as.data.frame(cbind(rownames(dat), dat2))
colnames(tab) <- colnames(dat)
pander::pandoc.table(tab)

Bins	N	Min	Max	Range	Sd	MEAN	MEDIAN
254	198	254.19	254.63	0.44	0.088	254.45	254.46
256	170	256.09	256.57	0.48	0.111	256.34	256.36
258	109	258.05	258.52	0.47	0.102	258.33	258.34
260	167	259.9	260.36	0.46	0.115	260.16	260.18
262	897	261.78	262.34	0.56	0.118	262.11	262.13
264	1477	263.65	264.27	0.62	0.099	264.03	264.04
266	937	265.58	266.33	0.75	0.106	265.95	265.97
268	1604	267.52	268.11	0.59	0.098	267.89	267.9
270	484	269.45	270.05	0.6	0.105	269.83	269.86
271	1	270.83	270.83	0	NA	270.83	270.83
272	420	271.41	271.99	0.58	0.11	271.77	271.79
274	520	273.35	273.94	0.59	0.118	273.7	273.72
275	84	274.45	274.98	0.53	0.111	274.75	274.76
276	255	275.35	275.91	0.56	0.127	275.65	275.66
278	423	277.2	277.85	0.65	0.124	277.59	277.6
279	243	278.29	278.86	0.57	0.115	278.64	278.65
280	343	279.15	279.74	0.59	0.108	279.55	279.55
281	514	280.28	281.69	1.41	0.443	281.07	281.26
283	106	282.36	283.59	1.23	0.346	283.2	283.34
285	340	284.96	285.52	0.56	0.109	285.32	285.33
286	17	286.36	286.5	0.14	0.04	286.42	286.42
287	504	286.54	287.46	0.92	0.11	287.24	287.25
289	244	288.82	289.32	0.5	0.101	289.15	289.17
290	18	290.1	290.32	0.22	0.052	290.22	290.22
291	93	290.83	291.29	0.46	0.106	291.1	291.11
293	517	292.66	293.21	0.55	0.088	293	293.02
295	215	294.63	295.15	0.52	0.1	294.93	294.92
297	110	296.54	297.02	0.48	0.104	296.84	296.85
299	93	298.52	298.96	0.44	0.095	298.78	298.77
301	111	300.37	300.88	0.51	0.118	300.67	300.7
303	16	302.47	302.72	0.25	0.074	302.61	302.62
304	1	304.14	304.14	0	NA	304.14	304.14
305	17	304.2	304.65	0.45	0.158	304.46	304.55
306	5	306.29	306.43	0.14	0.062	306.37	306.38
308	22	308.04	308.49	0.45	0.093	308.34	308.35
310	13	310.17	310.33	0.16	0.047	310.24	310.24
312	3	312.05	312.19	0.14	0.078	312.1	312.06
314	4	313.9	314.1	0.2	0.097	314.04	314.09
316	3	316.09	316.15	0.06	0.03	316.12	316.12
318	21	317.87	318.22	0.35	0.092	318.08	318.12
332	6	331.71	331.86	0.15	0.052	331.78	331.78

Generate cumulative plot for CA048828

res <- allCum(DB_orig, "CA048828", limit = 0.45)
print(res$plt)

Identify problems

There is a single odd sample ~ 271bp:

  res <- allCum(DB_orig, "CA048828", limit = 0.45, ymin = 270,
                ymax = 275)
  print(res$plt)

Identify the point:

  DB_orig %>%
    filter(Marker == "CA048828") %>%
    filter(Fragment >= 270.6 & Fragment <= 271)

    Marker    Sample Fragment       Date Plate
1 CA048828 BLD101103   270.83 2015-03-07   all

BLD101103 = Peak is legit.

A bin limit of 0.45 does not accuratly bin fragments between 280bp and 284 bp:
```
  res <- allCum(DB_orig, "CA048828", limit = 0.45, ymin = 275,
                ymax = 285)
  print(res$plt)
```
- The algorithm is unable to differentiate the 1bp shift of these fragments. In each case, the variance of each group of fragments will be reduced around the mean of the group to allow them to be accuratly binned.
- By reducing the variance of each bin and adding a small quantity to both of the large bins, the algorithm is able to differentiate these alleles. The code below (hidden) writes these manipulations to “Main_DB_new.txt”

Test that the changes are correct within the raw data

DB_new <- fastReadFrag("Main_DB_new.txt", as.character(Sys.Date()), "all")
res <- allCum(DB_new, "CA048828", limit = 0.35, ymin = 280, ymax = 285)
print(res$plt)

A binning limit of 0.35 is sufficient between 240 - 276.5bp. A limit of 0.45 is needed between 277 - 286bp. A limit of 0.35 is needed between 286.1-303bp. A limit of 0.5 is required from 303.1 - 340bp.

Extract Samples with alleles that occur in fewer than three individuals

tab <- getLowFreq(DB_new, "CA048828", 3, limit = list(c(276.5, 0.35),
                                                      c(286, 0.45),
                                                      c(303, 0.35),
                                                      c(340, 0.5)))
if(!is.null(tab)){
  samp <- as.character(tab$Sample)
  idx <- which(colnames(tab) == "Sample")
  tab <- cbind(samp, tab[,-idx])
  rownames(tab) <- NULL
  pander::pandoc.table(tab)
}

samp	DFrow	Reading	Gel
BLD101103	83130	270.8	all
FDB040516	77910	312.1	all
FDB040514	77906	312.1	all
BLD100901	83106	312.2	all
CAW100110	83154	316.1	all
CLG040112	80946	316.1	all
SKW040203	81557	316.1	all

BLD101103 = Peak is legit.
FDB040516 = Peak is legit.
FDB040514 = Peak is legit.
BLD100901 = Peak is legit.
CAW100110 = Peak is legit.
CLG040112 = Peak is legit.
SKW040203 = Peak is legit.

DONE!

Cocl-lav-4

Calculate bin statistics for Cocl-lav-4

dat <- BinStats(DB_orig, "Cocl-lav-4")
dat2 <- apply(dat[,-1], 2, function(x){
  return(round(as.numeric(unlist(x)), 4))
})
tab <- as.data.frame(cbind(rownames(dat), dat2))
colnames(tab) <- colnames(dat)
pander::pandoc.table(tab)

Bins	N	Min	Max	Range	Sd	MEAN	MEDIAN
148	4	148.09	148.15	0.06	0.029	148.11	148.09
152	2066	151.92	152.53	0.61	0.118	152.29	152.3
154	1233	154.06	154.65	0.59	0.108	154.43	154.45
157	3679	156.21	156.81	0.6	0.112	156.56	156.58
159	1691	158.35	158.91	0.56	0.112	158.7	158.71
161	564	160.51	161.02	0.51	0.117	160.82	160.84
163	712	162.59	163.15	0.56	0.112	162.93	162.95
165	38	164.73	165.26	0.53	0.164	165.04	165.12
167	28	166.89	167.3	0.41	0.128	167.08	167.12
169	2	169.05	169.33	0.28	0.198	169.19	169.19
171	1	171.32	171.32	0	NA	171.32	171.32

Generate cumulative plot for Cocl-lav-4

res <- allCum(DB_orig, "Cocl-lav-4")
print(res$plt)

Extract Samples with alleles that occur in fewer than three individuals

tab <- getLowFreq(DB_orig, "Cocl-lav-4", 3)
if(!is.null(tab)){
  samp <- as.character(tab$Sample)
  idx <- which(colnames(tab) == "Sample")
  tab <- cbind(samp, tab[,-idx])
  rownames(tab) <- NULL
  pander::pandoc.table(tab)
}

samp	DFrow	Reading	Gel
RMN13032	97180	169.1	all
RBW043701	88535	169.3	all
BRD041104	96518	171.3	all

BRD041104 = 171.32 peak legit.
RBW043701 = 169.3 peak legit.
RMN130302 = 169 peak legit.

DONE!

OneU9ASC

Calculate bin statistics for OneU9ASC

dat <- BinStats(DB_orig, "OneU9ASC")
dat2 <- apply(dat[,-1], 2, function(x){
  return(round(as.numeric(unlist(x)), 4))
})
tab <- as.data.frame(cbind(rownames(dat), dat2))
colnames(tab) <- colnames(dat)
pander::pandoc.table(tab)

Bins	N	Min	Max	Range	Sd	MEAN	MEDIAN
186	170	185.83	186.43	0.6	0.171	186.19	186.25
193	16	192.18	192.78	0.6	0.179	192.61	192.69
195	3	194.69	194.85	0.16	0.081	194.76	194.75
197	42	196.49	197.07	0.58	0.17	196.88	196.94
199	2484	198.54	199.37	0.83	0.146	198.94	198.97
201	2045	200.67	201.28	0.61	0.142	201.04	201.09
203	3377	202.75	203.45	0.7	0.148	203.14	203.18
205	963	204.87	205.51	0.64	0.138	205.25	205.28
207	43	207.04	207.59	0.55	0.134	207.34	207.36
209	40	209.06	209.62	0.56	0.156	209.4	209.39
212	151	211.19	211.83	0.64	0.161	211.54	211.59
214	813	213.22	213.91	0.69	0.15	213.64	213.64
216	2	215.81	215.81	0	0	215.81	215.81

Generate cumulative plot for OneU9ASC

res <- allCum(DB_orig, "OneU9ASC")
print(res$plt)

Extract Samples with alleles that occur in fewer than three individuals

tab <- getLowFreq(DB_orig, "OneU9ASC", 3)
if(!is.null(tab)){
  samp <- as.character(tab$Sample)
  idx <- which(colnames(tab) == "Sample")
  tab <- cbind(samp, tab[,-idx])
  rownames(tab) <- NULL
  pander::pandoc.table(tab)
}

samp	DFrow	Reading	Gel
KEL050512	100493	194.7	all
RBW040504	98217	194.8	all
KEL050819	100719	194.8	all
RFY041609	98960	215.8	all
FDB040110	99028	215.8	all

KEL050512 = 194.69 peak legit.
RBW040504 = 194.75 peak legit.
KEL050819 = 194.85 peak legit.
FDB040110 = Weak sample, but peak is legit.
RFY041609 = Weak sample, but peak is legit.

DONE!

SsaD157

Calculate bin statistics for SsaD157

dat <- BinStats(DB_orig, "SsaD157")
dat2 <- apply(dat[,-1], 2, function(x){
  return(round(as.numeric(unlist(x)), 4))
})
tab <- as.data.frame(cbind(rownames(dat), dat2))
colnames(tab) <- colnames(dat)
pander::pandoc.table(tab)

Bins	N	Min	Max	Range	Sd	MEAN	MEDIAN
239	106	238.73	239.3	0.57	0.15	239.12	239.18
243	50	242.88	243.44	0.56	0.158	243.13	243.13
248	1	247.6	247.6	0	NA	247.6	247.6
256	8	255.21	255.62	0.41	0.147	255.48	255.55
257	1	257.16	257.16	0	NA	257.16	257.16
260	78	259.23	259.75	0.52	0.149	259.51	259.55
262	5	260.9	261.56	0.66	0.278	261.38	261.51
264	322	263.19	263.81	0.62	0.133	263.54	263.55
266	30	265.21	265.78	0.57	0.181	265.56	265.63
268	531	267.27	267.86	0.59	0.136	267.6	267.62
270	360	269.25	269.88	0.63	0.169	269.57	269.58
272	588	271.32	271.95	0.63	0.149	271.69	271.7
274	552	273.32	273.96	0.64	0.175	273.69	273.74
276	474	275.39	276.04	0.65	0.157	275.77	275.8
278	737	277.39	278.05	0.66	0.167	277.79	277.83
280	573	279.19	280.09	0.9	0.139	279.91	279.91
282	372	281.47	282.14	0.67	0.195	281.85	281.92
284	800	283.55	284.21	0.66	0.157	283.92	283.94
286	144	285.53	286.41	0.88	0.175	285.88	285.89
288	933	287.57	288.26	0.69	0.15	287.96	287.98
290	111	289.58	290.2	0.62	0.145	289.98	290.02
292	1075	291.63	292.29	0.66	0.168	291.99	292
294	43	293.63	294.24	0.61	0.189	294.01	294.07
296	827	295.65	296.32	0.67	0.169	296.03	296.05
298	66	297.66	298.27	0.61	0.161	298.02	298.08
300	325	299.71	300.34	0.63	0.145	300.06	300.09
302	22	301.49	302.28	0.79	0.224	301.97	301.9
304	143	303.68	304.36	0.68	0.191	304.07	304.09
306	96	305.7	306.33	0.63	0.138	306.02	306.03
308	76	307.76	308.39	0.63	0.211	308.11	308.13
310	338	309.72	310.39	0.67	0.176	310.08	310.1
312	35	311.84	312.41	0.57	0.169	312.13	312.16
314	217	313.8	314.45	0.65	0.171	314.16	314.18
316	62	316	316.58	0.58	0.173	316.35	316.35
318	252	317.99	318.68	0.69	0.165	318.37	318.4
321	15	320.19	320.71	0.52	0.19	320.54	320.62
323	138	322.2	322.82	0.62	0.153	322.53	322.54
327	207	326.32	326.9	0.58	0.143	326.64	326.66
331	27	330.4	330.96	0.56	0.171	330.65	330.69
335	38	334.47	334.99	0.52	0.139	334.81	334.86
339	53	338.56	339.05	0.49	0.132	338.82	338.85
343	45	342.56	343.09	0.53	0.16	342.87	342.91
347	4	346.8	347.03	0.23	0.098	346.94	346.96
351	6	350.74	351.24	0.5	0.192	351.07	351.12
355	3	354.83	355.29	0.46	0.246	355.11	355.21
359	8	358.92	359.28	0.36	0.121	358.98	358.94
363	1	363.01	363.01	0	NA	363.01	363.01

Generate cumulative plot for SsaD157

res <- allCum(DB_orig, "SsaD157")
print(res$plt)

Extract Samples with alleles that occur in fewer than three individuals

tab <- getLowFreq(DB_orig, "SsaD157", 3)
if(!is.null(tab)){
  samp <- as.character(tab$Sample)
  idx <- which(colnames(tab) == "Sample")
  tab <- cbind(samp, tab[,-idx])
  rownames(tab) <- NULL
  pander::pandoc.table(tab)
}

samp	DFrow	Reading	Gel
BYN100202	116260	247.6	all
SXM13161	117924	257.2	all
UBN140028	116896	354.8	all
RBW040504	108372	355.2	all
BBN100507	115925	355.3	all
UBN140007	116857	363	all

BYN100202 = legit.
SXM13161 = Odd samples, genotype deleted in “Main_DB_new.txt”.
UBN140028 = legit.
RBW040504 = legit.
BBN100507 = legit.
UBN140007 = legit.

DONE!

Sssp2216

Calculate bin statistics for Sssp2216

dat <- BinStats(DB_orig, "Sssp2216")
dat2 <- apply(dat[,-1], 2, function(x){
  return(round(as.numeric(unlist(x)), 4))
})
tab <- as.data.frame(cbind(rownames(dat), dat2))
colnames(tab) <- colnames(dat)
pander::pandoc.table(tab)

Bins	N	Min	Max	Range	Sd	MEAN	MEDIAN
135	284	134.29	134.84	0.55	0.105	134.59	134.59
139	144	138.21	138.72	0.51	0.104	138.51	138.52
143	3584	142.14	142.87	0.73	0.102	142.5	142.51
147	2410	146.16	146.76	0.6	0.098	146.52	146.53
151	569	150.26	150.77	0.51	0.101	150.56	150.57
155	2058	154.25	155.52	1.27	0.159	154.61	154.6
159	1245	158.32	159.62	1.3	0.367	158.85	158.68
163	317	162.26	163.6	1.34	0.191	162.69	162.66
167	13	166.44	167.34	0.9	0.224	166.64	166.59

Generate cumulative plot for Sssp2216

res <- allCum(DB_orig, "Sssp2216")
print(res$plt)

Extract samples with a 1bp shift in the upper limits of the marker

The top four bins of this marker appear to have some samples that are 1bp larger than the assigned bins in the plot above. This can be seen more clearly in the zoomed plot below.

res <- allCum(DB_orig, "Sssp2216",ymin = 154, ymax = 168)
print(res$plt)

To confirm this pattern, a subset of samples from the third largest allele (shown below) will be checked. If the pattern is confirmed the larger fragments for each of the four bins will have a suitable value added to allow the binning algorithm differentiate the alleles.

res <- allCum(DB_orig, "Sssp2216",ymin = 158, ymax = 160)
print(res$plt)

Extract 5 samples from each potential bin

frags <- DB_orig$Fragment[DB_orig$Marker == "Sssp2216"]
# extract all large frags
lrg <- DB_orig[DB_orig$Marker == "Sssp2216",][(frags >= 159 & frags <= 160),]
sml <- DB_orig[DB_orig$Marker == "Sssp2216",][(frags >= 158.3 & frags <= 158.9),]
# find heterozygotes of each fragment
hets <- sml[sml$Sample %in% lrg$Sample,]
# return 5 random hets
hets[sample(1:nrow(hets), 5, replace = FALSE),]

         Marker    Sample Fragment       Date Plate
125875 Sssp2216 SXM102507   158.79 2015-01-29   all
121346 Sssp2216 CLG051510   158.67 2015-01-29   all
129225 Sssp2216  BLD13058   158.33 2015-01-29   all
125120 Sssp2216 BLD101602   158.81 2015-01-29   all
124606 Sssp2216 CLM100502   158.56 2015-01-29   all

The screen cap below demonstrates that these bins are likely to be two seperate alleles, seperated my 1bp. All of the larger fragements from each of the four largest bins of this locus will have 0.5 added to allow the algorithm to differentiate alleles.

split_bins

# read the latest database
DB_new <- fastReadFrag("Main_DB_new.txt", as.character(Sys.Date()), "all")
# make the corrections
mrkr <- which(DB_new$Marker == "Sssp2216")
frags <- DB_new[mrkr, "Fragment"]
frags <- sapply(frags, function(x){
  if(is.na(x)){
    return(x)
  } else if(x >= 155 & x <= 155.6){
    return(x + 0.5)
  } else if(x >= 159 & x <= 160){
    return(x + 0.5)
  } else if(x >= 163 & x <= 164){
    return(x + 0.5)
  } else if(x >= 167 & x <= 168){
    return(x + 0.5)
  } else {
    return(x)
  }
})
# replace the original values
DB_new[mrkr, "Fragment"] <- frags
# replot the third last bin to check the changes
print(allCum(DB_new, "Sssp2216", ymin = 154, ymax = 156)$plt)

# check all amended alleles
print(allCum(DB_new, "Sssp2216", ymin = 154, ymax = 170)$plt)

# write the new database
saveRDS(DB_new, "Main_DB_new.rds")

Extract Samples with alleles that occur in fewer than three individuals

tab <- getLowFreq(DB_new, "Sssp2216", 3)
if(!is.null(tab)){
  samp <- as.character(tab$Sample)
  idx <- which(colnames(tab) == "Sample")
  tab <- cbind(samp, tab[,-idx])
  rownames(tab) <- NULL
  pander::pandoc.table(tab)
}

samp	DFrow	Reading	Gel
LND14152	129639	168.3	all

LND14152 = legit.

DONE!

Str2QUB

Calculate bin statistics for Str2QUB

dat <- BinStats(DB_orig, "Str2QUB", limit = 0.4)
dat2 <- apply(dat[,-1], 2, function(x){
  return(round(as.numeric(unlist(x)), 4))
})
tab <- as.data.frame(cbind(rownames(dat), dat2))
colnames(tab) <- colnames(dat)
pander::pandoc.table(tab)

Bins	N	Min	Max	Range	Sd	MEAN	MEDIAN
200	7	200.09	200.36	0.27	0.09	200.27	200.27
210	5	210.05	210.29	0.24	0.095	210.19	210.23
212	56	211.98	212.33	0.35	0.087	212.15	212.14
214	285	213.66	214.39	0.73	0.111	213.98	214
216	48	215.71	216.35	0.64	0.11	216.06	216.06
220	875	219.56	220.37	0.81	0.124	219.99	220
222	13	221.71	221.97	0.26	0.085	221.82	221.83
223	1	222.94	222.94	0	NA	222.94	222.94
224	38	223.8	224.19	0.39	0.085	223.98	223.99
226	1791	225.39	226.22	0.83	0.12	225.8	225.8
227	1	227.14	227.14	0	NA	227.14	227.14
228	275	227.59	228.55	0.96	0.112	227.89	227.89
232	534	231.37	232.15	0.78	0.107	231.81	231.82
236	377	235.43	236.11	0.68	0.112	235.76	235.76
238	22	237.33	237.77	0.44	0.116	237.58	237.56
240	357	239.32	240	0.68	0.106	239.67	239.65
242	1	241.71	241.71	0	NA	241.71	241.71
244	193	243.43	243.98	0.55	0.09	243.7	243.71
248	133	247.4	247.9	0.5	0.107	247.7	247.71
252	78	251.4	251.89	0.49	0.106	251.69	251.71
256	51	255.4	255.87	0.47	0.109	255.6	255.61
257	2	257.43	257.47	0.04	0.028	257.45	257.45
259	8	259.32	259.52	0.2	0.088	259.42	259.42
261	87	261.15	261.68	0.53	0.105	261.41	261.42
263	86	262.93	263.62	0.69	0.138	263.34	263.37
265	100	265.06	265.65	0.59	0.093	265.37	265.37
267	130	266.98	267.59	0.61	0.108	267.28	267.3
269	34	269	269.67	0.67	0.132	269.29	269.3
271	2513	270.53	271.62	1.09	0.144	271.09	271.1
273	15	273.07	273.32	0.25	0.083	273.21	273.21
275	94	274.72	275.46	0.74	0.142	275.17	275.17
279	1351	278.35	279.45	1.1	0.159	278.98	279
281	29	280.86	281.19	0.33	0.11	281.03	281.05
283	143	282.51	283.27	0.76	0.144	283.02	283.06
285	77	284.46	285.34	0.88	0.156	284.88	284.9
287	218	286.27	287.22	0.95	0.157	286.82	286.83
291	74	290.34	291.12	0.78	0.159	290.71	290.7
293	6	292.37	292.8	0.43	0.156	292.65	292.65
295	40	294.35	295.06	0.71	0.172	294.61	294.58
296	6	296.08	296.26	0.18	0.077	296.2	296.24
297	95	296.31	296.77	0.46	0.108	296.53	296.52
298	9	298.11	298.32	0.21	0.079	298.24	298.25
299	50	298.33	298.8	0.47	0.104	298.54	298.54
301	15	300.43	300.82	0.39	0.105	300.64	300.62
302	15	302.07	302.6	0.53	0.144	302.31	302.3
304	135	303.77	304.63	0.86	0.197	304.18	304.14
308	5	308.37	308.45	0.08	0.031	308.4	308.39
310	227	309.53	310.4	0.87	0.157	310.04	310.05
312	17	311.68	312.51	0.83	0.233	312.2	312.25
314	349	313.4	314.29	0.89	0.158	313.89	313.91
316	17	315.81	316.49	0.68	0.168	316.26	316.33
320	45	319.82	320.56	0.74	0.196	320.2	320.19
322	44	321.52	322.29	0.77	0.177	321.92	321.92
324	8	324.37	324.57	0.2	0.069	324.44	324.41
326	2	326.01	326.02	0.01	0.007	326.01	326.01
328	48	328.07	328.68	0.61	0.13	328.38	328.37
332	7	332.28	332.41	0.13	0.06	332.35	332.37
336	14	335.99	336.55	0.56	0.17	336.26	336.25
340	3	339.91	340.19	0.28	0.143	340.03	340
346	10	345.49	345.84	0.35	0.099	345.64	345.62

Generate cumulative plot for Str2QUB

res <- allCum(DB_orig, "Str2QUB", limit = 0.4)
print(res$plt)

Finding problem samples (check bins, 30bp at a time)

There is a sole peak @ ~ 228.55bp as can be seen below:

r res <- allCum(DB_orig, "Str2QUB", ymin = 227.5, ymax = 228.9, limit = 0.4) print(res$plt)

Identify the point:

r DB_orig[DB_orig$Marker == "Str2QUB",][DB_orig[DB_orig$Marker == "Str2QUB" ,"Fragment"] >= 228.4 & DB_orig[DB_orig$Marker == "Str2QUB","Fragment"] <= 228.8,]

Marker Sample Fragment Date Plate 130932 Str2QUB BGW050301 228.55 2015-03-07 all

BGW050301 = Size standard was off. Peak actually occured at 227.9bp. Corrected in “Main_DB_new.txt”.
There is a sole peak @ ~ 241.75bp as can be seen below:

r res <- allCum(DB_orig, "Str2QUB", ymin = 239, ymax = 244, limit = 0.4) print(res$plt)

- Identify the point:

r DB_orig[DB_orig$Marker == "Str2QUB",][DB_orig[DB_orig$Marker == "Str2QUB" ,"Fragment"] >= 241 & DB_orig[DB_orig$Marker == "Str2QUB","Fragment"] <= 242,]

Marker Sample Fragment Date Plate 138806 Str2QUB UBN140050 241.71 2015-03-07 all

UBN140050 = Peak is legit.
Some bins > 295bp are not consistent, (see below):

r res <- allCum(DB_orig, "Str2QUB", ymin = 295, ymax = 300, limit = 0.4) print(res$plt)

Test a larger binning limit:

r res <- allCum(DB_orig, "Str2QUB", ymin = 295, ymax = 300, limit = list(c(295.5, 0.4), c(299.5, 0.7), c(350, 0.4))) print(res$plt)

- Binning limits between 295.5 and 299.5 should be set to 0.7.

Extract Samples with alleles that occur in fewer than three individuals

tab <- getLowFreq(DB_new, "Str2QUB", 3, 
                  limit = list(c(295.5, 0.4), c(299.5, 0.7), c(350, 0.4)))
if(!is.null(tab)){
  samp <- as.character(tab$Sample)
  idx <- which(colnames(tab) == "Sample")
  tab <- cbind(samp, tab[,-idx])
  rownames(tab) <- NULL
  pander::pandoc.table(tab)
}

samp	DFrow	Reading	Gel
RBW040504	129922	222.9	all
PRB040329	135105	227.1	all
UBN140050	138807	241.7	all
BKB040119	138387	257.4	all
SHK100501	137899	257.5	all
LND14123	140862	326	all
KLM100703	137592	326	all
CLG050910	131941	339.9	all
BRD050807	131187	340	all
CLG040707	134170	340.2	all

RBW040504 = Peak is legit.
UBN140050 = Peak is legit.
BKB040119 = Peak is legit.
SHK100501 = Peak is legit.
DGB050220 = Peak is legit.
LND14123 = Peak is legit.
KLM100703 = Peak is legit.
CLG050910 = Peak is legit.
BRD050807 = Peak is legit.
CLG040707 = Peak is 340.3, manually edited in “Main_DB_new.txt”.

DONE!

Str3QUB

Calculate bin statistics for Str3QUB

dat <- BinStats(DB_orig, "Str3QUB")
dat2 <- apply(dat[,-1], 2, function(x){
  return(round(as.numeric(unlist(x)), 4))
})
tab <- as.data.frame(cbind(rownames(dat), dat2))
colnames(tab) <- colnames(dat)
pander::pandoc.table(tab)

Bins	N	Min	Max	Range	Sd	MEAN	MEDIAN
126	2	125.53	125.56	0.03	0.021	125.55	125.55
129	1866	128.81	129.61	0.8	0.188	129.27	129.29
133	3	132.79	133.37	0.58	0.319	133.16	133.31
134	1	134.1	134.1	0	NA	134.1	134.1
138	1	138.14	138.14	0	NA	138.14	138.14
141	653	140.48	141.23	0.75	0.173	140.94	140.96
145	4	144.75	145.14	0.39	0.204	144.95	144.97
153	7	152.71	153.12	0.41	0.142	152.94	152.95
157	2131	156.42	157.2	0.78	0.164	156.9	156.93
169	4894	168.3	169.03	0.73	0.15	168.75	168.77
173	2	172.69	172.7	0.01	0.007	172.69	172.69
181	29	180.2	180.74	0.54	0.148	180.56	180.57
189	1	188.71	188.71	0	NA	188.71	188.71

Generate cumulative plot for Str3QUB

res <- allCum(DB_orig, "Str3QUB")
print(res$plt)

Extract Samples with alleles that occur in fewer than three individuals

tab <- getLowFreq(DB_orig, "Str3QUB", 3)
if(!is.null(tab)){
  samp <- as.character(tab$Sample)
  idx <- which(colnames(tab) == "Sample")
  tab <- cbind(samp, tab[,-idx])
  rownames(tab) <- NULL
  pander::pandoc.table(tab)
}

samp	DFrow	Reading	Gel
KEL050512	143237	125.5	all
RBW040504	141145	125.6	all
BLD13072	150155	132.8	all
GGR100909	148066	133.3	all
GGR100905	148060	133.4	all
KEL050730	143409	134.1	all
BRD050112	142101	138.1	all
BRD050421	142155	172.7	all
BRD050415	142146	172.7	all
CWT100104	148320	188.7	all

KEL050512 = Peak is 125.4, manually edited in “Main_DB_new.txt”.
RBW040504 = Peak is 125.5, manually edited in “Main_DB_new.txt”.
BLD13072 = Peak is legit.
GGR100909 = Peak is legit.
GGR100905 = Peak is legit.
KEL050730 = Peak was an artifact. GT changed to 156.9/156.9.
BRD050112 = Peak was an artifact. GT changed to 156.9/156.9.
BRD050421 = Peak is legit.
BRD050415 = Peak is legit.
CWT100104 = Odd sample. GT deleted for Str3QUB.

DONE!

Ssa420UoS

Calculate bin statistics for Ssa420UoS

dat <- BinStats(DB_orig, "Ssa420UoS", limit = 1.0)
dat2 <- apply(dat[,-1], 2, function(x){
  return(round(as.numeric(unlist(x)), 4))
})
tab <- as.data.frame(cbind(rownames(dat), dat2))
colnames(tab) <- colnames(dat)
pander::pandoc.table(tab)

Bins	N	Min	Max	Range	Sd	MEAN	MEDIAN
181	1	180.57	180.57	0	NA	180.57	180.57
184	5	184.13	184.56	0.43	0.172	184.43	184.47
189	5	188.36	188.74	0.38	0.155	188.63	188.7
193	257	192.5	193.03	0.53	0.124	192.79	192.8
197	364	196.67	197.18	0.51	0.116	196.96	196.98
201	838	200.86	201.37	0.51	0.124	201.15	201.18
205	58	205.03	205.5	0.47	0.146	205.29	205.32
209	294	209.12	209.67	0.55	0.129	209.45	209.48
214	698	213.3	213.83	0.53	0.128	213.6	213.63
218	1282	217.47	218.03	0.56	0.123	217.78	217.79
222	541	221.67	222.16	0.49	0.118	221.96	221.99
226	485	225.8	226.6	0.8	0.125	226.09	226.11
228	15	228.14	228.37	0.23	0.065	228.24	228.22
230	370	229.9	230.44	0.54	0.118	230.2	230.21
234	539	234.03	234.53	0.5	0.112	234.32	234.32
238	651	238.18	238.67	0.49	0.109	238.46	238.47
243	691	242.38	242.87	0.49	0.113	242.68	242.7
247	852	246.61	247.28	0.67	0.12	246.9	246.92
251	979	250.83	251.31	0.48	0.116	251.09	251.09
255	354	254.92	255.41	0.49	0.119	255.19	255.2
259	247	259.02	259.5	0.48	0.126	259.28	259.3
263	153	263.14	263.57	0.43	0.099	263.34	263.33
267	200	267.21	267.72	0.51	0.107	267.47	267.48
272	213	271.4	271.88	0.48	0.108	271.64	271.64
276	426	275.5	276.28	0.78	0.127	275.79	275.8
280	272	279.7	280.18	0.48	0.126	279.93	279.91
284	247	283.79	285.21	1.42	0.147	284.05	284.03
288	110	287.9	289.37	1.47	0.171	288.19	288.17
292	100	291.97	292.48	0.51	0.162	292.28	292.35
296	36	296.11	296.58	0.47	0.138	296.35	296.33
300	44	300.19	300.67	0.48	0.144	300.45	300.44
305	102	303.85	304.77	0.92	0.153	304.51	304.5
309	39	308.37	308.87	0.5	0.132	308.71	308.77
313	114	312.33	313.05	0.72	0.133	312.71	312.74
317	15	316.86	317.09	0.23	0.056	316.97	316.95
321	4	320.99	321.4	0.41	0.178	321.24	321.29
361	1	361.22	361.22	0	NA	361.22	361.22

Generate cumulative plot for Ssa420UoS

res <- allCum(DB_orig, "Ssa420UoS", limit = 1.0)
print(res$plt)

Problem samples

There appear to be two odd frgments between 284 and 290 bp:

  res <- allCum(DB_orig, "Ssa420UoS", ymin = 284, ymax = 290)
  print(res$plt)

Identify the two points:

  frg <- DB_orig[DB_orig$Marker == "Ssa420UoS", "Fragment"]
  DB_orig[DB_orig$Marker == "Ssa420UoS",][frg>=284.9&frg<=285.5, ]

          Marker    Sample Fragment       Date Plate
158758 Ssa420UoS LSN100801   285.21 2015-03-07   all

  DB_orig[DB_orig$Marker == "Ssa420UoS",][frg>=289&frg<=289.8, ]

          Marker    Sample Fragment       Date Plate
157123 Ssa420UoS BLD101302   289.37 2015-03-07   all

LSN100801 = Peak is legit.
BLD101302 = Peak is legit.

Another sample @ ~ 304bp may be an error.

  res <- allCum(DB_orig, "Ssa420UoS", ymin = 300, ymax = 306)
  print(res$plt)

Identify the point:

  DB_orig[DB_orig$Marker == "Ssa420UoS",][frg>=303.6&frg<=304.1, ]

          Marker    Sample Fragment       Date Plate
159747 Ssa420UoS CWH130014   303.85 2015-03-07   all

This fragment occurs very close to a One104 fragment (image below). On closer inspection, the Ssa420UoS fragment is actually 304.4. Manually edited in “Main_DB_new.txt”.

Ssa420UoS

There is a single fragment @ ~ 362bp:

  res <- allCum(DB_orig, "Ssa420UoS", ymin = 320, ymax = 370)
  print(res$plt)

Identify the point:

  DB_orig[DB_orig$Marker == "Ssa420UoS",][frg>=350&frg<=370, ]

          Marker    Sample Fragment       Date Plate
150924 Ssa420UoS RBW041418   361.22 2015-03-07   all

Unclear whether this fragment is legitimate due to overlap with One104. The genotype for this individual at _One104 and Ssa420UoS will be manually deleted in “Main_DB_new.txt”

Extract Samples with alleles that occur in fewer than three individuals

tab <- getLowFreq(DB_new, "Ssa420UoS", 3, limit = 0.8)
if(!is.null(tab)){
  samp <- as.character(tab$Sample)
  idx <- which(colnames(tab) == "Sample")
  tab <- cbind(samp, tab[,-idx])
  rownames(tab) <- NULL
  pander::pandoc.table(tab)
}

samp	DFrow	Reading	Gel
LSN100801	158752	285.2	all
BLD101302	157117	289.4	all

BLD101302 = Confirmed above.
BRD050911 = Peak is actually from Str3QUB. Both loci checked and genotypes edited manually in “Main_DB_new.txt”.
LSN100801 = Confirmed above.

DONE!

One104

Experimentally use ‘dplyr’ package for manipulation (Just for fun)

library(dplyr)

Calculate bin statistics for One104

dat <- BinStats(DB_orig, "One104")
dat2 <- apply(dat[,-1], 2, function(x){
  return(round(as.numeric(unlist(x)), 4))
})
tab <- as.data.frame(cbind(rownames(dat), dat2))
colnames(tab) <- colnames(dat)
pander::pandoc.table(tab)

Bins	N	Min	Max	Range	Sd	MEAN	MEDIAN
296	741	295.33	296.15	0.82	0.167	295.86	295.89
302	432	301.38	302.17	0.79	0.196	301.86	301.9
304	358	303.48	304.56	1.08	0.174	303.94	303.99
306	5	305.88	306.63	0.75	0.31	306.1	306
308	1141	307.48	308.7	1.22	0.178	307.98	308.01
310	14	309.7	310.31	0.61	0.151	310.18	310.23
312	853	311.6	312.69	1.09	0.177	312.02	312.03
314	692	313.71	314.45	0.74	0.194	314.1	314.18
316	210	315.77	316.45	0.68	0.156	316.15	316.15
318	1076	317.9	318.64	0.74	0.164	318.32	318.36
320	10	320	320.61	0.61	0.179	320.37	320.41
322	2763	322	322.77	0.77	0.154	322.46	322.47
325	24	324.15	324.73	0.58	0.117	324.6	324.63
327	242	326.14	326.82	0.68	0.174	326.57	326.62
328	26	328.15	328.72	0.57	0.171	328.47	328.49
331	41	330.3	330.8	0.5	0.151	330.52	330.51
333	95	332.19	332.86	0.67	0.173	332.57	332.6
335	937	334.25	334.96	0.71	0.156	334.68	334.71
337	158	336.2	336.92	0.72	0.148	336.65	336.68
339	26	338.46	338.88	0.42	0.112	338.73	338.74
341	52	340.29	340.89	0.6	0.226	340.63	340.77
345	111	344.36	345.07	0.71	0.132	344.83	344.83
347	2	346.55	346.55	0	0	346.55	346.55
349	179	348.51	349.12	0.61	0.122	348.92	348.94
353	72	352.5	353.17	0.67	0.179	352.94	353.01
357	171	356.64	357.27	0.63	0.137	357.03	357.04
359	1	359.28	359.28	0	NA	359.28	359.28
361	87	360.66	361.34	0.68	0.165	361.07	361.07
365	2	365.2	365.24	0.04	0.028	365.22	365.22
369	1	368.79	368.79	0	NA	368.79	368.79
384	1	383.5	383.5	0	NA	383.5	383.5

Generate cumulative plot for One104

res <- allCum(DB_orig, "One104")
print(res$plt)

Look for problem fragments

There are some fragments at the top of the bin around 304bp that may be a problem:

  res <- allCum(DB_orig, "One104", ymin = 303, ymax = 306)
  print(res$plt)

Extract the samples to check them.

  DB_orig %>%
    filter(Marker == "One104") %>%
    filter(Fragment >= 304.25 & Fragment <= 304.6)

  Marker    Sample Fragment       Date Plate
1 One104 BRD050422   304.46 2015-03-07   all
2 One104 CLG051408   304.56 2015-03-07   all
3 One104 SXM100106   304.28 2015-03-07   all

BRD050422 = Fragment is an artifact. Deleted.
CLG051408 = Fragment belongs to Ssa420UoS. Fixed
SXM100106 = Peak is legit.

There are some fragments between 305 and 307bp that may be a problem:

  res <- allCum(DB_orig, "One104", ymin = 305, ymax = 310)
  print(res$plt)

Extract the problem samples

  DB_orig %>%
    filter(Marker == "One104") %>%
    filter(Fragment >= 305 & Fragment <= 306.8)

  Marker    Sample Fragment       Date Plate
1 One104 CLM050920   306.00 2015-03-07   all
2 One104 CLM051211   306.12 2015-03-07   all
3 One104 CLM040204   305.89 2015-03-07   all
4 One104 CLM040305   305.88 2015-03-07   all
5 One104  RMN13239   306.63 2015-03-07   all

CLM050920 = Peak is 306.1. Fixed
CLM051211 = Peak is 306.1. Fixed
CLM040204 = Peak is 306.0. Fixed
CLM040305 = Peak is 305.9. Fixed
RMN13239 = Peak is 307.7. Fixed

There are some fragments between 308.45 and 309bp that may be a problem:

  res <- allCum(DB_orig, "One104", ymin = 308, ymax = 309)
  print(res$plt)

Extract the problem samples

  DB_orig %>%
    filter(Marker == "One104") %>%
    filter(Fragment >= 308.45 & Fragment <= 309)

  Marker    Sample Fragment       Date Plate
1 One104 OOW040402   308.65 2015-03-07   all
2 One104 FDB040215   308.70 2015-03-07   all

OOW040402 = Peak is 308.2. Fixed
FDB040215 = Peak is artifact. Deleted.

There is a potentially problematic fragment ~ 309.7

  res <- allCum(DB_orig, "One104", ymin = 308, ymax = 311)
  print(res$plt)

Extract the problem samples

  DB_orig %>%
    filter(Marker == "One104") %>%
    filter(Fragment >= 309.5 & Fragment <= 309.8)

  Marker   Sample Fragment       Date Plate
1 One104 MOY13008    309.7 2015-03-07   all

MOY13008 = Peak is 310.1. Fixed

There is a potentially problematic fragment ~ 312.7

  res <- allCum(DB_orig, "One104", ymin = 311, ymax = 314)
  print(res$plt)

Extract the problem samples

  DB_orig %>%
    filter(Marker == "One104") %>%
    filter(Fragment >= 312.6 & Fragment <= 312.8)

  Marker    Sample Fragment       Date Plate
1 One104 RBW043503   312.69 2015-03-07   all

RBW043503 = Peak as artifact, corrected.

Extract Samples with alleles that occur in fewer than three individuals

tab <- getLowFreq(DB_new, "One104", 3)
if(!is.null(tab)){
  samp <- as.character(tab$Sample)
  idx <- which(colnames(tab) == "Sample")
  tab <- cbind(samp, tab[,-idx])
  rownames(tab) <- NULL
  pander::pandoc.table(tab)
}

samp	DFrow	Reading	Gel
UBN140022	170653	346.6	all
UBN140037	170682	346.6	all
CLG052105	164477	365.2	all
GGR100507	169915	365.2	all
SXM13082	171449	369.1	all

UBN140022 = Peak is legit.
UBN140037 = Peak is legit.
RBW104006 = Weak sample. GT deleted.
CLG052105 = Peak is legit.
GGR100507 = Peak is legit.
SXM13082 = Peak is 369.1. Fixed.
KEL050512 = Genotype is not clear. Deleted.

DONE!

Ssa197

Calculate bin statistics for Ssa197

dat <- BinStats(DB_orig, "Ssa197")
dat2 <- apply(dat[,-1], 2, function(x){
  return(round(as.numeric(unlist(x)), 4))
})
tab <- as.data.frame(cbind(rownames(dat), dat2))
colnames(tab) <- colnames(dat)
pander::pandoc.table(tab)

Bins	N	Min	Max	Range	Sd	MEAN	MEDIAN
122	2	121.63	121.81	0.18	0.127	121.72	121.72
130	910	129.79	130.3	0.51	0.114	130.08	130.11
134	4823	133.77	134.42	0.65	0.109	134.12	134.14
138	1810	137.81	138.43	0.62	0.1	138.16	138.17
142	1246	141.93	142.46	0.53	0.104	142.25	142.28
146	195	146.13	146.61	0.48	0.091	146.41	146.43
150	233	150.12	150.66	0.54	0.092	150.36	150.37
155	348	154.22	154.81	0.59	0.11	154.54	154.54
159	179	158.41	158.97	0.56	0.133	158.67	158.67
163	66	162.54	163.12	0.58	0.159	162.87	162.93
167	21	166.61	166.97	0.36	0.11	166.86	166.91
171	49	170.74	171.23	0.49	0.117	170.99	170.96

Generate cumulative plot for Ssa197

res <- allCum(DB_orig, "Ssa197")
print(res$plt)

Extract Samples with alleles that occur in fewer than three individuals

tab <- getLowFreq(DB_orig, "Ssa197", 3)
if(!is.null(tab)){
  samp <- as.character(tab$Sample)
  idx <- which(colnames(tab) == "Sample")
  tab <- cbind(samp, tab[,-idx])
  rownames(tab) <- NULL
  pander::pandoc.table(tab)
}

samp	DFrow	Reading	Gel
MOY13012	182226	121.6	all
RMN040503	176122	121.8	all

MOY13012 = Peak is legit.
RMN040503 = Peak is legit.

Oki-10

Calculate bin statistics for Oki-10

dat <- BinStats(DB_orig, "Oki-10")
dat2 <- apply(dat[,-1], 2, function(x){
  return(round(as.numeric(unlist(x)), 4))
})
tab <- as.data.frame(cbind(rownames(dat), dat2))
colnames(tab) <- colnames(dat)
pander::pandoc.table(tab)

Bins	N	Min	Max	Range	Sd	MEAN	MEDIAN
214	24	213.71	214	0.29	0.063	213.89	213.91
218	74	217.87	218.34	0.47	0.087	218.08	218.09
222	78	222.06	222.33	0.27	0.059	222.22	222.23
226	288	226.16	226.48	0.32	0.064	226.34	226.34
230	804	230.22	230.62	0.4	0.068	230.45	230.45
235	730	234.32	234.73	0.41	0.071	234.55	234.56
239	341	238.48	238.85	0.37	0.069	238.67	238.67
243	319	242.69	243.61	0.92	0.076	242.87	242.87
247	322	246.87	247.21	0.34	0.065	247.08	247.09
249	1	248.88	248.88	0	NA	248.88	248.88
251	448	251.04	251.53	0.49	0.065	251.26	251.27
255	207	255.15	255.46	0.31	0.066	255.34	255.34
259	321	259.22	259.56	0.34	0.065	259.39	259.38
264	266	263.34	263.65	0.31	0.067	263.5	263.5
268	205	267.44	267.73	0.29	0.067	267.61	267.62
272	365	271.53	272.11	0.58	0.08	271.77	271.77
276	563	275.69	276.2	0.51	0.075	275.91	275.92
280	356	279.9	280.18	0.28	0.079	280.05	280.09
284	299	283.95	284.29	0.34	0.081	284.16	284.17
288	378	288.06	288.47	0.41	0.081	288.26	288.28
292	1039	292.14	292.58	0.44	0.081	292.36	292.36
296	680	296.2	296.73	0.53	0.086	296.44	296.45
301	711	300.28	300.89	0.61	0.084	300.51	300.53
305	617	304.33	304.79	0.46	0.087	304.58	304.59
309	574	308.44	308.89	0.45	0.086	308.68	308.69
313	471	312.54	312.99	0.45	0.093	312.78	312.81
317	232	316.79	317.22	0.43	0.096	317.02	317.04
321	492	321.06	321.44	0.38	0.09	321.26	321.27
325	344	325.2	325.64	0.44	0.096	325.42	325.43
329	40	329.33	329.66	0.33	0.089	329.48	329.46
334	57	333.53	333.81	0.28	0.073	333.71	333.72
338	29	337.57	337.95	0.38	0.098	337.79	337.79
342	11	341.75	342.02	0.27	0.107	341.86	341.81
346	2	345.86	346.09	0.23	0.163	345.98	345.98
353	9	352.92	353.28	0.36	0.113	353.13	353.16

Generate cumulative plot for Oki-10

res <- allCum(DB_orig, "Oki-10")
print(res$plt)

Look for problem fragments

There is a potentially problematic fragment ~ 243.6bp

  res <- allCum(DB_orig, "Oki-10", ymin = 242, ymax = 247)
  print(res$plt)

Identify the point:

  DB_orig %>%
    filter(Marker == "Oki-10") %>%
    filter(Fragment >= 243.5 & Fragment <= 243.8)

  Marker    Sample Fragment       Date Plate
1 Oki-10 BGW050301   243.61 2015-03-07   all

BGW050301 = Peak is 242.9. Fixed.

Extract Samples with alleles that occur in fewer than three individuals

tab <- getLowFreq(DB_new, "Oki-10", 3)
if(!is.null(tab)){
  samp <- as.character(tab$Sample)
  idx <- which(colnames(tab) == "Sample")
  tab <- cbind(samp, tab[,-idx])
  rownames(tab) <- NULL
  pander::pandoc.table(tab)
}

samp	DFrow	Reading	Gel
SXM13179	193118	345.9	all
CLD100913	189346	346.2	all

RBW040504 = Unclear genotype, deleted.
SXM13179 = Peak is legit.
CLD100913 = Peak is 346.2. Fixed.

DONE!

BG935488

Calculate bin statistics for BG935488

dat <- BinStats(DB_orig, "BG935488")
dat2 <- apply(dat[,-1], 2, function(x){
  return(round(as.numeric(unlist(x)), 4))
})
tab <- as.data.frame(cbind(rownames(dat), dat2))
colnames(tab) <- colnames(dat)
pander::pandoc.table(tab)

Bins	N	Min	Max	Range	Sd	MEAN	MEDIAN
105	74	104.81	105.55	0.74	0.19	105.21	105.22
114	3180	113.81	114.88	1.07	0.204	114.43	114.46
123	512	122.2	123.22	1.02	0.203	122.7	122.75
127	638	126.3	127.08	0.78	0.169	126.76	126.77
131	2391	130.27	131.3	1.03	0.189	130.85	130.87
135	1436	134.39	135.35	0.96	0.19	134.94	134.97
139	1469	138.61	139.57	0.96	0.184	139.06	139.07
143	891	142.76	143.54	0.78	0.175	143.24	143.25
148	198	146.97	147.74	0.77	0.185	147.48	147.53
152	54	151.26	151.89	0.63	0.167	151.65	151.67
156	46	155.48	156.15	0.67	0.147	155.87	155.87

Generate cumulative plot for BG935488

res <- allCum(DB_orig, "BG935488")
print(res$plt)

Extract Samples with alleles that occur in fewer than three individuals

tab <- getLowFreq(DB_orig, "BG935488", 3)

No valid samples

if(!is.null(tab)){
  samp <- as.character(tab$Sample)
  idx <- which(colnames(tab) == "Sample")
  tab <- cbind(samp, tab[,-idx])
  rownames(tab) <- NULL
  pander::pandoc.table(tab)
}

DONE!

SsaD71

Calculate bin statistics for SsaD71

dat <- BinStats(DB_orig, "SsaD71", limit = 1.0)
dat2 <- apply(dat[,-1], 2, function(x){
  return(round(as.numeric(unlist(x)), 4))
})
tab <- as.data.frame(cbind(rownames(dat), dat2))
colnames(tab) <- colnames(dat)
pander::pandoc.table(tab)

Bins	N	Min	Max	Range	Sd	MEAN	MEDIAN
177	1	176.66	176.66	0	NA	176.66	176.66
184	1950	183.67	184.85	1.18	0.203	184.47	184.49
189	2112	187.69	189	1.31	0.211	188.59	188.61
193	803	192.03	193.05	1.02	0.21	192.71	192.74
197	929	196.06	197.27	1.21	0.221	196.87	196.89
201	102	200.2	201.26	1.06	0.249	200.89	200.92
205	398	204.22	205.31	1.09	0.203	205	205.03
209	613	208.28	209.4	1.12	0.228	209.04	209.09
213	1243	212.33	213.54	1.21	0.216	213.12	213.16
217	586	216.6	217.6	1	0.192	217.27	217.28
221	386	220.67	221.7	1.03	0.195	221.36	221.36
225	154	224.82	225.7	0.88	0.186	225.4	225.4
230	472	228.76	229.83	1.07	0.208	229.48	229.51
234	360	232.96	233.87	0.91	0.189	233.53	233.54
238	732	237.06	237.92	0.86	0.181	237.59	237.61
242	215	241.01	242.03	1.02	0.225	241.69	241.75
246	314	245.21	246.2	0.99	0.184	245.86	245.87
250	45	249.3	250.27	0.97	0.305	249.8	249.9
254	5	253.71	254.31	0.6	0.226	254.09	254.15
262	6	261.74	262.43	0.69	0.259	262.12	262.09
266	4	265.8	266.41	0.61	0.292	266.15	266.19
270	11	269.85	270.53	0.68	0.232	270.22	270.1

Generate cumulative plot for SsaD71

res <- allCum(DB_orig, "SsaD71", limit = 1.0)
print(res$plt)

Extract Samples with alleles that occur in fewer than three individuals

tab <- getLowFreq(DB_orig, "SsaD71", 3, limit = 1.0)
if(!is.null(tab)){
  samp <- as.character(tab$Sample)
  idx <- which(colnames(tab) == "Sample")
  tab <- cbind(samp, tab[,-idx])
  rownames(tab) <- NULL
  pander::pandoc.table(tab)
}

samp	DFrow	Reading	Gel
KEL040515	209317	176.7	all

DONE!

Sasa-TAP2A

Calculate bin statistics for Sasa-TAP2A

dat <- BinStats(DB_orig, "Sasa-TAP2A",  limit = 1.15)
dat2 <- apply(dat[,-1], 2, function(x){
  return(round(as.numeric(unlist(x)), 4))
})
tab <- as.data.frame(cbind(rownames(dat), dat2))
colnames(tab) <- colnames(dat)
pander::pandoc.table(tab)

Bins	N	Min	Max	Range	Sd	MEAN	MEDIAN
285	22	284.44	285.06	0.62	0.229	284.79	284.86
287	1085	286.23	287.15	0.92	0.184	286.84	286.86
301	1	300.97	300.97	0	NA	300.97	300.97
306	3	305.49	305.77	0.28	0.14	305.62	305.61
310	624	309.02	310.02	1	0.226	309.58	309.61
316	1768	315.18	316.33	1.15	0.234	315.93	315.97
318	24	317.57	318.32	0.75	0.224	318.08	318.15
321	1	320.09	320.09	0	NA	320.09	320.09
322	953	321.18	322.87	1.69	0.286	322.09	322.09
324	87	323.66	324.52	0.86	0.227	324.17	324.18
326	3530	325.58	326.97	1.39	0.246	326.41	326.45
329	463	327.85	329.07	1.22	0.237	328.58	328.63
330	982	329.7	331.13	1.43	0.291	330.48	330.48
332	2	332.32	332.32	0	0	332.32	332.32
337	1310	335.95	336.94	0.99	0.214	336.52	336.53

Generate cumulative plot for Sasa-TAP2A

res <- allCum(DB_orig, "Sasa-TAP2A", limit = 1.15)
print(res$plt)

There are some odd samples ~ 318bp

  res <- allCum(DB_orig, "Sasa-TAP2A", limit = 1.15, ymin = 315, 
                ymax = 325)
  print(res$plt)

Extract the problem samples

  # Small fragments
  DB_orig %>%
    filter(Marker == "Sasa-TAP2A") %>%
    filter(Fragment >= 317 & Fragment <= 317.7)

      Marker    Sample Fragment       Date Plate
1 Sasa-TAP2A UBN140071   317.57 2015-03-07   all
2 Sasa-TAP2A UBN140089   317.62 2015-03-07   all
3 Sasa-TAP2A UBN140091   317.62 2015-03-07   all
4 Sasa-TAP2A UBN140113   317.62 2015-03-07   all

  # Large fragments
  DB_orig %>%
    filter(Marker == "Sasa-TAP2A") %>%
    filter(Fragment >= 318 & Fragment <= 318.7)

       Marker    Sample Fragment       Date Plate
1  Sasa-TAP2A RBW040508   318.15 2015-03-07   all
2  Sasa-TAP2A RBW040610   318.18 2015-03-07   all
3  Sasa-TAP2A RBW040614   318.12 2015-03-07   all
4  Sasa-TAP2A RBW040703   318.12 2015-03-07   all
5  Sasa-TAP2A RBW041411   318.15 2015-03-07   all
6  Sasa-TAP2A RBW041507   318.09 2015-03-07   all
7  Sasa-TAP2A RBW042007   318.18 2015-03-07   all
8  Sasa-TAP2A RBW042118   318.09 2015-03-07   all
9  Sasa-TAP2A RBW050108   318.25 2015-03-07   all
10 Sasa-TAP2A RBW051403   318.18 2015-03-07   all
11 Sasa-TAP2A RBW051609   318.18 2015-03-07   all
12 Sasa-TAP2A RBW051704   318.06 2015-03-07   all
13 Sasa-TAP2A RBW052610   318.15 2015-03-07   all
14 Sasa-TAP2A RBW052705   318.12 2015-03-07   all
15 Sasa-TAP2A RBW053007   318.15 2015-03-07   all
16 Sasa-TAP2A RBW054306   318.15 2015-03-07   all
17 Sasa-TAP2A RFY050207   318.15 2015-03-07   all
18 Sasa-TAP2A SXM102501   318.32 2015-03-07   all
19 Sasa-TAP2A SXM102603   318.30 2015-03-07   all
20 Sasa-TAP2A BCC100123   318.30 2015-03-07   all

Take all small fragment samples and four large fragment samples to check of they are actually different alleles.

sasatap2a1

There is a clear wobble between the samples compared. The two top samples in the image above correspond to the small fragments, while the bottom two correspond to the larger fragments. Given the seperation between the fragments is so small (average = 0.6), a more conservative approach should be taken. Alleles will remain binned into the same allele.

Extract Samples with alleles that occur in fewer than three individuals

tab <- getLowFreq(DB_new, "Sasa-TAP2A", 3, limit = 1.15)
if(!is.null(tab)){
  samp <- as.character(tab$Sample)
  idx <- which(colnames(tab) == "Sample")
  tab <- cbind(samp, tab[,-idx])
  rownames(tab) <- NULL
  pander::pandoc.table(tab)
}

samp	DFrow	Reading	Gel
DGB040232	221219	305.5	all
RMN050506	218705	305.6	all
SJH100122	223475	305.8	all
BRD052212	220233	332.3	all
BRD052216	220239	332.3	all

CLG041101 = Bad size. GT deleted.
DGB040232 = Peak is legit.
RMN050506 = Peak is legit.
SJH100122 = Peak is legit.
BRD051612 = Very weak peak. GT deleted.
BRD052212 = Peak is legit.
BRD052216 = Peak is legit.

DONE!

CA053293

Calculate bin statistics for CA053293

dat <- BinStats(DB_orig, "CA053293")
dat2 <- apply(dat[,-1], 2, function(x){
  return(round(as.numeric(unlist(x)), 4))
})
tab <- as.data.frame(cbind(rownames(dat), dat2))
colnames(tab) <- colnames(dat)
pander::pandoc.table(tab)

Bins	N	Min	Max	Range	Sd	MEAN	MEDIAN
140	1	140.19	140.19	0	NA	140.19	140.19
144	1	143.82	143.82	0	NA	143.82	143.82
148	183	147.65	148.12	0.47	0.093	147.88	147.9
152	1483	150.79	152.06	1.27	0.112	151.77	151.78
154	693	153.47	153.9	0.43	0.084	153.71	153.71
155	30	154.66	154.74	0.08	0.025	154.71	154.7
156	3520	154.75	156.67	1.92	0.112	155.75	155.76
157	7	156.68	156.71	0.03	0.013	156.69	156.7
158	2778	157.2	158.57	1.37	0.097	157.63	157.63
159	40	158.62	158.71	0.09	0.022	158.67	158.68
160	1282	158.74	160.47	1.73	0.187	159.53	159.54
161	311	160.55	161.68	1.13	0.404	161.21	161.43
162	25	162.37	162.57	0.2	0.053	162.49	162.49
163	44	162.48	163.5	1.02	0.242	163.31	163.38
164	21	164.33	164.47	0.14	0.047	164.41	164.41
166	1	166.31	166.31	0	NA	166.31	166.31

Generate cumulative plot for CA053293

res <- allCum(DB_orig, "CA053293")
print(res$plt)

Identify problem bins.

There is an odd fragment ~ 158.3:

  res <- allCum(DB_orig, "CA053293", limit = 0.4, ymin = 157, 
                ymax = 160)
  print(res$plt)

Identify the sample:

  DB_orig %>%
    filter(Marker == "CA053293") %>%
    filter(Fragment >= 158.2 & Fragment <= 158.45)

    Marker    Sample Fragment       Date Plate
1 CA053293 UBN140064   158.34 2015-03-07   all

UBN140064 = Peak is legit.

Extract Samples with alleles that occur in fewer than three individuals

tab <- getLowFreq(DB_orig, "CA053293", 3)
if(!is.null(tab)){
  samp <- as.character(tab$Sample)
  idx <- which(colnames(tab) == "Sample")
  tab <- cbind(samp, tab[,-idx])
  rownames(tab) <- NULL
  pander::pandoc.table(tab)
}

samp	DFrow	Reading	Gel
RBW044101	227982	140.2	all
UBN140057	235904	143.8	all
BRD050916	228855	166.3	all

RBW044101 = Peak is legit.
UBN140057 = Peak is legit.
BRD050916 = Peak is legit.

Ssa422UoS

Calculate bin statistics for Ssa422UoS

dat <- BinStats(DB_orig, "Ssa422UoS")
dat2 <- apply(dat[,-1], 2, function(x){
  return(round(as.numeric(unlist(x)), 4))
})
tab <- as.data.frame(cbind(rownames(dat), dat2))
colnames(tab) <- colnames(dat)
pander::pandoc.table(tab)

Bins	N	Min	Max	Range	Sd	MEAN	MEDIAN
132	1	131.7	131.7	0	NA	131.7	131.7
157	1	156.93	156.93	0	NA	156.93	156.93
159	3	158.55	158.61	0.06	0.03	158.58	158.58
160	1210	159.09	160.74	1.65	0.306	160.38	160.47
161	17	161.1	161.41	0.31	0.094	161.31	161.34
162	48	161.47	161.75	0.28	0.076	161.63	161.65
163	3232	161.99	162.78	0.79	0.145	162.5	162.52
165	112	164.16	164.76	0.6	0.142	164.53	164.55
167	48	166.38	166.75	0.37	0.1	166.58	166.59
168	1	167.56	167.56	0	NA	167.56	167.56
169	3899	168.17	168.92	0.75	0.143	168.6	168.62
171	7	170.56	170.81	0.25	0.086	170.75	170.77
201	121	200.78	201.33	0.55	0.126	201.07	201.09
207	378	206.82	207.41	0.59	0.138	207.14	207.16
211	14	210.85	211.23	0.38	0.098	211.04	211.06
213	16	212.7	213.2	0.5	0.179	213.01	213.07
215	6	214.9	215.14	0.24	0.089	215.08	215.11
217	2	217.19	217.23	0.04	0.028	217.21	217.21
219	30	218.92	219.37	0.45	0.129	219.25	219.28
221	17	221	221.41	0.41	0.13	221.23	221.28
223	407	222.94	223.49	0.55	0.112	223.26	223.28
225	846	224.95	225.51	0.56	0.112	225.28	225.28
227	1	227.38	227.38	0	NA	227.38	227.38
229	8	229.08	229.47	0.39	0.138	229.37	229.44
243	1	243.44	243.44	0	NA	243.44	243.44
246	7	245.48	245.76	0.28	0.099	245.66	245.7

Generate cumulative plot for Ssa422UoS

res <- allCum(DB_orig, "Ssa422UoS")
print(res$plt)

Identify problems

Due to the presence of 1bp shift in this locus, all samples run after 14/04/14 will have 0.35 added to their fragment value. This has an excellent effect on binning accuracy.

Before

  res <- allCum(DB_orig, "Ssa422UoS", limit = 0.55, ymin = 158, 
                ymax = 164)
  print(res$plt)

After

  res <- allCum(DB_new, "Ssa422UoS", limit = 0.55, ymin = 158, 
                ymax = 164)
  print(res$plt)

There is a potentially problematic sample ~ 168

  res <- allCum(DB_new, "Ssa422UoS", limit = 0.55, ymin = 160, 
                ymax = 170)
  print(res$plt)

Identify the point:

  DB_new %>%
    filter(Marker == "Ssa422UoS") %>%
    filter(Fragment >= 167.6 & Fragment <= 168)

     Marker    Sample Fragment       Date Plate
1 Ssa422UoS UBN140064   167.91 2015-03-07   all

Fragment is legit.

Extract Samples with alleles that occur in fewer than three individuals

tab <- getLowFreq(DB_new, "Ssa422UoS", 3, limit = 0.55)
if(!is.null(tab)){
  samp <- as.character(tab$Sample)
  idx <- which(colnames(tab) == "Sample")
  tab <- cbind(samp, tab[,-idx])
  rownames(tab) <- NULL
  pander::pandoc.table(tab)
}

samp	DFrow	Reading	Gel
KEL040916	241736	156.9	all
BLD101205	243834	158.6	all
BLD100506	243715	158.6	all
BLD100509	243721	158.6	all
UBN140064	246334	167.9	all
RBW053002	238622	217.2	all
FDB040219	238914	217.2	all
RCK100102	245168	227.5	all
RMN13234	247885	243.8	all

CLM050109 = Peak is an artifact. Fixed.
KEL040916 = Peak is legit.
BLD101205 = Peak is legit.
BLD100506 = Peak is legit.
BLD100509 = Peak is legit.
UBN140064 = Peak is legit.
RBW053002 = Peak is legit.
FDB040219 = Peak is legit.
RCK100102 = Peak is 227.5. Edited.
RMN13234 = Peak is legit.

DONE!

CA060208

Calculate bin statistics for CA060208

dat <- BinStats(DB_orig, "CA060208")
dat2 <- apply(dat[,-1], 2, function(x){
  return(round(as.numeric(unlist(x)), 4))
})
tab <- as.data.frame(cbind(rownames(dat), dat2))
colnames(tab) <- colnames(dat)
pander::pandoc.table(tab)

Bins	N	Min	Max	Range	Sd	MEAN	MEDIAN
156	646	155.15	155.73	0.58	0.12	155.49	155.5
158	4	157.26	157.58	0.32	0.155	157.49	157.56
160	1	160	160	0	NA	160	160
161	4361	160.82	161.6	0.78	0.118	161.33	161.33
165	3681	164.74	165.51	0.77	0.116	165.17	165.18
167	484	166.71	167.4	0.69	0.115	167.1	167.11
171	724	170.63	171.15	0.52	0.123	170.93	170.95
175	3	174.64	174.91	0.27	0.156	174.82	174.91

Generate cumulative plot for CA060208

res <- allCum(DB_orig, "CA060208")
print(res$plt)

Extract Samples with alleles that occur in fewer than three individuals

Identify potential problem samples

There is an odd bin ~ 157bp:

  res <- allCum(DB_orig, "CA060208", ymin = 155, ymax = 160)
  print(res$plt)

Identify the points:

  DB_orig %>%
    filter(Marker == "CA060208") %>%
    filter(Fragment >= 157 & Fragment <= 158)

    Marker    Sample Fragment       Date Plate
1 CA060208 SXM100205   157.57 2015-03-07   all
2 CA060208 SXM100206   157.56 2015-03-07   all
3 CA060208 RBW104002   157.58 2015-03-07   all
4 CA060208  LND14048   157.26 2015-03-07   all

All peaks are legit.

tab <- getLowFreq(DB_orig, "CA060208", 3)
if(!is.null(tab)){
  samp <- as.character(tab$Sample)
  idx <- which(colnames(tab) == "Sample")
  tab <- cbind(samp, tab[,-idx])
  rownames(tab) <- NULL
  pander::pandoc.table(tab)
}

samp	DFrow	Reading	Gel
UBN140064	256347	160	all
LND14083	258060	174.6	all
DGH100501	254127	174.9	all
BCC100128	254689	174.9	all

UBN140064 = Peak is legit.
LND14083 = Peak is legit.
BCC100128 = Peak is legit.
DGH100501 = Peak is legit.

DONE!

MHC-I-UTR

Calculate bin statistics for MHC-I-UTR

dat <- BinStats(DB_orig, "MHC-I-UTR")
dat2 <- apply(dat[,-1], 2, function(x){
  return(round(as.numeric(unlist(x)), 4))
})
tab <- as.data.frame(cbind(rownames(dat), dat2))
colnames(tab) <- colnames(dat)
pander::pandoc.table(tab)

Bins	N	Min	Max	Range	Sd	MEAN	MEDIAN
240	791	239.8	240.82	1.02	0.216	240.14	240.09
242	1	242.13	242.13	0	NA	242.13	242.13
245	436	244.62	245.62	1	0.13	245	245.01
247	318	246.41	246.99	0.58	0.128	246.71	246.71
249	105	248.19	248.95	0.76	0.18	248.55	248.55
251	460	250.29	251.45	1.16	0.296	251.08	251.2
252	2953	251.46	253.09	1.63	0.259	252.49	252.41
254	1914	254	254.9	0.9	0.178	254.49	254.49
256	293	255.76	256.27	0.51	0.125	256.07	256.09
258	148	257.88	258.69	0.81	0.16	258.25	258.28
262	27	261.8	262.23	0.43	0.134	262.05	262.1
264	700	263.62	264.29	0.67	0.123	264.03	264.05
274	159	273.42	274.03	0.61	0.12	273.79	273.79
278	2	277.8	277.83	0.03	0.021	277.81	277.81
282	18	281.42	281.76	0.34	0.084	281.6	281.62
428	972	426.85	428.87	2.02	0.231	428.03	428.04
430	389	429.32	430.35	1.03	0.228	429.95	429.97
432	207	431.34	432.32	0.98	0.222	431.91	431.92

Generate cumulative plot for MHC-I-UTR

res <- allCum(DB_orig, "MHC-I-UTR")
print(res$plt)

Identify problems

There is a sole point associated with a bin ~ 245bp:

  res <- allCum(DB_orig, "MHC-I-UTR", ymin = 244, ymax = 248)
  print(res$plt)

Identify the point:

  DB_orig %>%
    filter(Marker == "MHC-I-UTR") %>%
    filter(Fragment >= 245.5 & Fragment <= 246)

     Marker    Sample Fragment       Date Plate
1 MHC-I-UTR CLG051018   245.62 2015-03-07   all

CLG051018 = Peak is 246.6, fixed in “Main_DB_new.txt”.

There is a problem bin ~ 252bp:

  res <- allCum(DB_orig, "MHC-I-UTR", ymin = 250, ymax = 253.5)
  print(res$plt)

Identify the points between 251.6 and 251.9:

  DB_orig %>%
    filter(Marker == "MHC-I-UTR") %>%
    filter(Fragment >= 251.6 & Fragment <= 251.9)

     Marker    Sample Fragment       Date Plate
1 MHC-I-UTR RBW040413   251.72 2015-03-07   all
2 MHC-I-UTR BRD040722   251.83 2015-03-07   all
3 MHC-I-UTR BRD040908   251.71 2015-03-07   all
4 MHC-I-UTR KLW040505   251.64 2015-03-07   all
5 MHC-I-UTR SKW040122   251.70 2015-03-07   all

RBW040413 = Odd peak. Deleted.
BRD040722 = Peak is 250.3. Fixed.
BRD040908 = Peak is 252.6. Fixed.
KLW040505 = Peak is 252.6. Fixed.
SKW040122 = Peak is 252.7. Fixed.

Binning limits below 251.7 should be set to 0.7, while limits above this threshold should be 0.8.

There are two outlier fragments associated with the bin ~ 428bp:

  res <- allCum(DB_orig, "MHC-I-UTR", ymin = 425, ymax = 430)
  print(res$plt)

Identify the two points:

  # small point
  DB_orig %>%
    filter(Marker == "MHC-I-UTR") %>%
    filter(Fragment >= 426.8 & Fragment <= 427)

     Marker    Sample Fragment       Date Plate
1 MHC-I-UTR KEL051207   426.85 2015-03-07   all

  # large point
  DB_orig %>%
    filter(Marker == "MHC-I-UTR") %>%
    filter(Fragment >= 428.8 & Fragment <= 429)

     Marker    Sample Fragment       Date Plate
1 MHC-I-UTR BRD040706   428.87 2015-03-07   all

KEL051207 = Fragment is an artifact. Fixed
BRD040706 = Ambigious allele peak. Disgarded.

Extract Samples with alleles that occur in fewer than three individuals

tab <- getLowFreq(DB_new, "MHC-I-UTR", 3)
if(!is.null(tab)){
  samp <- as.character(tab$Sample)
  idx <- which(colnames(tab) == "Sample")
  tab <- cbind(samp, tab[,-idx])
  rownames(tab) <- NULL
  pander::pandoc.table(tab)
}

samp	DFrow	Reading	Gel
BCC100137	264662	242.1	all
GVY100904	265081	277.8	all
GVY100905	265083	277.8	all

BCC100137 = Peak is legit.
GVY100904 = Peak is legit.
GVY100905 = Peak is legit.

DONE!

SsaD170

Calculate bin statistics for SsaD170

dat <- BinStats(DB_orig, "SsaD170")
dat2 <- apply(dat[,-1], 2, function(x){
  return(round(as.numeric(unlist(x)), 4))
})
tab <- as.data.frame(cbind(rownames(dat), dat2))
colnames(tab) <- colnames(dat)
pander::pandoc.table(tab)

Bins	N	Min	Max	Range	Sd	MEAN	MEDIAN
137	78	136.17	136.86	0.69	0.154	136.58	136.6
149	26	148.39	148.99	0.6	0.206	148.73	148.78
153	377	152.43	153.09	0.66	0.153	152.81	152.83
157	281	156.49	157.2	0.71	0.146	156.93	156.95
161	225	160.39	161.28	0.89	0.177	161.01	161.03
165	416	164.63	166.01	1.38	0.169	165.03	165.04
169	1288	168.56	170.02	1.46	0.254	169.1	169.08
170	53	170.06	170.21	0.15	0.05	170.13	170.12
173	1247	172.65	174.09	1.44	0.17	173.07	173.07
174	5	174.13	174.21	0.08	0.031	174.16	174.16
177	1386	176.69	178.16	1.47	0.167	177.09	177.09
178	2	178.19	178.19	0	0	178.19	178.19
181	1797	180.68	182.16	1.48	0.199	181.14	181.13
182	20	182.17	182.32	0.15	0.049	182.25	182.25
185	1380	184.76	186.22	1.46	0.187	185.2	185.2
186	9	186.25	186.35	0.1	0.032	186.29	186.28
189	1088	188.81	190.22	1.41	0.255	189.3	189.28
190	39	190.24	190.44	0.2	0.059	190.34	190.34
193	832	192.89	194.27	1.38	0.181	193.31	193.31
194	10	194.29	194.45	0.16	0.05	194.39	194.41
195	1	195.24	195.24	0	NA	195.24	195.24
197	515	196.94	198.22	1.28	0.173	197.36	197.38
198	17	198.25	198.53	0.28	0.076	198.42	198.45
201	216	201.05	201.63	0.58	0.153	201.38	201.4
202	59	201.98	202.53	0.55	0.124	202.33	202.34
205	23	205.01	205.55	0.54	0.159	205.29	205.34
206	53	205.56	206.58	1.02	0.27	206.3	206.41
209	26	209.09	210.11	1.02	0.203	209.41	209.43
210	2	210.45	210.56	0.11	0.078	210.5	210.5
213	4	213.12	213.56	0.44	0.193	213.39	213.44
218	41	217.15	217.73	0.58	0.179	217.48	217.52
222	1	221.67	221.67	0	NA	221.67	221.67

Generate cumulative plot for SsaD170

res <- allCum(DB_orig, "SsaD170")
print(res$plt)

Identify problems

There are a number of larger fragments associated with the bin ~ 165bp:

  res <- allCum(DB_orig, "SsaD170", ymin = 163, ymax = 167)
  print(res$plt)

Identify the points:

  DB_orig %>%
    filter(Marker == "SsaD170") %>%
    filter(Fragment >= 165.5 & Fragment <= 166.2)

   Marker    Sample Fragment       Date Plate
1 SsaD170 KEL051524   165.85 2015-03-07   all
2 SsaD170 CGN100701   166.01 2015-03-07   all
3 SsaD170  SXM13051   165.62 2015-03-07   all
4 SsaD170  SXM13131   165.75 2015-03-07   all

KEL051524 = Peak is legit.
CGN100701 = Peak is legit.
SXM13051 = Peak is legit.
SXM13131 = Peak is legit.

It appeares that these fragments are seperated from the main bin by 1bp. Setting bin limits to 0.25 between 164bp and 167bp allow the algorithm to differentiate alleles.

There seems to be another 1bp split in the bin ~ 169bp:
```
  res <- allCum(DB_orig, "SsaD170", ymin = 168, ymax = 173)
  print(res$plt)
```
- Setting binning limits to 0.25 in this region allows the algorithm to differentiate alleles.
On closer inspection, most bins between 163bp and 200 bp contain a 1bp shift group of fragments. Setting binning limit to 0.25 accurately seperates these alleles.

There is an odd fragment ~ 195.5bp:

  res <- allCum(DB_orig, "SsaD170", ymin = 192, ymax = 198)
  print(res$plt)

Identify the point

  DB_orig %>%
    filter(Marker == "SsaD170") %>%
    filter(Fragment >= 195 & Fragment <= 196)

   Marker    Sample Fragment       Date Plate
1 SsaD170 RBW040504   195.24 2015-03-07   all

RBW040504 = Peak is legit.

Setting binning limit to 0.25 does not accuratly seperate 1bp differences for alleles between 200 - 208 bp. Increasing binning limits to 0.35 overcomes this issue.

Bin limit = 0.25

  res <- allCum(DB_orig, "SsaD170", limit = 0.25, ymin = 200, 
                ymax = 210)
  print(res$plt)

Bin limit = 0.35

  res <- allCum(DB_orig, "SsaD170", limit = 0.35, ymin = 200, 
                ymax = 210)
  print(res$plt)

The fragments ~ 210bp are not binned accuratly with a bin limit of 0.35. Setting bin limit to 0.45 for this region works. All fragments above this region should be binned with a bin limit of 0.8.

To summarise the binning pattern for this locus:

Fragments between 130bp - 162bp should have a bin limit of 0.8
Fragments between 163bp - 200bp should have a bin limit of 0.25
Fragments between 201bp - 208bp should have a bin limit of 0.35
Fragments between 209bp - 212bp should have a bin limit of 0.45
All larger fragments should have a bin limit of 0.8

Extract Samples with alleles that occur in fewer than three individuals

tab <- getLowFreq(DB_orig, "SsaD170", 3, 
                  limit = list(c(162, 0.8), c(200, 0.25), c(208, 0.35),
                               c(212, 0.45), c(250, 0.8)))
if(!is.null(tab)){
  samp <- as.character(tab$Sample)
  idx <- which(colnames(tab) == "Sample")
  tab <- cbind(samp, tab[,-idx])
  rownames(tab) <- NULL
  pander::pandoc.table(tab)
}

samp	DFrow	Reading	Gel
RBW040504	268310	195.2	all
SXM13049	278228	210.1	all
IOM120253	275731	210.4	all
ALG100105	276215	210.6	all
BLD102202	274729	221.7	all

RBW040504 = Peak is legit.
SXM13049 = Peak is legit.
IOM120253 = Peak is legit.
ALG100105 = Peak is legit.
BLD102202 = Peak is legit.

DONE!

Ssa413UoS

Calculate bin statistics for Ssa413UoS

dat <- BinStats(DB_orig, "Ssa413UoS", limit = 0.9)
dat2 <- apply(dat[,-1], 2, function(x){
  return(round(as.numeric(unlist(x)), 4))
})
tab <- as.data.frame(cbind(rownames(dat), dat2))
colnames(tab) <- colnames(dat)
pander::pandoc.table(tab)

Bins	N	Min	Max	Range	Sd	MEAN	MEDIAN
224	3	223.15	224.03	0.88	0.443	223.56	223.5
226	3509	226	226.64	0.64	0.118	226.38	226.39
229	2693	228.77	229.72	0.95	0.125	229.44	229.45
232	2	232.42	232.48	0.06	0.042	232.45	232.45
236	553	235.3	235.77	0.47	0.106	235.58	235.59
242	6	241.52	241.81	0.29	0.118	241.64	241.59
245	731	244.52	245.06	0.54	0.131	244.82	244.83
248	22	247.63	248.12	0.49	0.157	247.81	247.74
251	150	250.75	251.25	0.5	0.132	251.03	251.01
254	7	253.83	254.19	0.36	0.133	254.09	254.15
257	2114	256.65	257.3	0.65	0.135	257.06	257.07
260	217	259.61	260.27	0.66	0.118	260.07	260.09
263	137	262.65	263.3	0.65	0.134	262.98	262.98
266	13	265.76	266.28	0.52	0.155	266.06	266.06
272	242	271.83	272.36	0.53	0.126	272.13	272.13
275	8	274.97	275.33	0.36	0.139	275.18	275.2

Generate cumulative plot for Ssa413UoS

res <- allCum(DB_orig, "Ssa413UoS", limit = 0.9)
print(res$plt)

Identify problems

There is an odd fragment ~ 228.8 bp:

  res <- allCum(DB_orig, "Ssa413UoS", limit = 0.9, ymin = 225, 
                ymax = 230)
  print(res$plt)

Identify the point

  DB_orig %>% 
    filter(Marker == "Ssa413UoS") %>% 
    filter(Fragment >= 228 & Fragment <= 229)

     Marker    Sample Fragment       Date Plate
1 Ssa413UoS BLD101004   228.77 2015-03-07   all

BLD101004 = Peak is an artifact. Deleted.

Extract Samples with alleles that occur in fewer than three individuals

tab <- getLowFreq(DB_new, "Ssa413UoS", 3, limit = 0.9)
if(!is.null(tab)){
  samp <- as.character(tab$Sample)
  idx <- which(colnames(tab) == "Sample")
  tab <- cbind(samp, tab[,-idx])
  rownames(tab) <- NULL
  pander::pandoc.table(tab)
}

samp	DFrow	Reading	Gel
UBN140015	287974	223.2	all
RBW042903	280071	223.5	all

UBN140015 = Peak is legit.
RBW042903 = Peak is legit.
KEL040106 = Peak is artifact. Fixed.
ART050105 = Peak is artifact. Fixed.

DONE!

Ssa407UoS

Calculate bin statistics for Ssa407UoS

dat <- BinStats(DB_orig, "Ssa407UoS", limit = 0.75)
dat2 <- apply(dat[,-1], 2, function(x){
  return(round(as.numeric(unlist(x)), 4))
})
tab <- as.data.frame(cbind(rownames(dat), dat2))
colnames(tab) <- colnames(dat)
pander::pandoc.table(tab)

Bins	N	Min	Max	Range	Sd	MEAN	MEDIAN
203	2	203.24	203.48	0.24	0.17	203.36	203.36
208	28	207.41	208.11	0.7	0.191	207.84	207.87
209	9	209.09	209.62	0.53	0.199	209.37	209.43
212	3	212.11	212.23	0.12	0.062	212.18	212.2
214	8	213.45	213.64	0.19	0.068	213.54	213.54
218	82	217.4	217.94	0.54	0.14	217.72	217.72
220	2	220	220	0	0	220	220
222	930	221.55	222.16	0.61	0.133	221.88	221.89
224	25	223.76	224.32	0.56	0.154	224.03	224
226	220	225.68	226.26	0.58	0.129	226.01	226
228	3	227.94	228.32	0.38	0.206	228.18	228.27
230	246	229.79	230.39	0.6	0.151	230.13	230.15
232	12	231.98	232.47	0.49	0.209	232.27	232.38
234	386	233.96	234.49	0.53	0.145	234.29	234.32
236	71	236.09	236.59	0.5	0.143	236.38	236.39
238	746	238	238.62	0.62	0.134	238.4	238.43
240	12	240.3	240.73	0.43	0.149	240.51	240.46
243	986	242.28	242.99	0.71	0.136	242.6	242.6
247	364	246.54	247.12	0.58	0.145	246.82	246.83
249	29	248.72	249.21	0.49	0.141	249.03	249.07
251	699	250.75	251.27	0.52	0.125	251.04	251.05
253	23	252.9	253.35	0.45	0.129	253.16	253.17
255	901	254.82	255.37	0.55	0.125	255.13	255.11
257	60	256.99	257.47	0.48	0.124	257.23	257.22
259	383	258.94	259.42	0.48	0.132	259.21	259.21
261	117	261.05	261.51	0.46	0.121	261.29	261.28
263	549	262.78	263.59	0.81	0.137	263.3	263.31
265	89	265.13	265.64	0.51	0.121	265.43	265.43
267	688	267.13	267.77	0.64	0.136	267.45	267.48
270	241	269.26	269.87	0.61	0.14	269.59	269.61
272	803	271.24	271.83	0.59	0.139	271.59	271.61
274	89	273.39	273.98	0.59	0.164	273.76	273.84
276	765	275.41	275.98	0.57	0.138	275.73	275.75
278	11	277.61	278.07	0.46	0.194	277.86	277.96
280	482	279.51	280.17	0.66	0.143	279.88	279.91
282	25	281.74	282.2	0.46	0.135	281.97	282
284	464	283.45	284.26	0.81	0.141	284	284.01
286	3	285.82	285.88	0.06	0.032	285.84	285.83
288	149	287.78	288.34	0.56	0.15	288.08	288.11
292	49	291.89	292.43	0.54	0.139	292.19	292.18
294	1	294.49	294.49	0	NA	294.49	294.49
296	80	296.03	296.54	0.51	0.132	296.26	296.27
298	1	298	298	0	NA	298	298
300	12	300.19	300.61	0.42	0.191	300.38	300.27
304	21	304.14	304.75	0.61	0.158	304.39	304.38
308	10	308.32	308.75	0.43	0.154	308.51	308.48
313	22	312.38	312.95	0.57	0.137	312.66	312.64
317	4	316.64	317.09	0.45	0.206	316.79	316.72
321	3	320.88	321.06	0.18	0.092	320.98	321
325	6	325.15	325.59	0.44	0.182	325.34	325.31
329	2	329.29	329.3	0.01	0.007	329.3	329.3
330	14	329.22	329.71	0.49	0.166	329.52	329.54
334	7	333.29	333.71	0.42	0.141	333.53	333.56

Generate cumulative plot for Ssa407UoS

res <- allCum(DB_orig, "Ssa407UoS", limit = 0.75)
print(res$plt)

Extract Samples with alleles that occur in fewer than three individuals

tab <- getLowFreq(DB_orig, "Ssa407UoS", 3, limit = 0.75)
if(!is.null(tab)){
  samp <- as.character(tab$Sample)
  idx <- which(colnames(tab) == "Sample")
  tab <- cbind(samp, tab[,-idx])
  rownames(tab) <- NULL
  pander::pandoc.table(tab)
}

samp	DFrow	Reading	Gel
BRD042010	294535	203.2	all
RMN103104	296125	203.5	all
GGR100909	298096	212.1	all
GGR100905	298090	212.2	all
BLD101302	296139	212.2	all
RBW053904	290140	220	all
RBW051503	290709	220	all
RMN13001	299880	227.9	all
LCH100104	298396	228.3	all
LCH100105	298398	228.3	all
UBN140121	298945	285.8	all
UBN140125	298953	285.8	all
UBN140110	298925	285.9	all
BCC100119	296983	294.5	all
UBN140097	298900	298	all
RMN13191	300329	320.9	all
ART040205	294623	321	all
BRD040723	299005	321.1	all

BRD042010 = Peak is legit.
RMN103104 = Peak is legit.
GGR100909 = Peak is legit.
GGR100905 = Peak is legit.
BLD101302 = Peak is legit.
RBW051503 = Peak is legit.
RBW053904 = Peak is legit.
RMN13001 = Peak is legit.
LCH100104 = Peak is legit.
LCH100105 = Peak is legit.
UBN140121 = Peak is legit.
UBN140125 = Peak is legit.
UBN140110 = Peak is legit.
BCC100119 = Peak is legit.
UBN140097 = Peak is legit.
RMN13191 = Peak is legit.
ART040205 = Peak is legit.
BRD040723 = Peak is legit.
BRD100708 = Peak is legit.
LSN100401 = Peak is legit.

DONE!

SsaD48

Calculate bin statistics for SsaD48

dat <- BinStats(DB_orig, "SsaD48")
dat2 <- apply(dat[,-1], 2, function(x){
  return(round(as.numeric(unlist(x)), 4))
})
tab <- as.data.frame(cbind(rownames(dat), dat2))
colnames(tab) <- colnames(dat)
pander::pandoc.table(tab)

Bins	N	Min	Max	Range	Sd	MEAN	MEDIAN
193	1	192.87	192.87	0	NA	192.87	192.87
197	2	196.95	197.01	0.06	0.042	196.98	196.98
201	4	200.81	200.88	0.07	0.038	200.85	200.85
207	1	207.47	207.47	0	NA	207.47	207.47
240	8	239.52	239.71	0.19	0.087	239.62	239.62
287	1	287.37	287.37	0	NA	287.37	287.37
306	4	306.22	306.42	0.2	0.088	306.31	306.31
310	1	310.2	310.2	0	NA	310.2	310.2
318	1	318.48	318.48	0	NA	318.48	318.48
332	1	331.58	331.58	0	NA	331.58	331.58
335	18	334.98	335.78	0.8	0.268	335.42	335.46
339	1	338.85	338.85	0	NA	338.85	338.85
350	2	349.78	350.21	0.43	0.304	350	350
354	28	353.6	354.08	0.48	0.139	353.82	353.81
356	19	354.87	355.92	1.05	0.278	355.57	355.61
358	3	357.91	358.4	0.49	0.265	358.1	357.98
362	1	361.8	361.8	0	NA	361.8	361.8
363	12	362.29	362.89	0.6	0.205	362.58	362.57
367	7	366.63	367.08	0.45	0.166	366.85	366.9
370	17	369.94	370.13	0.19	0.049	370.06	370.08
371	147	370.17	371.59	1.42	0.267	370.61	370.62
372	1	371.99	371.99	0	NA	371.99	371.99
374	2	373.74	373.77	0.03	0.021	373.75	373.75
375	8	374.72	375.5	0.78	0.278	375.11	375.08
376	5	375.77	375.93	0.16	0.068	375.86	375.84
378	7	377.58	378.08	0.5	0.167	377.84	377.85
379	26	378.25	379.83	1.58	0.352	378.81	378.86
382	36	381.64	382	0.36	0.089	381.81	381.8
383	243	382.02	383.8	1.78	0.359	382.8	382.81
384	20	383.82	384.02	0.2	0.058	383.92	383.92
385	2	385.02	385.11	0.09	0.064	385.06	385.06
386	7	385.28	385.86	0.58	0.255	385.61	385.75
387	114	385.92	387.84	1.92	0.316	386.67	386.69
388	3	388.02	388.09	0.07	0.036	388.05	388.04
390	57	389.15	390.66	1.51	0.363	389.91	389.91
391	8	390.84	391.5	0.66	0.226	391.1	391.06
392	5	391.67	392.09	0.42	0.159	391.89	391.88
393	1	392.83	392.83	0	NA	392.83	392.83
394	125	393.51	394.67	1.16	0.248	394.34	394.43
395	54	394.7	395.58	0.88	0.278	394.98	394.88
396	26	395.68	396.36	0.68	0.182	395.95	395.94
398	7	397.41	398.01	0.6	0.198	397.81	397.89
399	39	398.11	399.1	0.99	0.28	398.7	398.71
400	69	399.12	400.36	1.24	0.227	399.72	399.74
402	2	401.38	401.5	0.12	0.085	401.44	401.44
403	112	402.11	403.54	1.43	0.326	403.04	403.07
404	85	403.56	404.23	0.67	0.188	403.84	403.77
406	9	405.11	406	0.89	0.342	405.74	405.94
407	126	406.08	407.71	1.63	0.404	407.19	407.3
408	15	407.72	408.22	0.5	0.122	407.81	407.8
409	26	408.53	409.38	0.85	0.295	409.11	409.24
410	15	409.68	410.01	0.33	0.114	409.88	409.93
411	246	410.05	411.8	1.75	0.439	411.09	411.12
412	3	412.23	412.52	0.29	0.145	412.38	412.38
414	133	413.23	414.47	1.24	0.267	414.13	414.19
415	249	414.51	415.83	1.32	0.393	415.15	415.23
416	25	415.88	416.95	1.07	0.303	416.23	416.17
418	30	417.54	417.96	0.42	0.098	417.86	417.9
419	418	418	419.66	1.66	0.428	418.87	418.82
420	93	419.73	420.98	1.25	0.344	420.09	419.91
421	2	421.11	421.18	0.07	0.049	421.14	421.14
422	7	421.49	421.95	0.46	0.178	421.77	421.84
423	209	421.98	423.46	1.48	0.418	422.86	422.79
424	243	423.47	425.01	1.54	0.417	423.99	423.91
425	6	425.12	425.29	0.17	0.071	425.18	425.15
426	35	425.38	426.11	0.73	0.199	425.91	425.99
427	260	426.13	427.51	1.38	0.368	426.77	426.72
428	142	427.52	428.94	1.42	0.348	427.98	427.96
429	2	429.19	429.19	0	0	429.19	429.19
430	43	429.3	430.06	0.76	0.248	429.77	429.88
431	425	430.08	431.56	1.48	0.305	430.7	430.72
432	236	431.59	433.05	1.46	0.297	432.16	432.1
433	4	433.12	433.31	0.19	0.086	433.19	433.15
434	15	433.42	433.96	0.54	0.16	433.82	433.88
435	348	434.01	435.53	1.52	0.335	434.67	434.68
436	230	435.54	436.98	1.44	0.395	436.23	436.25
437	6	437.02	437.29	0.27	0.107	437.16	437.16
438	118	437.59	438.62	1.03	0.252	438.28	438.36
439	61	438.63	439.33	0.7	0.176	438.82	438.78
440	246	439.38	440.8	1.42	0.366	440.21	440.26
441	45	440.81	441.44	0.63	0.172	440.98	440.92
442	293	441.54	442.92	1.38	0.275	442.41	442.42
443	31	442.93	443.49	0.56	0.205	443.26	443.32
444	147	443.5	444.94	1.44	0.354	444.27	444.25
445	15	444.98	445.16	0.18	0.06	445.03	445.01
446	4	445.84	445.85	0.01	0.006	445.85	445.85
447	150	446.03	447.41	1.38	0.26	446.61	446.59
448	149	447.48	448.76	1.28	0.332	448.25	448.33
449	36	448.77	449.31	0.54	0.151	448.97	448.95
451	77	449.98	451.13	1.15	0.229	450.61	450.63
452	64	451.42	452.2	0.78	0.219	451.89	451.96
453	194	452.21	453.73	1.52	0.328	452.81	452.83
454	6	453.81	453.95	0.14	0.066	453.9	453.93
455	119	454.02	455.12	1.1	0.239	454.59	454.63
456	29	455.51	455.98	0.47	0.122	455.82	455.83
457	165	455.99	457.54	1.55	0.334	456.71	456.71
459	227	457.81	459.41	1.6	0.229	458.62	458.6
460	111	459.5	461.07	1.57	0.385	460.29	460.35
461	12	461.15	461.45	0.3	0.088	461.24	461.21
462	3	461.82	461.95	0.13	0.067	461.89	461.91
463	128	462.08	463.37	1.29	0.277	462.66	462.69
464	141	463.49	465.11	1.62	0.346	464.15	464.18
465	5	465.18	465.34	0.16	0.071	465.24	465.2
466	5	465.6	465.92	0.32	0.124	465.75	465.76
467	119	466.04	467.36	1.32	0.284	466.63	466.64
468	180	467.37	469.12	1.75	0.372	468.15	468.21
469	6	469.2	469.43	0.23	0.087	469.3	469.32
470	14	469.94	470.22	0.28	0.094	470.07	470.04
471	33	470.28	471.05	0.77	0.24	470.7	470.72
472	177	471.14	473.07	1.93	0.419	472.18	472.16
473	11	473.1	473.39	0.29	0.095	473.28	473.27
474	12	473.87	474.11	0.24	0.067	473.99	473.99
475	116	474.18	475.43	1.25	0.296	474.66	474.6
476	86	475.45	477.01	1.56	0.351	476.14	476.13
477	5	477.21	477.29	0.08	0.031	477.24	477.23
478	4	477.39	477.93	0.54	0.259	477.69	477.73
479	111	477.98	479.27	1.29	0.291	478.57	478.61
480	146	479.4	480.65	1.25	0.317	480.1	480.1
481	80	480.72	481.94	1.22	0.256	481.1	481.09
482	5	482	482.05	0.05	0.018	482.02	482.02
483	47	482.24	483.13	0.89	0.248	482.74	482.75
484	227	483.16	484.95	1.79	0.357	484.09	484.1
485	34	485.02	485.47	0.45	0.101	485.21	485.21
486	1	485.96	485.96	0	NA	485.96	485.96
487	108	486.11	487.32	1.21	0.27	486.87	486.9
488	151	487.45	488.91	1.46	0.303	488.2	488.16
489	36	488.98	489.89	0.91	0.208	489.24	489.18
491	53	490.27	491.27	1	0.256	490.82	490.78
492	113	491.47	492.74	1.27	0.286	492.25	492.28
493	56	492.78	493.5	0.72	0.197	493.07	493.08
495	64	494.09	495.32	1.23	0.292	494.91	494.94
496	114	495.34	496.48	1.14	0.243	496.07	496.08
497	142	496.49	497.98	1.49	0.266	497.04	497.01
498	1	498.2	498.2	0	NA	498.2	498.2
499	23	498.28	499.27	0.99	0.315	498.85	498.94
500	100	499.34	500.84	1.5	0.371	500.13	500.1
501	17	500.92	501.66	0.74	0.205	501.16	501.13
502	1	501.9	501.9	0	NA	501.9	501.9
503	43	502.14	503.7	1.56	0.355	503.1	503.15
504	28	503.78	504.93	1.15	0.303	504.2	504.24
505	1	505.16	505.16	0	NA	505.16	505.16
506	8	505.55	506.35	0.8	0.291	506.02	506.12
507	12	506.4	507.32	0.92	0.256	507.05	507.1
508	32	507.78	509	1.22	0.263	508.29	508.23
511	49	510.36	511.53	1.17	0.292	511.09	511.07
512	52	511.58	512.87	1.29	0.282	512.16	512.12
513	4	513.02	513.25	0.23	0.103	513.1	513.06
515	42	514.1	515.35	1.25	0.395	514.92	515.14
516	96	515.52	517	1.48	0.29	516.29	516.26
517	3	517.24	517.62	0.38	0.209	517.38	517.28
519	91	518.34	519.8	1.46	0.357	519.2	519.21
520	17	520	520.58	0.58	0.173	520.3	520.35
521	20	520.63	521.43	0.8	0.204	520.88	520.83
522	1	521.91	521.91	0	NA	521.91	521.91
523	78	522.13	523.81	1.68	0.395	523.16	523.1
524	8	524.08	524.46	0.38	0.134	524.22	524.16
525	13	524.7	525.4	0.7	0.183	524.95	524.95
527	32	526.27	527.18	0.91	0.263	526.89	526.91
528	111	527.21	528.68	1.47	0.316	527.66	527.63
529	4	528.9	529.02	0.12	0.051	528.95	528.94
531	42	530.34	531.81	1.47	0.378	531.16	531.17
532	13	531.96	532.73	0.77	0.246	532.25	532.16
534	2	533.51	533.53	0.02	0.014	533.52	533.52
535	40	534.41	535.67	1.26	0.326	535.16	535.17
537	1	536.68	536.68	0	NA	536.68	536.68
539	110	538.19	540	1.81	0.353	539.11	539.13
542	1	542.17	542.17	0	NA	542.17	542.17
543	48	542.29	543.52	1.23	0.401	543.02	543.1
545	3	544.42	544.57	0.15	0.076	544.5	544.52
546	11	546.26	546.41	0.15	0.05	546.34	546.36
547	50	546.44	547.92	1.48	0.316	547.21	547.28
548	2	548.47	548.49	0.02	0.014	548.48	548.48
551	96	550.15	551.71	1.56	0.395	551.01	551.08
553	1	552.67	552.67	0	NA	552.67	552.67
555	118	554.14	556.18	2.04	0.411	555.03	555.04
556	1	556.3	556.3	0	NA	556.3	556.3
559	69	558.12	559.7	1.58	0.428	558.97	558.98
563	56	562.1	563.66	1.56	0.384	563.14	563.28
567	111	566.09	567.76	1.67	0.341	567.09	567.13
571	75	570.16	571.72	1.56	0.39	571.16	571.19
575	29	574.32	575.76	1.44	0.318	575.04	575.07
578	5	577.85	578.28	0.43	0.157	578.03	578
579	5	578.99	579.38	0.39	0.18	579.22	579.27
582	2	582.02	582.12	0.1	0.071	582.07	582.07
583	37	582.31	583.87	1.56	0.406	583.22	583.19
587	2	587.22	587.27	0.05	0.035	587.25	587.25
591	1	591.13	591.13	0	NA	591.13	591.13
599	4	598.56	599.39	0.83	0.412	599.18	599.38
600	5	599.58	600.42	0.84	0.374	599.94	599.9
605	2	605.3	605.53	0.23	0.163	605.41	605.41
614	7	614	614.57	0.57	0.185	614.35	614.31
615	5	614.97	615.57	0.6	0.229	615.19	615.15
616	1	616.32	616.32	0	NA	616.32	616.32
618	1	618.47	618.47	0	NA	618.47	618.47
620	7	619.28	620.1	0.82	0.315	619.82	620
622	2	621.6	622.43	0.83	0.587	622.01	622.01
624	2	623.39	624.04	0.65	0.46	623.71	623.71
627	2	626.18	626.9	0.72	0.509	626.54	626.54
628	3	627.21	627.65	0.44	0.246	627.49	627.62
632	3	631.6	631.69	0.09	0.046	631.65	631.66
634	1	633.51	633.51	0	NA	633.51	633.51
639	1	639.18	639.18	0	NA	639.18	639.18

Generate cumulative plot for SsaD48

res <- allCum(DB_orig, "SsaD48")
print(res$plt)

Extract Samples with alleles that occur in fewer than three individuals

tab <- getLowFreq(DB_orig, "SsaD48", 3)
if(!is.null(tab)){
  samp <- as.character(tab$Sample)
  idx <- which(colnames(tab) == "Sample")
  tab <- cbind(samp, tab[,-idx])
  rownames(tab) <- NULL
  pander::pandoc.table(tab)
}

samp	DFrow	Reading	Gel
GRB100104	309997	192.9	all
KEL101202	306756	196.9	all
CWT100103	309970	197	all
SXM13216	311629	207.5	all
RBW040504	301182	287.4	all
OOW040102	301943	310.2	all
RBW053907	301916	318.5	all
KEL050512	303785	331.6	all
BLD13033	312074	338.9	all
RBW041511	301372	349.8	all
SXM13002	311142	350.2	all
UBN140115	310570	357.9	all
LND14002	312301	358	all
CGN100801	309023	358.4	all
RMN13232	312020	361.8	all
BLD102304	307725	372	all
RBW053803	301070	373.7	all
FDB040209	302150	373.8	all
CLG050404	303176	385	all
CLG100504	307308	385.1	all
BLD100503	307521	388	all
BLD100509	307533	388	all
BLD100506	307527	388.1	all
KEL040311	305091	392.8	all
SXM13216	311630	401.4	all
LSN101204	309295	401.5	all
CLM050912	302909	412.2	all
RBW043207	301523	412.4	all
CLM100308	307068	412.5	all
SXM102403	308315	421.1	all
CLG050702	303215	421.2	all
BMT100203	309748	429.2	all
BYN100202	309766	429.2	all
BLD13071	312148	461.8	all
UBN140124	310587	461.9	all
UBN140086	310520	461.9	all
BLD13017	312052	486	all
UBN140113	310567	498.2	all
SXM13097	311320	501.9	all
GVY100304	309036	505.2	all
KEL042115	305553	517.2	all
RBW104301	309130	517.3	all
GVY100404	309044	517.6	all
SXM13001	311141	521.9	all
DGH100302	307834	533.5	all
DGH100307	307844	533.5	all
IOM120305	308787	536.7	all
RMN13152	311869	542.2	all
GVY101002	309064	544.4	all
GVY100402	309040	544.5	all
SXM100965	308121	544.6	all
GVY101004	309068	548.5	all
GVY100904	309058	548.5	all
SXM100704	307984	552.7	all
SKW040221	306064	556.3	all
UBN140069	310488	582	all
UBN140016	310391	582.1	all
CLG050107	303139	587.2	all
KEL040708	305249	587.3	all
PRB040228	306424	591.1	all
LND14048	312384	605.3	all
KEL051720	304336	605.5	all
BKB040223	310075	616.3	all
RBW102304	308944	618.5	all
RMN13140	311845	621.6	all
RMN13084	311739	622.4	all
CLG050115	303150	623.4	all
BKB040206	310042	624	all
KEL101908	306912	626.2	all
BRD040706	310611	626.9	all
BRD051508	302646	627.2	all
CWB110204	309961	627.6	all
BKB040227	310083	627.6	all
KGR10030102	309723	631.6	all
KGR10030101	309721	631.7	all
SHK100602	309536	631.7	all
KEL042410	305565	633.5	all
BLD101501	307681	639.2	all

CA054565a

Calculate bin statistics for CA054565a

dat <- BinStats(DB_orig, "CA054565a")
dat2 <- apply(dat[,-1], 2, function(x){
  return(round(as.numeric(unlist(x)), 4))
})
tab <- as.data.frame(cbind(rownames(dat), dat2))
colnames(tab) <- colnames(dat)
pander::pandoc.table(tab)

Bins	N	Min	Max	Range	Sd	MEAN	MEDIAN
101	3	101.46	101.68	0.22	0.124	101.54	101.47
106	17	105.7	106.11	0.41	0.117	105.9	105.9
110	6117	109.36	110.26	0.9	0.122	109.95	109.96
112	64	111.72	112.19	0.47	0.124	111.97	111.97
114	53	113.7	114.09	0.39	0.108	113.9	113.9
116	1	115.95	115.95	0	NA	115.95	115.95
118	2	117.91	117.95	0.04	0.028	117.93	117.93

Generate cumulative plot for CA054565a

res <- allCum(DB_orig, "CA054565a")
print(res$plt)

Extract Samples with alleles that occur in fewer than three individuals

tab <- getLowFreq(DB_orig, "CA054565a", 3)
if(!is.null(tab)){
  samp <- as.character(tab$Sample)
  idx <- which(colnames(tab) == "Sample")
  tab <- cbind(samp, tab[,-idx])
  rownames(tab) <- NULL
  pander::pandoc.table(tab)
}

samp	DFrow	Reading	Gel
CLG040116	314955	101.5	all
LND14055	318724	101.5	all
BCC100124	316645	101.7	all
KEL050512	314103	116	all
KEL050641	314160	117.9	all
RBW040504	312707	118	all

CLG040116 = Peak is legit.
LND14055 = Peak is legit.
BCC100124 = Peak is legit.
KEL050512 = Peak is legit.
KEL050641 = Peak is legit.
RBW040504 = Peak is legit.

DONE!

CA054565b

Calculate bin statistics for CA054565b

dat <- BinStats(DB_orig, "CA054565b")
dat2 <- apply(dat[,-1], 2, function(x){
  return(round(as.numeric(unlist(x)), 4))
})
tab <- as.data.frame(cbind(rownames(dat), dat2))
colnames(tab) <- colnames(dat)
pander::pandoc.table(tab)

Bins	N	Min	Max	Range	Sd	MEAN	MEDIAN
126	2	125.57	125.63	0.06	0.042	125.6	125.6
129	129	129.2	129.73	0.53	0.16	129.46	129.43
131	1	131.28	131.28	0	NA	131.28	131.28
133	4	133.38	133.53	0.15	0.065	133.44	133.43
135	5113	134.82	135.53	0.71	0.116	135.26	135.27
137	25	136.98	137.35	0.37	0.113	137.2	137.22
139	2448	138.69	139.38	0.69	0.117	139.16	139.16
141	174	140.89	141.32	0.43	0.081	141.15	141.14
159	1	159.15	159.15	0	NA	159.15	159.15
161	145	160.95	161.47	0.52	0.108	161.25	161.26
163	3	163	163.32	0.32	0.161	163.17	163.19
164	1	163.89	163.89	0	NA	163.89	163.89
165	2	165.15	165.16	0.01	0.007	165.16	165.16
171	11	171.07	171.4	0.33	0.095	171.19	171.18

Generate cumulative plot for CA054565b

res <- allCum(DB_orig, "CA054565b")
print(res$plt)

Extract Samples with alleles that occur in fewer than three individuals

tab <- getLowFreq(DB_orig, "CA054565b", 3)
if(!is.null(tab)){
  samp <- as.character(tab$Sample)
  idx <- which(colnames(tab) == "Sample")
  tab <- cbind(samp, tab[,-idx])
  rownames(tab) <- NULL
  pander::pandoc.table(tab)
}

samp	DFrow	Reading	Gel
CLG051706	320546	125.6	all
CLG051907	320579	125.6	all
BRD040706	325400	131.3	all
KEL051527	321081	159.2	all
KEL051502	321043	163	all
KEL041307	321902	163.2	all
CLG040311	325634	163.3	all
GGR100804	324738	163.9	all
CLG051013	320456	165.2	all
KEL041310	321906	165.2	all

CLG051706 = Peak is legit.
CLG051907 = Peak is legit.
BRD040706 = Peak is legit.
KEL051527 = Peak is legit.
KEL051502 = Peak is legit.
KEL041307 = Peak is legit.
CLG040311 = Peak is legit.
GGR100804 = Peak is an artifact. Deleted.
CLG051013 = Peak is legit.
KEL041310 = Peak is legit.

DONE!

One101

Calculate bin statistics for One101

dat <- BinStats(DB_orig, "One101")
dat2 <- apply(dat[,-1], 2, function(x){
  return(round(as.numeric(unlist(x)), 4))
})
tab <- as.data.frame(cbind(rownames(dat), dat2))
colnames(tab) <- colnames(dat)
pander::pandoc.table(tab)

Bins	N	Min	Max	Range	Sd	MEAN	MEDIAN
164	199	163.32	164	0.68	0.193	163.73	163.78
168	958	167.18	167.98	0.8	0.18	167.67	167.69
172	2004	170.97	172.01	1.04	0.181	171.67	171.68
176	4664	174.85	176.06	1.21	0.169	175.66	175.69
178	2	177.47	177.87	0.4	0.283	177.67	177.67
180	1645	179.02	180.09	1.07	0.171	179.63	179.63
184	126	183.27	183.98	0.71	0.194	183.69	183.75
192	19	191.41	191.95	0.54	0.197	191.7	191.68

Generate cumulative plot for One101

res <- allCum(DB_orig, "One101")
print(res$plt)

Extract Samples with alleles that occur in fewer than three individuals

tab <- getLowFreq(DB_orig, "One101", 3)
if(!is.null(tab)){
  samp <- as.character(tab$Sample)
  idx <- which(colnames(tab) == "Sample")
  tab <- cbind(samp, tab[,-idx])
  rownames(tab) <- NULL
  pander::pandoc.table(tab)
}

samp	DFrow	Reading	Gel
BLD101405	332309	177.5	all
BLD101602	332320	177.9	all

BLD101405 = Peak is an artifact. Fixed.
BLD101602 = Peak is an artifact. Fixed.

DONE!

CA060177

Calculate bin statistics for CA060177

dat <- BinStats(DB_orig, "CA060177", limit = 0.35)
dat2 <- apply(dat[,-1], 2, function(x){
  return(round(as.numeric(unlist(x)), 4))
})
tab <- as.data.frame(cbind(rownames(dat), dat2))
colnames(tab) <- colnames(dat)
pander::pandoc.table(tab)

Bins	N	Min	Max	Range	Sd	MEAN	MEDIAN
248	7	247.43	247.83	0.4	0.139	247.74	247.8
252	566	251.33	251.96	0.63	0.142	251.72	251.73
253	90	252.3	252.91	0.61	0.134	252.67	252.7
256	1	255.9	255.9	0	NA	255.9	255.9
260	2637	259.34	260	0.66	0.136	259.77	259.82
264	1439	263.4	264.13	0.73	0.142	263.83	263.84
268	3065	267.47	268.28	0.81	0.139	267.9	267.92
272	582	271.61	272.3	0.69	0.151	272.01	272.03
276	376	275.63	276.37	0.74	0.179	276.09	276.13
277	5	276.91	277.15	0.24	0.105	277.05	277.09
280	313	279.72	280.45	0.73	0.165	280.21	280.26
284	878	283.79	284.54	0.75	0.15	284.24	284.26
288	772	287.88	288.57	0.69	0.133	288.3	288.31
292	82	291.96	292.58	0.62	0.145	292.33	292.34
296	91	296.1	296.68	0.58	0.129	296.42	296.41

Generate cumulative plot for CA060177

res <- allCum(DB_orig, "CA060177", limit = 0.35)
print(res$plt)

Identify problems

There seems to be a 1bp shift ~ 253 bp:

  res <- allCum(DB_orig, "CA060177", limit = 0.35, ymin = 250, 
                ymax = 260)
  print(res$plt)

There is another 1bp shift ~ 277bp:

  res <- allCum(DB_orig, "CA060177", limit = 0.35, ymin = 270, 
                ymax = 280)
  print(res$plt)

Extract Samples with alleles that occur in fewer than three individuals

tab <- getLowFreq(DB_orig, "CA060177", 3, limit = 0.35)
if(!is.null(tab)){
  samp <- as.character(tab$Sample)
  idx <- which(colnames(tab) == "Sample")
  tab <- cbind(samp, tab[,-idx])
  rownames(tab) <- NULL
  pander::pandoc.table(tab)
}

samp	DFrow	Reading	Gel
BKB040205	345008	255.9	all

BKB040205 = Peak is legit.

DONE!

Citations

Alberto F (2009) MsatAllele_1. 0: an R package to visualize the binning of microsatellite alleles. Journal of Heredity, 100, 394–397.