Analysis was done using compute, polydNdS, and HBKpermute from the analysis package of libsequence1. Alignments can be found on github.

I am operating under the assumption that Ej29, S1033, Ms10, Ej15, and TeoParv are all parviglumis.

Data

  1. Sequences labelled as 32 were found to be mislabelled and instead from GRMZM2G109291 using BLAST to maizegdb
  2. Multiple parvilgumis samples had a duplicated portion of the sequence (bad alignment? cloning error?), which was trimmed
  3. mexicana was removed for initial comparison, as parviglumis is the direct wild ancestor of maize
  4. Realigned sequences

Results

\(F_{ST}\) between maize and parviglumis was very high at 0.6, with an associated p-value of \(p=0.0065\). Of the 5 total SNPs (see table), two are a fixed differences (one nonsynonymous and ony synonymous) between teosinte and maize. This is rare.

file nsam nsites S Singletons nhap hapdiv ThetaW ThetaPi TajD
GRMZM2G109291_final.fasta 10 811 5 2 4 0.777778 0.00313931 0.00331557 0.225145
GRMZM2G109291_teo.fasta 5 811 1 0 2 0.6 0.000852575 0.00106572 1.22474
GRMZM2G109291_maize.fasta 5 811 2 2 2 0.4 0.00118372 0.000986436 -0.972558

Conclusions

Too little diversity to say much, but fixed differences and lower Tajima’s D in maize are consistent with selection.

GRMZM2G323553

Data

  1. Confirmed via BLAST to maizegdb that sequences labelled 10 are from GRMZM2G323553.
  2. Kept all non-mexicana sequences, but only 2 teosinte have >250bp sequence. PCR/cloning failure?
  3. Realigned sequences

Results

\(F_{ST}\) between maize and parviglumis low at 0.0304709, with an associated p-value of \(p=0.3159\). Only one SNP possibly fixed, but \(n_{teo}\) is low so hard to say.

locus nsam nsites S Singletons nhap hapdiv ThetaW ThetaPi TajD
GRMZM2G323553_final.fasta 10 927 15 11 8 0.933333 0.0212943 0.0145471 -1.45945
GRMZM2G323553_teo.fasta 5 927 12 9 5 1 0.0231325 0.0216867 -0.45202
GRMZM2G323553_maize.fasta 5 927 4 4 3 0.7 0.00208243 0.00173536 -1.0938

Conclusions

No strong evidence from \(F_{ST}\), but drop in diversity is very striking.

GRMZM2G144581

Data

  1. Confirmed via blast that Sweet4b is GRMZM2G144581 (not shown).
  2. mexicana removed
  3. Realigned sequences

Results

\(F_{ST}\) between maize and parviglumis was very high at 0.58231, with an associated p-value of \(p=0.0198\). Large number of fixed SNPs (14) and indels (7) seems quite rare. Loss of diversity in maize and lower D again consistent with selection.

locus nsam nsites S Singletons nhap hapdiv ThetaW ThetaPi TajD
GRMZM2G144581_final.fasta 8 1010 33 15 5 0.785714 0.0134823 0.0139982 0.203515
GRMZM2G144581_teo.fasta 3 1010 16 16 3 1 0.0110079 0.0110079 NA
GRMZM2G144581_maize.fasta 5 1010 6 6 2 0.4 0.00295992 0.0024666 -1.14554

Conclusions

Clearer evidence. Loss of diversity in maize, lots of fixed differences and lower Tajima’s D, are consistent with selection.

C2

Data

  1. Confirmed via BLAST this comes from GRMZM2G137954 (not shown)
  2. Removed mexicana
  3. Kept initial alignment – better than automated alignment I ran

This is a ton of diversity. Are we confident no sequencing error/cloning issues?

Results

Testing the maize “haploype” – excluding Mo20W – in addition to maize. \(F_{ST}\) between teosinte and the maize haplotype is 0.76 (\(p=0.0279\)). This drops to \(F_{ST}0.459016\) and \(p=0.0478\) if all maize sequences are used. Still very high!

locus nsam nsites S Singletons nhap hapdiv ThetaW ThetaPi TajD
c2_final.fasta 9 609 7 3 4 0.75 0.0350416 0.0396825 0.601123
c2_teo.fasta 4 609 3 2 3 0.833333 0.012685 0.0129199 0.167656
c2_maize.fasta 5 609 29 29 5 1 0.0321478 0.0267898 -1.24515
c2_maize_hap.fasta 4 609 5 5 4 1 0.00589044 0.00539957 -0.796844

As expected from manual evaluation of alignment, Mo20W affects things a ton. Perhaps come back and look at LD here like in Wills et al.?2 Also not surprising – sample size of teosinte is \(n=2\) for most of sequence, so difficult to draw many conclusions.

Conclusions

Indel looks promising, but too few sequences to make good conclusions about diversity. Mo20W complicates things as well. Adaptation from standing variation? Partial sweep?

To Do

  1. Look at synonymous/nonsynonymous? Seems un-necessary unless fixed differences want to be characterized (as in GRMZM2G109291).
  2. Diversity along Sweet4d?
  3. Get physical location of genes. Estimate width of sweep? Compare with diversity stuff?
  4. Comparison to other loci outside of region? Will further provide evidence for sweep, but not for specific locus.
  5. More formal tests (require more data)?
  6. Add back in mexicana to increase sample size?

  1. Thornton, Kevin. “Libsequence: a C++ class library for evolutionary genetic analysis.” Bioinformatics 19.17 (2003): 2325-2327. ## GRMZM2G109291

  2. Wills, David M., et al. “From many, one: Genetic control of prolificacy during maize domestication.” PLoS genetics 9.6 (2013): e1003604. APA