This document should provide you with the basic steps to do the ancestral state reconstruction of the FL DoH subtype B HIV dataset. To oriantate yourself Im providing a small South African cluster as an example. But for the FL analyses you will need the following:
NOTE! - Make sure that the IDs in the fasta file, in the tree file and in the metadata table are exactly the same
We will perform the dating and ancestral state reconstruction in TreeTime from Richard Neher’s lab.
Download a zip copy from the github link. Move the directory to where you want it and then once inside the folder you can just run:
pip install .
This should install treetime and all dependencies. Once installed you can run treetime to test.
OK, now the first step would be to turn your ML tree topology into a dated topology (i.e. branches are in calander time). This is done with the base treetime function. Something like this:
treetime --aln <input.fasta> -- tree <input.nwk> -- dates <dates.csv>
Im using a single metadata table containing both the dates and locations so my example works like this:
treetime --aln 23106.fasta --tree 23106.phy.tbe.tree --dates 23106.metadata.csv
And then you should see something like this:
Attempting to parse dates...
Using column 'name' as name. This needs match the taxon names in the tree!!
Using column 'date' as date.
0.00 -TreeAnc: set-up
1.95 TreeTime.reroot: with method or node: least-squares
1.96 TreeTime.reroot: rerooting will ignore covariance and shared ancestry.
2.16 TreeTime.reroot: with method or node: least-squares
2.16 TreeTime.reroot: rerooting will ignore covariance and shared ancestry.
3.14 ###TreeTime.run: INITIAL ROUND
11.21 TreeTime.reroot: with method or node: least-squares
11.21 TreeTime.reroot: rerooting will ignore covariance and shared ancestry.
11.40 ###TreeTime.run: ITERATION 1 out of 2 iterations
20.41 ###TreeTime.run: ITERATION 2 out of 2 iterations
28.75 ###TreeTime.run: CONVERGED
Inferred GTR model:
Substitution rate (mu): 1.0
Equilibrium frequencies (pi_i):
A: 0.347
C: 0.1842
G: 0.2239
T: 0.2349
-: 0.01
Symmetrized rates from j->i (W_ij):
A C G T -
A 0 0.8202 3.0221 0.4694 0.6551
C 0.8202 0 0.4392 2.9248 0.8487
G 3.0221 0.4392 0 0.4261 0.7991
T 0.4694 2.9248 0.4261 0 0.7764
- 0.6551 0.8487 0.7991 0.7764 0
Actual rates from j->i (Q_ij):
A C G T -
A 0 0.2846 1.0487 0.1629 0.2273
C 0.1511 0 0.0809 0.5388 0.1563
G 0.6766 0.0983 0 0.0954 0.1789
T 0.1103 0.687 0.1001 0 0.1824
- 0.0065 0.0085 0.008 0.0077 0
Root-Tip-Regression:
--rate: 8.327e-04
--r^2: 0.09
--- saved tree as
2019-04-05_treetime/timetree.pdf
--- root-to-tip plot saved to
2019-04-05_treetime/root_to_tip_regression.pdf
--- alignment including ancestral nodes saved as
2019-04-05_treetime/ancestral_sequences.fasta
--- saved divergence times in
2019-04-05_treetime/dates.tsv
--- tree saved in nexus format as
2019-04-05_treetime/timetree.nexus
--- divergence tree saved in nexus format as
2019-04-05_treetime/divergence_tree.nexus
You can see that this runs pretty quickly on a small dataset of 110 pol sequences. Looking at the output you might think wow this is bad!!! An R^2 of 0.09 and a rate of 8.327e-4 which is way off ~2.0e-3 commonly used for HIV-1 pol subtype C. Though looking at the tree in the pdf treetime.pdf I’m happy! A tMRCA of ~1978 corresponds with what I expect for this cluster based on Marco and my last paper in Sci Reports. We might have to play around with some of the parameters if we find a weird tree.
So the dating prodced a folder called treetime-
#NEXUS
Begin Taxa;
Dimensions NTax=104;
TaxLabels ZA_DR_GT1853_2011.663 ZA_DR_GT2320_2012.375 ZA_DR_WC404_2014.121...
End;
Begin Trees;
Tree tree1=(((ZA_DR_GT1853_2011.663:30.53422... ...0.10000[&date=1978.05];
End;
Copy everyting after Tree tree1= and the rest of the line to a new file and save that as lets say 23106.treetime.nwk in the working directory.
OK! Now we ready for the ancestral state reconstruction. Here we are intrested in a basic mugration analyses between two or more discreet states. In my example these states are different provinces of South Africa (i.e. KZN, GT, WC… ect.). For you it will be only two states (FL vs rest/outside). So we do this with the mugration pacckage of treetime like so:
treetime mugration --tree <input.nwk> --states <states.csv> --attribute <field>
So in my example the attribute field is region:
treetime mugration --tree 23106.treetime.nwk --states 23106.metadata.csv --attribute region
which produces this:
Completed mugration model inference of attribute 'region' for 23106.treetime.nwk
Saved inferred mugration model as: 2019-04-05_mugration/GTR.txt
Saved annotated tree as: 2019-04-05_mugration/annotated_tree.nexus
Now you have a dated tree with inferred ancestral states based on the locations of the external tips of your tree.
OK! So David and I are working on a python script that will read in the tree and write time of each transition event from one state to another to a file. Should be done very shortly.