Bootstrap Support Values for Genome Distances.
Abstract: We have recently developed a distance metric for efficiently estimating the number of substitutions per site between unaligned genome sequences. These substitution rates are called “anchor distances” and can be used for phylogeny reconstruction. Most phylogenies come with bootstrap support values, which are computed by resam- pling with replacement columns of homologous residues from the original alignment. Unfortunately, this method cannot be applied to anchor distances, as they are based on approximate pairwise local alignments rather than the full multiple sequence alignment necessary for the classical bootstrap. We explore two alternatives: pairwise bootstrap and quartet analysis, which we compare to classical bootstrap. With simulated sequences pairwise and 53 human primate mitochondrial genomes bootstrap gives better results than quartet analysis. However, when applied to 29 E. coli genomes, quartet analysis comes closer to the classical bootstrap.
Data Sets.
Here follow all the data sets used for evaluation in the paper.
- Eco29: A set of 29 E. coli / Shigella genomes. The reference alignment.
- Human mitochondrial genomes
Programs and Tools.
For the analysis of the support values we have written quite a number of programs. Also we used scripts to conveniently wrap other programs like phylip.
- clustDist: Clusters sequences according to a distance matrix
- dnaDist: Compute a distance matrix from an alignment
- andi: Efficiently estimates evolutionary distances
- correlation.js: Correlate values from two consense runs
- afra: Compute alignment-free support values