Bootstrap Support Values for Genome Distances.

Abstract: We have recently developed a distance metric for efficiently estimating the number of substitutions per site between unaligned genome sequences. These substitution rates are called “anchor distances” and can be used for phylogeny reconstruction. Most phylogenies come with bootstrap support values, which are computed by resam- pling with replacement columns of homologous residues from the original alignment. Unfortunately, this method cannot be applied to anchor distances, as they are based on approximate pairwise local alignments rather than the full multiple sequence alignment necessary for the classical bootstrap. We explore two alternatives: pairwise bootstrap and quartet analysis, which we compare to classical bootstrap. With simulated sequences pairwise and 53 human primate mitochondrial genomes bootstrap gives better results than quartet analysis. However, when applied to 29 E. coli genomes, quartet analysis comes closer to the classical bootstrap.

Data Sets.

Here follow all the data sets used for evaluation in the paper.

Programs and Tools.

For the analysis of the support values we have written quite a number of programs. Also we used scripts to conveniently wrap other programs like phylip.