Monday, August 17, 2020

Rhizobium leguminosarum 5

 

Average Nucleotide Identity

In recent years, genome-based ANI has taken over from the old laboratory technique of DNA-DNA hybridisation (DDH) as the main criterion for deciding whether two strains belong to the same species. It was designed to mimic certain features of DDH (an approach that was developed in the 1960s, before DNA sequencing had been invented). The principle is to identify all stretches of DNA in one genome that are deemed to ‘match’ sequences in a second genome, and to calculate the percentage of identical nucleotides in these stretches. Like DDH, the set of sequences used is different for every pairwise comparison, because it is a mixture of universal core genes and those accessory genes that happen to be in common between the two strains. Like DDH, there is a threshold value above which two strains are considered to be conspecific – this is somewhere around 95-96%. Like DDH, there is also another arbitrary threshold that is hidden and seldom discussed, namely the threshold similarity that is used to decide whether to compare two stretches of sequence in the first place. Like DDH, there is no standard method for calculating ANI, and different methods may give slightly different results (Palmer et al. 2020,  https://doi.org/10.1099/ijsem.0.004124). Like DDH, the results are not expected to be exactly symmetric (the ANI of A to B is not the same as the ANI of B to A). 


In my view, a phylogeny based on a sufficiently large number of core genes is a safer basis for taxonomy than ANI, particularly in bacteria such as rhizobia that have large and variable accessory genomes, but it is important to look at ANI because it provides a different perspective and is widely used in bacterial taxonomy. I chose to use FastANI (Jain et al. 2018, https://doi.org/10.1038/s41467-018-07641-9) because it really is fast and gives comparable results to other methods in the critical 90-100% range (Palmer et al. 2020).

 

 

 

Here is a plot of ANI, ranked from highest to lowest, for 765 genomes labelled “Rhizobium” in GenBank, compared to USDA2370, the type strain of R. leguminosarum (which, in turn, is the type species of the genus Rhizobium). ANI does a good job of separating different levels of relatedness within Rhizobium, reflecting exactly the nested clades we observed in the phylogeny in post 3. USDA2370 belongs to genospecies E, and all other gsE genomes have ANI > 97.40. Then there is a clear gap before the gsD genomes (ANI 94.54 – 95.00). We know that gsD and gsE are closely related (Cavassim et al. 2020, https://doi.org/10.1099/mgen.0.000351). After another gap, the rest of the genospecies of the R. leguminosarum species cluster (Rlc) form a continuous group (ANI 92.15 – 93.52). There is another gap followed by 11 genomes of R. anhuiense (ANI 91.33 – 91.55). We saw in the phylogeny that this species is the sister group of the Rlc. There is then a large gap punctuated by a single spot. This is Rhizobium sp. L43 (ANI 90.18), a strain with no close relatives that was also the sister taxon of the Rlc-anhuiense clade in the phylogeny.

Tong et al. (2018, https://doi.org/10.1016/j.syapm.2018.03.001) have previously shown that R. anhuiense is the sister taxon of the Rlc, based on a phylogeny of 1458 shared genes, and their ANI and dDDH values support this. Their highest ANI and dDDH values for strain L43 were with R. anhuiense and the Rlc, but in their phylogeny it is not closest neighbour of these taxa, so the affinities of this strain need more investigation (but not now, since it is clearly outside the Rlc). Incidentally, the phylogeny that Tong et al. obtained from concatenated atpD-glnII-recA (their Fig. 3) shows L43 within the Rlc, illustrating the hazards of basing taxonomy on just a few genes.

Returning to our ANI plot, the rest of the leguminosarum/etli clade (with multiple species) has ANI 86.68 – 89.32. After that is a large drop to R. alamii at ANI 83.16, the sister taxon of the leguminosarum-etli clade, reflecting a long branch in the tree. After that, there are no more breaks in ANI, even at the boundary of the genus Rhizobium. Indeed, there is some overlap in ANI values between strains within Rhizobium and those in related genera. This seems a little surprising, considering that the phylogeny shows a long, well supported branch at the base of the genus, but ANI (and especially FastANI) is not very sensitive at these lower values. ANI also suffers from the general problem with pairwise distance measures, that slowly evolving lineages (with short branches) can give high similarity scores even if they are phylogenetically distant.

In summary, ANI fully corroborates the phylogeny and can identify clear breakpoints that define genospecies, the Rlc, and higher groupings within the genus Rhizobium. FastANI is not, however, useful for defining the genus boundary. In the following posts, we will look at each of the genospecies of the Rlc and the strains that are included in them.

No comments:

Post a Comment