Friday, August 28, 2020

Rhizobium leguminosarum 11

 


Genospecies B

 

We are still working our way through the previously described genospecies of the Rlc, and have reached gsB. Here is the part of the phylogeny that includes this genospecies.

 

 



In the centre is a tight clade on a long branch with a large number of strains that are definitely gsB. At the top are two strains, WSM1455 and WSM1481, that are the sister group of this clade, and it is not immediately clear whether we should include them within gsB or keep them separate. Let’s call them clade J. At the bottom is a clade of six strains whose relationship to gsB is not very close, or very certain, so probably need to be considered a new genospecies, but we’ll call them clade K for now. We need ANI values to help us with these decisions. Here are ANI values using strain 3841 as reference.

 



The core of gsB is very tight – all above 98% ANI. The pair of strains in clade J, WSM1481 and WSM 1455, have ANI of 95.99 and 95.96, right on the 96% boundary, so there is no clear verdict on whether to include them in gsB or not. Strains in clade K have ANI between 95.05 and 95.61, which could be used to justify a separate genospecies.

 

Taking WSM1455 as the reference for clade J, WSM1481 has an ANI of 98.69, core gsB strains range from 95.78 to 96.06, clade K from 95.11 to 95.61. Again, we see that clade J is right at the boundary of gsB. My instinct is to exclude this clade from gsB, because the core gsB strains form such a tight cluster that they probably share many genospecies-specific characters that clade J strains do not have. For now, I will treat clade J as a new genospecies, gsJ.

 

For clade K, I took FA23 (first to be sequenced, sv. phaseoli) as the reference. The four JHI strains in clade K have ANI between 97.37 and 98.37, but for Vaf12 the value is just 95.68, lower than for some core gsB strains. This suggests that we should consider clade K as a new genospecies, gsK, but exclude Vaf12, which stands by itself.

 

Thus we have representatives of four potential genospecies in the phylogenetic clade that we are considering: gsB, gsJ, gsK and strain Vaf12. I am not going to give genospecies names if there is only a single representative. These strains already have a name, so little is gained by giving them an extra one.

 

The gsB strains are closely related, but they have been isolated in numerous different projects and places. The SM strains in gsB are all sv. trifolii from one site in western England. Strain 22B is also sv. trifolii, but from Russia. I think all the other strains are sv. viciae. The JHI strains (from the James Hutton Institute) are from the UK, especially Scotland, while the strains contributed by Stéphane Boivin and Marc Lepetit, with complicated names like RSP1E6, are from various countries in continental Europe. Then there are a couple of CCBAU strains from China. Finally, there are two strains that have supported important genetic studies for very many years.  VF39 is  from Germany, and 3841 is the strain from eastern England that has served as the R. leguminosarum reference genome since we published the sequence in 2006.

 

Here are lists of the strains we have been discussing, ranked by ANI similarity to 3841. Potential type strains are in bold.

 

gsB

R._leguminosarum_3841.fna

R._leguminosarum_JHI960.fna

R._leguminosarum_JHI963.fna

R._leguminosarum_SM30_gsB.fna

R._leguminosarum_SM35_gsB.fna

R._leguminosarum_SM37_gsB.fna

R._leguminosarum_SM40_gsB.fna

R._leguminosarum_SM24_gsB.fna

R._leguminosarum_SM31_gsB.fna

R._leguminosarum_SM10_gsB.fna

R._leguminosarum_SM38_gsB.fna

R._leguminosarum_SM39_gsB.fna

R._leguminosarum_SM20_gsB.fna

R._leguminosarum_SM5_gsB.fna

R._leguminosarum_SM6_gsB.fna

R._leguminosarum_SM21_gsB.fna

R._leguminosarum_SM15_gsB.fna

R._leguminosarum_SM27_gsB.fna

R._leguminosarum_SM14_gsB.fna

R._leguminosarum_SM16_gsB.fna

R._leguminosarum_SM18_gsB.fna

R._leguminosarum_SM22_gsB.fna

R._leguminosarum_SM25_gsB.fna

R._leguminosarum_SM19_gsB.fna

R._leguminosarum_JHI974.fna

R._leguminosarum_JHI973.fna

R._leguminosarum_SM13_gsB.fna

R._leguminosarum_SM34_gsB.fna

R._leguminosarum_SM3_gsB.fna

R._leguminosarum_SPF4F3.fna

R._leguminosarum_JHI1600.fna

R._leguminosarum_JHI1587.fna

R._leguminosarum_P1NP2H.fna

R._leguminosarum_RSF2G1.fna

R._leguminosarum_SM36_gsB.fna

R._leguminosarum_JHI535.fna

R._leguminosarum_JHI585.fna

R._leguminosarum_P1NP1J.fna

R._leguminosarum_SM11_gsB.fna

R._leguminosarum_JHI1415.fna

R._leguminosarum_SM7_gsB.fna

R._leguminosarum_SM4_gsB.fna

R._leguminosarum_JHI13.fna

R._leguminosarum_VF39.fna

R._leguminosarum_SM12_gsB.fna

R._leguminosarum_SM32_gsB.fna

R._leguminosarum_SM9_gsB.fna

R._leguminosarum_SM17_gsB.fna

R._leguminosarum_22B.fna

R._leguminosarum_RSP1E6.fna

R._leguminosarum_P1NP2K.fna

R._leguminosarum_CCBAU65264.fna

R._leguminosarum_CCBAU03058.fna

 

gsJ

R._leguminosarum_WSM1481.fna

R._leguminosarum_WSM1455.fna

 

gsK

R._leguminosarum_JHI10.fna

R._leguminosarum_FA23.fna

R._leguminosarum_JHI2450.fna

R._leguminosarum_JHI2451.fna

R._leguminosarum_JHI54.fna

 

unique

R._leguminosarum_Vaf12.fna

 

 

Wednesday, August 26, 2020

Rhizobium leguminosarum 10

Genospecies A

Here is the part of the tree that includes genospecies A.

 


 The majority of the strains are SM strains isolated by Sara Moeskjær in our NCHAIN project, and we already know that these are gsA (Cavassim et al. 2020, https://doi.org/10.1099/mgen.0.000351). The question is how many of the other strains should be included in the genospecies. What about WSM78? What about that clade of five strains at the top? ANI can help us to answer these questions. Here is a plot of ANI for the Rlc, using CC275e as the reference.

 


There is a large gap in the ANI values between WSM78 (96.44) and SRDI943 (94.74), so WSM78 is definitely in gsA and the other five strains are definitely out. It is clear that they form a new genospecies – let’s call it genospecies H and make WSM1325 the reference strain, since this was one of the earliest Rlc strains to be sequenced and has a finished genome sequence. I haven’t calculated ANI within gsH, but it is obvious from the phylogeny that the strains are highly similar and the values are going to be high.

 

One interesting observation about gsA is that all the genome sequences available so far are of symbiovar trifolii. All the SM strains in gsA were from white clover in Denmark. CC275e was isolated in Australia from white clover, 9B in Russia (red clover), T88 in Colombia (red clover), and WSM78 in Australia (unspecified host, but it has a trifolii nodD sequence).

 

Here is a list of the strains in gsA:

R._leguminosarum_CC275e.fna

R._leguminosarum_SM138A_gsA.fna

R._leguminosarum_SM145C_gsA.fna

R._leguminosarum_SM140B_gsA.fna

R._leguminosarum_SM128A_gsA.fna

R._leguminosarum_SM128B_gsA.fna

R._leguminosarum_SM131_gsA.fna

R._leguminosarum_9B.fna

R._leguminosarum_SM146A_gsA.fna

R._leguminosarum_SM123_gsA.fna

R._leguminosarum_SM154A_gsA.fna

R._leguminosarum_SM154B_gsA.fna

R._leguminosarum_SM151B_gsA.fna

R._leguminosarum_SM151A_gsA.fna

R._leguminosarum_SM163B_gsA.fna

R._leguminosarum_SM138B_gsA.fna

R._leguminosarum_SM152A_gsA.fna

R._leguminosarum_SM154C_gsA.fna

R._leguminosarum_SM144B_gsA.fna

R._leguminosarum_SM137B_gsA.fna

R._leguminosarum_SM152B_gsA.fna

R._leguminosarum_SM152C_gsA.fna

R._leguminosarum_SM155B_gsA.fna

R._leguminosarum_SM130B_gsA.fna

R._leguminosarum_SM155A_gsA.fna

R._leguminosarum_SM144A_gsA.fna

R._leguminosarum_SM136A_gsA.fna

R._leguminosarum_SM163A_gsA.fna

R._leguminosarum_SM145A_gsA.fna

R._leguminosarum_SM155C_gsA.fna

R._leguminosarum_SM146B_gsA.fna

R._leguminosarum_SM140A_gsA.fna

R._leguminosarum_SM145B_gsA.fna

R._leguminosarum_SM130A_gsA.fna

R._leguminosarum_WSM78.fna

 

Here are the strains in gsH:

R._leguminosarum_SRDI943.fna

R._leguminosarum_WSM1325.fna

R._leguminosarum_CB2179.fna

R._leguminosarum_WSM1328.fna

R._leguminosarum_WSM409.fna

 


Tuesday, August 25, 2020

Rhizobium leguminosarum 9

 

How do we define the Rhizobium leguminosarum species complex?

Before continuing our tour of the Rhizobium leguminosarum species complex (Rlc), I want to pause to consider what we are trying to achieve. I was prompted to address this important question by a comment that Stéphane Boivin and Marc Lepetit have made on my recent post Rhizobium leguminosarum 5. Here is their comment:

We are enthusiastic about the idea of clarifying/defining the boundaries of the leguminosarum complex species. This will certainly help the community. There is no doubt that you are the best expert to solve this question. We will be pleased to help, if necessary, although we are specialists in systematic.
We read you posts with great interest. Thank you to give us the historical perspective and for initiating the debate for a new rationale organization of R leguminosarum , based on our current knowledge.
In your first tree you suggest to restrict Rlc to the large green clade. What do you think about extending this complex to the Anhuiense Gs and more generally to all related bacteria that nodulate clover, pea, fababean and bean that share closely related symbiotic clusters on symbiotic plasmid? The general objective might be to define a large leguminosarum complex gathering all “leguminosarum” symbiovars? We know, for example, that bacteria of the symbiovar viciae may belong to R anuihense R pisi or R binae… It is apparently the same story for symbiovars trifoli or phaseoli… To our understanding, Anhuiense Gs were defined as separate species only because their ANI with R leguminosarum Gs are lowers than the arbitrary limit. But, up to now there is little evidence suggesting that their symbiotic characteristics may strongly differ from the other leguminosarum (ie nodulating clover, pea, faba etc and possibly associated as PGP with non legume plants). To our knowledge, there is no clear rule and/or ANI threshold to define a complex species boundaries. Is this aim completely heterodox or unrealistic?

Thank you for your comment, and for pointing out, quite rightly, that the nodulation genes that define symbiovars viciae, trifolii and phaseoli have a wider host range than just the “green clade”. Before discussing this, I want to make a clear distinction between phenotypic classifications and taxonomic classifications. Taxonomy is, or should be, based on phylogenetic clades – each consisting of an ancestral organism and all its descendants. Each species should be a clade within its genus, each genus a clade within its family, and so on. Fortunately, we know that bacteria do have a single true phylogeny – every bacterial cell arises by division of exactly one parent cell. If we could see all the cell divisions since the last common ancestor, we would know the true phylogeny. Unfortunately, we can’t, so we try to reconstruct it from the sequences of genes. We choose those genes that are most likely to have been handed down vertically from parent to offspring, namely core genes with essential functions. Even these may occasionally be replaced by versions that come into the cell from other bacteria, but such transfers usually only affect one or a few genes at a time, in one lineage at a time, so we hope that if we use a large number of core genes, such disturbances will be diluted out and we will have an approximation to the true phylogeny.

In bacteria, many important adaptations to the environment are not gained by mutation of the core genes, but by acquisition of functional modules encoded by ready-made sets of genes that are transferred from other bacteria. The evolutionary history of these genes is clearly not going to match that of the bacterial cells, so they cannot be used for phylogeny-based taxonomy. In rhizobia, the symbiosis genes are the best-known example of an accessory module of this kind, and their history of horizontal transfer has been extensively documented (see Andrews et al. 2018 https://doi.org/10.3390/genes9070321).  That is why the taxonomy subcommittee pointed out, in its Minimal Standards document (de Lajudie et al. 2019 https://doi.org/10.1099/ijsem.0.003426), that symbiotic properties, though certainly interesting and important phenotypes, cannot be used as taxonomic characters when defining new rhizobial taxa.

If I understand correctly, you are proposing that we should define the R. leguminosarum species complex as including all species in which one or more of the symbiovars viciae, trifolii or phaseoli have been found. This could potentially be the whole leguminosarum-etli clade, which includes R. leguminosarum, laguerreae, sophorae, indicum, etli, anhuiense, ecuadorense, acidisoli, hidalgonense, vallis, pisi (syn. fabae), bangladeshense, sophoriradices, phaseoli, esperanzae, aethiopicum (syn. aegypticum), binae, lentis, and probably others. This clade is one of the major subdivisions of the genus Rhizobium and could perhaps be given a formal name as a subgenus. It is the clade with blue branches in the phylogeny I showed in the Rhizobium leguminosarum 3 post, and is a very distinct group on a long, well-supported branch. There is no guarantee, of course, that all species in this clade will match your phenotypic criterion: there may be species not yet described that do not, or even cannot, form a root nodule symbiosis. From what we know so far, though, it seems that the ability to host and express these particular sets of symbiosis genes may be a shared property of the species within this clade.

This leguminosarum-etli clade is much wider than the grouping that I want to focus on at the moment, which mostly consists of strains that are still called R. leguminosarum. You are right that there is no clear rule to define a species complex, because this is not a formal taxonomic category, although the term has been used in other bacterial groups, most notably for the Burkholderia cepacia complex, known as the Bcc (e.g. Mahenthiralingam et al. 2015 https://www.nature.com/articles/nrmicro1085). The Rlc is a unit of a comparable size to the Bcc (I should probably drop the italics and just call it the Rlc). It consists of a number of genospecies that are closely related but diverged enough that they meet the customary criteria for defining separate species. Indeed, several of them have already been given their own species names, but so far this has been done one by one in a haphazard way with no overall view of their place within the Rlc.

To summarise, the group that I am currently concerned with and am defining as the Rlc is the clade on a green background in the phylogeny I showed in the Rhizobium leguminosarum 3 post. It is an appropriate size to be called a species complex. The larger group that you are interested in is the whole leguminosarum-etli clade, shown as blue branches in the phylogeny. This is of an appropriate size to be defined as a subgenus, though I am not proposing to do that right now (it would require that the rest of the genus was also split into named subgenera). This larger grouping definitely needs a lot of work to clarify its structure, but that is work for the future. My immediate aim is to tackle the smaller grouping first.

Friday, August 21, 2020

Rhizobium leguminosarum 8

 

Genospecies D

The sister group of gsE looks like this:


There is a tight clade that includes the 5 strains from clover in Denmark that were identified as genospecies D in the study by Cavassim et al. (2020, https://doi.org/10.1099/mgen.0.000351), as well as WSM80 and WSM448 (from unspecified root nodules in Australia) and L145 (lentil in France). Then there are two strains, CC278f (clover in USA) and Norway (Lotus in Norway) that are more distant from this group and from each other.


The ANI plot for all strains in the Rlc looks like this, using SM51 as the reference strain:

 

The lowest ANI in the tight clade is 98.77, whereas ANI for Norway is 95.62 and for CC278f 94.99 (lower than for some gsE strains). Although the phylogeny places them on the gsD branch, these two outlying strains have similar ANI to gsE as to gsD. Against the gsE reference USDA2370, ANI for Norway is 94.54 and for CC278f it is 95.00. Between the two strains, ANI is 95.81 or 95.74 (depending on direction). I conclude that these two strains cannot be placed in gsD or gsE, and possibly represent two separate species.