Sunday, August 2, 2020

Towards an understanding of Rhizobium leguminosarum

I have been trying to understand the diversity of Rhizobium leguminosarum  for most of my career. When I first started work on R. leguminosarum in the 1980s, our concept of this species was in the middle of a radical change. Until then, Rhizobium species had been defined by their host range, so strains nodulating peas and related legumes were R. leguminosarum, those nodulating clover were R. trifolii, and those nodulating Phaseolus beans were R. phaseoli. However, the taxonomic evidence indicated that they were not really distinct apart from their host range, so D. C. Jordan proposed (in Bergey’s Manual, 1984) that they be amalgamated as three biovars within a single species, R. leguminosarum. My colleagues at the John Innes Institute had provided genetic evidence to support this, since they had converted “R. phaseoli” and “R. trifolii” into “R. leguminosarum” by transferring a symbiosis plasmid (Johnston et al. 1978. Nature 276, 634–636). My own contribution was to provide evidence from population genetics: I showed that isolates from pea, clover and bean shared the same pool of genetic variation (Young 1985, https://doi.org/10.1099/00221287-131-9-2399). The concept of biovars as symbiosis phenotypes that can be transferred horizontally between strains has become firmly established and we see that it applies to rhizobia in general, not just R. leguminosarum. They are now called ‘symbiovars’ because they are specifically concerned with symbiosis, and it is likely that other ‘adaptive packages’ exist.

Much has changed since then. In 1993, R. etli was split off from R. leguminosarum (Segovia et al. 193, https://doi.org/10.1099/00207713-43-2-374), and many other species in the leguminosarum-etli clade have been defined since. In 2006, we published the first genome sequence of an R. leguminosarum strain (Young et al. 2006, https://doi.org/10.1186/gb-2006-7-4-r34). The sequence was determined by an expert team at the Sanger Institute, using Sanger sequencing (of course). It took three years and cost £350 000. We never really considered sequencing the type strain because it was geneticists who needed the genome sequence, not taxonomists, and nobody had ever used the type strain for any genetic study. We discussed whether to sequence the strain 8401pRL1 (now called A34), because this had been used most extensively for genetic studies, but it was one of the “artificial” strains created by substituting the symbiosis plasmid, and I argued that we needed something closer to “natural” as the representative of the species, so we settled on strain 3841, which is derived from the field isolate 300 with just a spontaneous mutation to streptomycin resistance.

That genome sequence has been a very important reference point, but at that high cost we were clearly not going to get a lot more genomes. The next game-changer was the development of next-generation sequencing. In late 2005, when the genome sequence had been submitted for pubication, I wrote a grant proposal to explore the genes shared by different rhizobia that were living together, using microarrays. By the time this work was eventually funded, our university had acquired a 454 sequencer, and my postdoc Xavier Bailly was  smart enough to suggest that we diverted the project from microarrays to genome sequencing. We published the Sinorhizobium medicae part of the study in 2011 (Bailly et al. 2011, http://dx.doi.org/10.1038/ismej.2011.55), but the R. leguminosarum results were more complex and did not appear until 2015 (Kumar et al. 2015, http://dx.doi.org/10.1098/rsob.140133). Despite very low genome coverage (all we could afford!), they delivered a new view of R. leguminosarum as a complex of related but distinct genospecies. Later work has shown that the five genospecies that we found in a single square metre are, in fact, widespread in Europe (Cavassim et al. 2020, https://doi.org/10.1099/mgen.0.000351) and that, unsurprisingly, there are more than five genospecies in the complex (Boivin et al. 2020, https://doi.org/10.1111/nph.16392).

As a result of recent hard work by many different people, there are now more than 800 Rhizobium genomes in the public databases, of which about half are in the R. leguminosarum complex. We are in the midst of another revolution in our understanding of this species, and I think the time is right to develop a new definition of the R. leguminosarum complex (Rlc) and the genospecies within it, based on genome sequences.

I have already started some analyses of the available genome data, and I aim to share some of these in future posts to this blog. Eventually, I will assemble the more convincing analyses into a formal publication, but I see this as a joint enterprise by the R. leguminosarum research community, and I expect the publication to have many authors. This is, if you like, an experiment in “open research”, and I welcome comments, criticisms and suggestions, as well as offers of help, from anyone who is interested in joining in.

In my next post, I will look at the set of genomes that is available.

These posts are also available on my rhizobium blog at https://rhizobium.wordpress.com/

No comments:

Post a Comment