Tuesday, September 15, 2020

Rhizobium leguminosarum 17

Questions for you

 

So far, I have identified the Rhizobium leguminosarum species complex (Rlc) as a clearly-defined cluster with over 400 genomes that can be split into 18 putative genospecies plus 7 single strains that have no close relatives. I used a phylogeny of 120 core genes made with fasttree, and Average Nucleotide Identity values based on whole genomes calculated with fastANI. What else should we do to make a convincing and useful description of the Rlc? The aim is to define a set of well-supported genospecies that others can readily assign new strains to, and to set clear criteria for defining additional genospecies in the future.

 

1.     Should we make a phylogeny using a different phylogenetic method, or a different set of core genes? If so, which?

2.     Should we calculate pairwise genome similarity using a different metric, or different software to calculate ANI?

3.     Should we look at all the non-core genes, to identify sets of genospecies-specific genes?

4.     Should we look at recombination rates, to see whether these are higher within than between species? If so, how?

5.     Should we look at plasmid distributions?

6.     Does “species complex” convey the right level of divergence to describe the Rlc? How is the term “species complex” used for other groups of species, and how closely related are the species within them?

7.     What about the single strains with no close relatives? Are they just the first known members of additional genospecies, or are they some kind of short-lived ‘hybrid’ between species, or are they genomes that were not well assembled for some reason? How can we tell?

8.     What other questions do we need to answer?

 

The results so far are based on the genomes available from NCBI on 25 July 2020. I have kept an eye on new releases, and there have been an additional 30 genomes labelled “R. leguminosarum”. I have checked them by fastANI, and 19 are in the Rlc, in genospecies A, B, C and E, so I will add them to the final analyses. The other 11 are outside the Rlc, so we can add them to the list of mislabelled strains and forget about them. Here is the list.

 

R._leguminosarum_DSM_106839_GCF_014202125.1.fna

E

R._leguminosarum_DSM_30141_GCF_014138565.1.fna

E

R._leguminosarum_RCAM0610_GCA_014189555.1.fna

E

R._leguminosarum_RCAM0626_GCA_014189575.1.fna

C

R._leguminosarum_RCAM1365_GCA_014189635.1.fna

A

R._leguminosarum_RCAM2802_GCA_014189655.1.fna

C

R._leguminosarum_SEMIA_4011_GCF_014205785.1.fna

not in Rlc

R._leguminosarum_SEMIA_4016_GCF_014200035.1.fna

not in Rlc

R._leguminosarum_SEMIA_4022_GCF_014200055.1.fna

not in Rlc

R._leguminosarum_SEMIA_4024_GCF_014200075.1.fna

not in Rlc

R._leguminosarum_SEMIA_4025_GCF_014207035.1.fna

not in Rlc

R._leguminosarum_SEMIA_415_GCF_014197955.1.fna

not in Rlc

R._leguminosarum_SEMIA_416_GCF_014197975.1.fna

E

R._leguminosarum_SEMIA_421_GCF_014198005.1.fna

not in Rlc

R._leguminosarum_SEMIA_422_GCF_014198335.1.fna

not in Rlc

R._leguminosarum_SEMIA_430_GCF_014198015.1.fna

not in Rlc

R._leguminosarum_SEMIA_445_GCF_014198115.1.fna

E

R._leguminosarum_SEMIA_449_GCF_014198095.1.fna

E

R._leguminosarum_SEMIA_459_GCF_014198415.1.fna

E

R._leguminosarum_SEMIA_460_GCF_014138515.1.fna

E

R._leguminosarum_SEMIA_463_GCF_014198545.1.fna

E

R._leguminosarum_SEMIA_475_GCF_014198665.1.fna

B

R._leguminosarum_SEMIA_481_GCF_014198655.1.fna

E

R._leguminosarum_SEMIA_482_GCF_014198705.1.fna

not in Rlc

R._leguminosarum_SEMIA_483_GCF_014198695.1.fna

E

R._leguminosarum_SEMIA_485_GCF_014198735.1.fna

E

R._leguminosarum_SEMIA_488_GCF_014206965.1.fna

E

R._leguminosarum_SEMIA_491_GCF_014198795.1.fna

not in Rlc

R._leguminosarum_SEMIA_498_GCF_014198195.1.fna

E

R._leguminosarum_SEMIA_499_GCF_014198835.1.fna

E

 

 

There is also a new R. laguerreae, but it is just another version of the type strain under a different name. There is a strain, R. sp. WYCCWR11317, that is a new member of gsS. There is a corrected UPM1135. If anybody knows of other new accessions within the Rlc, or is aware of important new genomes that are just about to be made public, please let me know.

 

I hope I still have some readers out there to answer these questions, because this is a project that is important for the whole community of researchers who study R. leguminosarum and its relatives, and I would like to create a publication that will have wide support. I look forward to being overwhelmed by all your comments!

No comments:

Post a Comment