Questions
for you
So
far, I have identified the Rhizobium leguminosarum species complex (Rlc)
as a clearly-defined cluster with over 400 genomes that can be split into 18
putative genospecies plus 7 single strains that have no close relatives. I used
a phylogeny of 120 core genes made with fasttree, and Average Nucleotide
Identity values based on whole genomes calculated with fastANI. What else
should we do to make a convincing and useful description of the Rlc? The aim is
to define a set of well-supported genospecies that others can readily assign new
strains to, and to set clear criteria for defining additional genospecies in
the future.
1. Should
we make a phylogeny using a different phylogenetic method, or a different set
of core genes? If so, which?
2. Should
we calculate pairwise genome similarity using a different metric, or different
software to calculate ANI?
3. Should
we look at all the non-core genes, to identify sets of genospecies-specific
genes?
4. Should
we look at recombination rates, to see whether these are higher within than between
species? If so, how?
5. Should
we look at plasmid distributions?
6. Does
“species complex” convey the right level of divergence to describe the Rlc? How
is the term “species complex” used for other groups of species, and how closely
related are the species within them?
7. What
about the single strains with no close relatives? Are they just the first known
members of additional genospecies, or are they some kind of short-lived ‘hybrid’
between species, or are they genomes that were not well assembled for some
reason? How can we tell?
8. What
other questions do we need to answer?
The
results so far are based on the genomes available from NCBI on 25 July 2020. I
have kept an eye on new releases, and there have been an additional 30 genomes
labelled “R. leguminosarum”. I have checked them by fastANI, and 19 are
in the Rlc, in genospecies A, B, C and E, so I will add them to the final
analyses. The other 11 are outside the Rlc, so we can add them to the list of
mislabelled strains and forget about them. Here is the list.
R._leguminosarum_DSM_106839_GCF_014202125.1.fna
|
E
|
R._leguminosarum_DSM_30141_GCF_014138565.1.fna
|
E
|
R._leguminosarum_RCAM0610_GCA_014189555.1.fna
|
E
|
R._leguminosarum_RCAM0626_GCA_014189575.1.fna
|
C
|
R._leguminosarum_RCAM1365_GCA_014189635.1.fna
|
A
|
R._leguminosarum_RCAM2802_GCA_014189655.1.fna
|
C
|
R._leguminosarum_SEMIA_4011_GCF_014205785.1.fna
|
not in Rlc
|
R._leguminosarum_SEMIA_4016_GCF_014200035.1.fna
|
not in Rlc
|
R._leguminosarum_SEMIA_4022_GCF_014200055.1.fna
|
not in Rlc
|
R._leguminosarum_SEMIA_4024_GCF_014200075.1.fna
|
not in Rlc
|
R._leguminosarum_SEMIA_4025_GCF_014207035.1.fna
|
not in Rlc
|
R._leguminosarum_SEMIA_415_GCF_014197955.1.fna
|
not in Rlc
|
R._leguminosarum_SEMIA_416_GCF_014197975.1.fna
|
E
|
R._leguminosarum_SEMIA_421_GCF_014198005.1.fna
|
not in Rlc
|
R._leguminosarum_SEMIA_422_GCF_014198335.1.fna
|
not in Rlc
|
R._leguminosarum_SEMIA_430_GCF_014198015.1.fna
|
not in Rlc
|
R._leguminosarum_SEMIA_445_GCF_014198115.1.fna
|
E
|
R._leguminosarum_SEMIA_449_GCF_014198095.1.fna
|
E
|
R._leguminosarum_SEMIA_459_GCF_014198415.1.fna
|
E
|
R._leguminosarum_SEMIA_460_GCF_014138515.1.fna
|
E
|
R._leguminosarum_SEMIA_463_GCF_014198545.1.fna
|
E
|
R._leguminosarum_SEMIA_475_GCF_014198665.1.fna
|
B
|
R._leguminosarum_SEMIA_481_GCF_014198655.1.fna
|
E
|
R._leguminosarum_SEMIA_482_GCF_014198705.1.fna
|
not in Rlc
|
R._leguminosarum_SEMIA_483_GCF_014198695.1.fna
|
E
|
R._leguminosarum_SEMIA_485_GCF_014198735.1.fna
|
E
|
R._leguminosarum_SEMIA_488_GCF_014206965.1.fna
|
E
|
R._leguminosarum_SEMIA_491_GCF_014198795.1.fna
|
not in Rlc
|
R._leguminosarum_SEMIA_498_GCF_014198195.1.fna
|
E
|
R._leguminosarum_SEMIA_499_GCF_014198835.1.fna
|
E
|
There
is also a new R. laguerreae, but it is just another version of the type
strain under a different name. There is a strain, R. sp. WYCCWR11317,
that is a new member of gsS. There is a corrected UPM1135. If anybody knows of
other new accessions within the Rlc, or is aware of important new genomes that
are just about to be made public, please let me know.
I
hope I still have some readers out there to answer these questions, because
this is a project that is important for the whole community of researchers who
study R. leguminosarum and its relatives, and I would like to create a publication
that will have wide support. I look forward to being overwhelmed by all your
comments!