16S: the full story
The 16S ribosomal RNA sequences of the type strains of Rhizobium laguerreae, R. sophorae, R. ruizarguesonis and R. indicum are all identical to that of the type strain of R. leguminosarum. Even the type strain of the sister taxon R. anhuiense has the same sequence. From this, it would be reasonable to guess that all members of the Rlc had this sequence, but the truth is very different. In fact, I found 18 distinct 16S sequences among the available genomes – though these certainly do not correspond to the 18 genospecies. That does not include a further 5 variants that were only found in a single strain and differed by a single nucleotide from a common variant, which I discounted on the grounds that they might be sequencing errors. There were also three genome assemblies that had no 16S sequence, and three more in which it was incomplete – clearly these are errors in the assembly, since 16S is essential.
The 'type' sequence is certainly the predominant one, found in 286 of the 440 genomes, but there are three places in the 16S that have significant levels of polymorphism within the Rlc. Kumar et al. (2015, http://dx.doi.org/10.1098/rsob.140133) found a single polymorphic site in their sample (position 1069 in their numbering, 1151 in my alignment, which includes the IVS). They found this was T in gsA and gsB, C or A in gsC, A in gsD, C in gsE. With a much larger set of genomes, this remains broadly true, though the picture is less clear-cut and the fourth possible nucleotide, G, is also found. The type strains have the C variant. This nucleotide is in a loop, so is not paired in the 16S rRNA secondary structure. The second polymorphism is in a stem, so involves a complementary pair of nucleotides at positions 1023 and 1036 in the alignment. These are T and A in the type sequence, but C and G in all members of gsR (R. laguerreae) except, ironically, the type strain FB206. The C:G variant is also common in other F-clade genospecies, as well as in all gsM strains and one gsL.
The third polymorphism is the long intervening sequence that I discussed in the last post. After publishing that post, I located the reference that had slipped my mind. It is a nice paper from Raúl Rivas’s group in Salamanca, published last year (Flores-Félix et al. 2019, https://doi.org/10.1016/j.syapm.2018.10.009). They found the extra sequence in a number of strains, including three of the eleven genomes that I have just rediscovered it in, and have a very nice discussion of this. If I understand the paper correctly, they found that the IVS is excised in the RNA and the molecule is rejoined – it does not remain split as I imagined. The paper also refers to the literature on IVS in rRNA genes, and reminded me that the first published report in rhizobia (in what is now R. leucaenae) was by Anne Willems and Dave Collins back in 1993 (https://doi.org/10.1099/00207713-43-2-305). I decided that I did not have enough material to write a paper about the IVS I had found in R. leguminosarum in 1991, so I just submitted the sequence to GenBank in 1994 (accession U09271). The 11 genomes that have the IVS are all in the F-clade, but they are not a monophyletic group. Two of the strains have a single nucleotide variant within the IVS, but these strains are not neighbours.
The variation I have just described accounts for 9 of the 18 variants I claimed at the start. The other 9 involve a variety of other locations in the sequence, but occur only in one or two strains each.
I have tried to capture the 16S variation by adding to the phylogeny. Maybe the result is rather complex, but I hope it is more informative than just showing 18 arbitrary symbols for the variants.

Next week, I plan to start writing all this up as a manuscript, so I may not have new analyses to share with you. If anyone wants to try their own analyses (whether or not for potential inclusion in the manuscript), I can provide a link to a folder with all 440 genome sequences
No comments:
Post a Comment