Thursday, October 15, 2020

Rhizobium leguminosarum 21

16S: the full story

 

The 16S ribosomal RNA sequences of the type strains of Rhizobium laguerreae, R. sophorae, R. ruizarguesonis and R. indicum are all identical to that of the type strain of R. leguminosarum. Even the type strain of the sister taxon R. anhuiense has the same sequence. From this, it would be reasonable to guess that all members of the Rlc had this sequence, but the truth is very different. In fact, I found 18 distinct 16S sequences among the available genomes – though these certainly do not correspond to the 18 genospecies. That does not include a further 5 variants that were only found in a single strain and differed by a single nucleotide from a common variant, which I discounted on the grounds that they might be sequencing errors. There were also three genome assemblies that had no 16S sequence, and three more in which it was incomplete – clearly these are errors in the assembly, since 16S is essential.

 

The 'type' sequence is certainly the predominant one, found in 286 of the 440 genomes, but there are three places in the 16S that have significant levels of polymorphism within the Rlc. Kumar et al. (2015, http://dx.doi.org/10.1098/rsob.140133) found a single polymorphic site in their sample (position 1069 in their numbering, 1151 in my alignment, which includes the IVS).  They found this was T in gsA and gsB, C or A in gsC, A in gsD, C in gsE. With a much larger set of genomes, this remains broadly true, though the picture is less clear-cut and  the fourth possible nucleotide, G, is also found. The type strains have the C variant. This nucleotide is in a loop, so is not paired in the 16S rRNA secondary structure. The second polymorphism is in a stem, so involves a complementary pair of nucleotides at positions 1023 and 1036 in the alignment. These are T and A in the type sequence, but C and G in all members of gsR (R. laguerreae) except, ironically, the type strain FB206. The C:G variant is also common in other F-clade genospecies, as well as in all gsM strains and one gsL.

 

The third polymorphism is the long intervening sequence that I discussed in the last post. After publishing that post, I located the reference that had slipped my mind. It is a nice paper from Raúl Rivas’s group in Salamanca, published last year (Flores-Félix et al. 2019, https://doi.org/10.1016/j.syapm.2018.10.009). They found the extra sequence in a number of strains, including three of the eleven genomes that I have just rediscovered it in, and have a very nice discussion of this. If I understand the paper correctly, they found that the IVS is excised in the RNA and the molecule is rejoined – it does not remain split as I imagined. The paper also refers to the literature on IVS in rRNA genes, and reminded me that the first published report in rhizobia (in what is now R. leucaenae) was by Anne Willems and Dave Collins back in 1993 (https://doi.org/10.1099/00207713-43-2-305). I decided that I did not have enough material to write a paper about the IVS I had found in R. leguminosarum in 1991, so I just submitted the sequence to GenBank in 1994 (accession U09271). The 11 genomes that have the IVS are all in the F-clade, but they are not a monophyletic group. Two of the strains have a single nucleotide variant within the IVS, but these strains are not neighbours.

 

The variation I have just described accounts for 9 of the 18 variants I claimed at the start. The other 9 involve a variety of other locations in the sequence, but occur only in one or two strains each.

 

I have tried to capture the 16S variation by adding to the phylogeny. Maybe the result is rather complex, but I hope it is more informative than just showing 18 arbitrary symbols for the variants.

 


 

 


Next week, I plan to start writing all this up as a manuscript, so I may not have new analyses to share with you. If anyone wants to try their own analyses (whether or not for potential inclusion in the manuscript), I can provide a link to a folder with all 440 genome sequences

Thursday, October 8, 2020

Rhizobium leguminosarum 20

A 16S flashback

 

In November 1991, Helen Downer and I were sequencing 16S genes of rhizobia. We used a recently-invented process called PCR (Saiki et al. 1988 http://dx.doi.org/10.1126/science.239.4839.487) and primers Y1 and Y2 that I had designed to amplify the first part of the gene (Young et al. 1991 http://dx.doi.org/10.1128/jb.173.7.2271-2277.1991). Then we sequenced the products by hand using big gels, X-ray film and 32P radioisotope. The PCR product was normally 308-312 bp, but we were intrigued by one pea-nodulating strain, SP18, that gave a much longer product. When we sequenced it, we found that the extra DNA was in a region that was normally conserved. The first stem-loop in the secondary structure of Rhizobium 16S rRNA usually looks like this (taken from my 1991 lab book):




 

The CCCC….GGGG stem is found in most Rhizobium and in Sinorhizobium. The GCAA loop is even more conserved in most Alphaproteobacteria, but instead of GCAA, strain SP18 had:

TCCTTCAAGCAAGCTTGAAG-ATTTTTATCCTTGGAAAGGAAGATCAAGAAGAGCTTCTAAGAAGCTTTCTTGATGGA

 

A few months later, I left the John Innes Centre for the University of York and got involved in new projects, so I never published this strange sequence. Last week, I started to look at conservation of the 16S sequence in the 429 Rlc genomes, but was motivated to dig out my old lab records because I saw a similar ‘extra’ sequence in a few genomes. In fact, not just similar, but identical, apart from an additional ‘G’ where I have shown ‘-‘ in the SP18 sequence (almost certainly, this was an error in our manual sequence, which was based on a single read). There are 11 genomes with the extra sequence; they are all in genospecies O, P and Q, but not all genomes in these genospecies have it.

 

The first 16 bases of this long ‘loop’ sequence are complementary to the last 16 (except a couple of ‘bulges’), so would be expected to extend the stem structure, but what kind of secondary structure would be adopted by the rest of the sequence is unclear. This is what I got when I sent the sequence to an RNA structure prediction site (http://rna.urmc.rochester.edu/RNAstructureWeb/):

 

 



The red part at the bottom is the conserved stem shown in the previous figure; the rest of the structure is speculative.

 

I am hoping that you, my readers, can help me here. I think I have seen publications fairly recently that have described similar ‘long’ sequences in this location of 16S, but I cannot remember where. Can someone point us to relevant papers? I also have a suspicion that the 16S rRNA may be cleaved within this sequence and exist as two disconnected strands within the ribosome, but I can’t remember whether someone else showed that or it was our own unpublished observation of an unexpected pattern of rRNA bands in nucleic acid preps.

 

All this is something of a digression. I just wanted to record the 16S sequences of all the strains because this is something that taxonomists like to look at, and I thought the result was going to be boring and uninformative. It turns out that there is more 16S sequence variation than I expected. There are also a few genome assemblies with broken 16S sequences or no 16S at all (!), and it is taking me a while to sort those out, so the ‘boring’ consideration of 16S variation will have to wait until the next post.