Microbiota profiling with long amplicons using Nanopore sequencing: full-length 16S rRNA gene and whole rrn operon. FResearch State of Infectious Diseases in the Netherlands, Deurenberg, R. Application of next generation sequencing in clinical microbiology and infection prevention.
Devanga Ragupathi, N. Accurate differentiation of Escherichia coli and Shigella serogroups: challenges and strategies. New Microbes New Infect. Didelot, X. Transforming clinical microbiology with bacterial genome sequencing. Ekblom, R.
A field guide to whole-genome sequencing, assembly and annotation. High frequency of Tropheryma whipplei in culture-negative endocarditis. Goldenberger, D. Molecular diagnosis of bacterial endocarditis by broad-range PCR amplification and direct sequencing. Google Scholar. Gubler, J. Whipple endocarditis without overt gastrointestinal disease: report of four cases. Jayananda, S. Gemella species bacteremia and stroke in an elderly patient with respiratory tract infection.
Case Rep. Jensen, A. Re-evaluation of the taxonomy of the mitis group of the genus Streptococcus based on whole genome phylogenetic analyses, and proposed reclassification of Streptococcus dentisani as Streptococcus oralis subsp. Kerkhof, L. Profiling bacterial communities by MinION sequencing of ribosomal operons. Microbiome Kitts, P.
Assembly: a resource for assembled genomes at NCBI. Nucleic Acids Res. Lal, D. Exploring internal features of 16S rRNA gene for identification of clinically relevant species of the genus Streptococcus.
MacCannell, D. Next generation sequencing in clinical and public health microbiology. Mollerup, S. Propionibacterium acnes : disease-causing agent or common contaminant? Detection in diverse patient samples by next-generation sequencing. Motro, Y. Next-generation sequencing applications in clinical bacteriology. Munita, J. Enterococcal endocarditis: can we win the war? NIAID OTU Clustering Step by Step. Petti, C.
Detection and identification of microorganisms by gene amplification and sequencing. Purcell, L. Gemella species endocarditis in a child. Can J. Rossen, J. Practical issues in implementing whole-genome-sequencing in routine diagnostic microbiology. Sabat, A. Targeted next-generation sequencing of the 16SS rRNA region for culture-independent bacterial identification - increased discrimination of closely related species.
Salipante, S. Rapid 16S rRNA next-generation sequencing of polymicrobial clinical samples for diagnosis of complex bacterial infections. Salter, S. Reagent and laboratory contamination can critically impact sequence-based microbiome analyses. BMC Biol. Singh, G. Determination of cutoff score for a diagnostic test. Sohail, M. Previous studies have reported intragenomic 16S gene polymorphisms as a problem that potentially confounds bacterial species richness estimates 25 , By contrast, we demonstrate that, when handled correctly, the presence of such polymorphisms in full-length 16S reads has the potential to aid in taxonomic classification.
Finally, by extensive culturing of bacteria present in the human gut microbiome, we provide support for the observation that intragenomic 16S gene copy variants are present in a significant proportion of bacterial taxa 12 , Phasing of 16S gene SNPs produced highly similar substitution profiles for closely related taxa, indicating that these profiles provide a robust method for species-level taxonomic identification.
In conclusion, our results demonstrate that appropriate handling of high-throughput, full-length 16S sequence data has the potential to enable accurate classification of individual organisms at very high taxonomic resolution.
The in-silico analysis was carried out separately on two non-redundant public databases: Greengenes v Only the results for the Greengenes database are reported in the main text. For the HOMD, a single sequence was randomly selected to represent each species present in the database. Supplementary Fig. In-silico amplicons demarcating different sub-regions of the 16S gene were generated by trimming regions defined by established primer sets Supplementary Table 1 using Cutadapt v1.
K substr. MG Fig. Accordingly, deletions within other 16S sequences are represented in entropy plots, whereas deletions within the reference sequence are not. To determine the taxonomic resolution of afforded by different variable regions, each in-silico amplicon was classified against the filtered reference database from which it was generated using the mothur command classify. To create OTUs, in-silico amplicon datasets generated for each sub-region were filtered to remove non-unique sequences and re-ordered to correspond with the sequence order in the V1—V9 dataset.
Based on data available from the Human Microbiome Project and Human Oral Microbiome database, 36 bacterial strains were selected to represent microbes prevalent in the human body sites including the airways, gut, oral cavity, skin, and vaginal tract Supplementary Table 3. The other 26 strains were cultured in appropriate media and environmental conditions until cultures reached late logarithmic phase Supplementary Table 3 35 , 36 , 37 , Genome mass was then normalized based on the predicted copy number of the 16S rRNA gene Supplementary Table 3 and the appropriate mass of DNA containing the required 16S copy number for each species was calculated.
WGS sequencing was performed for 19 members of the mock community that did not have WGS sequence data publicly available. Genomes for sequenced organisms were assembled individually using SPAdes v3. Several reference gene sequences contained ambiguous base calls. Each sequence was therefore aligned to its respective WGS assembly and the aligned assembly region extracted to create an improved reference gene set containing a single representative 16S rRNA gene sequence for each member of the mock community.
Output alignments were parsed to determine the number and location of insertions, deletions, and substitutions in reads aligning to each reference 16S rRNA gene sequence. To determine the frequency and position of expected sequence variation—attributable to the presence of multiple, divergent copies of the 16S rRNA gene within a single genome—the seven gene copy variants known to exist in the E. To provide a second estimation of expected intra-genome sequence variation, Illumina WGS sequence reads were aligned to the single E.
Stool samples were collected from four healthy, competitive cyclists enrolled in the study described by Petersen et al. Fecal material was self-collected using polyethylene sample collection containers Fisher Scientific and was placed on freezer packs before shipping to the Jackson Laboratory for Genomic Medicine. Exact duplicate sequences were discarded on the assumption that they were PCR artifacts and the remaining reads were screened against the human reference genome GRCh38 using BMTagger Adapters and low-quality bases were trimmed using Flexbar Amplicon sequences from each sample were then reassigned to each OTU at the same similarity threshold used for clustering in order to obtain OTU relative abundance estimates.
V1—V3 and V1—V9 amplicons belonging to the genus Bacteroides were selected by directly classifying individual amplicon sequences using the RDP classifier. The suitability of the RTG database as a reference for discriminating different Bacteroides species was assessed by extracting the 16S rRNA gene sequences for each Bacteroides genome contained therein. The resulting tree Supplementary Fig. Sequences from each sample were therefore extracted and aligned to the single 16S rRNA gene reference sequence used in the mock community analysis.
Stool samples were again contributed by competitive cyclists enrolled in the study described by Petersen et al. Ethical oversight and sample collection were as described above. Bacteria were cultured on a variety of media and under anaerobic conditions, unless otherwise stated Supplementary Data 2.
A subset of multiplexed libraries were sequenced on multiple SMRT cells at varying loading concentrations Supplementary Data 2 resulting in different numbers of total reads. Each repeated run was therefore treated as a technical replicate to determine i the measurement error for the estimation of intragenomic 16S gene SNP frequencies attributable to the sequencing platform and ii the relationship between measurement error and sequencing depth.
Sequence data for each isolate were quality filtered and adapters removed as described above. Filtered sequences were reoriented using the mothur command align. Gaps in alignments were subsequently removed with the mothur command degap. The most abundant unique sequence for each isolate was then extracted on the assumption it was the least likely to contain sequencing errors and was used as a reference against which to align all reads for that isolate.
Due to the prevalence of sequencing errors in processed reads e. Substitution errors in alignments were filtered in a multi-step process to separate true intragenomic SNPs from background error.
First, samples with fewer than aligned reads were discarded, because preliminary investigation indicated they had insufficient signal-to-noise ratio for the detection of true SNPs. Second, the distribution of the frequency of substitution errors was calculated across the entire aligned region of the 16S gene.
Base positions where the substitution error frequency was well outside instrument error nine interquartile ranges above the upper quartile were identified as true SNPs. We also took advantage of variation in sequencing depth between replicates to determine whether the measurement error was affected by the number of reads available for SNP phasing. Resulting hits were sorted first by e -value, then bitscore and the taxonomy of the highest scoring sequence was reported. The phylogenetic relationship between isolates was determined by aligning the most abundant unique sequence for each isolate, then constructing a maximum-likelihood tree using FastTree v2.
To determine the total number of unique nucleotide substitution profiles generated from sequenced isolates, all isolates identified as belonging to the same OTU were compared with one another. Two isolates were considered different if the substitution frequency at one or more SNP loci differed more than 3 SDs above the mean measurement error i. Further information on research design is available in the Nature Research Reporting Summary linked to this article.
Data underlying Figs. All other data are available from the corresponding author upon reasonable request. Schloss, P. Introducing DOTUR, a computer program for defining operational taxonomic units and estimating species richness. Fitz-Gibbon, S. Propionibacterium acnes strain populations in the human skin microbiome associated with acne.
Jiao, X. Datamining Genomics Proteom. Google Scholar. Li, C. INC-Seq: accurate single molecule reas using nanopore sequencing. GigaScience 5 , 34 Callahan, B. DADA2: high-resolution sample inference from Illumina amplicon data. Methods 13 , Edgar R.
Eren, A. Minimum entropy decomposition: unsupervised oligotyping for sensitive partitioning of high-throughput marker gene sequences. ISME J. High-throughput amplicon sequencing of the full-length 16S rRNA gene with single-nucleotide resolution. Nucleic Acids Res. The Human Microbiome Project C,. Structure, function and diversity of the healthy human microbiome. Nature , Liu, Z. Short pyrosequencing reads suffice for accurate microbial community analysis.
Wang, Q. Acinas, S. Divergence and redundancy of 16S rRNA sequences in genomes with multiple rrn operons. Stoddard, S. Pei, A.
Diversity of 16S rRNA genes within individual prokaryotic genomes. Freddolino, P. Newly identified genetic variations in common Escherichia coli MG stock cultures.
Exact sequence variants should replace operational taxonomic units in marker-gene data analysis. Wexler, H. Bacteroides : the god, the bad, and the nitty-gritty. Oligotyping: differentiating between closely related microbial taxa using 16S rRNA gene data.
Methods Ecol. Petersen, L. It needs to be first reminded that the two methods applied here have a substantial difference; the DNA-targeting analysis scores abundance as essentially linked to rRNA gene numbers, which are influenced by ribosomal operon copy number in the genome and cell number.
Conversely the quantitative data produced by the rRNA-seq are centered on potential protein synthesis activity as a function of the number of ribosomes per single cell and are not necessarily coupled to total cell number nor population growth.
Therefore a net comparison of the two strategies can not be performed since the two methods draw their data from independent principles. The amplicon pyrosequencing experiment was in fact included as a referential example of the classic standard procedure that has been used in the past decade for a majority of environmental microbial analyses. The data ought to be regarded as those from two different, independent, possibly complementary strategies.
Keeping in mind these premises and restraining from any directly comparative evaluation, we indeed observe that, as expected, the two sequencing strategies, amplicons vs transcribed 16S-rRNA, yielded marked differences in terms of community taxonomic proportions and deducible bacterial physiology in the environment of choice. Being this an anammox bioreactor constantly monitored for nitrogen feeding and abatement, it offered the advantage of knowing a priori which main microbial activities were expected and, consequently, which microbial guilds could be, at least in part, expected to represent the assemblage.
While amplicon-based sequencing reported a predominant population of Proteobacteria outnumbering the sequences from the anammox Candidatus Brocadia, direct rRNA-seq instead indicated the overwhelming majority of reads as belonging to Planctomycetes, i. Such proportions were confirmed at both sampling times and are backed up by the large number of sequences obtained, since both methods exploit next generation sequencing technologies.
The premises thus appeared correct: the newly proposed approach directly targeting ribosomal transcription rate as simultaneous representation of bacterial presence and dynamic potential for physiological activity, proved suitable to the scope as the main expected metabolism in the chosen environment was indeed that of the Planctomycetes.
At the same time, as the method is independent of any primer annealing strength issues and PCR fidelity, it avoids all consequent biases related to amplification-based analyses that are currently still the standard in studies of this kind. Besides the differences in phyla proportions, the SOLiD sequencing approach and the associated bioinformatics pipeline also showed higher resolution in extracting the diversity of this system. Indeed, while from amplicon pyrosequencing a single type of Planctomycetales could be individuated as representative of the phylum, the rRNA-seq reads revealed that two other anammox-related genera were present and actively maintained a high protein synthesis potential in this environment.
The higher diversity observed is in any event also in line with the higher throughput of the system in itself, providing a deeper detection of the rare taxa. Table 1 allows to observe the results of the two approaches.
The method is indeed introducing the physiological aspect of taxon activity within the community besides that of presence assessment. As mentioned, inevitably this prevents the possibility of a net comparison of the two methods and any consideration in this respect needs to be kept in mind when following these data.
Planctomycetes and Armatimonadetes, which were the most abundant from the RNA sequencing at both sampling times, are apparently highly metabolically active groups. Moreover an important consideration concerning Armatimonadetes is that those are the group that, among all taxa identidfied in this bioreactor, displayed the most severe predictable matching inefficiency from the in silico primer analyses.
This supports the value of the present method as suited to uncover and reveal the importance of bias-sensitive phyla in community analyses. The other phyla displayed PCR-dependent frequencies higher than those yielded by direct rRNA-seq and their ratios were not constant, spanning from extremely high disproportions e.
Actinobacteria, followed by Firmicutes, Acidobacteria, Bacteroidetes, Proteobacteria, to lower but appreciable differences. Besides the dominant Planctomycetes, Chlorobi-related bacteria were highly detected with the direct RNA-seq. This group has been reported as presumably related to granules formation 56 and these structures are thought to have a fundamental role for a high anammox reaction rate Methyloversatilis presence could instead be related to concurrent denitrification processes An increase of these taxa is noticeable from the first to the second sampling time point, although further replication effort would be necessary to prove the significancy of such apparent trends.
Changes within anammox-related bacteria, also depending on different substrate affinity has been described 56 , In the present analysis the dominant taxa appear to maintain a relatively stable 16S-rRNA transciptional rate throughout the two sampling points, including also the Chlorobi phylum was confirmed by both sequencing approaches.
The sampled microenvironment was deemed an ideal setting for a hypothesis-testing approach as the biochemical data reflected an overtly active anammox metabolism. This strengthens the suitability of the approach in estimating taxonomic groups that were rarely detected by techniques relying on PCR. In examining the data stemming from the two approaches it should be considered that one of the drawbacks of the PCR-based extant methods is the fact that bacteria can have more than one copy of ribosomal genes operon and that this creates a further bias as it proportionally leads to an overestimation of cell counts for cases with multiple operons.
With the rRNA-seq approach such issue is bypassed by targeting directly the transcribed rRNA molecules as a genuine function of that species contribution to potential physiological activity at that time within that environment. Planctomycetes are anyhow reported to possess a single copy of rRNA operon and their prominent position in the community is to be seen as related to the combination of high expression rate and high attained cell numbers.
Considering that their generation time involves a doubling every 9—11 days and that in the initial inoculum sampling time day 1 they were at very low levels as judged by quantitative PCR using Planctomycetes—specific primers data not shown , we can estimate that their population dynamics has unfolded steadily and that the rRNA sequencing at five and six months have efficiently revealed their outcome.
In general terms and for phyla in which one could not have information on the number of ribosomal gene operons, it can be taken as a reference point that the number of rRNA copies in each bacterial cell is bound to be always substantially higher than the number of starting rDNA genes.
PCR can multiply exponentially such rDNA with a performance relying upon the chances of primer matching strength. For rRNA-seq, proportions instead inherently reflect only two factors: a taxon abundance and b level of gene expression within the system, mirrored by the dynamically fluctuating number of ribosomes.
Essentially it is helpful to stress that in the PCR approach the number of sequences obtained for a given taxon obeys to the equation:. As concerns some technical considerations, the method was here coupled to a SOLiD platform, but the approach is amenable and compliant to any RNA-seq protocol and any type of high throughput Next Generation Sequencing platforms.
The following further details can be outlined; the native format of the ABI SoLID technology yields data in color space, which need to be translated to base space nucleotides. The output of the method presented here conversion of reads in a merged and workable sequence is also suitable to be used with any of the available tools and utilities for sequence classification as it is in the form of a plain base-encoded gene sequence and is even longer than the average read produced by any current sequencing machine.
In addition its abundance also entails a datum on gene expression within the sampled environment. The approach principle is such that it does not allow any chimera formation in the sequencing stage, thus avoiding another drawback of amplification-based methods. The analysis run on databases clusterized at various degrees of identity stringency Supplementary material Technical note 1 showed that, notwithstanding the decreasing size of the resulting numbers of subjects from over , to less than 14, the number of identified sequences stayed practically constant, indicating a strong degree of independence of the protocol from the database size.
Therefore any multi-FASTA assemblage of 16S references can be indicated as a valid substrate for the identification analysis as well as any online 16S amplicon annotation tool could be used. In analyzing defined ribosomal RNA bands obtained upon electrophoretic separation, it should be also considered that the presence of bacteria containing self-splicing introns and protein coding regions within their rRNA could give rise to transient transcripts of unconventional sizes, but when these are mapped as metatranscriptomic sequences to assembled 16S rRNA genes it appears that those insertions are not retained in transcribed RNAs and are probably rapidly degraded 65 ; their occurrence is therefore not affecting approaches as the one presented in our study.
Summarizing the evidences and issues emerging from the present analysis, rRNA gene amplification studies provide a deep level of information on the diversity of biological systems. Nevertheless, as underlined in many studies cited above, in relation to potential biases and limitations, data trustworthiness and results interpretation under current practices are still affected by considerable levels of uncertainty and complexity.
The use of 16S-rRNA as a prokaryotic universal barcoding remains a reference asset but its use is affected by critical issues mostly consisting in primer matching biases due to the degeneracy of the conserved regions and to the consequent non-universality of any currently known primer pair. This in turn influences dramatically the first PCR cycles and can exponentially carry over an error in the end point proportions of each taxon with respect to its original values in the sampled community.
Further critical issues concern chimera formations during PCR and sequencing. Another limit of any PCR-based census based on the mere gene presence is that no information is conveyed on the actual level of activity or quiescence of that given cell. This prevents to infer anything about its consequent contribution to its ecosystem physiology and ultimately to the ecology of the environment under study.
Even in the presence of the most trustworthy data a further critical aspect, is the proper handling of the massive raw outputs of the sequencers and the correct probing of the available databases, to convert reads into exact and meaningfully annotated information. The method presented herewith addresses all these criticalities and considers alternative solutions to reduce their impact. The high throughput of NGS machines has been coupled with a transcriptomics-driven approach where taxonomical assignment is coupled to activity assessment.
A purposely-elaborated reads-processing workflow is followed by a taxonomical assignment pipeline which exploits jointly two major sequence databases. The high number of RNA sequences that did not align suggests considerations over the potentialities to further exploit this kind of approach.
Besides accounting for a rate of sequencing errors, the reason for not matching ribosomal databases could be dual.
Namely a belonging to taxa which have yet not been detected as their sequence diverges enough from known 16S primers as well as from extant database subjects under the alignment strategy and database clustering stringency used for annotation in the present rRNA sequencing approach; b belonging to a co-purified fraction of mRNA.
Moreover in stained RNA gels the two bands of 23S and 16S visually represent the strikingly dominant bands and are well resolved from each other as well as from the messenger which is mostly in a higher and rather faint smear pool. Besides, mRNAs, being destined to rapid turnover, are by nature more sensitive to degradation than their structural ribosomal counterparts. The direct rRNA-seq approach compared to amplification-based sequencing has shown unique results in the detected community composition, consistently with its simultaneous assessment of presences and physiological rate of metabolic activity potential of bacteria.
Supposedly this result is also achieved by virtue of its independence from possible PCR-related biases for given taxa. Seeking to mention possible limitations of the method, one could comment that, being based on rRNA molecules, if bacteria were very quiescent their signal could be low; however this is also the strength of the method, i. As mentioned, the reliability of all taxonomy-assigning methods rests upon the availability of a large and precisely annotated database. Overcoming these limits is one of the goals of microbial taxonomy.
The high number of low-identity matching sequences and that of non-aligning reads found in the present analysis could in part be due to the peculiar bioreactor habitat, or to possible non —ribosomal contaminants or to limits in the reads annotation pipeline. Nevertheless dedicated direct approaches are envisaged as necessary to tap on the unknown reservoir of unculturable bacteria with un-amplifiable 16S rRNA genes that could inhabit environments that have hitherto been explored mainly with PCR-based methods.
Samples were collected from the anammox reactor at day and day after inoculation, corresponding, respectively, to a technical performance of and grams of total nitrogen abated per cubic meter per day. Three extractions were performed for each sample and nucleic acids were finally pooled together. Any potential pair of primers with a distance between and bases was considered and pairs amplifying hypervariable regions V3, V4 and V5 were favoured.
The primers are a slightly different version of the pair S-D-Bactb-S and S-D-Bacta-A 51 which were published after our initial set up. Three forward primers, degenerated in one position and three reverse primers, degenerated in three positions, were selected as the most adequate universal oligonucleotides.
The in vitro performance of the custom-synthesized fusion primers Invitrogen, Life Technologies was evaluated by PCR and amplicon cloning. Three replicate PCR-reactions were conducted for each sample. For each sample, two replicates were performed of the total RNA extraction protocol. The total RNA obtained was run in a 0. Slices corresponding to the 16S rRNA band were then cut from the gels.
Three Acid saturated with 0. The replicates of each sample were then pooled together and purified by RiboMinus Life Technologies. To preserve abundance of the transcripts in the final library, 12 amplification cycles were used after adaptor ligation. For the electrophoresis, solutions including water were prepared by using commercial reagents declared DNase-RNase-free by the manufacturers.
Electrophoretic apparati components and slicers were disinfected with 0. Samples were checked by Agilent Bioanalyzer Agilent Technologies before and after electrophoresis to ascertain absence of specific signatures of RNA degradation e. Only pair-end sequences with an estimated distance between 50 and bases were considered as correct. To correctly assign the SOLiD reads to their taxa, a two-steps procedure was designed. Firstly, only the uniquely-aligned reads, which correspond both to single or paired sequences that present a unique best hit against the reference dataset, were considered to obtain a preliminary group of putative subjects.
Since this dataset represents just a small subset of all the known 16S sequences, many of the initially multi-mapped SOLiD reads presented a unique alignment against the subjects.
In order to obtain a database that could provide the broadest span of representative biodiversity, the RDP and DDBJ databases were merged together and clustered at decreasing similarity levels of The percentage of reads uniquely-aligned on each subject was considered as an estimator of the bacterial activity in the bioreactor.
A flowchart of the rRNA-seq procedure is shown in Fig. Roche reads are available under the codes SRR and SRR for day and day sampling times, respectively. To assess a possible coverage-dependent bias higher reads number for longer genes , the correspondence between extracted subjects and the length of sequences covered by uniquely aligned reads was plotted.
However, despite sophisticated research tools for microbial detection, rapid and accurate molecular diagnostics for identification of infection in humans have not been extensively adopted. Time-consuming culture-based methods remain to the forefront of clinical microbial detection. The 16S rRNA gene, a molecular marker for identification of bacterial species, is ubiquitous to members of this domain and, thanks to ever-expanding databases of sequence information, a useful tool for bacterial identification.
This strain repository was used to systematically evaluate the ability of 16S rRNA for species level identification. We show that for species identification, a model-based approach is superior to an alignment based method. We point to multiple cases of probable clinical misidentification with traditional culture based identification across a wide range of gram-negative rods and gram-positive cocci as well as common gram-negative cocci.
Academic Editor: Markus M. This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose.
The work is made available under the Creative Commons CC0 public domain dedication. Data Availability: All relevant data are within the paper and its Supporting Information files. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist. Currently, one of the major challenges for clinical practice and public health surveillance is rapid and accurate identification of infectious agents.
Furthermore, several studies have demonstrated that rapid, appropriate, and adequate antibiotic treatment significantly improves patient outcomes, particularly in the setting of the intensive care unit and that in the absence of such treatment, patient mortality is approximately doubled [ 2 , 3 ]. Rapid and definitive pathogen identification can facilitate appropriate initiation of antibiotic therapy not only in clinical syndromes, such as sepsis, but also in cases of infections potentially caused by multiple pathogens such as in upper respiratory tract infections URIs.
For example, the common presenting clinical symptoms of URIs, cough and coryza, can not discriminate between bacterial and viral agents, but only the former is appropriately treated with antimicrobial administration. These clinical challenges have resulted in inappropriate antibiotic administration and contributed to a large extent to the relatively recent development of pan-resistant microorganisms [ 4 , 5 ]. Emerging infectious diseases such as SARS and reemergence of common bacterial agents exemplified by the Bordetella pertussis epidemic in , required intensive efforts over several weeks by public health personnel working with microbiology laboratory personnel to first detect, then contain, and prevent spread of these infectious agents.
In addition, hospital-acquired infections from emerging and reemerging pathogens are growing, leading to increased morbidity, mortality, and health care costs on a national level [ 6 , 7 ].
The application of genomic tools to identify etiological agents of acute disease was highlighted with the Escherichia coli outbreak in Hamburg, Germany. Whole genome sequencing of the etiological agent was achieved within a number of weeks of the outbreak, providing information on the super-toxicity of the strain.
Clearly, the first step in infectious disease curtailment is rapid and accurate identification of the pathogen s involved in the infection. Despite advances in technology, currently, identification and antimicrobial resistance profiling of microbial species by the majority of public health and hospital microbiology laboratories is largely reliant upon culture-based techniques [ 8 , 9 ]. Such approaches are time-consuming requiring at least 16 hours, but frequently substantially more as in the case of fastidious organisms such as Mycobacterium and Legionella species.
Following culture, further biochemical and antimicrobial resistance testing may be performed which adds to the protracted nature of this process. Hence culture-based identification is time intensive and frequently fails to produce relevant data within the critical window of opportunity to permit rapid and appropriate therapeutic decisions to be made.
Molecular testing allows for a large number of pathogens highly specific and sensitive identification from clinical isolates and clinical specimens. The ability of molecular techniques to identify of pathogens directly from clinical samples makes a rapid identification without recourse to culture possible [ 10 — 12 ].
Such approaches are becoming more common for pathogenic species such as methicillin-resistant Staphylococcus aureus and Clostridium difficile for whom rapid identification is paramount to improving patient outcomes [ 13 ]. Over the past several decades, a number of molecular markers that permit identification of specific microbial taxa and their phylogenetic classification [ 8 , 14 — 18 ] have been identified.
Phylogenetic markers include the presence of specific protein coding or structural genes, the combinations of such genes and their variants, insertion and repeat elements.
Foremost, the functional constancy of this gene assures it is a valid molecular chronometer, which is essential for a precise assessment of phylogenetic relatedness of organisms. It is present in all prokaryotic cells and has conserved and variable sequence regions evolving at very different rates, critical for the concurrent universal amplification and measurement of both close and distant phylogenetic relationships.
These characteristics allow the use of 16S rRNA in the assignment of close relationships at the genus [ 8 ] and in many cases at the species level [ 19 — 21 ]. Moreover, dedicated 16S databases [ 22 — 24 ] that include near full length sequences for a large number of strains and their taxonomic placements exist.
The sequence from an unknown strain can be compared against these sequences. This last point is particularly relevant in an era where DNA sequencing is rapidly becoming a commodity. Tens to thousands of full-length 16S rRNA gene sequences can be generated using capillary sequencing of cloned PCR products while at least two orders of magnitude more short hypervariable regions to bp can be generated using next-generation sequencing technologies in a cost effective way [ 25 , 26 ]. While relying on non-full length 16S rRNA gene sequence limits the taxonomic resolution and the specific hypervariable region dictates taxonomic coverage [ 27 , 28 ], it is clear that recent advances in sequencing and 16S rRNA gene sequencing protocols [ 29 ] will make this molecular marker a more acceptable means for rapid identification.
Several studies evaluated the usefulness of 16S rRNA gene sequencing for clinical microbiology. Historically, slow-growing mycobacteria have been a major group of organisms for which a plethora of 16S studies exist [ 30 , 31 ]. Drancourt et al. Bosshard et al. Spilker et al. Despite the existence of these studies, a systematic and broad evaluation of 16S rRNA gene for the identification of clinically relevant organisms is lacking. Moreover, even in the existing studies with a limited breadth of organisms, the identification is based on sequence alignment based similarity against databases with very limited diversity i.
Toward these aims, we assembled a culture isolate collection of some of the most common hospital-associated bacterial pathogens as well as endemic community-acquired and less common organisms associated with increased disease burden to determine the accuracy of clinical vs.
The results of our investigation provide insight into the strengths and limitations of molecular identification using 16S rRNA gene for microbiological identification of common bacterial pathogens. Overall, the isolates represented the most common bacterial pathogens with the exception of two Neisseria lactamica isolates cultured by the UCSF clinical microbiology lab as well as some less common species associated with severe disease burden such as Stenotrophomonas maltophilia and Burkholderia cepacia complex.
For Neisseria meningiditis. For each of the clinical identities represented in the repository, Table 1 summarizes the clinical identification method, the number of isolates, and the source of the isolate.
Isolates obtained from the Clinical Microbiology Laboratory at University of California, San Francisco had undergone culture on relevant selective media, had been further sub-cultured, and had their biochemical profile tested per clinical microbiology laboratory protocols based on current Clinical and Laboratory Standards Institute guidelines to provide a final culture-based identification. Typical temporal workflow of clinical microbiological laboratory to identify microbes from clinical samples based on phenotypic, biochemical, and culture-based techniques.
Neisseria spp. Streptococcus spp. Single colonies of each isolate were sub-cultured in liquid media for DNA extraction. The majority of species were sub-cultured in Luria-Bertoni broth and grown at 37 C and rpm for 24—48 hours, H.
A total of 2 ml of liquid culture of each isolate was centrifuged and DNA extracted using a combination of bead-beating 5. Previous studies have shown that the quality of 16S sequences are essential to accurate phylogenetic placement [ 44 ] and taxonomic classification [ 45 ].
To obtain the longest feasible high-quality sequences, forward and reverse reads corresponding to each isolate were assembled using Phrap version 0. Training set. This set included , sequences and was filtered to obtain a set of 35, sequences corresponding to medically important bacterial 89 genera listed in the most current edition of Manual of Clinical Microbiology [ 47 ] S1 Dataset.
All the species pathogens and commensals under these genera were included. S2 and S3 Datasets list GenBank accession numbers for the sequences in the training set and the number of sequences for all the genera and species in the training set respectively. The assembled 16S rRNA sequences were classified to species level and the bootstrap confidences for the genus and species level classifications were estimated based on iterations.
0コメント