June 21, 2021


High-quality microbial reference data coupled with authenticated standards are vital to the successful implementation of next-generation sequencing (NGS)-based solutions for microbiome analysis. Both shotgun metagenomic and targeted amplicon sequencing are two widely used methods to study human and environmental microbiome samples. However, the accuracy and reproducibility of these methods is dependent on the quality of the available reference genomic databases used for the data comparison. To further support microbiome research applications, the American Type Culture Collection (ATCC) Enhanced Authentication Initiative has sequenced and published over 1,200 complete genomes for materials held within our collections. The published data include genome assemblies, annotations, metadata, and quality control metrics. In addition, we have published the genomes for all bacterial, fungal, and viral components that comprise the ATCC microbiome whole cell and nucleic acid standards. The published genomes are now accessible on the ATCC Genome Portal ( Here, we compared our genomic data sets with publicly available genomes cited as "ATCC" and found that many of these genomes were incomplete or hand considerable numbers of SNPs, indels, or structural rearrangements. All of these variations could potentially have significant impacts on the accuracy of metagenomic analyses. Additionally, we found significant variation in the reported 16S copy number and sequence identity for the 16S rRNA or ITS genomic regions. Interestingly, some microbial strains from public databases reported several assemblies from ATCC strains with varying genome length and number of plasmids, which in some cases contain thousands of SNPs and indels. Furthermore, we demonstrated that by updating the ATCC Microbiome Standards data analysis modules with our new high-quality assemblies, there was a significant improvement in the results including the quantification of true positives, relative abundance of individual species, reduction in unclassified reads, and a reduction in reads mapping to false positives. The updated microbiome modules are available for use by the research community at One Codex ( The ATCC Microbiome Standards are now paired with all ATCC's high-quality genome reference sequences. The availability of these high-fidelity genomes will lead to improvements in standardized analyses, greater confidence in experimental outcomes, and better reproducibility in microbiome research.

Nikhita Puthuveetil, MS

Bioinformatician, Sequencing and Bioinformatics Center, ATCC

Nikhita Puthuveetil is a bioinformatician at ATCC that performs routine bioinformatics analysis on internal sequencing submissions, primarily SARS-CoV-2 samples as well as plasmid and bacterial samples. She also works with her team to aid in the development of the ATCC Genome Portal. She first joined ATCC as an bioinformatics intern in 2019 where she worked to create an internal sequencing dashboard in R to track and manage sequencing at the Sequencing and Bioinformatics Center. She has an MS in Bioinformatics from Virginia Commonwealth University.