• Careers
  • Support

The ATCC Genome Portal: Expanded Authenticated Microbial Reference Genomes with Data Provenance

Purple, fragmented DNA double helix strand on a black background.

ASM Microbe 2023

Houston, Texas, United States

June 16, 2023


The ATCC Genome Portal is a multi-year initiative aimed at producing high-quality microbial reference genomes representing the entire microbial collection at the American Type Culture Collection (ATCC). All data is publicly accessible, curated, and traceable to physical materials in ATCC's collection. New, annotated genome assemblies from ATCC's Sequencing & Bioinformatics Center are released monthly. As of January 2023, the ATCC Genome Portal includes fully authenticated genome assemblies and genome annotations for over 2,335 bacterial, 234 viral, 179 fungal, and 3 protist genomes—a 34% increase since June 2022. All sequencing data, assemblies, and annotations were produced in-house at ATCC. Many of these assemblies are for strains already represented in public databases, such as RefSeq, but importantly these public references are not required to be authenticated, nor even traceable to physical biomaterials in a biorepository or culture collection.

Over the last 20 years in genomics, due to traceability issues, a changing landscape of sequencing technologies and bioinformatics methods, inconsistent application of genomics standards and metadata, and the near absence of requirements for metadata harmonization, the quality and reliability of microbial genomics data in the public domain has steadily declined. This complicates many downstream bioinformatics applications and research outcomes due to unexpected, yet often substantial, discrepancies between the physical strains present in culture collections and the genomes that represent those strain in public databases (Yarmosh DA et al. 2022). The ATCC Genome Portal is intended to address this gap in data provenance and data quality for ATCC strains and is the only ISO 9000 compliant microbial genome reference database of authenticated ATCC strains. These resources are intended to assist researchers in identifying the correct strains for their research, improve the success rate of new assay designs and developments, and reduce time spend re-sequencing or "second guessing" results obtained using data for the same strains found elsewhere. In this presentation, we present our progress in expanding the ATCC Genome Portal, a review of our existing bioinformatics pipelines and their performance, and provide additional details on novel strains not sequenced previously by any research group. The ATCC Genome Portal data is a resource for research-use purposes and is accessible via the web ( or via an authenticated REST-API.

Download the poster to learn about our bioinformatics pipelines for generating reference-quality genomes



Jonathan Jacobs, headshot.

Jonathan Jacobs, PhD

Senior Director of Bioinformatics, ATCC

Dr. Jonathan Jacobs leads ATCC’s Sequencing & Bioinformatics Center and the development of the ATCC Genome Portal. He has over 20 years of experience in molecular genetics, bioinformatics, and microbial genomics, and he has worked throughout his career at the interface of academia, government, and industry. He holds a joint Research Professor appointment at Syracuse University’s Forensic & National Security Sciences Institute in support of microbial forensics graduate student training and research, and he actively collaborates with several US public health laboratories involved in pathogen genomics research and surveillance. Dr. Jacobs is also certified in Product Management from Pragmatic Institute, and he has led successful commercial launches of several bioinformatics products into the market.

Nikhita Puthuveetil, headshot.

Nikhita Puthuveetil, MS

Senior Bioinformatician, Sequencing and Bioinformatics Center, ATCC

Nikhita Puthuveetil is a bioinformatician at ATCC that performs routine bioinformatics analysis on internal sequencing submissions, primarily SARS-CoV-2 samples as well as plasmid and bacterial samples. She also works with her team to aid in the development of the ATCC Genome Portal. She first joined ATCC as an bioinformatics intern in 2019 where she worked to create an internal sequencing dashboard in R to track and manage sequencing at the Sequencing and Bioinformatics Center. She has an MS in Bioinformatics from Virginia Commonwealth University.

DNA rods with bacteria.

Reference-quality sequences

Through the ATCC Genome Portal, you can easily search, access, and analyze thousands of reference-quality genome sequences. Our optimized methodology is designed to achieve complete, circularized (when biologically appropriate), and contiguous genomic elements by using short-read (virology collection) and hybrid (bacteriology, mycology, and protistology collections) assembly techniques. We then take our workflow one step further by accompanying each stage of the process with rigorous quality control analyses that ensure the highest quality data. Only the data that passes all quality control criteria are published to the ATCC Genome Portal. Visit the portal today to find the high-quality data you need for your research.

Visit the portal