ATCC ATCC Logo 0

The ATCC Genome Portal – An Updated Resource for Authenticated Microbial Reference Genomes

Poster
Light blue DNA strand.

ASM Microbe 2022

Washington, DC, United States

June 11, 2022

Abstract

The traceability and authentication of microbial genome assemblies to physical biological materials is typically poorly documented and not a requirement for many public genome databases. The availability and reliability of this data is, however, essential for most microbiological research. The tension between genomic data reliability and its traceability to source materials is a growing area of concern that has significant real-world impacts on public health epidemiology, drug discovery, and environmental biosurveillance research. While databases such as NCBI’s RefSeq database have leveraged the scalability of crowd sourcing for growth, they have done so at the expense of including some unreliable, incomplete, or incorrect genomes, which have persisted in the database for decades. While the introduction of consequential data provenance policies may potentially mitigate these issues, public databases largely do not have these requirements. This creates substantial risk in the trustworthiness of individual genome assemblies, and in aggregate, for several important genomic database resources. The ATCC Genome Portal was created to reestablish the trustworthiness, reliability, and accuracy of genome assemblies associated with ATCC microbial strains. Currently, it includes over 2,000 high-quality ATCC Standard Reference Genomes (ASRGs), all produced in-house by ATCC from authenticated materials sourced directly from ATCC’s biorepository. The collection of annotated ASRGs, updated monthly, currently represents strains for 1,706 bacteria, 119 fungi, and 175 viruses. Each bacterial and fungal genome is sequenced on both Illumina and Oxford Nanopore platforms, resulting in a “hybrid-assembly” incorporating both sequencing technologies. To date, viral genomes have been produced using only Illumina sequencing. Each genome entry is provided with sequencing and assembly quality metrics, annotation metrics, and additional metadata for each strain such as antibiotic susceptibility, geographical origins, and phenotypic data. Here, we describe our laboratory and bioinformatics methods, the diversity of the current contents of the ATCC Genome Portal, it’s current capabilities, and new features for use by the research community. As we continue to carry out whole-genome sequencing of ATCC’s microbial collection, the number of reference genomes in the ATCC Genome Portal will continue to grow every month for the foreseeable future, and we encourage the research community to contact us with suggestions on taxa to prioritize in our pipeline. The ATCC Genome Portal and the data contained therein is freely available for research-use and is accessible online (https://genomes.atcc.org) or via an authenticated REST-API. 

Download the poster to explore the generation of high-quality, curated reference data

Download

Watch the poster presentation

Presenter

Jonathan Jacobs, headshot.

Jonathan Jacobs, PhD

Senior Director of Bioinformatics, ATCC

Dr. Jonathan Jacobs leads ATCC’s Sequencing & Bioinformatics Center and the development of the ATCC Genome Portal. He has over 20 years of experience in molecular genetics, bioinformatics, and microbial genomics, and he has worked throughout his career at the interface of academia, government, and industry. He holds a joint Research Professor appointment at Syracuse University’s Forensic & National Security Sciences Institute in support of microbial forensics graduate student training and research, and he actively collaborates with several US public health laboratories involved in pathogen genomics research and surveillance. Dr. Jacobs is also certified in Product Management from Pragmatic Institute, and he has led successful commercial launches of several bioinformatics products into the market.

Blue DNA helix near floating translucent blue rods.

Reference-quality sequences

Through the ATCC Genome Portal, you can easily search, access, and analyze hundreds of reference-quality genome sequences. Our optimized methodology is designed to achieve complete, circularized (when biologically appropriate), and contiguous genomic elements by using short-read (viruses) and hybrid (bacteria and fungi) assembly techniques. We then took our workflow one step further by accompanying each stage of the process with rigorous quality control analyses that ensure our data are the highest quality possible. Only the data that passes all quality control criteria are published to the ATCC Genome Portal. Visit the portal today to find the high-quality data you need for your research.

Visit the portal