ATCC 100 Years Logo Anniversary ATCC 100 Years Logo Anniversary Cart 0
  • Quick Order
  • Careers
  • Support

Building a Sequence Repository

Poster
3D rendering of DNA with rows of ones and zeros across the image.

2024 ASM Conference on Rapid Applied Microbial Next-Generation Sequencing and Bioinformatic Pipelines (ASM NGS)

Washington, DC, United States

October 15, 2024

Abstract

Background 

In 2019, the ATCC® Genome Portal (AGP) was launched as a part of an initiative to produce high-quality reference genomes for the entirety of the ATCC microbial collection. Each genome published on the AGP has been extensively documented, including detailed records starting from sample deposition to the ATCC collection through the final publication to the AGP. Example metadata include sample storage, culture conditions, and details on the sample's extraction, as well as the sequencing technology and genome assembly methods employed. As of May 2024, ATCC has published 4,500 microbial genomes complete with all supporting metadata. Here, we present the bioinformatics workflow as of 01 January 2024 that standardizes each genomic assembly made available through the AGP. 

Methods

 ATCC’s microbial assembly pipeline is composed of discrete modules for each category of organisms: viruses, bacteria, and microbial eukaryotes. Each category of organism proceeds through the same custom Python and Bash libraries that contain global functions for read-processing, read-mapping, and quality control. All DNA-based microbes in the collection undergo a hybrid assembly method, requiring both Illumina short-read and Oxford Nanopore Technology long-read sequencing data. RNA-viruses are assembled from Illumina short-read sequencing data only. The assembly pipeline diverges from uniformity when needed for kingdom-specific steps and software, such as using MaSuRCA for eukaryotic assembly, Unicycler for prokaryotes, and SPAdes for viruses. All viruses go through an additional step for de-hosting the host cell-line gDNA that is co-eluted with viral DNA/RNA extractions. Stringent quality control is performed for all items at multiple stages and passing assemblies.  

Results 

The ATCC microbial assembly pipeline has contributed to the 4,500 microbial genomes currently on the AGP which include 1,431 type strains, 408 antimicrobial resistance strains, and 51 strains composing ATCC’s NGS standards. 

Conclusion 

ATCC’s microbial assembly pipeline is intended to standardize the assembly process for reproducible, high-quality genomes. Through ATCC’s microbial assembly pipeline and accompanying detailed metadata, ATCC can provide the scientific community with traceable, authenticated, high-quality assemblies.  

 

Download the poster to learn about our microbial assembly pipeline.

Download

Presenter

Nikhita Puthuveetil, headshot.

Nikhita Puthuveetil, MS

Senior Bioinformatician, Sequencing and Bioinformatics Center, ATCC

Nikhita Puthuveetil is a bioinformatician at ATCC that performs routine bioinformatics analysis on internal sequencing submissions, primarily SARS-CoV-2 samples as well as plasmid and bacterial samples. She also works with her team to aid in the development of the ATCC Genome Portal. She first joined ATCC as an bioinformatics intern in 2019 where she worked to create an internal sequencing dashboard in R to track and manage sequencing at the Sequencing and Bioinformatics Center. She has an MS in Bioinformatics from Virginia Commonwealth University.

DNA rods with bacteria.

Reference-quality sequences

Through the ATCC Genome Portal, you can easily search, access, and analyze thousands of reference-quality genome sequences. Our optimized methodology is designed to achieve complete, circularized (when biologically appropriate), and contiguous genomic elements by using short-read (virology collection) and hybrid (bacteriology, mycology, and protistology collections) assembly techniques. We then take our workflow one step further by accompanying each stage of the process with rigorous quality control analyses that ensure the highest quality data. Only the data that passes all quality control criteria are published to the ATCC Genome Portal. Visit the portal today to find the high-quality data you need for your research.

Visit the portal