ATCC ATCC Logo 0

Increasing ONT Throughput for the ATCC Genome Portal

Poster
DNA helix made of green and yellow puffy balls.

ASM Microbe 2022

Washington, DC, United States

June 12, 2022

Abstract

The American Type Culture Collection (ATCC) uses a hybrid assembly method combining Illumina and Oxford Nanopore Technologies (ONT) data to assemble the highest quality bacterial genomes, and we provide the data to the scientific community on the ATCC Genome Portal. To date, we have sequenced 1,706 bacterial genomes. As ATCC currently possesses over 18,000 strains of bacteria, there is a growing need to develop a higher throughput sequencing pipeline. To be effective, the ONT requirements for DNA need to be reduced to quicken the current high molecular weight microbial extraction protocols. ATCC currently requires DNA fragments to have an average fragment length of 20 kb, 50% of fragments to be larger than 10 kb, and at least 1000 ng of starting material. Producing such high-quality DNA is time consuming but is in line with current ONT standards for the native barcoding and ligation sequencing with a R9.4.1 flow cell.

We sought to evaluate these requirements to determine if lower thresholds could generate the same quality bacterial assemblies. We gathered nine bacterial samples and filtered Nanopore reads to maximum length thresholds of 5, 7.5, 10, 15, 20, 25-200 kb, and no threshold. Next, a hybrid assembly was performed for each sample using the filtered ONT and unfiltered Illumina datasets. This constituted 63 separate assemblies. Each was evaluated based on current genome portal publication requirements. Results showed assemblies generated with reads at a maximum read length of 10 kb or higher met ATCC standards and had comparable completeness metrics. 

To test sample concentration requirements, we gathered 13 bacterial DNA samples. 500 ng and 1000 ng (the latter being the current ONT minimum) of DNA from the same sample were input into separate ONT library preparations using LSK-109. Next, the half and full samples were identically prepared and sequenced on separate R9.4.1 flow cells. For each preparation, the ONT datasets and the corresponding Illumina data were hybrid assembled and a comparative analysis was performed to evaluate the resulting assemblies. 84% of half assemblies were comparable or better than the full assemblies.

From our analysis, at half the sample concentration or at shorter read lengths, assemblies meet ATCC’s stringent requirements. Our analysis provides in-silico evidence to credit assemblies derived from smaller reads and at half sample concentrations. While more in-situ testing may be required, this opens the door for future work with faster extraction methods to further increase ATCC’s publication rate to the ATCC Genome Portal

Download the poster to explore ATCC's efforts toward developing a higher throughput sequencing pipeline

Download

Watch the poster presentation

Presenter

Nikhita Puthuveetil, headshot.

Nikhita Puthuveetil, MS

Bioinformatician, Sequencing and Bioinformatics Center, ATCC

Nikhita Puthuveetil is a bioinformatician at ATCC that performs routine bioinformatics analysis on internal sequencing submissions, primarily SARS-CoV-2 samples as well as plasmid and bacterial samples. She also works with her team to aid in the development of the ATCC Genome Portal. She first joined ATCC as an bioinformatics intern in 2019 where she worked to create an internal sequencing dashboard in R to track and manage sequencing at the Sequencing and Bioinformatics Center. She has an MS in Bioinformatics from Virginia Commonwealth University.

Samuel Greenfield, headshot

Samuel R. Greenfield, BS

Senior Biologist, ATCC

Samuel Greenfield is a Senior Biologist in the Sequencing and Bioinformatics Center at ATCC. His focus is NGS sequencing, using Illumina but mainly Oxford Nanopore devices. Sam joined ATCC in 2019 and over the last few years has worked on increasing the Sequencing and Bioinformatics throughput capabilities. Previous to his work at ATCC, he attended the University of Vermont where he earned his BS.

Blue DNA helix near floating translucent blue rods.

Reference-quality sequences

Through the ATCC Genome Portal, you can easily search, access, and analyze hundreds of reference-quality genome sequences. Our optimized methodology is designed to achieve complete, circularized (when biologically appropriate), and contiguous genomic elements by using short-read (viruses) and hybrid (bacteria and fungi) assembly techniques. We then took our workflow one step further by accompanying each stage of the process with rigorous quality control analyses that ensure our data are the highest quality possible. Only the data that passes all quality control criteria are published to the ATCC Genome Portal. Visit the portal today to find the high-quality data you need for your research.

Visit the portal