A Bioinformatics Pipeline for Characterizing SARS-CoV-2 Viral Stocks

ASM Conference on Rapid Applied Microbial Next-Generation Sequencing and Bioinformatic Pipelines (ASM NGS 2022)
Baltimore, Maryland, United States
October 18, 2022Abstract
The SARS-CoV-2 pandemic has highlighted the need for thorough characterization of viral stocks; because vaccine and therapeutic efficacy differ between SARS-CoV-2 variants, a well-characterized viral stock is critical for downstream research. Therefore, viral stocks must be authenticated by next generation sequencing (NGS) analysis for consensus sequence, and they must be screened for genomic variants that arise from adaptive changes due to propagation in cells. When determining the identity of an isolate in a clinical specimen, NGS reads are typically mapped to the ancestral SARS-CoV-2 sequence (AS), genomic variants are called, and a consensus sequence is generated; we refer to this sequence as the sample reference sequence (SRS). However, this process does not answer the question: has this viral stock deviated since its initial isolation and analysis?
The Sequencing and Bioinformatics Center (SBC) of the American Type Culture Collection (ATCC) has developed a pipeline to answer this question by comparing NGS results of the viral stock to the AS and the SRS. The pipeline begins with NGS of the viral stock produced from early passage seed virus deposited at BEI Resources. These reads are processed to remove adapters and low-quality reads and mapped to the SRS. Then, variants are called and a consensus is generated, which we refer to as the sample consensus sequence (SCS). Finally, the SCS, SRS, and AS are aligned. This allows the identification of mutations relative to the fully annotated AS by relating positions in the SCS and SRS to their corresponding AS positions.
There are five possible permutations of agreement (PoA) at each position in this alignment: (1) the SRS and SCS are identical, but the AS differs, (2) the AS and SRS are identical, but the SCS differs, (3) the AS and SCS identical, but the SRS differs, (4) all three sequences differ, and (5) all three sequences agree. Mutations of the first PoA are expected because the SRS and SCS are identical if no mutations have arisen. The second, third, and fourth PoA all indicate the potential presence of selective pressures. The second PoA signifies a mutation away from the SRS, the third PoA represents a reversion of the sample back towards the AS, and the fourth PoA suggests a new deviation. The fifth PoA covers regions of stability. With this approach, a sample that has not deviated from its initial isolation and analysis can be recognized by only having PoA of the first and fifth types. The quantity and frequency of the second, third, and fourth PoA indicate the amount of deviation that has occurred since the initial analysis. This pipeline is an important tool for quality control testing of SARS-CoV-2 variant identity and provides a means for analyzing deviation due to laboratory or natural selective pressures that ensures a solid foundation for research.
Download the poster to learn about our bioinformatics pipeline for evaluating SARS-CoV-2 variant identity.
DownloadWatch the poster presentation
Presenter

Ford Combs, PhD
Bioinformatician, Sequencing and Bioinformatics Center, ATCC
Ford Combs is a new member of ATCC's Sequencing and Bioinformatics Center, having joined in January 2021. As a bioinformatician, he primarily works on ATCC's internal sequencing projects by either assembling and analyzing data or testing and improving bioinformatics pipelines. As the Audio Engineer on ATCC's Podcast, Behind the Biology, Ford performs sound design and audio editing. He holds an MS and PhD in bioinformatics and computational biology from George Mason University. His dissertation focused on topological and machine learning-based approaches to protein secondary structure assignment.
Explore our featured resources

Discover the ATCC Genome Portal
The ATCC Genome Portal is a rapidly growing ISO 9001–compliant database of high-quality reference genomes from authenti…
More
Coronavirus
We offer COVID-19 tools and resources to contain the impact and investigate the long-lasting effects of the coronavirus…
More
SARS-CoV-2 Molecular Diagnostics Development
ATCC provides a variety of authenticated and clinically relevant materials for evaluating limit of detection, inclusivi…
More