A Bioinformatics Pipeline for Characterizing SARS-CoV-2 Viral Stocks
ASM Conference on Rapid Applied Microbial Next-Generation Sequencing and Bioinformatic Pipelines (ASM NGS 2022)
Baltimore, Maryland, United StatesOctober 18, 2022
The SARS-CoV-2 pandemic has highlighted the need for thorough characterization of viral stocks; because vaccine and therapeutic efficacy differ between SARS-CoV-2 variants, a well-characterized viral stock is critical for downstream research. Therefore, viral stocks must be authenticated by next generation sequencing (NGS) analysis for consensus sequence, and they must be screened for genomic variants that arise from adaptive changes due to propagation in cells. When determining the identity of an isolate in a clinical specimen, NGS reads are typically mapped to the ancestral SARS-CoV-2 sequence (AS), genomic variants are called, and a consensus sequence is generated; we refer to this sequence as the sample reference sequence (SRS). However, this process does not answer the question: has this viral stock deviated since its initial isolation and analysis?
The Sequencing and Bioinformatics Center (SBC) of the American Type Culture Collection (ATCC) has developed a pipeline to answer this question by comparing NGS results of the viral stock to the AS and the SRS. The pipeline begins with NGS of the viral stock produced from early passage seed virus deposited at BEI Resources. These reads are processed to remove adapters and low-quality reads and mapped to the SRS. Then, variants are called and a consensus is generated, which we refer to as the sample consensus sequence (SCS). Finally, the SCS, SRS, and AS are aligned. This allows the identification of mutations relative to the fully annotated AS by relating positions in the SCS and SRS to their corresponding AS positions.
There are five possible permutations of agreement (PoA) at each position in this alignment: (1) the SRS and SCS are identical, but the AS differs, (2) the AS and SRS are identical, but the SCS differs, (3) the AS and SCS identical, but the SRS differs, (4) all three sequences differ, and (5) all three sequences agree. Mutations of the first PoA are expected because the SRS and SCS are identical if no mutations have arisen. The second, third, and fourth PoA all indicate the potential presence of selective pressures. The second PoA signifies a mutation away from the SRS, the third PoA represents a reversion of the sample back towards the AS, and the fourth PoA suggests a new deviation. The fifth PoA covers regions of stability. With this approach, a sample that has not deviated from its initial isolation and analysis can be recognized by only having PoA of the first and fifth types. The quantity and frequency of the second, third, and fourth PoA indicate the amount of deviation that has occurred since the initial analysis. This pipeline is an important tool for quality control testing of SARS-CoV-2 variant identity and provides a means for analyzing deviation due to laboratory or natural selective pressures that ensures a solid foundation for research.
Download the poster to learn about our bioinformatics pipeline for evaluating SARS-CoV-2 variant identity.Download
Watch the poster presentation
Ford Combs, PhD
Bioinformatician, Sequencing and Bioinformatics Center, ATCC
Ford Combs is a new member of ATCC's Sequencing and Bioinformatics Center, having joined in January 2021. As a bioinformatician, he primarily works on ATCC's internal sequencing projects by either assembling and analyzing data or testing and improving bioinformatics pipelines. As the Audio Engineer on ATCC's Podcast, Behind the Biology, Ford performs sound design and audio editing. He holds an MS and PhD in bioinformatics and computational biology from George Mason University. His dissertation focused on topological and machine learning-based approaches to protein secondary structure assignment.
Explore our featured resources
Reproducible research starts with credible biological materials. That’s why ATCC has made a commitment to advance authentication in life science research. Through our Enhanced Authentication Initiative, we are enriching the characterization of our biological collections with reference-quality whole-genome sequencing data and making that data available to everyone.More
We offer COVID-19 tools and resources to contain the impact and investigate the long-lasting effects of the coronavirus disease.More