• Quick Order
  • Careers
  • Support

A Bioinformatics Pipeline for Characterizing SARS-CoV-2 Viral Stocks

Grainy, red-orange and purple spheres of Middle East respiratory syndrome coronavirus.

ASM Conference on Rapid Applied Microbial Next-Generation Sequencing and Bioinformatic Pipelines (ASM NGS 2022)

Baltimore, Maryland, United States

October 18, 2022


The SARS-CoV-2 pandemic has highlighted the need for thorough characterization of viral stocks; because vaccine and therapeutic efficacy differ between SARS-CoV-2 variants, a well-characterized viral stock is critical for downstream research. Therefore, viral stocks must be authenticated by next generation sequencing (NGS) analysis for consensus sequence, and they must be screened for genomic variants that arise from adaptive changes due to propagation in cells. When determining the identity of an isolate in a clinical specimen, NGS reads are typically mapped to the ancestral SARS-CoV-2 sequence (AS), genomic variants are called, and a consensus sequence is generated; we refer to this sequence as the sample reference sequence (SRS). However, this process does not answer the question: has this viral stock deviated since its initial isolation and analysis?

The Sequencing and Bioinformatics Center (SBC) of the American Type Culture Collection (ATCC) has developed a pipeline to answer this question by comparing NGS results of the viral stock to the AS and the SRS. The pipeline begins with NGS of the viral stock produced from early passage seed virus deposited at BEI Resources. These reads are processed to remove adapters and low-quality reads and mapped to the SRS. Then, variants are called and a consensus is generated, which we refer to as the sample consensus sequence (SCS). Finally, the SCS, SRS, and AS are aligned. This allows the identification of mutations relative to the fully annotated AS by relating positions in the SCS and SRS to their corresponding AS positions.

There are five possible permutations of agreement (PoA) at each position in this alignment: (1) the SRS and SCS are identical, but the AS differs, (2) the AS and SRS are identical, but the SCS differs, (3) the AS and SCS identical, but the SRS differs, (4) all three sequences differ, and (5) all three sequences agree. Mutations of the first PoA are expected because the SRS and SCS are identical if no mutations have arisen. The second, third, and fourth PoA all indicate the potential presence of selective pressures. The second PoA signifies a mutation away from the SRS, the third PoA represents a reversion of the sample back towards the AS, and the fourth PoA suggests a new deviation. The fifth PoA covers regions of stability. With this approach, a sample that has not deviated from its initial isolation and analysis can be recognized by only having PoA of the first and fifth types. The quantity and frequency of the second, third, and fourth PoA indicate the amount of deviation that has occurred since the initial analysis. This pipeline is an important tool for quality control testing of SARS-CoV-2 variant identity and provides a means for analyzing deviation due to laboratory or natural selective pressures that ensures a solid foundation for research.

Download the poster to learn about our bioinformatics pipeline for evaluating SARS-CoV-2 variant identity.


Watch the poster presentation


Ford Combs, headshot.

Ford Combs, PhD

Bioinformatician, Sequencing and Bioinformatics Center, ATCC

Ford Combs is a new member of ATCC's Sequencing and Bioinformatics Center, having joined in January 2021. As a bioinformatician, he primarily works on ATCC's internal sequencing projects by either assembling and analyzing data or testing and improving bioinformatics pipelines. As the Audio Engineer on ATCC's Podcast, Behind the Biology, Ford performs sound design and audio editing. He holds an MS and PhD in bioinformatics and computational biology from George Mason University. His dissertation focused on topological and machine learning-based approaches to protein secondary structure assignment.

Explore our featured resources

Concentric circles with purple, orange and yellow markers for DNA sequencing.

Discover The ATCC Genome Portal

The ATCC Genome Portal is a rapidly growing ISO 9001–compliant database of high-quality reference genomes from authenticated microbial strains in the ATCC collection. Through this cloud-based platform, you can easily access and download meticulously curated whole-genome sequences from your browser or our secure API. With high-quality, annotated data at your fingertips, you can confidently perform bioinformatics analyses and make insightful correlations.

Coronavirus cells.


We offer COVID-19 tools and resources to contain the impact and investigate the long-lasting effects of the coronavirus disease.

3D illustration of SARS-CoV-2

SARS-CoV-2 Molecular Diagnostics Development

ATCC provides a variety of authenticated and clinically relevant materials for evaluating limit of detection, inclusivity, and cross-reactivity of novel SARS-CoV-2 molecular diagnostic assays.